Support for special characters in usernames (Basic Auth)

cooperka · June 24, 2020, 11:05pm

Motivation

We have users in several countries, some of whom expect usernames to support characters in their language (for example é).

Modern server-side databases can easily store such characters, but the Basic Auth implementation that Collect currently uses is technically limited to ASCII characters only.

Current behavior

Trying to use non-ASCII UTF-8 characters in either the username/password (under Collect's General Settings > Server) results in undefined behavior on some servers and errors on others.

For example, ~~the Rails gem Authlogic~~ Collect's OkHttp sends data encoded as Latin-1, resulting in the server interpreting café as caf� if the expected headers are missing.

Possible solution

Collect could implement RFC 7617 for Basic HTTP auth -- see nice explanation here.

TL;DR: Sending data as UTF-8 in certain circumstances could allow servers to process non-ASCII characters properly and understand them.

Seems a simple change, and we'd be willing to help contribute if this seems feasible.

Code reference

This seems like a place to start digging; I'm not yet sure where exactly we'd need to make the change:

# src/test/java/org/odk/collect/android/openrosa/OpenRosaServerClientProviderTest.java
assertThat(request.getHeader("Authorization"), equalTo("Basic dXNlcjpwYXNz"));

LN · June 25, 2020, 1:02am

Naturally, I find this request reasonable and I appreciate you using é in your example.

I'm a little bit confused about the directionality here. My understanding is that the server should indicate it can receive UTF-8. Is Collect sending UTF-8 no matter what?

It seems like the right thing to do is for Collect to send the username/password as ASCII unless it sees the charset="UTF-8" authentication parameter in which case it should send then as UTF-8. Does that match your understanding? If so, this wouldn't affect servers that don't send the parameter so I don't think the behavior is controversial.

In terms of implementation, coincidentally the library we use for basic auth just added support for UTF-8 credentials: https://github.com/rburgst/okhttp-digest/issues/66#issuecomment-636297069. However, we set up the authenticators preemptively before receiving an auth challenge. Two options I can think of are

if preemptive auth fails and the parameter is set in the challenge, try again with UTF-8 encoding (and probably save this property of the server to avoid an extra request each time)
look for non-ASCII characters in the username and password and if they're found, send as UTF-8 encoded and hope the server is happy with that

Neither makes me feel totally thrilled. Maybe some of those quick thoughts give you additional ideas.

cooperka · June 25, 2020, 11:12pm

You're right, I got that part backwards. I now doubt that I captured Authlogic's role properly. Perhaps caf� was changed by OkHttp, NOT Authlogic (~~but can't confirm right now~~) UPDATE: Actually nothing is being "changed", the character encoding is just understood as Latin-1 instead of UTF-8 so the é is misinterpreted. Workaround here is a decent short-term fix to re-encode the string on the server.

Anyway, great to hear there's library support! Good timing

Option 2 sounds in line with what the StackOverflow post suggests:

As of 2018, modern browsers will usually default to UTF-8 if a user enters non-ASCII characters for username or password (even if the server does not use the charset parameter).

Easy enough to implement and test, and probably wouldn't break anything. Option 1 sounds a bit fragile just because of the extra steps involved.

Is this something you'd accept as a PR, or needs more thought? (I don't immediately have time to submit but could get to it later.)

LN · June 26, 2020, 5:05am

Since there's a widely-accepted approach, I think this is ready to implement whenever you're ready!

CC @Grzesiek2010 @seadowg just in case they see something else that might need discussion.

seadowg · June 29, 2020, 8:30am

This would be great to go ahead with.

@cooperka that code reference looks like where I'd start. It'd probably makes sense to add a second test there for basic auth with special characters and then make it pass in OpenRosaServerClientProvider.