Support for special characters in usernames (Basic Auth)


We have users in several countries, some of whom expect usernames to support characters in their language (for example é).

Modern server-side databases can easily store such characters, but the Basic Auth implementation that Collect currently uses is technically limited to ASCII characters only.

Current behavior

Trying to use non-ASCII UTF-8 characters in either the username/password (under Collect's General Settings > Server) results in undefined behavior on some servers and errors on others.

For example, the Rails gem Authlogic Collect's OkHttp sends data encoded as Latin-1, resulting in the server interpreting café as caf� if the expected headers are missing.

Possible solution

Collect could implement RFC 7617 for Basic HTTP auth -- see nice explanation here.

TL;DR: Sending data as UTF-8 in certain circumstances could allow servers to process non-ASCII characters properly and understand them.

Seems a simple change, and we'd be willing to help contribute if this seems feasible.

Code reference

This seems like a place to start digging; I'm not yet sure where exactly we'd need to make the change:

# src/test/java/org/odk/collect/android/openrosa/
assertThat(request.getHeader("Authorization"), equalTo("Basic dXNlcjpwYXNz"));

Naturally, I find this request reasonable and I appreciate you using é in your example. :smiley:

I'm a little bit confused about the directionality here. My understanding is that the server should indicate it can receive UTF-8. Is Collect sending UTF-8 no matter what?

It seems like the right thing to do is for Collect to send the username/password as ASCII unless it sees the charset="UTF-8" authentication parameter in which case it should send then as UTF-8. Does that match your understanding? If so, this wouldn't affect servers that don't send the parameter so I don't think the behavior is controversial.

In terms of implementation, coincidentally the library we use for basic auth just added support for UTF-8 credentials: However, we set up the authenticators preemptively before receiving an auth challenge. Two options I can think of are

  1. if preemptive auth fails and the parameter is set in the challenge, try again with UTF-8 encoding (and probably save this property of the server to avoid an extra request each time)
  2. look for non-ASCII characters in the username and password and if they're found, send as UTF-8 encoded and hope the server is happy with that

Neither makes me feel totally thrilled. Maybe some of those quick thoughts give you additional ideas.

You're right, I got that part backwards. I now doubt that I captured Authlogic's role properly. Perhaps caf� was changed by OkHttp, NOT Authlogic (but can't confirm right now) UPDATE: Actually nothing is being "changed", the character encoding is just understood as Latin-1 instead of UTF-8 so the é is misinterpreted. Workaround here is a decent short-term fix to re-encode the string on the server.

Anyway, great to hear there's library support! Good timing :clap:

Option 2 sounds in line with what the StackOverflow post suggests:

As of 2018, modern browsers will usually default to UTF-8 if a user enters non-ASCII characters for username or password (even if the server does not use the charset parameter).

Easy enough to implement and test, and probably wouldn't break anything. Option 1 sounds a bit fragile just because of the extra steps involved.

Is this something you'd accept as a PR, or needs more thought? (I don't immediately have time to submit but could get to it later.)

Since there's a widely-accepted approach, I think this is ready to implement whenever you're ready!

CC @Grzesiek2010 @seadowg just in case they see something else that might need discussion.

This would be great to go ahead with.

@cooperka that code reference looks like where I'd start. It'd probably makes sense to add a second test there for basic auth with special characters and then make it pass in OpenRosaServerClientProvider.