Support for special characters in usernames (Basic Auth)

Motivation

We have users in several countries, some of whom expect usernames to support characters in their language (for example é).

Modern server-side databases can easily store such characters, but the Basic Auth implementation that Collect currently uses is technically limited to ASCII characters only.

Current behavior

Trying to use non-ASCII UTF-8 characters in either the username/password (under Collect's General Settings > Server) results in undefined behavior on some servers and errors on others.

For example, the Rails gem Authlogic Collect's OkHttp sends data encoded as Latin-1, resulting in the server interpreting café as caf� if the expected headers are missing.

Possible solution

Collect could implement RFC 7617 for Basic HTTP auth -- see nice explanation here.

TL;DR: Sending data as UTF-8 in certain circumstances could allow servers to process non-ASCII characters properly and understand them.

Seems a simple change, and we'd be willing to help contribute if this seems feasible.

Code reference

This seems like a place to start digging; I'm not yet sure where exactly we'd need to make the change:

# src/test/java/org/odk/collect/android/openrosa/OpenRosaServerClientProviderTest.java
assertThat(request.getHeader("Authorization"), equalTo("Basic dXNlcjpwYXNz"));

Naturally, I find this request reasonable and I appreciate you using é in your example. :smiley:

I'm a little bit confused about the directionality here. My understanding is that the server should indicate it can receive UTF-8. Is Collect sending UTF-8 no matter what?

It seems like the right thing to do is for Collect to send the username/password as ASCII unless it sees the charset="UTF-8" authentication parameter in which case it should send then as UTF-8. Does that match your understanding? If so, this wouldn't affect servers that don't send the parameter so I don't think the behavior is controversial.

In terms of implementation, coincidentally the library we use for basic auth just added support for UTF-8 credentials: https://github.com/rburgst/okhttp-digest/issues/66#issuecomment-636297069. However, we set up the authenticators preemptively before receiving an auth challenge. Two options I can think of are

  1. if preemptive auth fails and the parameter is set in the challenge, try again with UTF-8 encoding (and probably save this property of the server to avoid an extra request each time)
  2. look for non-ASCII characters in the username and password and if they're found, send as UTF-8 encoded and hope the server is happy with that

Neither makes me feel totally thrilled. Maybe some of those quick thoughts give you additional ideas.

1 Like

You're right, I got that part backwards. I now doubt that I captured Authlogic's role properly. Perhaps caf� was changed by OkHttp, NOT Authlogic (but can't confirm right now) UPDATE: Actually nothing is being "changed", the character encoding is just understood as Latin-1 instead of UTF-8 so the é is misinterpreted. Workaround here is a decent short-term fix to re-encode the string on the server.

Anyway, great to hear there's library support! Good timing :clap:

Option 2 sounds in line with what the StackOverflow post suggests:

As of 2018, modern browsers will usually default to UTF-8 if a user enters non-ASCII characters for username or password (even if the server does not use the charset parameter).

Easy enough to implement and test, and probably wouldn't break anything. Option 1 sounds a bit fragile just because of the extra steps involved.

Is this something you'd accept as a PR, or needs more thought? (I don't immediately have time to submit but could get to it later.)

1 Like

Since there's a widely-accepted approach, I think this is ready to implement whenever you're ready!

CC @Grzesiek2010 @seadowg just in case they see something else that might need discussion.

This would be great to go ahead with.

@cooperka that code reference looks like where I'd start. It'd probably makes sense to add a second test there for basic auth with special characters and then make it pass in OpenRosaServerClientProvider.