We have users in several countries, some of whom expect usernames to support characters in their language (for example
Modern server-side databases can easily store such characters, but the Basic Auth implementation that Collect currently uses is technically limited to ASCII characters only.
Trying to use non-ASCII UTF-8 characters in either the username/password (under Collect's General Settings > Server) results in undefined behavior on some servers and errors on others.
the Rails gem Authlogic Collect's OkHttp sends data encoded as Latin-1, resulting in the server interpreting
caf� if the expected headers are missing.
Collect could implement RFC 7617 for Basic HTTP auth -- see nice explanation here.
TL;DR: Sending data as UTF-8 in certain circumstances could allow servers to process non-ASCII characters properly and understand them.
Seems a simple change, and we'd be willing to help contribute if this seems feasible.
This seems like a place to start digging; I'm not yet sure where exactly we'd need to make the change:
assertThat(request.getHeader("Authorization"), equalTo("Basic dXNlcjpwYXNz"));
Naturally, I find this request reasonable and I appreciate you using é in your example.
I'm a little bit confused about the directionality here. My understanding is that the server should indicate it can receive UTF-8. Is Collect sending UTF-8 no matter what?
It seems like the right thing to do is for Collect to send the username/password as ASCII unless it sees the
charset="UTF-8" authentication parameter in which case it should send then as UTF-8. Does that match your understanding? If so, this wouldn't affect servers that don't send the parameter so I don't think the behavior is controversial.
In terms of implementation, coincidentally the library we use for basic auth just added support for UTF-8 credentials: https://github.com/rburgst/okhttp-digest/issues/66#issuecomment-636297069. However, we set up the authenticators preemptively before receiving an auth challenge. Two options I can think of are
- if preemptive auth fails and the parameter is set in the challenge, try again with UTF-8 encoding (and probably save this property of the server to avoid an extra request each time)
- look for non-ASCII characters in the username and password and if they're found, send as UTF-8 encoded and hope the server is happy with that
Neither makes me feel totally thrilled. Maybe some of those quick thoughts give you additional ideas.
You're right, I got that part backwards. I now doubt that I captured Authlogic's role properly. Perhaps
caf� was changed by OkHttp, NOT Authlogic (
but can't confirm right now) UPDATE: Actually nothing is being "changed", the character encoding is just understood as Latin-1 instead of UTF-8 so the
é is misinterpreted. Workaround here is a decent short-term fix to re-encode the string on the server.
Anyway, great to hear there's library support! Good timing
Option 2 sounds in line with what the StackOverflow post suggests:
As of 2018, modern browsers will usually default to UTF-8 if a user enters non-ASCII characters for username or password (even if the server does not use the charset parameter).
Easy enough to implement and test, and probably wouldn't break anything. Option 1 sounds a bit fragile just because of the extra steps involved.
Is this something you'd accept as a PR, or needs more thought? (I don't immediately have time to submit but could get to it later.)
Since there's a widely-accepted approach, I think this is ready to implement whenever you're ready!
CC @Grzesiek2010 @seadowg just in case they see something else that might need discussion.
This would be great to go ahead with.
@cooperka that code reference looks like where I'd start. It'd probably makes sense to add a second test there for basic auth with special characters and then make it pass in