We have users in several countries, some of whom expect usernames to support characters in their language (for example é).
Modern server-side databases can easily store such characters, but the Basic Auth implementation that Collect currently uses is technically limited to ASCII characters only.
Current behavior
Trying to use non-ASCII UTF-8 characters in either the username/password (under Collect's General Settings > Server) results in undefined behavior on some servers and errors on others.
For example, the Rails gem Authlogic Collect's OkHttp sends data encoded as Latin-1, resulting in the server interpreting café as caf� if the expected headers are missing.
Naturally, I find this request reasonable and I appreciate you using é in your example.
I'm a little bit confused about the directionality here. My understanding is that the server should indicate it can receive UTF-8. Is Collect sending UTF-8 no matter what?
It seems like the right thing to do is for Collect to send the username/password as ASCII unless it sees the charset="UTF-8" authentication parameter in which case it should send then as UTF-8. Does that match your understanding? If so, this wouldn't affect servers that don't send the parameter so I don't think the behavior is controversial.
In terms of implementation, coincidentally the library we use for basic auth just added support for UTF-8 credentials: https://github.com/rburgst/okhttp-digest/issues/66#issuecomment-636297069. However, we set up the authenticators preemptively before receiving an auth challenge. Two options I can think of are
if preemptive auth fails and the parameter is set in the challenge, try again with UTF-8 encoding (and probably save this property of the server to avoid an extra request each time)
look for non-ASCII characters in the username and password and if they're found, send as UTF-8 encoded and hope the server is happy with that
Neither makes me feel totally thrilled. Maybe some of those quick thoughts give you additional ideas.
You're right, I got that part backwards. I now doubt that I captured Authlogic's role properly. Perhaps caf� was changed by OkHttp, NOT Authlogic (but can't confirm right now) UPDATE: Actually nothing is being "changed", the character encoding is just understood as Latin-1 instead of UTF-8 so the é is misinterpreted. Workaround here is a decent short-term fix to re-encode the string on the server.
Anyway, great to hear there's library support! Good timing
Option 2 sounds in line with what the StackOverflow post suggests:
As of 2018, modern browsers will usually default to UTF-8 if a user enters non-ASCII characters for username or password (even if the server does not use the charset parameter).
Easy enough to implement and test, and probably wouldn't break anything. Option 1 sounds a bit fragile just because of the extra steps involved.
Is this something you'd accept as a PR, or needs more thought? (I don't immediately have time to submit but could get to it later.)
@cooperka that code reference looks like where I'd start. It'd probably makes sense to add a second test there for basic auth with special characters and then make it pass in OpenRosaServerClientProvider.