Some QR codes with non-ASCII contents fail to scan

Xiphware · March 16, 2026, 7:52pm

I have observed similar failures with the newer QR decoder being unable to scan QRs that the previous old version could. My investigation suggested that it was being caused by two related issues; the first is that the offending QRs appeared to always include non-plaintext (eg accent characters or, say, Arabic). And second it depended on what QR code generator has been used to originally generate them (!). Specifically, accent chars in QR generated by the popular QR4OFFICE extension would generate unscannable QRs if the (rich) text happened to have an accent chars in them, whereas the exact same string encoded by a different tool (eg https://qrcode.tec-it.com/en) would generate a decodeable QR code.

So my current belief is that some QR encoders are encoding non-regular characters (eg full unicode?) in perhaps a ‘non-standard’ manner, which the newer QR decoder cannot handle. That is to say, the newer QR decoder in Collect appears to be a lot more sensitive to how non-plaintext data is being encoded by various existing QR encoders. But its also the case that using different QR encoder package - eg https://www.the-qrcode-generator.com - for the same data with accents and/or Arabic chars will actually generated readable QR codes for me.

So I cant say whether the problem is that the new QR decoder in Collect is buggy, or that some of the existing QR encoders out there (eg QR4OFFICE and QRCode.js) are doing a lousy job encoding non-plaintext.

LN · March 19, 2026, 6:54pm

Splitting this out because I suspect it's a different issue.

I tried to encode "Hélèneپ á" with https://davidshimjs.github.io/qrcodejs/ and when I tried to scan it with Collect's barcode question type the viewfinder toggled from yellow to green back and forth and I never got a reading. Other scanners I tried gave me a result but with different unexpected characters ranging from boxes to Japanese. When I encoded the text with https://www.qr-code-generator.com/ all scanners I tried including Collect could read it quickly.

I think it's the responsibility of the person creating a code to make sure that the character encoding is unambiguous. It looks to me like Collect is able to read bits from the failing QR code but then when it tries to decode them as UTF-8 that fails, explaining the back and forth between green and yellow. Other scanners seem to fall back to some kind of decoding but it doesn't produce a usable output.

The default Android scanner on my Android device shows "Unknown encoding" which could be a slight improvement to make.

Xiphware · March 19, 2026, 7:22pm

That jives with what I observed, although I didn’t specifically the notice a yellow/green back-and-forth… but perhaps it wasn’t visibly that obvious to me.

I agree that those generating these QR codes should test them. The problem I see is by-and-large QR codes mostly ‘just work’, so the majority of folk simply dont know about potentially different encodings of non-standard text - eg arabic, kanji, accents, etc - and that some QR generators/decoders dont actually do it correctly (but are fine when you give them plaintext).

If it is possible to do something (in the new scanner) that explicitly indicates that the code you are trying to scan is malformed and unscannable, that’d at least be a clue to the user that something is wrong with the code. Whereas you otherwise just keep futilely moving the phone around thinking that Collect is broken…