Columns in a choice list cannot begin with numbers : Invalid XML - 'The content of elements must consist of well-formed character data or markup'

Don't know if this is a parser bug or a poorly documented restriction (not mentioned here) - posting so that either it can be fixed, or someone else can search this post when they strike the same error message.

The error message in this and this post is clearer, but in both cases the issue was a space in the filename, which the error message doesn't refer to filename or spaces being forbidden characters


I found when uploading an xlsx definition that it failed at the XML state with a fairly unclear error (and no XML file to view column 870655)

After paring down the xlsx I discovered the issue was that I had reference data columns in the choices sheet that began with numbers not letters. Prepending these with a letter allowed the form to validate/upload. eg a column 2023_status would fail, but status_2023 would pass.

Central version
versions:
65d38c5de66dc07245632a19f3458035337f1215 (v2023.4.0)
95326b9ad66ec31c93bdb68c29f8797975d93fd2 client (v2023.4.0)
63fdf150e1ed81e3b1059050f7b1ba323931ab24 server (v2023.4.0)

Keywords
XML is invalid
The content of elements must consist of well-formed character data or markup
org.xml.sax.SAXParseException

Hi @ahblake,
The rule is at least mentioned for variable names, mentioned here: https://docs.getodk.org/xlsform/#the-survey-sheet, same for choice names, see https://docs.getodk.org/xlsform/#the-choices-sheet and in the new ODK Template :partying_face:, see ODK XLSForm Template.

Also here: https://xlsform.org/en/#setting-up-your-worksheets

Names have to start with a letter or an underscore. Names can only contain letters, digits, hyphens, underscores, and periods. Names are case-sensitive.

There has also been previous discussions, as you mentioned above and e.g.
Error: is an invalid xml tag. Names must begin with a letter, colon, or underscore, subsequent characters can include numbers, dashes, and periods.

Unfortunately, the XLSForm Online validator is not checking for choices name syntax (except warning for duplicates)! It only checks the names in the survey tab, giving an error like this:

Error: [row : 4] Invalid question name [WallElement’01] Names must begin with a letter, colon, or underscore. Subsequent characters can include numbers, dashes, and periods.
So, here even colon seems allowed.

Yes, the documentation mentions that choice/variable names must begin with a letter or underscore, but it doesn't mention it for choice list data columns.

The error message is also abstruse, unlike the error message for when you have an invalid choice/variable name which does clearly state this. The post you linked to is the same post that I already linked to.

The three changes that would help other people avoid this in future are

  • documentation update for datasets in choices indicating they must also begin with a letter or underscore
  • error message change to match the message for invalid variables.
  • edit this note in the ODK template to indicate that valid extra columns must begin with a letter or underscore
1 Like

Thanks, I removed the duplicate. And fully agree with your propositions.