CSV data export with carriage returns is badly formatted

on the submission page, the csv export is almost always badly formatted. I can't say exactly what's wrong. Lines jump and the file is unusable. What's more, the export is very limited: csv. excel, json impossible. The only good export is Odata, which I use. I would like to know if other users have this problem, if something is being developed to improve exports, and the formatting and choice of formats (xlsx, json, gpkg etc...). Maybe we could think about offering a better export experience to users.

Maybe it's encoding issue...?

I never had any problem with csv export. Could you share some exemple ?

1 Like

Thank you @mathieubossaert ,
Here is an example where the comment field is "cut" and this moves the line one line down.
This doesn't happen on all comments though. And I can't find any difference. I tried to reproduce the error but without success.
export_csv_bug.csv (2.0 KB)

Maybe the content of the rows is just bottom aligned (Excel default)? Perhaps, you can see/check it, when you activate wrap text for the comment cell/column.

I have had this happen numerous times and put it down to something in the text comment causing a newline / delimit issue but as I don't work with the CSV I hadn't investigated.

I pulled some submissions to find one and compare the CSV to the raw submission text - first one I found broke a submission across five rows, when I looked at the content of the text field, it contained carriage returns at the same point where the CSV had a newline. A few others had the same issue, so it appears that carriage returns in text fields cause the CSV to have a newline at each one, which screws up the record when opened.

These fields were instance lookups, not user entry/edit and there was no " surrounding them to indicate a block of text. A user entered field in another record was wrapped in " eg
(Pilcrows (¶) added for clarity)

Note: The source CSV for the lookup did have " around the text block with CRs.

value,value,"user entered text with¶
a carriage return",value,value,vs instance lookup¶
with a carriage return,value,value

When opening directly in Excel, the user entered text with CR didn't cause a newline issue and the " were stripped, but the instance looked up text with no " did have the newline issue.

When importing and specifying delimiters, even with the text qualifier set to ", the import didn't strip the " and breaks to a newline after the CR on both user entered and instance looked up.


Unlike your file however, after the carriage return in mine,

  • there wasn't a delimiter for the remaining columns in the form,
    • Grand collectif.;;;;;;;;;;;;;;
  • the delimiter is a , not a ; and
  • there are no extra double quotes at the end of the text block before the delimiter.
    • "Le collectif blanc ne fait pas parti de la commune de ban st martin""";

Is this the raw CSV from Central or has it been opened/saved?

Was the entered text in your form field actually as below, or were there any " in it?:

Grand collectif.¶
Le collectif blanc ne fait pas parti de la commune de ban st martin


It appears that a calculated text field containing CRs will definitely have problems:

value,value,The¶
quick¶
brown¶
mongoose,value,value

And user entered text with CRs may have problems:

value,value,"Jumped¶
over the¶
lazy grey¶
cobra",value,value

2 Likes

As noted at https://docs.getodk.org/central-submissions/#downloading-submissions-as-csvs

Excel will not import CSVs with Unicode characters like ã, ß, and 箸 correctly if you double-click the file or open it from the File menu. You must use the Text Import Wizard and specify a file origin of Unicode (UTF-8, 65001) and the comma delimiter.

In this case, @Ri_Ri's export_csv.bug.csv file doesn't look like it came from Central. It is semi-colon (;) separated instead of comma (,) separated. Further, it has characters like ée which means whatever software produced it didn't import or export in a Unicode-safe manner. My guess is that this CSV was opened in Excel in a French-locale (that would explain the semi-colon) without going through the text wizard (that would explain the weird characters.

@Ri_Ri, what app are you using to open the CSV from Central? And how exactly are you opening it (double-click, file menu, something else)?

3 Likes

thank you everyone for all this information and your time for this bug.
Indeed, I modified the file to make it slightly anonymous. I also put it in semicolon to test if it changed anything.

The quote and double-quote works well for submissions.

I think @ahblake hit on something with carriage return, on smartphone only. Here's a test I just did, a line is created for each carriage return.
Capture

Would it be possible to prevent users from making a carriage return in text fields? Either on Excel, or directly on the xlsform?

Thank you @yanokwa for the link of text wizard import.

I'd try using a constraint to prevent carriage return inputs. Whether you need to use a carriage return or an ascii code I'm not sure as I haven't tested this.

not(contains(., '
')

thank you @ahblake , ok, i tried : not(contains(., '\r') or not(contains(., '\n'). It didn' work :confused: .

Does the CSV export function provide for carriage return? or is it my Excel and its French encoding that's causing the problem?

no one else seems to have had this problem ?

@yanokwa With Text Import Wizard it's ok for special characters, but there's still the problem of carriage returns.

did this also not work?

not(contains(., '
')

The CR is entered in excel with alt-enter / option-enter. These CR when used in hints/notes etc are recognised. ^M will probably also not work if \r etc don't.

I've certainly had the broken CSV issue as above, but I use OData to get my submissions, so it's not a problem per se.

It did not work too :confused:

It's what i use too (OData), but it will be great if csv export work with the carriage return.

Had the same issue with your file on LibreOffice.
Thanks for the discussion.

Can you share an export of your CSV directly from Central. Do not open it. Just export a few submissions that have the problem. If it isn't something you can share publicly, email it to me at yanokwa@getodk.org.

Hi @yanokwa ,

Here's an example of a CSV export directly from odk central (from a test dataset)
test_export (8).csv (1.9 KB)

Hi @Ri_Ri

I tried the provided CSV and used "Data > Get Data (Power Query) > Text/CSV" option to load the CSV and here is the screenshot of the data, it has correctly handled the CRs:

PS: "Text Import Wizard" is a legacy feature, I couldn't make it work for the provided file.

2 Likes

I created a small form that tried to get carriage returns into it a few different ways

  • Collect
    • instance lookup to a text field from a CSV - doesn't show CR in field but is preserved
    • instance lookup to a calculate from a CSV - CR preserved, displaying value as hint renders CRs
    • user entry virtual keyboard - not possible
    • user entry attached keyboard - not possible
    • user entry pasting text - doesn't show CR in field but is preserved
  • Enketo
    • instance lookup to a text field from a CSV - doesn't show CR in field but is preserved
    • instance lookup to a calculate from a CSV - CR preserved, displaying value as hint strips CRs
    • user entry attached keyboard - not possible
    • user entry pasting test - strips CR

Opening the CSV of submissions (Excel M365 for Mac, separate export for each open/import attempt)

  • 'Open': no problems
  • Legacy text import wizard, select UTF-8, comma delimited: formatting broken at CR
  • Get data from text, select UTF-8, comma delimited, based on first 200 rows: no problems

I then tried with a much larger CSV (180 columns, 6398 rows)

  • 'Open': formatting broken at a CR (row 47, instance lookup text with CR, rest of record in column1 row48)
  • Legacy text import wizard, select UTF-8, comma delimited: formatting broken at CR (row 47, rest of record in column1 row48)
  • Get data from text, select UTF-8, comma delimited, based on first 200 rows: formatting broken at a CR (row 47, rest of record in column1 row48)
  • Get data from text, select UTF-8, comma delimited, based on entire dataset: broken at row 2033, instance lookup text with CR

I reduced the export down to just the day with an issue (8 rows)

  • 'Open' - no problems
  • Legacy import - broken, rest of record in column1 row7
  • Get data from text - no problems

So, depending on the size of the file, and method of open/import, it can succeed or break at different locations. I'm no closer to narrowing it down. :person_shrugging:

2 Likes

Thanks so much for this detailed test! Can you share (or email me) the much larger CSV with 6398 rows. I'd like to try it with Excel for Windows to see if the problem exists there.

I can't send this file (but can demo on a call if it gets to that), but I exported them again in Windows, separate copies for each test.

If the legacy wizard is hidden you can reenable it;

Click on “File” –> “Options” –> “Data” and set the corresponding checkmarks for reactivating the “Text Import Wizard” in Excel. Start the text import by clicking on “Data” –>”Get Data” –> “Legacy Wizards” –> “From Text (Legacy)”.

Windows 10 / Office365 v2402

  • large CSV (6398 rows) only first error noted.

    • 'open' broken at a CR, row 2033. multiple CRs in a field split record over multiple rows, rest of record in incorrect columns
    • legacy import wizard, broken at a CR, row 47, rest of record in column1, row48
    • get data from text, UTF-8, comma, first 200 rows. Result 16189 rows loaded, 1142 errors (type mismatches), broken at a CR, row 47, rest of record in column1, row48
    • get data from text, UTF-8, comma, entire dataset. Hung for hours, later preview finished but would not load unless re-previewed on first 200 rows.
  • pruned CSV (8 rows capturing row 47 above)

    • 'open' - no problems
    • legacy import wizard, broken at CR, rest of record in column1 row7
    • get data from text, UTF-8, comma, first 200 rows, no problems
  • pruned CSV (10 rows capturing row 2033 above)

    • 'open' - broken at row 2, multiple CRs in a field split record over multiple rows, rest of record in incorrect columns
    • legacy import wizard, Broke at rows 1, 2, 8, 10. record split over 1 to multiple rows, rest of record in incorrect columns.
    • get data from text, UTF-8, comma, first 200 rows. Result: 14 rows loaded, no errors reported. Broken at row 2, multiple CRs in a field split record over multiple rows, rest of record in incorrect columns