Naming of repeat CSVs in Briefcase export in v1.12.1

Matthew_White · September 30, 2018, 9:57pm

(@Francisco_Carballo, you can probably skip this post: I'll reply to your post in a separate post below.)

@ggalmazor, it's been a while since I dug into some of this, but I just tested out a form with nested groups and repeat groups, and I think I have a better sense of the expected behavior. The form I used has the title nested and has the following structure:

nested.xlsx (9.6 KB)
nested.xml (6.8 KB)

field1
group1
- field2
- group2
  - field3
  - group3
    - field4
  - repeat1
    - field5
- repeat2
  - field6
  - group4
    - field7
  - repeat3
    - field8
repeat4
- field9
- group5
  - field10
  - group6
    - field11
  - repeat5
    - field12
- repeat6
  - field13
  - group7
    - field14
  - repeat7
    - field15

I actually don't know a lot about the specifics of PARENT_KEY and KEY. The only thing that odkmeta assumes is that the PARENT_KEY of a row of a repeat group CSV exactly equals the KEY of the corresponding row of the parent CSV.

I think this is right, and Briefcase 1.12.1 matches my expectation here.

It looks like earlier versions of Briefcase took an interesting approach here that differs slightly from Briefcase 1.12.1 when there is a repeat group within another repeat group. In earlier versions, the filename of a CSV file for a nested repeat group would not include the name of the repeat group's parent repeat group. However, it would include the names of any groups between the repeat group and its parent repeat group. On the other hand, Briefcase 1.12.1 uses the fully qualified name of the nested repeat group for the filename, so it includes the name of the repeat group's parent repeat group in addition to the names of any groups between the repeat group and its parent repeat group.

When I use Briefcase 1.12.1 for the nested form, it exports the following files:

nested.csv
nested-group1-group2-repeat1.csv
nested-group1-repeat2.csv
nested-group1-repeat2-repeat3.csv
nested-repeat4.csv
nested-repeat4-group5-repeat5.csv
nested-repeat4-repeat6.csv
nested-repeat4-repeat6-repeat7.csv

Briefcase 1.10.1 exports the following files:

nested.csv
nested-group1-group2-repeat1.csv
nested-group1-repeat2.csv
nested-repeat3.csv
nested-repeat4.csv
nested-group5-repeat5.csv
nested-repeat6.csv
nested-repeat7.csv

I think using the fully qualified repeat group name, as Briefcase 1.12.1 does, is safer for forms that contain two repeat groups with the same name. I would guess that's pretty rare, especially since XLSForm doesn't allow it. However, when I test such a form with Briefcase 1.10.1, the two repeat groups with the same name are written to the same file.

I'd be happy with either approach: the current approach seems slightly safer, but the older approach also seems reasonable, and it wouldn't require any odkmeta changes.

While we're on the subject of duplicate names, I think there's still a slight chance that Briefcase 1.12.1 could run into duplicate CSV filenames, if there are two repeat groups that have different names but have the same fully qualified name. For example, if there is a repeat group named r that is nested within a group named g, and there is a second repeat group named g-r, then even though there are no duplicate names, the fully qualified names are duplicate. However, I think the risk here is probably vanishingly small, and I think Briefcase 1.12.1 does the right thing by throwing an error. But I thought I'd mention this in case it makes sense to include it as a test case.

This sounds right to me!

This sounds like a good strategy to me, and using a hyphen as the separator means that odkmeta won't have to change.