What is the general goal of the feature?
To allow more control over how variable names appear in the data after being submitted. Currently, the file name and the group name are automatically added before the actual name given to the variable. It would be good to be able to turn off file name prefixes and group name prefixes.
For example, in my survey named household_survey
, all my variable names end up in my sheet looking like this:
household_survey-today
household_survey-enum
household_survey-basic-name_f
household_survey-basic-name_l
household_survey-basic-gender
household_survey-basic-age
But I want them to be named like this:
today
enum
name_f
name_l
gender
age
I've described this issue in this thread also: How to remove survey name from headers after submission in ODK Collect? - #3 by notaplatypus
What are some example use cases for this feature?
Having the file and group names makes it difficult to do preliminary analyses on my data as they come in (i.e. not yet complete or ready to move over to other software, and so names cannot be changed yet) as all the variable names look the same. The columns have a fixed width so all of them show household_
before being cut off. I then need to select the cell with the name to see the full name in the formula box. This is tedious especially when presenting on ODK to stakeholders and introducing the concept of data collection and analyses to them, showing how survey responses get turned into data. Sure, we can guess at some of the variable names by looking at the responses (date, names), but when there are a few questions in a row that contain integers, that's where it starts to get confusing and cumbersome to keep having to check the variable name.
The front of a variable name is what is read first and having all variable names share the same file name at the start just creates clutter and makes names unnecessarily long. This also makes me reconsider my practice of naming my files with the most recent date, as I don't want to further add to the length of my variable names. This issue is carried through to other programs such as R where calling a column name requires a lot of typing and renders keyboard arrow shortcuts useless.
I understand that it is currently set up the way it is to ensure that variable names are unique, but ODK Validate already checks for unique names.
These are the errors I get in XLSForm Offline when the variable is repeated in the same group:
ODK XLSForm Offline Errors:
There are more than one survey elements named 'name' (case-insensitive) in the section named 'test_form'.
This is when the variable name is repeated across two groups:
ODK XLSForm Offline Errors:
There has been a problem trying to replace ${name} with the XPath to the survey element named 'name'. There are multiple survey elements with this name.
Given that there are already checks in place to make sure that names are unique regardless of group, the final variable name is needlessly long with both file names and group names as prefixes.
Additionally, a good feature to have could be to allow non-unique names during form building, and upon submission, append numerals to the end of names that are repeated. This is a common solution used in all computers when saving something with a duplicate name, adding (1) or (2) etc. Software like R/RStudio also do this when joining data frames.
This will save a lot of collective time and effort for users as I'm sure many are already manually removing the unwanted prefixes as the first step of data analysis. If there is an option to turn off either or both file name prefixes and group name prefixes, it will be a welcome improvement.
Thanks.