1. What is the issue? Please be detailed.
I am using ruODK (v1.4.0) in R (R version 4.3.1) to import data from a current survey we are collecting using ODK. For previous surveys, I have used the option within ruODK::odata_submission_rectangle(names_sep = NULL) to remove the group name prefixes from the variable names that are loaded into R. Currently I am getting an error related to the GPS coordinates collected by ODK:
Error in `tidyr::unnest_wider()`:
ℹ In column: `coordinates`.
ℹ In row: 1.
Caused by error:
! Can't unnest elements with missing names.
ℹ Supply `names_sep` to generate automatic names.
2. What steps can we take to reproduce this issue?
Try importing a survey with GPS coordinates to R using ruODK.
3. What have you tried to fix the issue?
I have tried supplying names_sep with different patterns, including the default "_". While this prevents the error and allows me to create a dataframe, all of the group names are included as prefixes to the variable names, creating lengthy variable names, which is what the names_sep = NULL option is designed to avoid.
I just tried to run the ruODK package on R and am still getting the same error. If I run the standard code, I get all the group names (which I find cumbersome in analysis). When I run
Thanks for the notice, I’ll take a look tonight! For now, you could possibly use a column delimiter like `__`, then sanitise the column names to get rid of the group name and delimiter?
Thanks @Florian_May - that would be great. The automated delimiting does not work well in my file because I have some variable names that contain _, which makes this messy. I can just replace strings (e.g replace mainform = ““) in R or Stata, but that is a bit tedious because I have quite a few groups and would have to do this for each group separately. Great if you could take a look - we can definitely some less elegant coding fixes for this otherwise.
Apologies, that wasn’t formatted clearly in reply above - I meant to suggest using a double underscore __ as separator (assuming you’re using only single underscores in your variable names) and Mutate multiple columns — mutate_all • dplyr as a quick patch until I publish a fix.
I just tried the dplyr package but it looks like it is not finding any objects with double underscore. When I run it with just single underscore, it gives me the expected error (names no longer unique). Should there be a double underscore after the group name in my .csv?
To clarify, I meant for you to set odata_submission_get(sep=”__”) so that ruODK uses that double underscore to separate group name prefixes from variable names.
Then you can use dplyr to remove anything up to and including the double underscore.
Does that work?
And to confirm, are your variable names unique without group names?
Sorry, I missed that. I tried to run sep=”__” in ruODK 1.5.1, but this option is not recognized. Chatgpt then recommended the code below, but this somehow does not change the headers (i.e. they still just have instead of double underscore) - what am I missing?
3) Pull the main table
submissions <- odata_submission_get(
table = "Submissions",
local_dir = "media", # downloads & relinks attachments; folder will be created
wkt = TRUE, # geopoints as WKT; omit for GeoJSON + extracted lon/lat columns
parse = FALSE,
expand = TRUE # customized separator
)
2) Rectangle (flatten) with your custom separator
subs <- odata_submission_rectangle(
data = submissions,
names_sep = "___" # <-- your desired separator
form_schema = form_schema() # optional: pass if using GeoJSON to avoid unnesting geofields
I’m working on adding support for different names_sep to odata_submission_get.
If you have a moment, you can try this as ruODK v1.5.2 (fresh from github).
You will shortly be able to omit group names from submission data but this comes with one caveat: this will cause errors if the form contains repeated groups.
In the unparsed submission data, ODK Central represents repeated groups as unnamed lists containing the repeated field names in each group. Removing the group name will then lead to repeated = not unique column names.
Therefore I’ve implemented this as follows:
odata_submission_get(names_sep=NULL) will drop the group name unless it contains repeats, in which it includes the group name to prevent duplicate column names.
odata_submission_get(names_sep="__", clean_names=FALSE)will separate group names with a double underscore (new: and not janitor::clean_names() the double underscores into single underscores). You can then add |> dplyr::rename_with(~ stringr::str_replace(.x, ".*?__", ""))` to drop the group name but this will again fail on non-unique names if the form has repeats. Also, this does not parse coordinates nicely yet (into _latitude, _longitude, _altitude etc.)
Your best option may be to keep group prefixes, then rename column names as needed in the presentation / visualisation steps.
Thanks @Florian_May. I upgraded to version 1.5.2. and am running the following code below on R, but am still seeing group names - is that because of the repeats in my data?
****
svc <- odata_service_get() # tibble with names like "Submissions", "Submissions.child_group"
submissions <- odata_submission_get(
table = "Submissions",
local_dir = "media", # downloads & relinks attachments; folder will be created
wkt = TRUE, # geopoints as WKT; omit for GeoJSON + extracted lon/lat columns
parse = FALSE,
expand = TRUE, # customized separator
names_sep=NULL
)
subs <- odata_submission_rectangle(
data = submissions,
names_sep = "___" # <-- your desired separator
I just realized this may be a problem with the double underscore __, which somehow R seems to reformat. When I take an _xxx delimiter, the following code seems to work for me with version 1.5.2.:
subs <- odata_submission_rectangle(
data = submissions,
names_sep = "_xxx" # <-- my separator
You can disable the name sanitization with clean_names=FALSE and odata_submission_get will handle this arg and names_sep, so you can set parse=TRUE and will not need to rectangle yourself.
That also works. The data frame looks slightly different though with all missing values showing up as NA (and the overall variables as strings) - can I change that?
My current file does not have any repeats - do you want me to test it with a repeat as well?
Hmm, the NAs and all strings indicate that this patch has upset correct form schema parsing. This is definitely something I want to fix in the long run but this will take some dedicated time.
Good to hear confirmation that your form has no repeats, this explains why you can drop group names without errors. My test form has repeats and I had to re-enable group names for those.
Given all this I feel you'll get the best (most correct and well-formed) output from not dropping the group names and accept the extra step of relabeling columns as a postprocessing step.
Is this option something you'd be comfortable with?
I think that can work. I guess the ideal solution would be one where the system would create a different separator for groups; if that was possible, the non-repeat variables could easily be cleaned, and the repeat vars could just stay as is. How does the system handle this when I pull down the data manually from ODK Central and click on the “remove group names” checkbox? I have never had any problems with this option.