Odkmeta input csv format

@Matthew_White

Dear Matthew,

I have a similar problem as @Amanda_Berman.
I tried with CSV files exported as ASCII, but I still get the following error:

column header label not found
invalid label() suboption
invalid survey() option

Do you know how this can be solved?

Best,
Daan

@Daan_S, odkmeta can issue this error message if the format of the survey file differs from what odkmeta expects. Is the .csv file comma-separated instead of semicolon-separated? You might also want to double-check that the .csv file is ASCII, not Unicode.

It was semicolon-separated but I replaced all the semicolons by comma's. I was working on a Mac, but I couldn't get the .csv file in ASCII. On Windows it worked in the end, but there are still problems with the generated do file. I still don't have the value labels of my variables and also not the correct variable labels. Do you know why this is the case?

Are you saying that the do-file runs without error, but that it doesn't correctly attach variable and value labels? I haven't encountered that issue before. Do you see a pattern among the variables that do not have labels? Also, how are you exporting the data from your server to .csv?

Thanks so much for the help.

Yes, except for this error: "today variable survey_date could not be converted using the mask MDY"
Half of the variable labels are missing and the other half are a copy of the variable name.
The variables of which the labels are missing are mostly the ones with multiple answers, for these variables multiple variables are also generated, one for each possible answer.

In the do-file I also see that not all the variable values have quotation marks.
In my dataset I have labels in two languages. I specified to take the English labels, but in the do-file labels in the other language also appear. I don't know if this is part of the problem?

I got the data from the ngo I'm working with, so I'll have to ask them how they exported the data from their server to .csv.

Yes, except for this error: "today variable survey_date could not be converted using the mask MDY"
Half of the variable labels are missing and the other half are a copy of the variable name.

The do-file has to run to completion without any error to succeed. odkmeta formats date/time variables before attaching labels. Since the do-file resulted in an error while formatting those variables, it hadn't started its labeling work yet. The labels you see may just be the ones autogenerated by insheet.

It sounds like you need to update the locals at the top of the do-file. The help file describes the purpose of those:

The do-file starts with the definitions of several local macros; these are constants that the do-file uses. For instance, local macro `datemask' is the mask of date values in the .csv files. The local macros are automatically set to default values, but they may need to be changed depending on the data.

If that doesn't work, I recommend checking that you have raw, unmodified data from the NGO. For example, if you open a .csv file in Excel and save it, even without making other changes, that can reformat dates and times.

In the do-file I also see that not all the variable values have quotation marks.

Note that if a value label does not contain a space, it usually doesn't need to be enclosed by quotes.

Hope that helps!

2 Likes

Thank you so much!

I updated the locals and used raw data, I however still get this error: "today variable survey_date could not be converted using the mask YMD”.
The ‘survey_date’ variable contains dates like this: 2017-04-18 00:00:00.0.
I think I specified the right order since I tried with YMD and YMDhms, but still get the converting error.

Fields of type date and today are exported from ODK Briefcase without a time stamp, but it looks like survey_date, which is a today field, includes a time stamp. I've only seen that when the raw data has been modified. I'd recommend using ODK Briefcase to export the raw data again (or asking the NGO to do so).

If that doesn't work, I think you'll have to manually modify the do-file. The do-file formats date and time variables in a section of the do-file with the comment * Date and time variables. It formats fields of type date and today using the date() function, which expects a date string, not one with a time stamp. You'd have to change the code to use a different function. Alternatively, the do-file is designed to be fairly modular, so you should be able to remove the section entirely, then implement your own date/time formatting after the do-file completes.

Thank you so much for all the help!
PS: a video tutorial explaining the tool would have helped me a lot and I think it could also be very helpful for other students wanting to use the tool in the future.

Facing the same problem. Any solution for this problem? I perfectly execute the sample prepared by Matthew that contains date but will get an error (datetime variable SubmissionDate could not be converted using the mask MDYhms) in the do file when I use the my data set that is generate by briefcase with submission date like 3/2/2019 9:32.

Please help

Hi @Naod!

The odkmeta do-file uses the Stata clock() function to convert dateTime strings to Stata %tc values. As part of that, the do-file specifies a "mask" that describes how the dateTime strings are formatted. See here for the Stata documentation about this mask.

The default mask that the do-file uses for dateTime strings is "MDYhms". However, you can use the `datetimemask' local macro at the top of the do-file to specify a different mask (see my post above). Your example value of 3/2/2019 9:32 seems like it might require the mask "MDYhm" (without the seconds component) or perhaps "DMYhm".

However, it seems unusual that your dateTime data does not include seconds. Because of that, I recommend the following steps:

  1. Re-export the data from Briefcase.
  2. Do not edit any CSV export (or even open it and re-save it) in Excel or any other software before running the odkmeta do-file: the do-file requires the raw data.
  3. Next, try running the do-file without changing `datetimemask'.
  4. If the do-file still results in an error, consider changing `datetimemask'. Consult the Stata documentation for the correct mask.

There is also a feature request to have Briefcase export date and time strings using ISO 8601, which I think would make things easier in cases like this by increasing standardization.

Hope this helps!

1 Like

Thank you Matthew, I have responded in my latest email. Still there are some very important points that you pointed out, specially the editing part of the csv file and the second in the
datetimemask which i did not notice.

1 Like

Thank you so much Matthew. On top of the above recommendations by Matthew it is good to check the version being used for the ODK extraction. I was using version ODK-Briefcase 1.13 and immediate shift to the ODK-Briefcase 1.12.2 has resulted for perfect do file creation. Probably it could be a lesson for others too.

1 Like

I'm glad that you were able to generate a do-file successfully!

In general, I recommend using the latest version of ODK Briefcase with odkmeta, as that will have the latest features and bug fixes. However, as you've seen, there are differences between Briefcase v1.12 and and v1.13, specifically around the names of repeat group CSV files. (See this thread for more information.) Most of my time lately has been on ODK Central, but I'm also working on a new version of odkmeta that will use the same repeat group CSV filename pattern as the latest version of Briefcase.

Until that's released, one option is to use the latest version of Briefcase, then replace the repeat group CSV filenames in the odkmeta do-file so that they match what Briefcase exports.

The only odkmeta-related difference between Briefcase v1.12 and v1.13 that I know of has to do with the repeat group CSV filenames. Given that, I'm curious how you resolved the error with your dateTime field. Did you not encounter the error when you used Briefcase v1.12 to export the data?

Yes Matthew, I always try to use latest versions but as discussed before challenged in Briefcase v1.13 for odkmeta.
I have generated my entire CSV files successfully using version 1.12. But while doing some data cleaning on the CSV file, the usual error “datetime variable SubmissionDate could not be converted using the mask MDYhms” will appear. So all I am doing is, avoid all the date information from the CSV files while doing the data cleaning and generate the Stata version -it is okay so long as there is no date information. I just create another CSV file that hold the date variables with the key and finally change that one as well to a Stata file. And I merge the two Stata files using the Key-uuid.
If there is no date information editing the CSV file will not disturb the odkmeta from generating the do file and the Stata files. But preferably doing the edition on the Stata files is advisable than on the CSVs.
emphasized text

Hi Matthew,
I am just looking at the data that I have generated using odkmeta in Stata. For select multiple response questions the variables are splitted based on the number of options in the question. My expectation for each newly generated variable (var_1, var_2,...) is just to see a single response, but that is not the case. In the first variable values from the other options will be stored and the same is true for the other variables (sometimes it put (#1/5), (#2/5),... and none in some of the cases). I think it is counting the response for each response but what I need is just to see the responses for var_1, without the other variables and the same is true for the others. So is that it normal or did i miss something?

Thanks in advance

Hi @Naod!

In cases like this, it's often helpful to examine the do-file itself to see what Stata commands it's running. In this case, odkmeta first uses the split command to split a select_multiple variable into multiple variables. Later, it assigns a value label to those variables. To see what it's doing with the dataset, it might be helpful to add pause commands to the do-file, examining the dataset before and after the split (perhaps using the browse command). Note that the split variables do not count responses.

These split variables are useful for some tasks, but you may need to further transform them for some types of analysis. For example, there's a longstanding GitHub issue about automatically generating dummy variables for select fields.

Hi Matt,

So does it mean the actual results of the select-multiple questions is displayed in the usual manner? (For each variable in yes or no manner) because i am getting the same result for all the data sets I am running. For a while I have written a do file to transfer the result to the normal way but will try to check what you have suggested above.