Web-based Stata cleaning code generator

Hi All,
I'd like to introduce a simple new WebApp: ODK -> STATA

The app produces STATA cleaning code for your ODK generated dataset. Just upload your XLSForm to get started.

The code will

  • Attach a label of user choice to the variables
  • Attach the list_name values as value labels
  • Optionally: Split select_multiple columns into dummy variables
  • Label the select_multiple dummies

While I love ODKMETA, the wonderful stata package by IPA that does all this and more within stata, I sometimes need to produce basic cleaning code without meeting all requirements necessary when using it (eg csv files not available easily, etc.).

I would love to get your thaughts and feedback on the app!

Give it a spin: https://odk-stata.streamlit.app/
Run it locally and/or contribute on GitHub: https://github.com/JonasWeinert/ODK_CleaningcodeGenerator



Hi @JonasWeinert! I'm always excited to see activity in the ODK + Stata space. I've been busy working on ODK Central and haven't had the chance to update odkmeta in a long while. I like the idea of wrapping this functionality in a web app — something easier to use than running a command in Stata.

Could you say more about what requirements you aren't able to meet? I think odkmeta mostly just needs the XLSForm (in CSV format).

Since you're using Python, I'm curious about whether you've taken a look at https://github.com/PMA-2020/odk2stata. Also, since your repository mentions R, I thought I'd point out https://github.com/ropensci/ruODK as well.


Hi @Matthew_White!
Thank you for the two links! As I see that they are both open source as well, I'll see how we can combine our powers:)
The R code in the repo description is indeed a thought that might no longer be true exactly because of ruODK.

RE ODKmeta: I sometimes

  • need to load the data from different file formats than csv (eg xlsx, json, etc)
  • and/or don't have access to the csv files of the data
  • have data that contains already split select_multiple fields
  • needed to adjust the XLSForm label columns (to remove non ASCII characters) and change the column names.

Thank you for your input Matthew!

1 Like