Bulk population of a new form with data from csv file

Central v 1.2 added functions for editing submissions and the first thing my team wanted to use this for was to provide a really simple and stupid system for management of data about students at our university. Essentially we needed

(1) about 10 people to be able to access the system from all around the world
[ODK = secure global access]

(2) to be able to open and add data / edit data or remove data from a student record
[Central v1.2 makes this possible]

(3) to set up some automatic reports based on R markdown
[easy enough, see here]

Almost immediately, someone said...

"We already have a lot of data about students in an excel / CSV format, can we upload all of this data in bulk?"

Various people have asked about this over the years and I've never had much reason to need to do it because editing wasn't possible, but v1.2 changes this.

The following is a highly alpha and fairly hacky solution that takes advantage of the ability of ODK Briefcase to push submission xml files to a central project. It uses R because that's my comfort ground, but could work similarly using python etc.

Also note that this includes some coding, but the time required even for longer forms is probably much less than manually entering a huge amount of records. It also removed the risk of data clerks introducing errors and need for double entry system.

I think that this is inteded to be used at initial setup (i.e. you create a form, populate it and then use ODK as normal to add new records and data)

I can't promise this won't brick your project, so recommend trying with something neutral in the first instance.

Procedure

For explanation of this approach, I'll use a really simple form design

Survey

Choices

I also have an existing database in CSV format

fig2

I start by making a single submission to the form on central via Enketo. I then download this to workstation using ODK Briefcase. The submission.xml file contains all we need to create new versions that are populated with data from csv.

Essentially we have an xml tag for each field and we can use R/Python etc to parse new data in to a proforma.

The instanceID appears to be a set of hex codes which we can generate randomly on the assumption that we'll never get duplicates of such a complicated code length.

R CODE

00_Prepopulate_Form.R.txt (2.8 KB)

Running this script does the following

  • Reads the database
  • Creates new random UUIDs for each line in database
  • Parses data from csv in to an xml proforma based on the example we downloaded with briefcase
  • Saves new xml files in correctly named instance subfolders

Which can then be uploaded using ODK Briefcase push

and these can be opened and edited on central

Initial version of the data for LaWayne

Edited version of the data for LaWayne

4 Likes

This is very helpful. I was about to post a question on this subject.

I have data in Excel format which needs to be uploaded to certain complex forms on a Central 1.2 server. I am not fully familiar with R, but the logic presented here looks super useful. I will give it a try.

Anyother thoughts and details will be highly appreciated.

@Syed_Muhammad_Qadeer
If you come up with a solution using python or similar please post an example here!