MySQL Data to ODK Agregate

htuser · March 27, 2019, 11:54pm

Dear All,
I'm new to ODK, but 15 years old Cspro application developer.
We would like to perform longitudinal survey using an existing database stored on MySQL. In Cspro, it would be very easy to import and sync them using Excel2Cspro.
Please, do you have or know a solution that can be used to migrate the MySQL database to aggregate server or ODK collect? Enumerators would use them in ODK Collect for the survey.
Please let me know about.
Thanks in advance for response,

danbjoseph · March 28, 2019, 3:41pm

Welcome to the ODK forum, @htuser ! We're glad you're here. When you get a chance, please introduce yourself here. I'd also encourage you to add a real picture as your avatar because it helps build community!

Longitudinal data collection support in ODK is something that is on the roadmap:

github.com/getodk/roadmap

Entity-based data collection

opened 09:59AM - 26 Jun 18 UTC

closed 09:03PM - 13 May 23 UTC

admbtlr

See [the forum table of contents](https://forum.getodk.org/t/odk-ecosystem-longi…tudinal-data-collection-table-of-contents/22234) - https://github.com/getodk/central/issues/298 adds Datasets of Entities generated from form Submissions and attached to follow-up forms using the existing CSV mechanism. <details> <summary>2018 strawman proposal from @admbtlr </summary> ## User Stories *As a health worker, I want to be able to collect a medical record every time a patient visits my health facility, so that I can keep track of the patient's progress over time* *As a census taker, I want to visit a village every year and record population data* *As a vaccine delivery driver, I want to keep track of the quantities of vaccines that I deliver to cold storage facilities during my weekly deliveries* *As a regional vaccine administrator, I want to download CSV files that show the quantities of vaccine that have been delivered to all the cold storage facilities in my region over the last six months* ## Proposed Implementation For the sake of this explanation, I'm going to use the following terminology: - **Entity** refers to the thing about which data is collected. The kind of thing -- the "entity type" -- will depend on the use case. So in the above user stories, the entity types would be "patient", "village", "cold storage facility", "cold storage facility" - **Record** refers to one round of data collection. So in the above user stories, a record would be 1. the details of patient's visit to a health facility 2. an annual set of population data for a village 3. the quantities delivered to a health facility in a given week 4. again, the quantities delivered to a health facility in a given week The simplest solution is probably to have two separate forms, one to collect the details of an entity ("the Entity Form") and one to collect the details of each visit ("the Record Form"). A Record must have one (and only one) Entity associated with it. An Entity can have multiple Records associated with it. ### The Entity Form Forms for creating entities must have a certain field (or fields) marked as an "identifying field". This would be for example a patient's name and DOB, or a village name and region, or a cold storage facility name and ID number. These identifying fields can then be used as labels in the CSV file that the Record Form uses to enable a data collector to choose the linked Entity. Entity Forms can also have fields marked as "filter fields". These will be used to reduce the number of options shown in the list of Entities (see *Getting Entity lists onto devices* below). ### The Record Form Forms for creating records must have one attribute called `entity_type_id`; this attribute can only contain the UUID of an Entity Form. They must also have one field called `entity_id`. This field should be of type `select_one_external` (see *Getting Entity lists onto devices* below). ### Getting Entity lists onto devices The first question in a Record Form should be a selection of the associated Entity. This question should be of type `select_one_external`. The values will then be loaded into the form from an external CSV file that is downloaded from the server. The CSV file should have the following format: ``` list_name,name,label,<filter_field_1>,<filter_field_2>,... entities,<instanceID>,<identifying field value>,<filter field 1 value>,<filter field 2 value>,... entities,<instanceID>,<identifying field value>,<filter field 1 value>,<filter field 2 value>,... ... ``` [More](http://xlsform.org/#external) on external CSV files in X(LS)Forms. These CSV files should be generated automatically by ODK Central, and updated every time a new Entity Form is submitted. It should then be possible to use the automatic form update functionality to keep the CSV file up to date. _[Question: if a media file is updated - in this case the CSV - does that count as an updated form? or would ODK Central have to automatically make a new version of the form each time it updates the CSV file?]_ ### Local Entities A common use case is to create an Entity and then immediately create a Record for that Entity. In an offline scenario, this is not possible with the spec so far. It is there therefore necessary to add a mechanism for adding Entities locally, within ODK Collect. Every time the Entity form is completed, the data should be written to a local CSV file (or a local database?). There should then be a mechanism whereby the local CSV file is merged with the downloaded CSV file whenever the Record Form is opened. It might make sense to clean up the local CSV file every time a new CSV file is downloaded from the server, but it's questionable whether this will be necessary (one reason: if an Entity is deleted on the server, it will still be in the local CSV and the merge will make it available in the form). ## Required Changes ### XForm Spec - addition of concept of an Entity Form and a Record Form (not sure if this is totally necessary, but ODK Central will need to recognise an Entity Form so that it can do the automatic generation of CSV files) - addition of identifying fields and filter fields ### ODK Central - automatic generation of CSV files from Entity instances - automatic form update after generation of CSV file (is this necessary?) - a UI to enable display of Records by Entity ### ODK Collect - ability to store a local Entity Instances CSV file and merge it with a downloaded CSV ## Additional Notes ### De-duplication of Entities It would make sense to build some duplication detection and resolution into ODK Central. Ideally, it would only possible to do data collection on entities that have come from Central, so that they will always have to go through this de-duping, but this is obviously not acceptable if I want to register a patient and then make a case report on them in a totally offline setting. I could see a possible solution using a kind of tombstone for de-duped entities, so that a process might look like this: - while offline, I register patient `dd6c32a4` using Form A - `dd6c32a4` is now marked as "pending" on my device, which means I can submit case reports against it, but it's not on Central - I then do a case report on `dd6c32a4` using Form B - when eventually online, I submit both to Central - it turns out that patient `dd6c32a4` is an exact duplicate of an existing patient, `19f44a40`, who already has case reports - (more details about how exactly de-duping works here) - my case report is switched to refer to the existing patient, `19f44a40` - patient `dd6c32a4` is replaced in Central with a tombstone that refers to `19f44a40` - all incoming case reports for `dd6c32a4` will be switched to refer to `19f44a40` - once my device has updated its entity list, I will no longer be able to make a case report against `dd6c32a4` For the specifics of the de-duping process, I would probably use a combination of approaches. First you need to find possible matches, probably using a trigram algorithm (or possible Levenshtein distances) on identifying fields such as name, village, etc. There's a really good trigram module for Postgres. This is then combined with matches on other fields (e.g. date of birth or geopoint) to calculate a similarity score. You can then figure out values and say something like "if it's over 95%, just merge them automatically" and "if it's over 80%, flag them as probable dupes", and provide a simple interface that displays the data with yes/no buttons. I've done something like this for de-duping patient lists in DRC and it worked pretty well. </details>

You might also take a look at ODK-X for some of the more advanced functionality that you're looking to implement.

htuser · March 28, 2019, 9:18pm

Thank you very much @danbjoseph. While waiting for Longitudinal data collection support in ODK, i'm taking a look on ODK X.
Best Regards,

yanokwa · April 1, 2019, 5:17am

@htuser If return visits are not very often, you can get longitudinal behavior in ODK Collect. You can build a CSV with the data you want to to reference in the form, attach it to the follow up form, and then use pulldata() to access it.

htuser · April 1, 2019, 7:46am

Thank you @yanokwa. Since we need frequent update, pulldata() is'nt right for me. Another question, do you have a solution to parse questionnaire in MySQL specific schema with multiple table and columns in ODK? Please see this project as an example: https://github.com/ilri/odktools
Thanks in advance for response.

yanokwa · April 1, 2019, 6:52pm

No, I'm not aware of such a solution.

htuser · May 3, 2019, 8:39pm

@danbjoseph, i don't find anyway to use ODK-X tools for upload any data from to ODK agregate. Please do you have any idea? Where are you with the longitidunal data migration feature?
Please let me know.
Thanks in advance,

danbjoseph · May 4, 2019, 6:09am

ODK-X is a separate set of data management tools. ODK Aggregate is not part of the ODK-X tools. You'll need to use ODK Survey, Tables, and Services. The ODK-X docs have a sample application to help understand the functionality of ODK-X. Check out the Getting Started User Guide: https://docs.opendatakit.org/odk-x/getting-started-2-user/

ekemeh · May 19, 2019, 8:31pm

Hi, my name is Eric and I am from Ghana. Good to be here. I am learning alot here