Joining Two Tables in Exported CSV

Hello All

Anyone here who can share the processes of joining two tables or help join the parent table to repeat table; I'm a bit lost.

Thanks in advance!

Hi,

The data set you attached includes peoples names. I advise you remove the attached excel from the site and replace it with a version without any personal information if necessary.

Normally there is a column in the second tab (the name can change depending on the exact system) which can be used to make the join. How you manage that join varies somewhat depending if the data inthe second tab can be aggregated using sums and counts etc. or not.

2 Likes

Hello,
This file version are without people names.
Can you create an example out of this file? I seem not to figure it out how to join the parent sheet to repeat sheet.

Thanks in advance!

Updated Raw Data.xlsx (447.2 KB)

You don't say what software you have available or what level of programming knowledge you have.

Probably the easiest way of doing this (without programming) is via a join in SQL. You need to join the tables on the "_parent_index" and "_index". I'm not able to upload an example file due to limitations of file types. You can do this in LibreOffice Base or SQLite after importing the csv files.

e.g.

SELECT "tbl_Updated_Raw_Data".*, "tbl_hh_repeat".* FROM "tbl_Updated_Raw_Data" JOIN "tbl_hh_repeat" ON "tbl_hh_repeat"."_parent_index" = "tbl_Updated_Raw_Data"."_index"

Personally I would use Pandas dataframes in Python, but it involves some coding.

Thank you, @Farmer_Tony! This helps, though I was looking for a more simpler approach. I'm using MS Excel. I'll try it out.

Thanks

Hi @Nana_Kwasi_Appiah! If you're using Excel, another option might be to use something like VLOOKUP(). See the following post for a description of what that approach could look like:

2 Likes

If you're using MS Office you can do the same thing in Access. I would recommend a database to manage your data if you can, rather than Excel. It's more extensible and much easier to analyse and report.

If you do want to use Excel take a look at INDEX, MATCH.

1 Like

Hi Nana,

I haven't sited exactly which variables you wanted to join back from the hh repeat to the main interview data set. But MS Excel has quite a number of simple and efficient functions to do that job.
My favorite is the index function which you can read about. Though it might take you some time to master.
I've use the simplest of them [vlookup] and have joined back some few variables and reattached your file.
-A few things to note about vlookup function:

  1. Its right handed; meaning what you are looking up should be positioned to the right of the unique identifier.
  2. it needs to specify the column number of the array that contains what you are looking up - we can achieve this by using another function like count or column or match to count for us to avoid manual counting. I've used match as its simple.

This would be enough:
=VLOOKUP($W3,hh_repeat!$A$3:$W$7033,MATCH(Ghana_Enumeration_17.12.19!AF$2,hh_repeat!$A$2:$W$2,0),0)
But when vlookup finds nothing, it returns a 0 (zero) yet this means something to data
Modify the formula as below to solve that;
=IF(VLOOKUP($W3,hh_repeat!$A$3:$W$7033,MATCH(Ghana_Enumeration_17.12.19!AF$2,hh_repeat!$A$2:$W$2,0),0)="","",VLOOKUP($W3,hh_repeat!$A$3:$W$7033,MATCH(Ghana_Enumeration_17.12.19!AF$2,hh_repeat!$A$2:$W$2,0),0))
Also, when what you're looking for is genuinely not there, vlookup will return #N/A which is not good looking to data.
Control for it as;
=IFERROR(IF(VLOOKUP($W3,hh_repeat!$A$3:$W$7033,MATCH(Ghana_Enumeration_17.12.19!AF$2,hh_repeat!$A$2:$W$2,0),0)="","",VLOOKUP($W3,hh_repeat!$A$3:$W$7033,MATCH(Ghana_Enumeration_17.12.19!AF$2,hh_repeat!$A$2:$W$2,0),0)),"")Updated Raw Data_with formulae.xlsx (598.9 KB)

Beyond excel, you can read about stata.
Hope this helps.
JosephKi?

1 Like

Great work @JosephKi and thanks for taking the time to look into this.

In reference to your questions: unique ID used is empty in the hh_repeat. WHY? I'm not sure why. That is just the output file I get after exporting from ONA as a CSV file.

Thanks brother, I'll take a critical look and build on your formulaes.

Best!

1 Like

My conviction is the process of exporting from ona. Is it possible you can pull from ona using briefcase?

First try pulling the submissions. Most likely, it'll come out differently but better with all its unique identifiers as well.

Regards.

Hello @JosephKi thanks for the suggestion. But for some reason it seems Briefcase (ODK Briefcase v1.17.3) cannot pull from ONA. None of the options (Aggregate Server and Central Server) seems to be working with ONA.

Maybe I'm missing something.

Best!

Hello Nana.

Sorry about that.

Unfortunately I haven't actually worked with ONA a lot - I have only received datasets extracted from there. And thus can't be of help in that respect.

Maybe the programmers here can guide regarding ONA and Briefcase compatibility.

And my suggestion is for you to go ahead and work with the data as they are. Good enough you had their the variable _index which is unique in both sets and could facilitate the merging process very well. You can also create other unique identifiers in case you're not contented with it - but the same uid has to identify corresponding observations in both files. So If the suggestion on the joining formula worked well, just go ahead.

Best.

JosephKi?

1 Like

Thank you Joe, for the support!
:handshake:

1 Like