TRUE/FALSE values instead of expected values

Hi,

When I download my data, some variables show TRUE or FALSE as values instead of the expected data type (e.g., integers, option choices).

For example, I want to know how many ex-convicts in a household currently have a job. However, in my raw data, this variable contains TRUE, FALSE, or NA instead of the expected numbers.

Capture d'écran 2025-01-13 165339

Could you help me understand why this is happening and how I can retrieve the correct information?

Thanks in advance!

1 Like

Your screenshot looks like from Stata. What server are you downloading your data from? And how did you get it into Stata? Once we know those things, we can better advise.

1 Like

Hi Yaw!

Thanks for your reply.

I'm using ODK Central. I load data in Stata using this command:

"import delimited "C:\Users\pc\IDinsight Dropbox\data.csv", clear "

I tried adding the option "bindquote(strict)", I still had the issue.

I also tried re-downloading the data, now I have only empty cells for the variables where I previously had TRUE/FALSE.

I'm not very familiar with Stata so this is a bit of a guess! I think this could mean that 444,365 of your responses did not include an answer to the question, 45 had an answer of 0 (interpreted by Stata as FALSE), and 30 had an answer with a value other than 0 (interpreted by Stata as TRUE).

What if you look at the raw CSV in Excel or a text file? Are you seeing 30 rows with numeric values other than 0? If so, there's something wrong with how Stata is either being told or inferring the type of that column. Are you defining a schema somewhere? Can you force the type to integer?

1 Like

Thanks Hélène for your reply!

I do see the TRUE/FALSE no matter how I open the data.
I also understand the same from the results (45 households with 0 et 35 households with a value different from 0) but I need to know which values are behind the TRUE to build my indicators (could be 1 ; 2 or 3).

Hi @Claire_Ricard Claire,

Echoing @LN's answer as I have experienced this issue a few times myself

  1. It seems like you’re working with variables that are only available for a small number of observations in your whole dataset. In such cases, systems (and in particular Stata/R readers) often guess the variable type. When there are lots of missing variables, for some reason it is usually cast as logical (TRUE/FALSE). To resolve this, you may need to explicitly tell the system to treat these variables (and possibly others) as specific types. Personally, I load my entire datasets as text first to avoid this issue, and then I adjust the types as needed.

  2. You need to understand where the problem is:

  • either the problem is already present in your CSV file (i.e. TRUE/FALSE values also present when you open your CSV with Microsoft Excel), and then there is nothing you can do at your level and you need to go to the people who exported the data from ODK and shared the dataset with you as if an intermediary step or software was used to retrieve ODK data (could be an ODK API wrapper) and then save in CSV, that's probably what has introduced the issue
  • or if the problem is not present in your CSV file (when opening in Excel you can find the proper values) and in that case you can avoid the issue by using stringcols or numericcolsto force the data type of the column numbers in numlist to be string or numeric, see Stata forum, e.g to convert the 1st, 2nd and 3rd column to a string

import delimited "C:\Users\pc\IDinsight Dropbox\data.csv", stringcols(1,2,3) clear

In any case, I highly doubt the issue would come from ODK itself unless there is a major issue on how the form was designed

4 Likes

Hi Thalie,

Thank you for your reply.

When I opened the CSV in Excel, I noticed the TRUE/FALSE values. I downloaded the data again, and this time, the TRUE/FALSE values were gone (I only see NAs now). I also checked the archives of my previously downloaded datasets and found one that contains the actual values for my problematic variables. I'm not sure why or how the data differed during those downloads, but I can now combine the archived data with my most recent dataset to get the information I need.

Thank you all for your help! I’ll keep you updated if I uncover the reason behind this issue.

1 Like

Hi @Claire_Ricard, could you clarify how you "download" the datasets? This would be an helpful clarification - manually from ODK Central (by clicking on the Download button) ? or through a different mechanism?

Hi Thalie,

Apologies for my late reply. I downloaded the data by clicking the "download" button on ODK Central.

After investigating further, I discovered that I can recover all the correct values when I download the data for shorter periods. This makes me wonder if Excel could be converting the values to TRUE/FALSE when the dataset is too small.

Fortunately, I was able to reconstruct my entire database with the correct values, so everything is resolved now.

Thank you all for your support!

1 Like