Help: CSV Data set from ODK Briefcase has string entries instead of numeric codes

Dear ODK Community,

I have successfully collected survey data using the ODK technology thanks
to information shared on the ODK website and the ODK Community.

However, I have started preparing my data set for analysis in Stata and
have detected that the csv data files from ODK Briefcase show variable
labels (string characters) in the data rather than numeric codes. For
example, on gender, “male” appears in the data instead of “1”, and “female”
rather than “2”. I just realized that the data appear this way because of
how I named question listings under the “choices” sheet in my ODK program
file. I thought numbers were not acceptable under the ‘name’ column on the
“choices” sheet but I have noticed that it is possible to use numbers and,
for some reason, I missed that part when I was designing my form.

Because I am now faced with raw data that is showing string entries instead
of numerical codes, I wanted to check with you all to see if someone knows
how to solve this issue in ODK (or other faster way to switch the data
entries from string to numeric codes). The obvious option to me now is
recoding the entries to numerical values either manually (which is a lot of
work prone to errors) or try to import into Stata and recode which is also
a cumbersome fix.

Please let me know if this issue could be resolved in a simpler way.

Thanks,

Mike

Mike,

I'd write a Python script to process the CSV to do the find/replace.
I'm not a Stata expert, but there has to be away to do a find/replace
of values. https://github.com/matthew-white/odkmeta might be able to
help.

Yaw

··· -- Need ODK consultants? http://nafundi.com provides form design, server setup, professional support, and software development for ODK.

On Fri, Oct 2, 2015 at 3:23 AM, Mike Nyarko cashsterling2@gmail.com wrote:

Dear ODK Community,

I have successfully collected survey data using the ODK technology thanks to
information shared on the ODK website and the ODK Community.

However, I have started preparing my data set for analysis in Stata and have
detected that the csv data files from ODK Briefcase show variable labels
(string characters) in the data rather than numeric codes. For example, on
gender, “male” appears in the data instead of “1”, and “female” rather than
“2”. I just realized that the data appear this way because of how I named
question listings under the “choices” sheet in my ODK program file. I
thought numbers were not acceptable under the ‘name’ column on the “choices”
sheet but I have noticed that it is possible to use numbers and, for some
reason, I missed that part when I was designing my form.

Because I am now faced with raw data that is showing string entries instead
of numerical codes, I wanted to check with you all to see if someone knows
how to solve this issue in ODK (or other faster way to switch the data
entries from string to numeric codes). The obvious option to me now is
recoding the entries to numerical values either manually (which is a lot of
work prone to errors) or try to import into Stata and recode which is also a
cumbersome fix.

Please let me know if this issue could be resolved in a simpler way.

Thanks,

Mike

--

Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to the Google Groups
"ODK Community" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yaw,
Thanks for your suggestions.
I will check with the odkmeta group.

Regards,

Mike

··· On Thu, Oct 1, 2015 at 9:22 PM, Yaw Anokwa wrote:

Mike,

I'd write a Python script to process the CSV to do the find/replace.
I'm not a Stata expert, but there has to be away to do a find/replace
of values. https://github.com/matthew-white/odkmeta might be able to
help.

Yaw

Need ODK consultants? http://nafundi.com provides form design, server
setup, professional support, and software development for ODK.

On Fri, Oct 2, 2015 at 3:23 AM, Mike Nyarko cashsterling2@gmail.com wrote:

Dear ODK Community,

I have successfully collected survey data using the ODK technology
thanks to
information shared on the ODK website and the ODK Community.

However, I have started preparing my data set for analysis in Stata and
have
detected that the csv data files from ODK Briefcase show variable labels
(string characters) in the data rather than numeric codes. For example,
on
gender, “male” appears in the data instead of “1”, and “female” rather
than
“2”. I just realized that the data appear this way because of how I
named
question listings under the “choices” sheet in my ODK program file. I
thought numbers were not acceptable under the ‘name’ column on the
“choices”
sheet but I have noticed that it is possible to use numbers and, for some
reason, I missed that part when I was designing my form.

Because I am now faced with raw data that is showing string entries
instead
of numerical codes, I wanted to check with you all to see if someone
knows
how to solve this issue in ODK (or other faster way to switch the data
entries from string to numeric codes). The obvious option to me now is
recoding the entries to numerical values either manually (which is a lot
of
work prone to errors) or try to import into Stata and recode which is
also a
cumbersome fix.

Please let me know if this issue could be resolved in a simpler way.

Thanks,

Mike

--

Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to the Google Groups
"ODK Community" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to a topic in the
Google Groups "ODK Community" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/opendatakit/eas2DPf2LxA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
opendatakit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Mike,

The easiest option may be for you too look into the Stata 'encode' command.

~lb

··· On Friday, October 2, 2015 at 6:47:50 PM UTC+6, Mike Nyarko wrote: > > Yaw, > Thanks for your suggestions. > I will check with the odkmeta group. > > Regards, > > Mike > > > On Thu, Oct 1, 2015 at 9:22 PM, Yaw Anokwa <yan...@nafundi.com > wrote: > >> Mike, >> >> I'd write a Python script to process the CSV to do the find/replace. >> I'm not a Stata expert, but there has to be away to do a find/replace >> of values. https://github.com/matthew-white/odkmeta might be able to >> help. >> >> Yaw >> -- >> Need ODK consultants? http://nafundi.com provides form design, server >> setup, professional support, and software development for ODK. >> >> On Fri, Oct 2, 2015 at 3:23 AM, Mike Nyarko <cashst...@gmail.com > wrote: >> > Dear ODK Community, >> > >> > I have successfully collected survey data using the ODK technology >> thanks to >> > information shared on the ODK website and the ODK Community. >> > >> > However, I have started preparing my data set for analysis in Stata and >> have >> > detected that the csv data files from ODK Briefcase show variable labels >> > (string characters) in the data rather than numeric codes. For example, >> on >> > gender, “male” appears in the data instead of “1”, and “female” rather >> than >> > “2”. I just realized that the data appear this way because of how I >> named >> > question listings under the “choices” sheet in my ODK program file. I >> > thought numbers were not acceptable under the ‘name’ column on the >> “choices” >> > sheet but I have noticed that it is possible to use numbers and, for >> some >> > reason, I missed that part when I was designing my form. >> > >> > Because I am now faced with raw data that is showing string entries >> instead >> > of numerical codes, I wanted to check with you all to see if someone >> knows >> > how to solve this issue in ODK (or other faster way to switch the data >> > entries from string to numeric codes). The obvious option to me now is >> > recoding the entries to numerical values either manually (which is a >> lot of >> > work prone to errors) or try to import into Stata and recode which is >> also a >> > cumbersome fix. >> > >> > Please let me know if this issue could be resolved in a simpler way. >> > >> > Thanks, >> > >> > Mike >> > >> > -- >> > -- >> > Post: opend...@googlegroups.com >> > Unsubscribe: opendatakit...@googlegroups.com >> > Options: http://groups.google.com/group/opendatakit?hl=en >> > >> > --- >> > You received this message because you are subscribed to the Google >> Groups >> > "ODK Community" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an >> > email to opendatakit...@googlegroups.com . >> > For more options, visit https://groups.google.com/d/optout. >> >> -- >> -- >> Post: opend...@googlegroups.com >> Unsubscribe: opendatakit...@googlegroups.com >> Options: http://groups.google.com/group/opendatakit?hl=en >> >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "ODK Community" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/opendatakit/eas2DPf2LxA/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> opendatakit...@googlegroups.com . >> For more options, visit https://groups.google.com/d/optout. >> > >

Thanks for your suggestion Lloyd. I think encode and other stata commands
for generating new variables are my real options now.

Mike

··· On Fri, Oct 2, 2015 at 10:09 PM, Lloyd Owen Banwart <lloyd.banwart@gmail.com wrote:

Hi Mike,

The easiest option may be for you too look into the Stata 'encode'
command.

~lb

On Friday, October 2, 2015 at 6:47:50 PM UTC+6, Mike Nyarko wrote:

Yaw,
Thanks for your suggestions.
I will check with the odkmeta group.

Regards,

Mike

On Thu, Oct 1, 2015 at 9:22 PM, Yaw Anokwa yan...@nafundi.com wrote:

Mike,

I'd write a Python script to process the CSV to do the find/replace.
I'm not a Stata expert, but there has to be away to do a find/replace
of values. https://github.com/matthew-white/odkmeta might be able to
help.

Yaw

Need ODK consultants? http://nafundi.com provides form design, server
setup, professional support, and software development for ODK.

On Fri, Oct 2, 2015 at 3:23 AM, Mike Nyarko cashst...@gmail.com wrote:

Dear ODK Community,

I have successfully collected survey data using the ODK technology
thanks to
information shared on the ODK website and the ODK Community.

However, I have started preparing my data set for analysis in Stata
and have
detected that the csv data files from ODK Briefcase show variable
labels
(string characters) in the data rather than numeric codes. For
example, on
gender, “male” appears in the data instead of “1”, and “female” rather
than
“2”. I just realized that the data appear this way because of how I
named
question listings under the “choices” sheet in my ODK program file. I
thought numbers were not acceptable under the ‘name’ column on the
“choices”
sheet but I have noticed that it is possible to use numbers and, for
some
reason, I missed that part when I was designing my form.

Because I am now faced with raw data that is showing string entries
instead
of numerical codes, I wanted to check with you all to see if someone
knows
how to solve this issue in ODK (or other faster way to switch the data
entries from string to numeric codes). The obvious option to me now is
recoding the entries to numerical values either manually (which is a
lot of
work prone to errors) or try to import into Stata and recode which is
also a
cumbersome fix.

Please let me know if this issue could be resolved in a simpler way.

Thanks,

Mike

--

Post: opend...@googlegroups.com
Unsubscribe: opendatakit...@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to the Google
Groups
"ODK Community" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to opendatakit...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Post: opend...@googlegroups.com
Unsubscribe: opendatakit...@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to a topic in the
Google Groups "ODK Community" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/opendatakit/eas2DPf2LxA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
opendatakit...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
--
Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to a topic in the
Google Groups "ODK Community" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/opendatakit/eas2DPf2LxA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
opendatakit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

As noted in the documentation, the choice values are string values, hence
why they are surrounded by double-quotes.

··· On Mon, Oct 5, 2015 at 8:33 AM, Mike Nyarko wrote:

Thanks for your suggestion Lloyd. I think encode and other stata commands
for generating new variables are my real options now.

Mike

On Fri, Oct 2, 2015 at 10:09 PM, Lloyd Owen Banwart < lloyd.banwart@gmail.com> wrote:

Hi Mike,

The easiest option may be for you too look into the Stata 'encode'
command.

~lb

On Friday, October 2, 2015 at 6:47:50 PM UTC+6, Mike Nyarko wrote:

Yaw,
Thanks for your suggestions.
I will check with the odkmeta group.

Regards,

Mike

On Thu, Oct 1, 2015 at 9:22 PM, Yaw Anokwa yan...@nafundi.com wrote:

Mike,

I'd write a Python script to process the CSV to do the find/replace.
I'm not a Stata expert, but there has to be away to do a find/replace
of values. https://github.com/matthew-white/odkmeta might be able to
help.

Yaw

Need ODK consultants? http://nafundi.com provides form design, server
setup, professional support, and software development for ODK.

On Fri, Oct 2, 2015 at 3:23 AM, Mike Nyarko cashst...@gmail.com wrote:

Dear ODK Community,

I have successfully collected survey data using the ODK technology
thanks to
information shared on the ODK website and the ODK Community.

However, I have started preparing my data set for analysis in Stata
and have
detected that the csv data files from ODK Briefcase show variable
labels
(string characters) in the data rather than numeric codes. For
example, on
gender, “male” appears in the data instead of “1”, and “female”
rather than
“2”. I just realized that the data appear this way because of how I
named
question listings under the “choices” sheet in my ODK program file. I
thought numbers were not acceptable under the ‘name’ column on the
“choices”
sheet but I have noticed that it is possible to use numbers and, for
some
reason, I missed that part when I was designing my form.

Because I am now faced with raw data that is showing string entries
instead
of numerical codes, I wanted to check with you all to see if someone
knows
how to solve this issue in ODK (or other faster way to switch the data
entries from string to numeric codes). The obvious option to me now is
recoding the entries to numerical values either manually (which is a
lot of
work prone to errors) or try to import into Stata and recode which is
also a
cumbersome fix.

Please let me know if this issue could be resolved in a simpler way.

Thanks,

Mike

--

Post: opend...@googlegroups.com
Unsubscribe: opendatakit...@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to the Google
Groups
"ODK Community" group.
To unsubscribe from this group and stop receiving emails from it,
send an
email to opendatakit...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Post: opend...@googlegroups.com
Unsubscribe: opendatakit...@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to a topic in the
Google Groups "ODK Community" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/opendatakit/eas2DPf2LxA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
opendatakit...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
--
Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to a topic in the
Google Groups "ODK Community" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/opendatakit/eas2DPf2LxA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
opendatakit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Post: opendatakit@googlegroups.com
Unsubscribe: opendatakit+unsubscribe@googlegroups.com
Options: http://groups.google.com/group/opendatakit?hl=en


You received this message because you are subscribed to the Google Groups
"ODK Community" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opendatakit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Mitch Sundt
Software Engineer
University of Washington
mitchellsundt@gmail.com