Jim VanPeursem TSC Application - 2019-01-10

Name
Jim VanPeursem (@jvp)

Organization
in transition from Africa to USA

What contributions (e.g., issue triage, tech support, documentation, bug fixes) have you made to the ODK community?
Just getting started in ODK, so essentially none to date. Was based in Africa, using a competing commercial solution (farmforce) to gather data on our agriculture interventions, employing under educated youth using smart phones. This gave me insights into some of the unique challenges of community based data gathering in Eastern Africa.

How do you believe your contributions have benefited ODK?

What do you believe the top priorities for ODK are?
To me, a data collection tool such as ODK should optimize: 1) Robust and narrowly focused input types. One of the challenges of community based data collection is the data noise and bias that comes from the people collecting the data. The more focused the data entry types, and the more clarity in the correct way to record each data field, helps reduce this source of this type of error, 2) Efficient sync so that remote data collection and subsequent sync can be as low cost and efficient (time and battery) as possible, 3) human consumption of the collected data. Simple export is great, but one way to enhance the utility of the platform would be to offer the ability to join data sets (e.g. combine a prior community demographic set with a health data set to produce analysis on health by age, gender, etc.), simple rule-based ETL's, etc. to make consumption and analysis of multiple data sets easier and more powerful.

How will you help the ODK community accomplish those priorities?
In my career, I have served in numerous technical leadership positions. I have also had significant standards leadership experience, working with many industry leaders to create and shape the initial Java-for-mobile platform (J2ME) in its formative years. More recently I spent 2 years in Africa working for a commercial company, introducing automation and remote data collection, and integrating our farm management system with our banking partner's core banking system, offering truly end-to-end loan origination and payoff automation. I am hoping that the diversity of my experience can help toward the continuing evolution of ODK to become even more powerful for its users.

How many hours a week can you commit to participating on the TSC?
4+ during our transition back to the States. More once settled.

What other mobile data collection projects, social good projects, or open source projects are you involved with?
Most recently, I used Farmforce as a farm management system in my role as CTO of GAFCo, the Great African Food Company (https://www.greatafricanfood.com/). GAFCo was a commercial entity, partnering with World Vision and Vision Fund (microfinance) to connect poor farmers to the global food market, in an effort to improve their incomes. Earlier in our formation, we used iFormBuilder for our initial data collection platform. Earlier in my career, I was the chair of Motorola's Open Source Review Board, the body who helped shepherd proprietary source code toward external open source publication, as well as the inbound utilization of open source in our commercial products, within the license obligations of each project.

Please share any links to public resources (e.g., resume, blog, Github) that help support your application.
https://www.linkedin.com/in/jimvanpeursem/
Unfortunately my blog needed to be shut down toward the end of our tenure in Tanzania as a result of new laws cracking down on free speech.

Hi @jvp,
couple of questions for you:

what kind of extra checks would you implement apart from the existing ones? constraints and relevancies are already in place and could contain also quite complex functions or regex.

did you experience problems on syncing with Collect?

ETL happens after data collection occurs. There are some free open source ETL software. I personally use – Pentaho Data Integration(PDI) - Kettle. I connect it to my MySql database and I proceed with the transformation I need to get the metrics important for checking out the quality of my dataset.
The kind of transformation anyone need and the way to present the data is not unique, so how would you choose a "standard" way for the users?
Is it not better to propose an export that then can be consumed from many different software to get the result needed?

1 Like

Hi @jvp! Thanks for your application.

I am curious what brings you to ODK at this point in your career? All you've told is that you're "Just getting started". What are your plans for using or contributing to ODK besides joining the TSC?

Echoing @aurdipas above, I'm curious about "Robust and narrowly focused input types."
I would think some of this would be good survey design and proper use of the functionalities ODK provides. The best knives in the world still need a good chef to wield them. Do you have ideas for technical improvements to ODK that could help meet this priority?

Wow, sorry for the late reply. We went on a trip to find a new home and I left my normal laptop back home. Sorry about that.

The primary reason for raising this point is not a critique on ODK as it stands today, as much as it is the desire for a continued focus on improving the accuracy of data collected. ODK already has powerful form logic and constraints, but the more a form designer can do to minimize in-field data errors, the better. When analyzing data, the task of filtering outliers and normalizing data can be challenging. How do you know if an outlier is a valid outlier or an invalid data point? Being vigilant about building in methods to (more easily) constrain data input is a worthy goal.

No, however based on my experience with two other data collection frameworks, we had significant problems in the field due to the inefficiency (and lack of error resilience) in their sync protocol.

True. ETL is definitely after the data collection occurs, and is a back end task. One of my first impressions of ODK, however, is that there's a fair amount of setup involved and isn't for the faint of heart. Easing the setup and data analysis might make it more popular for newcomers and the non-techie crows. E.g. some of the simple analysis steps that clients have asked me for are simply replacing the integer values returned in a pick-list, to a list of text labels. This first-level of simple ETL could make the ODK easier for users on the simpler end of the spectrum.

As an alternative, offer suggestions and pre-integration with some of the existing ETL frameworks out there.

This is part of the TSC questionnaire that's missing. "Why do you care" or "What experience do you have". After a career in tech development, I switched direction in 2016 and decided to start giving back. I spent two years in Africa volunteering as CTO for a medium-sized startup (~180 people). As part of this, we used two different (competing) data collection frameworks, utilizing kids in the communities to collect data on our efforts and the people we were working with.

When the end of that project was nearing, I was looking for ways to leverage my experience and continue to contribute. I contacted Yaw about ODK and decided to throw my hat in the ring. I admit I'm very new to ODK, and probably don't have enough direct experience yet, but that's just how the timing worked out. I plan to also start contributing code-wise when we get a bit more settled.

I am 100% comfortable with either winning or losing this ballot. My ego is not tied to this at all. I'm just looking for ways to help out, and am flexible, regardless.

True, yet what if you don't have a chef on your team, or anyone with any culinary training? In Tanzania, our biggest challenge was the education of the staff, especially those out in the bush. It is rare to find anyone who finishes their secondary education. The only way we succeeded was by: 1) keeping the surveys as constrained as possible to reduce errors, and 2) lots of hands on training and practice again and again and again. Even with that, there was still a significant effort required in scrubbing the data before it could be used.

1 Like