Basic Data Analysis with R (Scripts, Markdown & Dashboard)

During the @TAB meeting on 2021-05-12 we had some discussions about how to perform basic analysis in R, a free software for statistical analysis, visualisation and report generation.

R is incredibly powerful and new users should take a look at these resources to get a more in depth view of how to get started

You'll need to install a copy of R and the GUI Rstudio in order to use the examples below.

You'll also need this toy ODK data set in CSV format.
data.csv (570.1 KB)

What people want to do with their ODK data is pretty varied, but in the first instance most people will want some summaries and a couple of bar charts.

Here I provide a basic example of how to do this in R.

There are two scripts that you need to download in this zip file

R_Analysis_Scripts.zip (2.1 KB)

The easiest way to work these is to open RStudio, then click the file menu and select New Project. Save the project in a new folder, then open that folder. Paste the data set and the scripts. you downloaded above in to this folder, then click the .Rproj file to open the project.

In the Rstudio File menu you can then use open file to load the scripts.

To run them you simply press the source button.

Basic Analysis

For most people, the Basic.Data.Summaries.in.R file provides a useful start point. This script will

  • Open the data set
  • Select some useful variables
  • Save a summary table to the R project folder
  • Save another summary, split by a user defined variable
  • Save a barchart

Armed with a bit of basic knowledge from the training resources above and you'll be up and running with your own analysis pretty quickly.

R Markdown

Many users want to share a summary report to their stakeholders. R Markdown is a nice way to make reports in html or pdf format. You can also put these online as a form of basic dashboard.

The Basic.Data.Summaries.in.Rmd script does the same things as the basic analysis script, but makes a PDF report. You'll see that the key difference here is that you can window dress the documentation with text, show the code that was used to create the tables and charts and other things.

Simply open the .Rmd file in Rstudio and click Knit to create the PDF in your R project folder.

R Markdown is very versatile and can export a whole bunch of different formats.

Dashboards

There's a lot of ways to make interactive dashboards in R. Some are very sophisticated, but are also a bit harder to use. Perhaps the simplest way is to use R Markdown to create an html page or site.

Open the Basic.Data.Summaries.in.Rmd file and change line 3
from
output: pdf_document

to

output: rmdformats::readthedown

This will create an attractive html report that can be served from pretty much any website (I generally use github.io sites for this, see example here).

Hopefully this post will provide a start point for people who have so far used excel for their analysis. Using a professional package for statistical analysis can seem daunting and does come with a pretty steep learning curve, but is worth it in the long run.

Example output files
Example_Outputs.zip (1.0 MB)

11 Likes

Great!

Will check this out.

Paul

Great resources, thanks for posting these!

To get data directly from ODK Central into R, you can use the R package ruODK.

Installing R, RStudio (or any other IDE), ruODK and its dependencies can be a hassle on corpo-bricked Windows machines. urODK, the companion package to ruODK, provides a pre-built RStudio Server at Binder.

ruODK provides a template RMarkdown workbook with the basic "download and parse my data" workflow:

rmarkdown::draft("my_example.Rmd", "odata", package="ruODK")

The workbook contains instructions to configure ODK Central credentials, to download and parse your own data, and some initial insight and visualisations as a starting point for your own analysis.

In addition to the resources posted by @chrissyhroberts, R for applied epidemiology and public health looks useful.

Spatial data needs some special (spatial?) steps to visualise. Follow the worked examples of the ruODK vignette "Spatial data" to go from the three ODK types (geopoint, geotrace, geoshape) to native spatial objects in R.

If you write a paper or publication using ruODK, you can find the citation for ruODK here and for ODK in this thread.

6 Likes

It means if you don't know how to use R you can not have a dashboard from ODK.

Hi @Dushime

This is just one option of many for creating dashboards.

You could also use PowerBi, HTML, Python/Panda or a very basic excel, word or PDF document on a github page.

I prefer R because it is open source, relatively easy to learn and has integrated pipelines for turning an analysis in to an html page as described above

3 Likes

Good,
I am using excel and the challenge that I am facing is that instead of getting labels I am getting label valued. For example. if variables sex is coded like 1=Female and 2=Male Excel displayed the data with 1 and 2 instead of displaying Female and Male.
Do you know how to correct that in excel. Since I ama using excel

You should probably make that change in your ODK form. Why save data in the wrong format in the first place? My opinion is that you should always save descriptive values in your data frames, so biological sex in your example should be MALE or FEMALE.

In Excel, I think you are stuck with using find and replace, which is why you should never do analysis in Excel. Too much scope for human error. Investing in learning how to use a scripted language for data analysis will benefit you in so many ways.

This excellent open source book is a great way to get started with R. The learning curve is quite steep, but I promise you it is worth it if you are planning to spend much of your life working with data.

https://r4ds.had.co.nz/

4 Likes

3 posts were split to a new topic: repvisforODK: Quickly visualize your ODK data in R