Xlsformstatalabel – ODK variable and value labelling in Stata

Here is a packaged xlsformstatalabel with a Stata command (xlsformlabel) that reads your ODK XLSForm and generates variable and value labels based on the form. You can directly include this in your data management code

Usage:

import delimited "baseline_data.csv", clear
xlsformlabel using "baseline_form.xlsx", formname(baseline) case(lower) capture do

Key features: variable & value labels from XLSForm, select_one and select_multiple support, case control (lower/upper/preserve), group prefix, and note variable removal.

Installation:

net install xlsformstatalabel, from("https://raw.githubusercontent.com/samwel2000/xlsformstatalabel/master/")

A note on scope: The package labels variables and values exactly as defined in your XLSForm, it's up to the user to ensure variable names are Stata-compliant, repeat groups are merged, and any other data management is handled on their side.

A colleague ran into a challenge: their form had non-compliant variable names (exceeding Stata's 32-character limit), and the labelling failed. For that case, I wrote additional code to rename variables to compliant names, update the XLSForm accordingly, and then run xlsformlabel. Sharing it here in case others hit the same issue:

import delimited "$path\Raw data\DQA_Early_Learning_v2.csv", case(preserve) bindquote(strict) clear

// Clean up long variable names to Stata-compliant 32-char varnames
qui desc v*, varlist
local label_vars ""
local new_label_vars ""
foreach var of varlist `r(varlist)' {
    local lbl : variable label `var'
    local label_vars "`label_vars' `lbl'"
    local newlbl = subinstr("`lbl'", "_", "", .)
    local newlbl = substr("`newlbl'", 1, 32)
    local new_label_vars "`new_label_vars' `newlbl'"
}
rename (`r(varlist)') (`new_label_vars')
tempfile dataset
save `dataset', replace

// Update the XLSForm to reflect new names
import excel "$path\Tools\DQA_Early_Learning_v2.xlsx", ///
    sheet("survey") case(lower) firstrow clear
local n : word count `label_vars'
forvalues i = 1/`n' {
    local old : word `i' of `label_vars'
    local new : word `i' of `new_label_vars'
    replace name = "`new'" if name == "`old'"
}
replace labelenglishen = subinstr(labelenglishen, `"""', "", .)
export excel "$path\Tools\DQA_Early_Learning_v2.xlsx", ///
    sheet("survey", replace) firstrow(variables)

use `dataset', clear

// Install and apply labels
which xlsformlabel
if _rc {
    net install xlsformstatalabel, from("https://raw.githubusercontent.com/samwel2000/xlsformstatalabel/master/")
}
xlsformlabel using "$path\Tools\DQA_Early_Learning_v2.xlsx", ///
    savepath("$path\Dofiles") formname(Early_Learning) ///
    labelcolumn(labelenglishen) capture noteremove groupremove do

More details on the GitHub repository. https://github.com/samwel2000/xlsformstatalabel

2 Likes