Here is a packaged xlsformstatalabel with a Stata command (xlsformlabel) that reads your ODK XLSForm and generates variable and value labels based on the form. You can directly include this in your data management code
Usage:
import delimited "baseline_data.csv", clear
xlsformlabel using "baseline_form.xlsx", formname(baseline) case(lower) capture do
Key features: variable & value labels from XLSForm, select_one and select_multiple support, case control (lower/upper/preserve), group prefix, and note variable removal.
Installation:
net install xlsformstatalabel, from("https://raw.githubusercontent.com/samwel2000/xlsformstatalabel/master/")
A note on scope: The package labels variables and values exactly as defined in your XLSForm, it's up to the user to ensure variable names are Stata-compliant, repeat groups are merged, and any other data management is handled on their side.
A colleague ran into a challenge: their form had non-compliant variable names (exceeding Stata's 32-character limit), and the labelling failed. For that case, I wrote additional code to rename variables to compliant names, update the XLSForm accordingly, and then run xlsformlabel. Sharing it here in case others hit the same issue:
import delimited "$path\Raw data\DQA_Early_Learning_v2.csv", case(preserve) bindquote(strict) clear
// Clean up long variable names to Stata-compliant 32-char varnames
qui desc v*, varlist
local label_vars ""
local new_label_vars ""
foreach var of varlist `r(varlist)' {
local lbl : variable label `var'
local label_vars "`label_vars' `lbl'"
local newlbl = subinstr("`lbl'", "_", "", .)
local newlbl = substr("`newlbl'", 1, 32)
local new_label_vars "`new_label_vars' `newlbl'"
}
rename (`r(varlist)') (`new_label_vars')
tempfile dataset
save `dataset', replace
// Update the XLSForm to reflect new names
import excel "$path\Tools\DQA_Early_Learning_v2.xlsx", ///
sheet("survey") case(lower) firstrow clear
local n : word count `label_vars'
forvalues i = 1/`n' {
local old : word `i' of `label_vars'
local new : word `i' of `new_label_vars'
replace name = "`new'" if name == "`old'"
}
replace labelenglishen = subinstr(labelenglishen, `"""', "", .)
export excel "$path\Tools\DQA_Early_Learning_v2.xlsx", ///
sheet("survey", replace) firstrow(variables)
use `dataset', clear
// Install and apply labels
which xlsformlabel
if _rc {
net install xlsformstatalabel, from("https://raw.githubusercontent.com/samwel2000/xlsformstatalabel/master/")
}
xlsformlabel using "$path\Tools\DQA_Early_Learning_v2.xlsx", ///
savepath("$path\Dofiles") formname(Early_Learning) ///
labelcolumn(labelenglishen) capture noteremove groupremove do
More details on the GitHub repository. https://github.com/samwel2000/xlsformstatalabel