ODK to collect species and habitats localities, as pressure and threats to ecosystems

Primary Topic / Field of Application

Ecology, Nature conservation

Context

Who we are

logo_sicen_1
The "Conservatoire d'espaces naturels du Languedoc-Roussillon" is a nature conservancy NGO based in Montpellier in the South of France.
Our team consists of around 30 people contains ecologists, naturalists, agro-ecology specialists, project managers, administrative staff and GIS administrators, working in 7 sites in the region and managing approximately 12,000ha (about 30,000 acres). In September we will merge with our neighbors from Midi-Pyrénées to fit th boundaries of our new region and become together the Conservatoire d'espaces naturels d'Occitanie, employing 60 people.
At the national level there are 30 Conservatoires d'espaces naturels employing around 1000 people, federated under the "Fédéreation des C.E.N." and its dedicated team.

Traditionally, naturalists and ecologists write their field notes in a paper notebook, and then manually transcribe their data into a computer when they are back in the office, via a custom web tool, spreadsheet, or GIS file. With spreadsheets and GIS files, a further operation is then required to consolidate the data into our central GIS database.

As a consequence, it could take a month before the data was available to other colleagues for analysis. Approximately 10 years ago we developed a dedicated web interface for data entry, which saved considerable time by removing the need for action by a GIS engineer, and made data available as soon as it was entered.
We also had some prior experience (from 2007) around collecting data with a PDA and Arcpad, which was particularly interesting because it effectively made this data available in real time.

Since 2006, our Geographical Information System (GIS) located in Montpellier, has been organized around a central PostgreSQL/PostGIS database, connected to several open-source tools: QGIS, Lizmap, JasperStudio, Redash, and ODK Aggregate.
ODK and redash are the most recent tools we added to our IT infrastructure.

At the regional level, 25 colleagues and direct partners are now using ODK against our database. @nathalie_H and I are presently the principal form designers on the team.

This was our presentation at FOSS4G-fr 2018:

16mai_Cauchy_Bossaert-CENLR_0.pdf (3.0 MB)
(in it you can see our colleagues out in the field and the web tools we use with ODK).

How we met ODK

In 2015, a colleague from another region (@Remy_CLEMENT) , showed us their use of geoodk to help field technicians to report their field work (grazing, tree cutting...).
A year later we discuss creating a dedicated form for our common naturalist database. We spend two days in Lyon to create the form and generate SQL queries to pull data directly from ODK Aggregate into our own database.
Two weeks later, 2 colleagues became beta testers and subsequently adopted our tool. They estimated they saved 5 days per person with the form ; I spent five days to create the tool.

In 2016 @Remy_CLEMENT and I gave our first course to colleagues from other regions. This blueprint has evolved into standard course offering, which we have now conducted four times to approximately 25 people from National Parks, CEN, botanical conservatories, regional parks, and various NGOs. The course is divided into two parts: 1.5 days on form design, and 1.5 days on installing ODK Aggregate and creating SQL views and triggers to interact with other databases.

Why We Use ODK for Mobile Data Collection

Our field season is quite dense, with long days and little time to spend in front of a computer. So our computer time was typically postponed to the end of the field season (and not our most fun time of the year...). ODK gave us a good way too transform this previously unrewarding computer time into more actual days in the field, and more time for interesting data analysis and report writing.

Our days in the field are long, and conducted in poorly connected areas. So our mobile data tool had to allow unconnected work and provide a stable, trusted storage system. The tool also had to be as easy to use as possible and must constraint user input to ensure the acquired data was as reliable as possible (eg constraint vocabularies and defined input types). Our mobile devices themselves must provide for good field autonomy, and be protected against water and dust.

Form users

Our ODK users are mainly colleagues and specialist in a naturalist domain - plants, animals, mushrooms, naturals habitats - but in some cases or studies they could be farmers, wine growers, etc.

Form logic

The form described in this Showcase is our main form, initially created in 2016. The first version allowed users to collect basic information about species and habitats. Each subsequent revision - in 2017, 2019 and 2020 - improved upon it by adding more adaptive questions and choice lists according to the observations. This year we added three new features (described here) but the 2019 version probably represents our most improved version, and is the result of Jean Baïsez university work.

Our form is used as both a note book while collecting species and habitat data, and it can also record threats or pressure on nature, and record management advice. All these data types are geo-located using ODK geo widgets and and may be additionally documented with pictures taken from the phone.
Here is a logical schema of the form.

Tips and tricks

within the form...

1. Choices lists from big external csv file

Our reference taxonomic list from the National Natural History Museum
This files contains more than 400 000 taxa for animals, plants, mushrooms, etc.
So we have to find a way to easily find the species we want in this big lists.
We use the search() function in combination with startswith.

type name label hint constraint calculation required appearance default
text recherche_espece_animale Nom de l'espèce animale : au moins 3 lettres string-length(${recherche_espece_animale})>2 yes
select_one list_espece lb_nom_animalia Sélectionnez l'espèce : yes quick search('espece_animale', 'startswith', 'lb_nom_key', ${recherche_espece_animale})
calculate cd_nom_animalia pulldata('espece_animale','cd_nom_key','lb_cd_nom_key',${lb_nom_animalia})

2. Choices list styling

The form highlights the official species name in the species list, and shows its synonym in another format.

3. Personalized settings and metadata

The form can be easily customized or personalized; for example, I only want to use the form for collecting plant locations (lat+long). Each value from 0 to 9 represents a different setting: 1 to 6 for the thematic sub-forms, and 7 to 9 for geo-location types.

  1. animals
  2. plants
  3. mushrooms
  4. natural habitats
  5. threat / pressure
  6. general observation
  7. point
  8. line
  9. polygon

Such personal settings do not already exist in ODK. So we found a workaround by using the phone number from collect's personal settings.
In case the phone number from Collect's personal settings is incomplete we initialize it it with an "all options" combination "0123456789":

type name calculation
phonenumber phonenumber
calculate custom_setting coalesce(${phonenumber}, ‘0123456789’)

To implement this customization, we use a test in the choice filter or relevant column on the associated group or field. For example, the choices presented by the geo-location method select_one is configured from settings with this test :

contains(${custom_setting},filter)

and the choice sheet looks like this :

list_name name label filter
metode_geo point point 7
metode_geo long_lat coordinates input 7
metode_geo line line 8
metode_geo polygon polygon 9

4. Form adapts to the observation type

The form adapts its input fields according to the type of the observation; specifically, plant descriptions differ from animal descriptions. For example, it is not relevant to ask for the behavior (French: 'comportement') of a plant, but it is for animals. Drilling down deeper, behaviors can vary between birds, spiders, amphibians...

type name label relevant choice_filter
select_one comportement comportement Comportement ${type_observation} = 'Animalia' and ${groupe} != '' (filter = ${groupe}) or (filter =1)

where the "choices" are:

list_name name label filter
comportement 18-Nid vu avec un adulte couvant 18-Nid vu avec un adulte couvant Oiseaux
comportement construction toile construction toile Arachnides

The ${groupe} field is calculated based on species selection, obtained from the csv media file :

type name calculation
calculate group concat(pulldata('espece_animale', 'groupe', 'cd_nom_key', ${cd_nom_animalia}) , pulldata('espece_plante', 'groupe', 'cd_nom_key', ${cd_nom_plantae}),pulldata('espece_champi', 'groupe', 'cd_nom_key', ${cd_nom_fungi}) )
Here we check the 3 CSV species lists to find the taxonomic group. It could certainly be improved: instead of reading the three csv files, we should only check the relevant one corresponding to the one selected in ${type_observation}

on the database side

5. External tool integration

Uploaded pictures are automatically saved to files on our server (database task (french page to be translate)) and shown in other tools, such as QGIS and our custom-written web tool:

Note: a web tool account is automatically created if ODK user if they were not previously known to the system

Solved issues and unresolved "problems"

Map to show previous location where form was filled

ODK Collect can now show on the map the previous instances of a form with a fist question which is a geopoint. Although this feature is very interesting, it is not really helpful to our generalist form, but may be useful for others others like phytosociological surveys. In the future we will keep an eye on Geo widget evolution, sure they will help us a lot.
By the way we could use a start_geopint at the beginning of the form to show prvious instances on a map and help user to easilly find their data.

Add a summary of observed species

At the end of each location inventory, it would be very useful to show the list of all species observed in that place by the user. This is currently technically possible, but our colleagues only just ask for this feature very recently.

Rename each loop with the species name instead of rank

We have been asked recently to add this feature to help with form navigation. We need to investigate what is possible in group naming with a variable, and will keep an eye on ODK form navigation developments.

Relevant forum topics

About "autocomplete" search in select_one for species :

About custom / personalized settings :

Add more metadata field -> Custom form metadata - #5 by tomsmyth

About form navigation:

About choices list with long labels and html styling:

Screenshots

User details (name / email)

Those values are obtained from ODK Collect's metadata. If not available or not set, the user can fill them in or change the default values. User may also add additional people who were along with them in the field

Geo/location method choice

This is the beginning question for the current observation. This list of choices can be personalized, as described previously.


We have since added the option to manually enter latitude and longitude decimal values, for people who may be filling in the form at home from written notes.

Geopoint example


In the future we would like to be able to move the map under the central traget to give the user more accuracy.

Observation type

What kind of observation do we want to fill for this first location ? This list can be personalized by the user, as described previously.

Auto-completion and list styling

Our taxonomy uses synonyms to name taxa. One of these synonyms is the valid name for the taxa at a moment. Knowledge of the taxonomy evolves over time so taxa can be split or merged. Here we use html styling to show both the valid name and synonym (if applicable) in the selection list. The user may type 3 letters to choose a species or genus in the list.

Observation details

Here is an example for an animal for each age group: adult, juvenile, undefined. The user must indicate the gender of the observed animal. If they select "male" the form will display a question to enter the number of adult males. At the end of the form we calculate and show the total number of animals seen in this observation.

Personal settings

Here is the classical screen showing user's settings in Collect, containing the phone number and other options. As we saw before, the phone number is used to store user's personal settings (1 to 6 for the thematic sub-forms, and 7 to 9 for geo-location types

Data processing tools

We integrate ODK a lot into our information system. We have used PostgreSQL and PostGIS since 2006 and retrieve data directly from Aggregate's PostgreSQL database server, using foreign data wrappers, views, and cron task to save new observations into our historical database.
Our main database connects to Aggregate's via a PostgreSQL FDW. Each form table is created as an foreign table in the central database. We then create a view to format data as needed and every X minutes we integrate only new data (data for which its "_URI" is not yet in our main database).

This database is iused by several tools such as QGIS to create maps, and redash and jasperstudio to generate web-based or static PDF dashboards

This is a redash screenshot showing how ODK (green line) is replacing web input (blue line) for data entry since our initial adoption of ODK 2015:

Resources (link to xls and multimedia files)

Here are the files (media files have been truncate) to try out this form: sicen_2020.zip (2.3 MB)

These are PostgreSQL functions to transform binary data (photo) stored in PostgreSQL into files: https://framagit.org/mathieubossaert/sql_divers/snippets/3620

Perspectives

on the data collection side...

we now cover 80% of our needs. Once an issue in javarosa is fixed, it will achieve 90% of our needs!

on the server side

Using the new ODK Central instead of Aggregate is going to be a big step forward:

Conclusion

ODK as became a core tool in our information system. The three most important factor are:

  1. it is easy to integrate into our existing database environment, due to having PostgreSQL backend,
  2. we don't need to expend time, money and effort on developing a custom mobile data acquisition app,
  3. we can instead focus on translating our existing field methods into XLSForm; minimal informatics skills are needed to create our forms.

Acknowledgments

Many thanks to @Xiphware and @LN for their advice.
Thanks to @Xiphware for the time devoted to proofreading this showcase and for the corrections and suggestions made.
Thank you to all ODK contributors for the tools you make and the quality of the discussions over the forum.

16 Likes

Thanks for this kind of stuff.

1 Like

Thanks for the detailed write up! I'm working on the web portal for Threatened Species and Communities in Western Australia, and the stakeholders are looking at electronic data capture. This post provides a very useful case study for our stakeholders.

1 Like

Good evening to all and best wishes for 2021 !

The difference between ODK and years is that you will never be disappointed by ODK :wink:
To prove it I am improving our general form for the coming field season, thanks to functionalities I did not yet used :

  • last saved values
  • naming repeat group

And thanks to documentation lines I did not yet read :slight_smile: :

  • search() function can query several columns of a csv file
  • resulting list in select_one can be sort following "sortby" csv column integer value

I will describe the enhancements later

5 Likes

2021 update of the form

3 enhancements of our main form are detailed here. All made possible by using existing features of collect and thanks to a more attentive reading of the documentation and some guru help :slight_smile:

1 - Facilitate and generalize the use of personal settings per form.

In the previous version of the form, we used user's phone number, stored in Collect to allow form customization. Each value from 0 to 9 represents at least ten parameters (see point 3 of the first message of the topic).
This "hack" is quite functional but limited because we can set only one phone number in the app, and not all the forms in the app use the same parameters...

In combination with user settings (user identity and email address defined at Collect's level), we will use the "last saved values" feature, introduced in Collect v1.21.0, to define user settings in each form

The logical is simple :

Each parameters is set to its lasts saved value. If it is the first use of the form, defaults values are set to true or the "Collect user's settings" one when possible (username and email).
here the "default" column of the user name :

coalesce(${last-saved#user_name},${username})

${username} is a field of type "username".

And here for the "geopint" setting (Do you want to create geopoint within the form ?)

coalesce(${last-saved#utiliser_geopoint},’true’)

User will be able to modify each value and next time he will use the form, this last saved value will be the default.

Those settings will be useful to adapt the form :

  • show/hide group of question,
  • filter select option (filter column : contains(${user_settings},filter))

Here is the extract of the 2021 version of the form relative to user settings and preferences. The empty columns are droped.

type name label calculation required appearance default
begin group settings Préférences → Géométries field-list
select_one boolean utiliser_geopoint Saisir des points ? yes columns coalesce(${last-saved#utiliser_geopoint},’true’)
select_one boolean utiliser_geotrace Saisir des lignes ? yes columns coalesce(${last-saved#utiliser_geotrace},’true’)
select_one boolean utiliser_geoshape Saisir des polygones ? yes columns coalesce(${last-saved#utiliser_geoshape},’true’)
end group
begin group settings2 Préférences → Thématiques field-list
select_one boolean animalia Données de faune ? yes columns coalesce(${last-saved#animalia},’true’)
select_one boolean plantae Données de flore ? yes columns coalesce(${last-saved#plantae},’true’)
select_one boolean fungi Données de champignons ? yes columns coalesce(${last-saved#fungi},’true’)
select_one boolean habitat Données sur les habitats ? yes columns coalesce(${last-saved#habitat},’true’)
select_one boolean pression_menace Pressions / menaces ? yes columns coalesce(${last-saved#pression_menace},’true’)
select_one boolean observation_generale Observations générales / jalons ? yes columns coalesce(${last-saved#observation_generale},’true’)
calculate user_settings concat(if(${utiliser_geopoint} = 'true','point',''),if(${utiliser_geotrace} = 'true','line',''),if(${utiliser_geoshape} = 'true','polygon',''),if(${animalia} = 'true','animalia',''),if(${plantae} = 'true','plantae',''),if(${fungi} = 'true','fungi',''),if(${habitat} = 'true','habitat',''),if(${pression_menace} = 'true','pression_menace',''),if(${observation_generale} = 'true','observation_generale',''))
end group
begin group protocole_etude Protocole et étude field_list
select_one list_etude id_etude Etude yes quick search('etudes')
select_one list_protocole id_protocole Protocole yes quick search('protocoles')
end group

2 - How to select species in a list by typing first letters of its name or a code

This code consists in concatenate the 3 first letters of the genus with the 3 firsts letters of the species (for example the little bird Pheonicurus ochruros is coded with those 6 letters "pho och")

And as a bonus users wanted the list to show first official names, not synonyms : genus firsts, followed by species and in the end subspecies, alphabetically ordered.

This was finally easy to do, even if I didn't had any idea about how to do it before I read once again the serach() function documentation. It mention that you can search in one or more columns - I ignored or forgot that - and also the possibility to arbitrary order the list with a column named "sortby" in the csv file.

\3. "search(csvName, 'startswith', columnsToSearch, searchText)": This  search expression includes all distinct rows that start with the  specified text in the specified column(s) (e.g., "search('hhplotdata',  'startswith', 'respondentname', ${nameprefix})"). The third parameter  specifies either a single column name to search, or a comma-separated  list of column names to search. Rows with matches in any specified  column will be included.
There are just two additional notes on usage:

\1. Choices will be ordered, by default, in the order that they appear in your .csv file. If you want to specify a different ordering, include a numeric column in your .csv file named sortby; choices will be  ordered numerically, according to the sortby column (if present).

So I just had to create a new column in the csv file for the species code (code_espece_key), and another one "sortby" containing the rank of the species once the list is order by official/taxonomic rank/name

SELECT lb_nom_key, cd_nom_key, lb_cd_nom_key, regne as regne_key, groupe, group1_inpn, code_espece_key, rank() OVER(ORDER BY CASE WHEN rang ='GN' THEN 1 else 2 END, CASE WHEN lb_nom_key ILIKE '%nom valide%' THEN 3 ELSE 4 END, lb_nom_key ) AS sortby
FROM total
WHERE ...

the filter columns looks like this :

quick search('espece_animale', 'startswith', 'code_espece_key,lb_nom_key', ${recherche_espece_animale}) 

Select a tax with its name

or its code

3 how to facilitate checking or modifying collected data on the field

In other words how to enhance repeat group naming and form navigation.

Thanks to @LN I was able to reproduce what I did achieved in a simple form.

Each place (geo-widget question) is now named using creation time (ex. place at 18:06)

And within each location instances, observation are named using the species or habitat name or pressure/threat type

Things to investigate before the field season :

for species lists : having a "last_saved value like" behavior over preloaded data and search() function, or being able to do the same (with search() function) over external csv files (form datasets, select_one from external / select_one_from_file), more standard compliant way) and then query 2 columns (starts with) and order the result in the proposed list.

Here is the new form and its media files (truncated to few lines).
sicen_2021.zip (68.8 KB)

And the sql code to generate taxonomic csv files from the french reference (https://inpn.mnhn.fr/programme/referentiel-taxonomique-taxref)
vue_espece_animale.txt (4.8 KB)

6 Likes

Thank you Mathieu for these interesting advances,
For your second point, I had the same request last year and I found the solution using the '%' character which replaces any string. the text to search for (searchText) was calculated like this:
concat (substring-before ($ {searchtext_latin}, ''), '%', substring-after ($ {searchtext_latin}, ''))

Thanks again

Hi Guilhem,
Welcome to the ODK community forum :wink: Don't hesitate to take some time to introduce yourself here
Thanks for the tip but could you share a concrete example, I do not understand how to use it. I never add the idea to try if % would be escaped or not in sqlite lists :slight_smile: ...
Didn't you notice any performances impact over large list ? I mean the LIKE clauses are not easy to index.

1 Like

I was convinced to put presented but I did not have to go to the end of the process, here it is.
I'm sending you an extract of a form where I'm using %.saisie_flore_simple.xlsx (14.7 KB)
I didn't notice any performance issues. Only on display if the list is too long, that's why the list is only displayed when I have 4 characters are at least one in the species name.

1 Like

Nice trick ! I will try it in the form.
If it is not too much slow over a big list, it could avoid to create a specific column in the media file for the code.

Some improvements for 2022 field season :

  • Upgrade to TAXREF v15 (national reference for French species taxonomy)
  • Add a question about accuracy if the geopoint is placed on a map
  • Simple creation of station to add data later
  • Add the name of the person who made the determination
  • Ability to mask user preferences (using last_saved or coalesce) :
    If masking is set → last_saved else we use default values
  • Adding 3 different geopoint (3 different GPS accuracy Threshold)
  • Addition of the entry of breeding pairs and chicks for birds
  • Addition of the synthesis of the species observed on a site :
    using a calculation : join('<br/>','${repeat_question})
  • Addition of the organization (mail domain) for future filtering of studies and protocols relevant for user's organization.
  • Studies and protocols are now in the choices sheet.

Here is the form with light taxonomic list (1000 first lines approx) :
SicenODK_2022.zip (89,0 Ko)

And SQL views to generate species list (csv media files)
views_to_generate_styled_species_media_file_from_taxref.zip (3,3 Ko)

4 Likes

2 posts were split to a new topic: Using last-saved in Build