ODK Fingerprints & Biometrics framework

Collecting biometric data is pretty cool and has lots of potential for applications in ODK. One of those is to link data across time.

This is a brief presentation of work done by our team at LSHTM with @seadowg, @dr_michaelmarks and others, to make a system for capturing fingerprints during ODK data collection with ODK Collect.

The primary motivation here was to be able to do successive waves of data collection in the field and then later (at the desk) to check (using fingerprint data) that data which are supposed to come from the same individual across time match up at the biometric level.

Although it could have real potential in the world of the entity based data collection future of ODK, we definitely don't want to do fingerprint matching in the field (i.e. getting ODK collect to pull up someone's data) at this stage. Not least, this is because there's many security issues that would come with this, but also because entity based data collection needs to precede such development.

Code base

Code for this open source project can be found here https://github.com/LSHTM-ORK/ODK_Biometrics

System design of the Biometrics Framework

The novel biometrics system consists of two components. The first component is “Keppel”, a smartphone app designed to run on Google Android operating systems. This app provides an I/O interface between the ODK Collect app and an ANSI INCITS 378-2004 compliant electronic fingerprint reader/sensor device. The app has to be sideloaded (it isn't on play store yet).

A really important point here is that the system is not simply taking photographs of fingerprints. The data are stored as concise code which has a very 'lite' impact on the size of the data stored in ODK and also requires no use of attachments. The fingerprint data are captured as plain text that is stored and encrypted along with other ODK data.

The Keppel Smartphone app was designed using Android Studio and Software Development Kit (SDK) https://developer.android.com/studio. The initial version of the app works only with the low cost (<£50) Mantra MFS100 Biometric C-Type Fingerprint Scanner (Mantra Softech Inc. www.mantratec.com), functionality for which was based on code templates provided within the Mantra MFS100 Software Development Kit (https://download.mantratecapp.com/).

The app was designed with a view to making the addition of further biometric sensors relatively simple. A software ‘demo’ scanner is also included, and this allows users to test their fingerprint supported ODK forms without having a scanner connected.

Using Keppel App to capture fingerprint templates

The app integrates with ODK Collect's External app widget using the uk.ac.lshtm.keppel.android.SCAN intent. An example XML form can be found here and an XLS Form version can be found here.

To capture all the fingers of one hand, your form would look like this.

and on the screen




Clicking 'launch' opens the external app





Pressing 'capture' then activates the scanner.





and once the template has been captured, the data are returned to ODK Collect as plain text
N.B. Here I'm using the dummy scanner



This whole process is pretty quick. Each scan takes just a couple of seconds.

Matching fingerprint templates

The second component of the system is the Keppel Command Line Interface (CLI), a Java/kotlin application designed to run on the command line of a desktop or laptop computer. The Java application is able to compare any two ANSI INCITS 378-2004 fingerprint templates and to generate a simple score which describes the overall similarity between the two templates. The Keppel Java CLI was based on code provided in the Mantra MFS100 Software Development Kit. Calls to the CLI take the form
keppel match -p [template1] [template2]
where [template 1] and [template 2] are either plain text (flag -p) or standalone (no flag) files containing copies of the fingerprint templates of interest.

To use this part of the system, you'd download the CSV file from ODK Central, extract the data from the columns relating to the fingerprinting and run the CLI once for each pair of templates.

Comparing two templates takes around one second of compute time and, for purposes of scaling, the CLI call can be handled by other software tools such as R, Python and C++ as an embarrassingly parallel workload.

The core function requires that each template is stored in a single line of its own text file.

From version 0.3, the following options are available

-p
Treats TEMPLATE_ONE and TEMPLATE_TWO as plain text rather than file This option is very useful for scripted analysis from R or python

Example [templates truncated]

keppel match -p 464d520020323000000001080000013c016200c500c5... 464d520020323000000000f00000013c016200c500c...

-ms
Return whether templates match along with score like "match_210.124"

-m
Return whether templates match (either "match" or "mismatch")

-t FLOAT
Threshold (score) to be used to determine whether templates are a match or mismatch

-h, --help Show this message and exit`

Real world tests

We're currently in process of doing a formal evaluation of how well this system works for linking data from different time points across a longitudinal survey. Hopefully we'll be publishing this in the next few months, but here's a sneak peak of the results.

In this study we asked 200 people to scan each finger of their right hand twice. This allows us to compare the first and second scan to see how the system performs. It also allows us to test fingers from person A to fingers from person B. Overall that makes it possible for us to investigate how often we'll see false positive matches (in the mismatched pairs) and false negative results (in the matched pairs).

Key finding 1 is that the quality of the match goes down as you move along the hand. Matched pairs of scans from the thumb perform best, whilst the ones from the pinkie are the worst. In short, if you scan any finger, it may be best to choose the thumb.
Having said that, they're all pretty good and there's fairly good separation between the distribution of scores in the matched and mismatched pairs of scans (see chart below).

This looks great, but false positives and negatives are happening here (see how the distributions overlap a little at the black dots [outliers] at the low end of the matched group and the top end of the mismatched group). This could cause problems even if the rate at which those occur is pretty low. In a study of 10000 people, a false negative rate of 1% adds up to 100 cases where you'd get a false negative result.

Key finding 2 is that you get a much lower false negative / false positive rate if you combine the scores for multiple fingers. In the chart below we see that when we scan the thumb, index and middle finger, then add up the scores, we can get a much better result. Here we called positives anything with a combined score above 75. In this small study, combining the scores from three fingers gave us false negative and false positive rates that were zero. No system is really perfect, so there's still going to be a few problems, but the conclusion is that if you capture three fingerprints and combine scores, you can get a very good diagnostic on whether any two data records collected with ODK collect actually come from the same individual.

Future directions

We're keen to expand the range of devices that this system works with. In theory it should be fairly easy to add new fingerprint scanners and the framework should allow for things like iris scanners to be added, though we'd need new functions to the CLI to add different types of biometrics. The fingerprint templates are an ANSI/ISO standard, so many reader devices will spit out data that are compatible with the existing fingerprint CLI.

I think that there's also scope to add in bluetooth connectivity and functions for reader devices for things like RFID / PIT chips (@Florian_May!)

As an open source project we are of course very keen for others to get involved.

Funding & Ethics

This work was funded by the UK Department of Health and Social Care using UK Aid funding managed by the NIHR (PR-OD-1017-20001). Ethical permission for the study was granted by the London School of Hygiene & Tropical Medicine Observational Research Ethics Committee (Ref. 22562).

imageimage

7 Likes

Hi,
This is pretty cool. Thanks for sharing.

One question I have when it comes to biometrics is how viable this is to promote to data collectors/data collection organisations, and for what purposes, as it feels slightly intimidating to hold biometric data on people, or even to encourage/teach people how to do this. If someone was surveying me, and asked to scan all my fingers, I'd be intimidated, as that's kind of what happens when I pass through security at an international border, for example...and not very many other places. (Do you have anything to share on how comfortable people feel having their fingers scanned?)

In many instances, people collecting data hold a certain power imbalance over people they are collecting data from. So it feels like with this capability, there's also an assumption that someone being asked for their fingerprints might feel pressured to share that data, without an ability to say no, thank you.

I'm not sure how to frame my question very well, but, is there a way to flag up data protection inside an app. For example - we flag up the 'auto GPS' in ODK, and allow people to turn it off. How do we flag to people that this is possibly more sensitive data being collected? Maybe this is just me having grown up with CSI-Miami or something, but I think of fingerprints as something that police use to lock people up.

With the development of easy tools like this, how do we also share easy 'application' learning/teaching that bring people along on the journey of when/where/how to collect and handle data like this. Some organisations are quite sophisticated and can handle biometric data. I think a lot of other organisations using it won't be quite as large, without the scope to implement sufficient data protection practices/policies.

There's quite a range of data handling tactics out there - most of them aren't very robust. So I get pretty scared pretty quickly with things like this - mainly because I don't understand all the risks, or best practices around all of it. I'd love to talk and ask more about it, but don't know if I even have good vocabulary to explore it.

As part of your project, do you have developed some sort of guidance around the how/what/when to apply?

Would be really interested in that!
Thanks!
Janna

Hi @janna
These are some really good points that are well made and very worth highlighting to the community. I'm certainly no expert on ethics, but do have some experience from my work in biomedical research. I think it really is a super complicated discussion and one that I take very seriously.

Note that below when I talk about research or studies, I am meaning stuff you do where you collect some data. By extension of my thinking in the context of fingerprints, I am only really talking about collecting data from or about humans, both living and dead.

So informed consent and participatory engagement are key to all things in good research practice and of course if anyone feels pressured at any time to provide any information, we've failed in our responsibilities and the minimum standard for informed consent has not been reached. There's a huge body of literature out there to draw from and I encourage anyone using ODK to become familiar with the basics of research ethics as well as international, regional and local data laws.

From my own perspective, we have it easier than many as there are very strict systems in place within academia and government led research. The work of our team is always guided by the principles of the Declaration of Helsinki and is governed by local and institutional ethics boards. From our perspective as professional biomedical researchers, these are our guides as to what is appropriate for different activities. We are lucky to be able to collaborate with and defer to the experience of experts in medical ethics. By co-developing studies with stakeholders from communities that will be participating in research that affects them, we also have great opportunities to do research in a way that supports the autonomy and rights of participants, rather than exploiting them. Like I say, we're the lucky ones!

You are right that not all users of ODK are governed by the same well developed regulatory frameworks and this is something our community should concern itself with, but almost certainly not govern. I guess what I am saying is that we should care a great deal about it, but defer action and responsibility to each user and the conscience, as we do now implicitly.

I wonder if there is perhaps a discussion to be had about adding some links on topics like ethics and data protection to the ODK docs? Maybe we could link to an online course? I'm very supportive of the idea that we could provide some pointers to help people develop skills in this area, whilst not supportive of being prescriptive in who can use ODK and related tools, and for what.

Some basic thoughts follow with specific respect to fingerprints system described above

  • You don't want or need to do this for every study. Only valuable if there's actually issues with credibility of linkage based on other systems like study ID cards, looking at official ID and so on. Keep in mind that many people in many places have no official documentation, so the challenge of confirming identity is very real in a lot of settings.

  • If you are for instance doing a clinical trial where you inject someone with something that could have potentially nasty side effects, then you might well want or need to be able to prove beyond reasonable doubt that the person was who they said they were. This could be for both medical and legal reasons, or just for purposes of trial data integrity. You need iron-clad data to get a new medical product approved (like a vaccine).

  • This isn't just about linking data sets together, there's also the huge issue of confirmation of identity when adhering to laws about rights of participant access to data concerning them. i.e. the biometrics help you to prove a person is the owner of the data, so preventing you from inadvertently disclosing their other sensitive data to people who might claim to be the person whose data it is. Also rights to be removed from a study etc.

  • I've worked for many years on studies where most participants were non-literate, so ink fingerprints were default 'proof' for consent. These couldn't ever be used to confirm identity after the fact as they were smudgy and we had no method for comparing one to another. This low-tech approach was fairly widely acceptable in many places where we worked, though I think we will need to do acceptability studies to see if this changes now that we are asking people to use an electronic scanner (i.e. your concerns about feelings of intimidation). Again, we won't be the first to cover this ground and others have already done a lot of work around this topic.

  • Taking part in the process of fingerprinting is participatory and so appropriate as proof of consent to participate in a study, so long as those running it are behaving ethically (you are right about power dynamics, although this is heterogeneous in different settings). Electronic consent is an important topic. At the moment paper consent forms are still widely used and work is ongoing to understand how the future may feature electronic biometric proofs for consent.

  • Personally I don't think that we shouldn't promote new technology just because it could feasibly be used in unintended ways (see arguments passim against everything from flint axeheads to a broad and thorough understanding of genetics). OK, so there's probably no good way to use Novichok, but within reason I think that most people who go to the bother of setting this up, will use this specific tech for good purposes, with good purpose and after having been trained in how not to inadvertently make things worse for the same people they set out to help. I am sure that everyone in the ODK community is keen that anyone using ODK does so with consideration to ethics and data protection impact assessment.

  • Open source makes for good transparency. The tools presented here don't save anyone's biometric data anywhere mysterious. They're on a Central server controlled by the people running the survey and the underlying code is both simple and auditable. There's no third parties involved, which I think is great.

  • Asking any questions in a survey which covers what international data laws would describe as 'sensitive' or 'highly sensitive' is essentially equivalent to collecting biometric data. GDPR, for instance, would quite rightly expect us to treat the fingerprint template and a person's HIV viral status or sexual preference with the same degree of extreme care. As for any technology that facilitates data collection, ODK could feasibly be used to illegally harvest and store vast quantities of highly sensitive data that are not biometric, but there's not really a distinction (no distinction at all in GDPR). Sensitive data should be collected with sensitivity.

  • We could also say that the fact that not all ODK Central projects are encrypted at the project level provides opportunities for malicious third parties to access highly sensitive data over the internet. As with biometrics, how tightly you control things depends on the case-by-case circumstance. At our institute, we insist that with very rare exception, all studies should implement project level encryption. This is because there's generally very little that you can do on Central that can't be done offline using R or similar tools; whilst having human readable data available on the web behind a simple password that could be owned is pretty dangerous.

  • Meanwhile, the new tools that allow editing via Enketo and future implementation of entity based data collection may make us revisit this requirement for universal project level encryption. That's because the potential benefits of the new functions will change that risk to benefit calculation, as will the maturity of Central itself.

Hope this isn't too all over the place and makes some sense.

Chrissy

1 Like

This is great work @chrissyhroberts

Working with research teams in Kenya we built a similar platform although it didn't integrate this well with ODK.

One of the lessons learnt from COVID-19 is to use contactless biometrics. Have you explored iris biometrics?

Paul

Question: does this work on turtles? (rather surprised somebody else hasn't already asked this... :slight_smile: )

2 Likes

Turtles are identified with microchips much like domestic pets or even horses, e.g. these. The 15 digit PIT tag number is read with a scanner. These scanners can connect to Windows desktops (hard to lug around in the field) via USB cables or Bluetooth.

Reading a 15 digit number off a tiny display at night while wrestling a 90kg Flatback turtle can be pretty error-prone, so I wish there were a direct way for the scan result to go into Collect.

We'd need a custom Android app for each type of scanner to be used as external app from inside Collect. The tricky part I'm currently investigating is communication from computer (later, Android) with the scanner.

@chrissyhroberts,
Thank you so much for your thoughtful and helpful response! Even as I wrote my question, I hesitated, as I agree, we can't stop working on useful things even though there might be the chance it might be misused! However, I appreciate your openness to engage. I love the examples you share, this really helps me imagine! And good idea about a list of resources. There's a ton of resource out there, but I rarely see the "principles" linked up to the "practical", if that makes sense. More to come... :slight_smile:

1 Like

Hi @paul_macharia - I agree that there's a lot of value in contactless biometrics and we spent a lot of time investigating this during our recent work on Ebola vaccines in DR Congo. Ultimately, we hit a barrier that we couldn't find any cheap and 'off the shelf' devices that we thought you could just buy, plug in and use. I think that theres a bit more coming to market now so this is probably worth revisiting.

A further barrier was that when we did some work to investigate how acceptable it would be to gather iris images, it wasn't promising. In part, this may have been because of the context of that outbreak, which took place in an area that had suffered from many years of political and civil instability, so may not have been an ideal setting to try new things with biometrics. In the end we didn't use fingerprints either.

I think that with a bit of investment we may end up with a nice package for adding multiple readers, sensors and devices to ODK, but for now it is fingerprints.

@Florian_May I'm guessing that you can't just plug in via an OTG cable and use the scanner like a keyboard. We use bluetooth and wired barcode scanners in our laboratories and these just dump plain text in to a text field on ODK.

I like peripherals that do this as you don't need any apps.
I'm guessing that you've tried a variety of readers and found that this is the one for you, but I suppose what makes it possible to do these things more easily is when the manufacturers provide an SDK for making your own solutions.

I think that the other solution is probably to make your own hardware solution, programming it to dump plain text to ODK over bluetooth. In my head there's an open hardware solution based on an Arduino Nano or RPI Pico, plugged in to an appropriate FDX-B frequency RFID sensor.

Would probably cost about $20 to make.

Very true @chrissyhroberts

A number of smartphones have inbuilt high quality front facing cameras that could be another option for contactless biometrics.

One other technology I have seen being tested is palm prints as a contactless biometric. Using the smartphone camera, high quality images are used to create unique identification platforms.

The idea that your team is working on an open technology to plugin multiple biometric technologies is a great one.

Please feel free to share future updates...

Asante
Paul Macharia

1 Like