1. What is the issue? Please be detailed.
If you configure a form as auto-delete it does appear to delete the form, however if you look at the "View Sent Form" window you see a list of sent forms and some time metadata (Form name, Sent on ...., and Deleted on....) This is obviously an insanely dangerous data leak in the event a collectors device was collected by the wrong people. There must be a way to expunge this potentially fatal data, no? I'm just not finding it either in the collect UI or the form configuration. I'm creating forms using the XLSForm, is there perhaps an option to clear this data using XML?
2. What steps can we take to reproduce this issue?
It seems to be universal, I note that this was brought up in 2017 though it wasn't clear if as a bug or a feature: Deleted submission remains in View Sent Form It is clearly a bug and a dangerous one from enumerators working in delicate environments.
3. What have you tried to fix the issue?
@LN's suggestion to hide the menu option is helpful, but it has become fairly standard for even lower-tech adversaries to employ phone forensic devices and extract all data from a phone for later analysis and so it is best that no data or metadata be left on the phone, hidden or visible.
It would seem that if encrypted forms are used, metadata should also be expunged by default.
4. Upload any forms or screenshots you can share publicly below.
Hint: A problem might be then, that there is no more a possibility to check on "sent" cases, which didn't arrive in the dataset on server level.
A workaround might be to have a final question (e.g. acknowledgement type) in the form to anonymise the data (names etc.) and the instance_name, before finalisation/send.
I trust, you know, there is a (manual) option in the settings menu, but this will also delete saved (unfinalised/unsent) forms. (And there is another broader "Delete" option.)
The delete saved forms option is indeed helpful as a manual step, thank you. There are use cases where it is more important that the enumerators activities not leave a cookie trail than that the enumerator can review sent forms.
The use case of concern would demand a higher level of data hygiene, using this process, on the part of the enumerator than can be reliably counted on to manually reset the data after each collection and if they forget and get rolled up, they might not be unrollable.
This seems like something that is a very reasonable ask, even if an edge case, because of the possible magnitude of the consequences, even if very unlikely. People operate in areas where not everybody is friendly and appreciative of even the best intentions.
Can you describe in greater detail what your role is and what you're trying to do? If you're the form designer, you should be able to choose what information to include in the form instance name as described in the documentation. In particular, the documentation states the following:
A sent form's instance_name is maintained after it is deleted. This makes it possible to confirm what work has been completed even if submissions are configured to delete after send. However, it does mean sensitive data should be avoided in instance_name.
This is a really great suggestion if you want to both have a meaningful instance name for any drafts saved and make sure that sensitive data does not remain on the device.
It would not address the desire to have a meaningful submission name on Central. If that's the need you have it would be helpful to understand your workflow and how you're using instance name on Central in greater detail.
Does this mean that your data collectors are using devices that do not auto lock? Or maybe older Android devices or ones by untrusted companies? If your devices auto lock and run Android 10 or greater, they should also be encrypted and therefore virtually impossible to get any data from without the device pin/biometrics.
Which metadata are you concerned about? Do you consider e.g. the count of submissions made per form to be sensitive? In general, I expect that folks using submission encryption would either leave instance name blank or set it to something generic (e.g. "Sent").
ODK tends to give users a lot of power and responsibility. That helps make it a very flexible platform but it does mean that form/project designers need to do more cost/benefit analysis than if ODK prescribed more of a set workflow.
I understand that's the premise, however it is not in practice a reliable assumption. There are BFU techniques with a good success and if the device is still biometrically unlockable (typically within 48 hours of a BFU unlock/refresh), recovering plaintext data is generally trivial even without chipoff or even JTAG. If a panic password is set and the collector is able to access it and nuke the phone quickly enough, plaintext recovery is understood to be more than a little challenging especially if the DE key is erased and the hardware address burned, though not life-or-death comfortingly impossible. A problem with CE passwords is that mnemonicity is systematic, but even an arbitrary 6 digit CE pin median decode is 3 minutes with a low end RTX and the hardware DE key is extractable from many chipsets (though probably not all).
I do not consider the expenditure and equipment necessary to recover and decrypt post-nuke data relevant to our use case, nor is it fruitful since the "deleted" data from the forms is quite plausibly recoverable along with the rest of the ghostly digital refuse left behind.
Adversaries fall into two categories: those that generally respect rule of law and where a collector would have rights and recourse to them. Such regions generally are not meaningfully inhibited by device encryption and there are no shortage of vendors globally offering decryption/forensic extraction of locked and encrypted devices (even BFU), either as a software package, a hardware device, or as a service. I would not trust a collector's safety to device encryption even assuming they have the right to refuse to "voluntarily" decrypt, but in such environments collectors would generally have sufficient legal rights and operate within them.
An alternative modality involves adversaries who do not presume a right to device privacy and in such environments refusing to decrypt a device is not a survivable option whether the formality of a hearing is to be provided or not. The concern is not to protect collectors who are violating local laws, but in situations where doing good things runs afoul of local political turmoil and the adversaries are not adhering to local laws they find inconvenient even if the collectors are. A password is only as strong as the fingers that enter them.
Yes, I'm thinking of the list of transactions, a transaction count, and the time/date "sent" - I admit I haven't tried a blank form (It didn't occur to me, TBH),and I'm assuming (perhaps wrongly, I'll test later) that the list in the "view sent form" tab would still have the "Sent on..." and "Deleted on...." dates. If blank form names would inhibit recording that data, my bad.
Very fair and much appreciated guidance. I appreciate the link - that's some great work in explaining the habitats monitoring use case and sets a standard I'm not sure I can meet.
I think my concern is a little bit better illustrated by the adversarial modality above, but a use case might add more color, I apologize for describing a fictional situation:
A fictional example:
A collector is gathering data on civil rights in refugee camps in a contested region with limited recourse to rule of law. The data collection is authorized by the nominal government, but regional control is maintained by militias who are engaged in a low-level genocidal campaign against some otherwise locally tolerated minority. The government has authorized and requested data on the status of regionally repressed minorities. The collector is tasked with gathering data from these minorities, who are not externally visible and would be subject to problematic treatment if they could be identified by the militia in control.
It is believed by the militia that the collector's submissions would be from members of the targeted minority and the collector's movements are tracked. The timing of the report submissions, when correlated with the collector's movement reports, provides the militia with a target list for elimination.
It is not relevant whether this correlation is meaningful or not.
If the View Sent Form tab is blank the collector has some plausible deniability. There is still, obviously, residual risk but just as a mobile collection system is extremely useful and significantly simplifies flash polls and similar work and ODK's exfiltration and encryption model are really solid and a good match to such needs, so too have unfriendly adversaries come to understand that mobile phones are critical sources of information and expended meaningful capital to ensure they have and maintain access and generally turn first to captured digital devices.
Given that there seem to be a lot of projects leveraging ODK in democracy challenged parts of the world, at least contemplating unfriendly encounters seems relevant and useful.
If the collector's movements are being tracked, the adversary knows much more about the data being collected than they can learn from submission count and sent datetime.
It sounds like the question now is if we can identify a scenario where submission count and datetime are sensitive enough to warrant purging.
The example is explicitly fictitious and suffers from some construction flaws. The premise is that the collector is moving through a populations with some small percentage of targeted sub-population mixed in to a larger population that tolerates them, but under the control of an intolerant force. This force, we assume, respects the privacy of the home, but monitors public movements.
I appreciate it is a forced example and there would be many opportunities to mitigate risk in any real-world scenario.
An alternative fictitious example might be collectors at a public protest collecting sentiment or documenting specific acts by one side or the other. The collector is rolled up by one side that suspects the collector of acting on behalf of the side opposite their own; absolute polarity is irrelevant, merely that it is opposite and the collector and their digital device are in the physical control of one polarity and suspected of acting on behalf of the opposite.
For completeness sake, lets assume the collector has been rolled up with a large number of other protesters and the those doing the rolling are willing to disappear leaders of the side to which they are opposed but intend to catch-and-release those determined to be low level individuals. This might be recognized as an operative principal in managing dissent in many regions that either don't have the resources to disappear dissenters en masse, don't have the political capital to do so, or internally reject mass actions but do believe their cause justifies selective action.
Carried digital devices are almost always considered critical evidence and will be cracked and decrypted to extract contact information and media as part of standard procedure (interesting side note: memes have become so central to strategic messaging/cognitive warfare that commercial packages to assist in forensic recovery include generalized multi-lingual OCR as a standard feature).
Should ODK collect be discovered, it can be reasonably assumed that the entity will search it for relevant data. It seems that a similar scenario is anticipated by developers in including the option to force encryption on finalization and delete after send, well-implemented and well-designed features that significantly mitigate the risks of collecting data in contested environments.
It is my opinion that the residual time stamp metadata would be sufficient indication in the above scenario that the collector was acting in a manner warranting involuntary discriminatory action that would likely be unwelcome to the collector and that the absence of such data would limit the collectors risk much as "encrypt after finalize" and "delete after send" already do.
As a fairly simple extension of the very well implemented "encrypt" and "delete after finalize" features, I'd suggest that if both are enabled that the time stamp records also be expunged from the collect device as it is clearly the form-designer's intent to minimize collected data exposure by enabling encryption and submission deletion, though I certainly appreciate that there may be use cases where it is desirable that the collector retain "receipts" even after the encrypted/expunged data package is gone.
I find the discussion interesting, and to make data collection more safe..
But I´think, there can be situations where collecting, transferring and storing such sensible digital data should not be used. For example, several countries decided to exclude ethnic questions in their official Population and Household Surveys. Direct (informal) contact with the targeted groups and their local participation might be preferable options. Also, using Enketo (Web form) together with browser cache cleaning might be an option.
Certainly it is true that there are lines of questioning that are too sensitive to risk collectors lives gathering. I'd like to move that boundary as far to the "safe" side as possible and minimize the shadows that give nefarious activities freer operation.
As for Enketo, it is a useful tool and with a browser like Orbot is quite secure. It does, however, require a better internet connection than the Collect app (it'd be awesome if the Collect app could integrate a Tor backend or IPFS store and forward proxy or a Briar transport mechanism someday). A lot of places just don't have reliable internet service, either because there is no mobile data available (fairly common) or because it is being blocked (usually transient).
I've just read this through and wanted to summarise some thoughts.
It would be good to provide some sources for these claims. I'd be interested to know if there are currently reliable ways to extract data from phones using Android's File-Based Encryption (standard from Android 10 upwards). I could totally imagine the feasibility of cracking into a device is dependent on the strength of the password set for it (as that will be used to encrypt the device's key), but it's not something I've delved into really.
That aside, I think we can simplify the discussion by imagining a scenario where someone is coerced into unlocking their device so that an attacker can just look a the data in View Sent Forms. In a case where the forms are either encrypted and being deleted after send, the only data that would be visible from a submission point of view (as others have pointed out) would be the submission title (the "instance name"), the time the submission was sent and the time the submission was deleted (which will most likely be exactly the same). The instance name is within the control of the form designer, so I don't think we need to consider that as ever sensitive (maybe I'm wrong) so it's just the time timestamps that are potentially problematic here.
I think all the scenarios discussed here where that data could be sensitive are described as fictional (apologies if I've skipped one). @gessel do you have a concrete example where these timestamps could be considered sensitive? As much as it's good to discuss hypothetical situations (especially when it comes to security), we generally find having a real world example of a need or problem is the best way to get to a solution.
With respect to demonstrable cracks - I can't share any direct experiences, you'll have to take that for what it's worth. It is trivial to find companies that claim the capability, provide detailed operational scenarios for how they can extract data and under what conditions and which devices (not all conditions, not all devices, not all vendors: if we're to accept their public claims as limiting) and from their repeat sales which are discoverable and verifiable on, for example, on contract summary sites like highergov.com, it would seem risky to place trust in device vendor claims over the evidence that LE and Mil budgets are being allocated to such services on a repeat basis.
If you want to "do your own research" I suggest Milipol in Paris. If you can plausibly represent a, say, regional government with legitimate terrorism concerns then vendors will share some unpublished capabilities. You might even get a demo.
A little FUD is not paranoid: a zero day crack, especially a class break of mobile devices is worth many, many millions if it is protected and quickly becomes worthless if exposed. Unpatched zero days in the forensic extraction community will not be even hinted at outside a SCIF.
Absent an online demonstration of restricted capabilities, perhaps we can agree that the history of encryption suggests that assuming a complex device like a phone is secure has never been a reliable one. Maybe Android 10 is the first encryption protocol ever to actually be enduringly secure, but maybe it is better to assume the security it provides is, at best, an inconvenience. Note too that it is well documented that not all security services need to rely on technical means to gain access to encrypted devices or have meaningful qualms about physical methods of doing so.
As for scenarios - they are "fictionalized" not "fictional." This is a public forum, the concern is literally mortal peril, albeit clearly and unarguably an edge case. My assumption, and I appreciate it may have been unfounded, was that this data leak would be easy to plug at relatively low cost and that the savings, while rarely and hopefully never realized, would be of sufficient value to justify the cost.
My modest suggestion is that if a form is configured for both encryption and deletion, that the View Sent Forms entry also be expunged, requiring only client logic. An extension of the metadata tab to add a "clear_send_records" would be more flexible and also satisfy the goal.