Remove the default sitewide Admin access to human readable data

chrissyhroberts · October 6, 2022, 9:32am

The problem

When using Central, the sitewide admin is able to access all human readable data in all projects across the central installation. In a multi-user/service model deployment, this is problematic.

From a data security perspective, the current situation is non-ideal because the sitewide admin exists only to set up the project, to manage the users and so on. They rarely have a direct role in the project, or any responsibility for the stewardship of the data in the project. In my setup at LSHTM, I'm directly involved in about 12 of ~200 projects I host at any given time. The only way I can make the system compatible with GDPR, local standards etc. is either to (1) get myself named on the official paperwork as an individual with rights of access to the data or (2) to ask everyone to use project level encryption. Doing (1) is often problematic because [a] a lot of people don't think about this until it is too late and [b] many studies won't justify it as there has to be a coherent reason why I'd need it. As it stands, I currently have to use option (2) and ask everyone to encrypt everything. This sadly locks them out of the benefits of editing and any future entity based implementations of ODK. This problem is perhaps worse for commercial providers, where there's even less justification for the admins to be able to see a project's data, rather than simply to manage the service.

The goal of this feature request is

Sitewide admin should not be able to see human readable data on central projects unless they are also assigned a project role in the appropriate tab of the project. This makes for a more transparent view of who can access human readable data on any given project. In practical terms, this means that any sitewide admin should also be a project manager for any project on which they want to see human readable data.
Option to lock project roles with a password kept by the project manager. This acts as a preventative that stops the sitewide admin from simply assigning themselves a project role at any time [which would defeat the point of the previous feature]. Any changes to the project roles would therefore require the settings to be unlocked with a password known only to the project manager. This places full control of data access rights to project data with the project manager.
Project managers should be able to add web-users to their project with data collector or project viewer roles, removing the need for sitewide admin to manage this. I think that keeping project manager assignments with sitewide admin is sensible.
System logs should log system actions. Project logs should log project actions. Any changes to data should be logged and contained within the project environment. No project data should be accessible outside of the contol of the project roles.

What are some example use cases for this feature?

Use case 1

Current situation

Bélèn runs ODK Klood, an ODK Central hosting service. One project is a major clinical trial of a COVID-23 vaccine that is coincidentally recruiting participants in the neighbourhood in which Bélèn lives. As sitewide admin, Bélèn can see all the trial data and learns a lot of things she didn't know about her friends from the block.

Proposed situation

Bélèn runs ODK Klood, an ODK Central hosting service. One project is a major clinical trial of a COVID-23 vaccine that is coincidentally recruiting participants in the neighbourhood in which Bélèn lives. As sitewide admin, Bélèn creates the project and assigns several project roles including a project manager. Bélèn does not need to see the data and so does not assign themselves a project role. The project manager then locks the project roles, preventing Bélèn from being able to later access the data without it first being unlocked by the project manager. Bélèn enjoys living in her community, knowing that she has supported the health of her her friends and neighbours whilst not compromising herself or exposing trial data.

Unknown element

Momo is a really cool IT expert who manages the installation of Central and its databases for ODK Klood. Whether Momo can see all the data is a mystery to this narrator, but in this idealised world where ODK Klood is a thing, Momo has no reason to see the data and so they can't.

Use case 2

Current situation

Harpo is working to support children who live on the streets in London. He plans to establish a central database of these children, which will be accessed and edited [all via Central] by stakeholders from the local authority, by teams from Harpo's charity and also by social workers on the ground. Harpo has read about ODK Central and realises that the new entity based features will be the perfect solution. Being short of funds and also low in IT support, Harpo applies to a zero-cost Central hosting provider that he finds on the internet. It all goes really well until Harpo discovers that the sitewide admin of the system is a member of a criminal gang who has stolen the data for horrible reasons. Things end badly.

Proposed situation

Harpo is working to support children who live on the streets in London. He plans to establish a central database of these children, which will be accessed and edited by stakeholders from the local authority, by teams from Harpo's charity and also by social workers on the ground. Harpo has read about ODK Central and realises that the new entity based features will be the perfect solution. Being short of funds and also low in IT support, Harpo applies to a zero-cost Central hosting provider that he finds on the internet. It all goes really well and the children have much better lives.

Unknown element

So this somewhat sensationalist use-case only really works if Momo-tui, the IT person, is also unable to see the human readable data. This is really only here to illustrate the fact that the person with highest data stewardship needs and responsibilities is the project manager and not the sitewide admin. They should have the power to lock the admin (or anyone else) out, without compromising UX functions like editing and entity based stuff.

LN · October 7, 2022, 9:42pm

Providing a unified project-level log is something we intend to do.

I hope you’ll pardon my gentle joke here – what I’m hearing is “I’m an ODK Central service provider and I’m not trustworthy!” How to be a responsible and law-abiding service provider is outside of the bounds of what we try to focus on but I’ll try to provide a few useful pointers and ideas. In general, we expect that self-hosted Central is used within organizations by individuals who can vet each other and with terms of service appropriate for the location that the system is used in.

I recommend you review this with a local lawyer in the context of your service offering and corresponding policy. My understanding is that data policies always include provisions for service providers to provide the service (but I don’t know your local/institutional rules and responsibilities).

That is, it’s ok for a service provider to have theoretical access to data, it’s just not ok for that service provider to actually access data outside of what is required for providing the service. You can look into GDPR's ‘grounds of legitimate interest’ for more. Service providers are responsible for providing a secure service and for having clear terms governing that service.

I think you're coming at this from the perspective of someone who uses hosted services managed by others. In that case, you don't have visibility into the access that they may have to provide the service. If you read their terms of service, that will give you some indication.

Platform providers almost always have full access to data on some level. In practice, access is limited through secrets protection policies and legal contracts. Because many platforms involve self-service account creation, you get the illusion that your account is fully segmented but in practice almost all platforms have admins (and often people like customer service representatives) who could get access unless client-side encryption is used (e.g. https://docs.getodk.org/encrypted-forms/ or Google's new-ish https://cloud.google.com/blog/products/workspace/new-google-workspace-security-features available for limited products).

ODK Central makes this access obvious because the Sitewide Administrator role can see all resources through the frontend. But fundamentally and as you note in your scenario regarding Momo-tui, most platforms have some version of this property.

I think what you’re reacting most strongly to is that Sitewide Administrators can log into the frontend and see everything. We could limit what is visible in the frontend but I think that would be less transparent – it would obscure the fact that the person administering the server does in fact have full access. Perhaps the real issue here is that you as an individual have a Sitewide Administrator account when "Sitewide Administrator" doesn't really describe what you want to do.

What I'm reading from your proposed solutions is that you’re trying to split service administration and access granting. It sounds like you want to be the access-granter without being the service administrator. Central is designed with the idea that the two are the same individual(s). Furthermore, it sounds like you’re not just the access-granter but also a tenant.

If this describes what you’re trying to do, one thing you could have the service administrator do is to create a Sitewide Administrator for you and block that user from logging into the frontend and/or access to most of the API at the nginx layer. That way that single Administrator would be able to use the API to e.g. create projects, create users, assign users to projects, and nothing else. You as an individual could have another account to access the projects you need frontend access to. You could use email tags to use the same email address for both your access-granter and tenant roles.

That would provide additional guardrails against accidents but please note that it’s not enough to conclusively prevent you from gaining access to data. If you are malicious, you could create a new user using an email address that you have access to, assign that user to get access to a project that you know exists, and use that to log into the frontend or use the API.

For even more protection, you as the access-granter could do something like send a request to the service administrator that they then have the authority to approve/reject. This could be largely automated and would guarantee that truly only the service administrator has theoretical access to tenants’ data.

You're essentially describing another access level here, let’s call it a Project Provisioner. I believe it would have to be governed by unusual rules like “can only assign Project Manager roles to projects that have no Project Manager” that don’t fit cleanly into a typical permissioning system. There are some interesting ideas there, some of which we have explored a bit. I don’t see them becoming an area of focus in the near future given what we have ongoing. As far as I can see, it would only benefit service providers who want to split service administration and access granting. That said, if others have a similar need, please do comment.

ODK Cloud provides single-tenant offerings. Each customer has a fully-isolated ODK Central instance. ODK has theoretical access to customer data but our policies (e.g., need to know basis, two factor auth) and the law guide what we can do.

Either Bélèn has a robust service provider agreement or they should not be trusted as a service provider. If Bélèn is believed to be accessing data against the terms they have stated in the service agreement, legal action should be taken against them.

There is very little we can do in this case. We have no control over code running on other people’s servers. Unfortunately, there is always the risk that a service provider claims that they run Central and actually provides something that only very remotely resembles it. That’s why it’s important to read terms of service and validate the trustworthiness of service providers. Adding roles and layers of security to Central won’t help here (because they can be removed or tampered with). We offer a zero-trust option: client-side encrypted submissions. Zero-trust will always have limitations.

chrissyhroberts · October 10, 2022, 10:28am

Thanks so much @LN for this very deeply considered and detailed response; which I've found somewhat transformative in reshaping my thinking.

I've been cogitating on this comment in particular...

I think you're coming at this from the perspective of someone who uses hosted services managed by others. In that case, you don't have visibility into the access that they may have to provide the service. If you read their terms of service, that will give you some indication.

You are correct, I certainly was thinking of this more from a consumer perspective.

I think I've always approached this from a mindset that there's a hard relationship between need and ability; for instance, my assumption that the fact that I should not access the human readable data meant that there should be a mechanism to actively prevent me (as a service provider) from doing so. In reality, I, like you @LN, simply don't (and will never) access the data of our users, because we're trustworthy individuals working for trusted organisations. That fictional rogue Bélèn made the mistake of going and looking at the data, which is an active breach of the trust (and laws) that users put in their service; whilst Harpo didn't do the due diligence on the criminal platform they opted to work with. Both failures really stem from placing trust in an untrustworthy service provider.

I love this comment, which gets to the nub of the thing ...

what I’m hearing is “I’m an ODK Central service provider and I’m not trustworthy!”

That's definitely not the message I wanted to convey, but maybe more like "I’m an ODK Central service provider and I’m worried that my users won't believe that I'm trustworthy!”. Some 500 projects have worked with us over the years and thinking more about your descriptions of the trust-relationships between the servicer provider and users; that's probably the best evidence I have that we both have, and deserve to have the trust of our users.

But I think that the most helpful thing you've said is this.

it’s ok for a service provider to have theoretical access to data

Until now, I guess I just didn't think it was OK. A lot of the motivation behind my original concept for these features was born out of my concern that there was some need to minimise the 'service provider' role to the barest minimum number of people (i.e. even within the group of just three people who have this theoretical ability in our setup at LSHTM). Having thought more since reading your response, I realise that it really is OK, because I am trustworthy, and so is everyone involved in our provision. No one on our team would ever consider using this theoretical access for malign purposes, or outside the basic requirements of service provision and support!

You're right, there are already 'zero trust` setups, which the user can implement. If they don't then it becomes inevitable that someone is theoretically able to see the human readable data. A reliable provider like ODK Cloud, Kobo, Ona, LSHTM etc simply won't ever look.

It's interesting to further consider that my thinking was also driven from the perspective that users might expect the totality of their data security to be delegated to the software (through feature developments such as those I proposed above), when in fact a lot of the protection comes from the covenant of trust between the users and the people and teams who are the service providers. The professional acumen and histories of the services themselves is part of the security!

Thanks again @LN for an absolutely fascinating discussion that's given me a great deal to think about.

Chrissy

LN · October 11, 2022, 4:00pm

This is very well summarized.

I think it's also helpful to think about how this applies to other parts of life in metropolitan areas. We give our credit card with the number on it in plain text to folks helping us at stores and restaurants. We get into vehicles controlled by people we've never met. We give our bank customer representatives the answer to our secret question so they can access our full account details. There are often no hard systems preventing abuse. But there are contracts, trust relationships, trainings, auditing, risk of being fired, etc.

I realize now that my comment could be taken to mean I actually believe you are untrustworthy. Thank you for taking my tongue-in-cheek comment as I intended it and rephrasing it more elegantly!

While it may not be strictly required, it is good practice to try to limit access as much as possible. In your case, since you are both an administrator and a user, I strongly encourage you to have separate accounts for those two hats you wear if you don't already.

Ideally, the credentials you use when you have your administrator hat on would have an extra layer of security, e.g. protected by a private key only you have access to. You can also go further and do things like what I described with limiting frontend access. The goal should be that those credentials are used only as-needed and that the usage can be easily audited.

The important thing you're doing is assessing where weak points are in access policies and looking for ways to harden them. In some cases there will be software solutions so I don't mean to shut down that line of thinking entirely. But you can go really far with a good operational setup.