Limiting Entity access for App Users and Data Collectors

Currently, when a form has an Entity List attached, all Entities are always sent to all users that request the Entity List. For some projects, this is a significant barrier to adoption of Entities either because of privacy concerns or because their Entity Lists are so large that it’s not practical to always download them fully. We’ve seen projects do things like split their forms (register_north, register_south) and Entity Lists (participants_north, participants_south) to get around this limitation.

We have been exploring how to approach this need for some months and have identified a number of related issues:

  • App Users are often defined as roles (e.g., vaccinator) rather than individuals because the absence of bulk tools in Central makes it hard to manage App Users as individuals.
  • Confusion between roles: App Users and Web Users with the Data Collector role feel like they should be very similar but they’re currently represented very differently.
  • It’s easy to get lost while managing users: You can configure users and access from multiple pages

This led us to explore ways to fundamentally rework user management in Central. We made some prototypes and got some great feedback from user testing – shoutout to everyone that gave feedback so far!

For now, we have decided to come back to the underlying problem of limiting Entity access for App Users and Data Collectors and have some ideas for making improvements in that area in a phased way without taking on all user management issues at once.

We’re exploring the concept in a phased approach:

Phase 1: Single ownership

In this phase, there are two options for Entity access:

  1. Data Collectors and App Users will either be able to access all entities in a project
  2. Or only the entities they have created. This should benefit projects with many entities where users only need access to their own.

In this phase, we would enable changing Entity ownership through the API only. Changing Entity ownership is important to support cases where Entities were created through CSV upload and need to be given App User/Data Collector owners or where Entities created by one individual need to be reassigned to another individual.

Phase 2: Single ownership with the ability to change owners

If single ownership is enabled, Project Managers will be able to change entity owners. This should benefit projects with many entities where users only need access to a subset.

Phase 3: Group ownership

Project Managers can create groups and assign users to them, where users can be part of multiple groups. This should benefit projects where users need access to entities associated with roles or locations.

We’d love to hear from you on these approaches:

  • How might phase 1 and 2 work within your projects?
  • What potential challenges or concerns do you have?
  • What questions are top of mind for limiting Entity access?
6 Likes

Really excited to see this as it's a size (especially when entities get media...!) performance / privacy issue for me and many others I know.

  • How might phase 1 and 2 work within your projects?
    • P1 - I would have multiple app users that need to access the same subset of the entity list(s) and they would not create them themselves in most cases. If I could accept a single 'app user' for a region and rely on deviceID &/or entered user info then this would allow using the API to change ownership of each subset of the list to limit access.
      • You described it as access 'all entities OR the entities they have created (i.e. ownership)'. Is there a possibility for 'generic/project wide entities AND entities they have created/own'?
    • P2 - This looks like P1 but with a Central interface for changing ownership, which is ok for small entity lists but impractical for large ones unless there is a filter and select all UI.
    • P3 - this looks like it solves most of my issues!
  • What potential challenges or concerns do you have?
    • As above, OR vs AND access and ability to change many through Central
    • If an owner is an app user and access is only to that app user, can the submission be edited in Enketo as that user is not the app user and so wouldn't have access to the entity items?
  • What questions are top of mind for limiting Entity access?
    • would CSV upload allow assigning ownership at time of creation?
    • would ownership change with an update to an entity by app user B vs creation of an entity by app user A? (assuming set as all users access all, but ownership is known)
    • only tangentially related, but will CSV upload allow updating existing entities vs only adding new at some point?
1 Like

Hi @ahblake, thanks for the questions and feedback!

If an owner is an app user and access is only to that app user, can the submission be edited in Enketo as that user is not the app user and so wouldn't have access to the entity items?

Currently only Project Managers or System Administrators can edit submissions. Entity filtering doesn't apply to them (yet?) so they would be able to see and edit all Entity data when opening an existing submission.

would CSV upload allow assigning ownership at time of creation?

We do want to make it so that a CSV upload would allow assigning ownership at the time of creation. That could be part of P2 with a database id or we may first migrate users to having human-friendly usernames.

would ownership change with an update to an entity by app user B vs creation of an entity by app user A? (assuming set as all users access all, but ownership is known)

We currently don't think that updates will change ownership. We may at some point introduce a new form design action for changing ownership. That would be similar conceptually to the create or update actions.

only tangentially related, but will CSV upload allow updating existing entities vs only adding new at some point?

That's the hope! We haven't taken it on yet because it's complex but you can use the pyodk merge function to achieve that goal if you're comfortable with some scripting.

2 Likes

With this 3 phases concept and perhaps implementation later, is there still a limit on how big the entity list can be? What is the expected size of entity list that could work under this setup before hitting a real performance/practical issue?

For trees use case, perhaps half a million is a number that I am looking at. With current entity feature, it will be just too overwhelming. If the entity is with location to show in a map, the limitation will be even more obvious. At the moment, I will use attached csv or json when the dataset is at this scale. When entity features are really essential, the only option is to create numerous entities, which will make data management more cumbersome.

It is a bit difficult to strike a balance between performance and entity management now. The ideal situation is to have both optimized, manage just only one entity with good performance (subset entity by access group).

Phase 3 does seem to address my need if the performance can be improved as well.

This is very useful. I used to call this requirement as Differential Access of entities to users.

In my workstream, such feature is highly needed.

Glad to see the progress.

4 Likes

Thanks for your questions and thoughts @chun_hing_yap and @Syed_Muhammad_Qadeer!

Is your question about Central or about Collect/clients? The biggest performance issues we're aware of are on Collect both related to data transfer and display/filtering. This functionality is aimed at helping with Collect performance by reducing the number of Entities individual devices get. Are you getting performance issues on Central?

Would that be half a million to a single device or would you be able to divide up the list with functionality like what's described in this thread?

Entities are served to Collect as attached CSVs and Entity Lists are optimized in Collect so I'm surprised this is your solution. Is it because when you use a manually-built CSV/JSON you can change the subset of items attached? For example, the first 100,000 on week 1, then the next 100,000 on week 2, etc?

How many individuals need to access the same group of Entities in your scenario? Could that group share an App User token as a stopgap before we introduce a formal group feature?

As a follow up to limiting Entity access for App Users and Data Collectors and after receiving feedback on user needs, we are exploring single ownership with the ability to bulk change owners as an improved version of Phase 2:

Users will be able to select multiple Entity rows and scroll or search for an App User or Data Collector to transfer ownership to.

@ahblake does this get closer to solving for your use case more quickly?

@chun_hing_yap if you use an App User as a group as @LN mentioned, might this get you closer to what you need? For example, searching by trees species or region to assign ownership?

2 Likes

Phase 1: Single ownership
Currently, I do not have a workflow in which entity only can be accessed by App User who created it.

Advantage: Very high privacy.

Phase 2: Single ownership with the ability to change owners
Currently, I do not have a workflow in which entity by default accessed by App User who created it, and the ownership can be changed to another App User.

Advantage: High privacy and added flexibility to change ownership. However, an entity cannot be shared among App Users.

Phase 3: Group ownership
I assume entity by default accessed by App User who created it, and the ownership can be changed to multiple App Users.

I will be more favor to:

  1. All App Users can access to the entity that was newly created. Assume no ownership was assigned during entity creation. (by default).
  2. Entity with assigned ownership only can be accessed by relevant App User.

On the other hand, can consider having an 'active' flag for each entity for admin user to activate/deactivate the entity for access. This might sound like the existing delete flag, but actually for different reason. It will be beneficial for workflow that only need subset of entity to be loaded onto device and hence improving the entity performance (reducing data transfer and display/filtering) in ODK Collect. By default, when entity is created, the 'active flag' is 1/true. Only when 'active flag' is 1/true, the data will get transferred.

Suggestions:
'active flag' and no_ownership assigned = All App users can access the entities with active flag (as '1'/true)

'active flag' and one/group_ownership assigned = Only specific App users can access the entities with active flag (as '1'/true)

1 Like

It is related to data transfer and display/filtering. Group ownership is definitely the feature that I will choose despite it will come later at Phase 3. In addition, a generic way to subset entity such as introducing an 'active flag' will be useful for form that doesn't require ownership assigned.

Currently, I leverage more on pulldata on attached csv because it is more performance than if it were to use instance list with entity. AFAIK, pull data access db file. Has the optimization on Entity list have made no different in term of lookup performance between pulldata and instance list?

Instance list is definitely more versatile than pulldata as it can do aggregation such as count and sum. It is really an indispensable function for entities. However, the aggregation function is limited such as average, mode and median are unavailable. Perhaps, it will be good to have some advanced examples on using instance list with other ODK function to derive some aggregated value, else just mentioned it is not possible to be realized at current version.

Hi @norlowski and users,

Thanks for this discussion and apologies for the very late reply :frowning:

Phase 1 :

The option 1 is the one we use actually as it is the only one for the moment.
And it will still be a good option for small entities lists as studies, species, or sites lists that feed select questions (even from map)
But as the list grows, it becomes hard to use. I have in mind 10m grid cells with thousands of cells (entities).
At the moment, we only use is to feed select questions but as we will be able to create or update entities within repeats, we will want to update properties (as seen species). Such entities are created externally and bulk uploaded (or through the API).
Entities created by App users can be seen by all other (pond, trees) until the lists are not too big. Limiting access to entity owner does not have a lot of sense for us (I mean right now with the use cases I have in mind). Different app users can have some forms in common and then use the same entities lists

Phase 2

Manually changing entity ownership will not be useful for us. We'll prefer API management.

Phase 3

This one really fits our needs (or our ideas), and what we plan to use with a group based user management. A user belonging to one or more Thematic groups and to one or more Geographical groups

Mainly about Collect map rendering :

  • how to highlight or hide visited/updated entities during form filling
  • how to show only entities that are close to the enumerator (only the 100 cells around me instead of all the 10000). I have to test the distance() function over thousands of entities :wink:
1 Like