I have been excited for entities since they were introduced. And I now have a health project where I would like to use entities. However, the documentation earlier indicated a limit of 50,000. I was wondering if that is no longer the case, and i would like to learn from other members of the community who may have used it with more that 50,000.
The project, will list estimated 280, 000, maybe less or more with follow up actions for listing hh members, then death and pregnancy notifications etc.
Does anyone have any experience with entities on that scale.
I just created a project and bulk uploaded 215,000 entities in one go. It took 23 seconds. All the properties were processed and are there. No doubt you could upload more!
So Central can definitely handle the upload and processing no problem.
I haven't tested the mapping workflows with this - perhaps adding a select_one_from_file and trying to load all the points on a map might be a bad idea, but other than that I can't see why there would be issues - good luck with testing!
Thanks for your question @Iammash and I'm glad you're feeling excited for Entities!
We are keeping the 50,000 warning for now but it's very hard to make a blanket statement because there are many factors that affect Entities performance.
The biggest thing I would have you consider for 280,000 is that the whole list will be transferred to every device each time a form update is requested. Depending on the connection that devices are on, this could be a big deal. I'd recommend making a spreadsheet with random values and the number of columns you expect, creating an Entity List from it in a test project, and trying to use it in a test form. You'll want to make sure that you try network conditions similar to the ones you expect in your real context. If you want to tell us how many columns you expect and what kind of data they will contain, the ODK team may be able to do some testing for you.
By default, Collect requests a form update every 15 minutes. If you don't need to have very fresh data, you may want to consider reducing that frequency to avoid constantly requesting and processing the data.
Another thing you will want to consider is how you want to access and use Entities in your form. If you're always looking up a single Entity by system ID (the name column used from selects), that should always perform very well. If you're filtering on other columns or using more complex lookup expressions, performance will depend on the CPU and RAM of your devices.
Thanks for sharing your experience, @spwoodcock! How many properties do those Entities have?
Yes, definitely so. In general, it should be possible to use choice filters to only need to show a subset on the map.
Unfortunately performance depends on mapping engine. I believe OpenStreetMap is currently the most performant but even then I'd try to filter down to at most 500 Entities to map.
Good morning all, apologies for the late reply, been somewhere with network challenges.
Thanks @spwoodcock I think I will try to load up entities like you did and test with the devices likely to be used by field workers.
@ahblake could you share the specs of the device you had challenges with? I will mostly work with geopoints, I am guessing it would be less problematic than geotraces.
Thanks @LN for the detailed explanation. Our use case does not need to pull data every now and then, but there may be a case when they need to capture household member data against the registered household almost immediately.
Now that I think about it, I may need to really think about this. Here is more context.
The system is to be used for a sample registration system that will collect some baseline data, followed by event data like pregnancies, births and deaths. We expect community workers to collect based line data in their assigned enumerated areas in the household listing form, then capture household member information in household roster form, then pregnancy events in another form linked to the household members who are female and above a certain age, for the pregnancy, we want to capture pregnancy outcome later, and then there is also capturing of death events against the listed members. And I am thinking whether to entities of just the old select_once from csv with search(). Otherwise, I just need few properties for the entities, 3 - 5.
Entities sound like a good match for this. They will allow you to split your workflow across multiple forms that all make offline updates to the same Entity.
Am I understanding correctly that all Entities that a data collector needs to work with will be originally created by that same person? If that's the case, then you're right that pulling Entity data from the server is not very important for this use case. You could turn off updates entirely or you could let them run and accept that they may fail. That's probably acceptable in this case and at least will let you push updates to the form definition if needed. Currently there's no way to separate out form definition updates and Entity updates.