We are getting ready to start a project that will enroll around 200,000 households and 1 million people. It would be nice to take advantage of entities to link census, events, and follow ups. Here are two ideas:
Since entities is limited (by performance) to around 50,000 rows, perhaps we could do this by using a single form per province (around 10 provinces). That way, we'd only have around 20,000 homes (per province).
This still would not help with a map. I'm guessing the map performance would drop significantly with anything above 500 points. We are grouping by cluster, so if there was a way to filter by data collector or cluster, we could go down to around 300 homes. I think this would allow us to select the house form a map.
Could we dynamically remove data from an entity over time (using API)? If we were trying to connect visit 1 with visit 2, perhaps we could remove the entity record after visit 2 was completed? With this, we could probably keep the entity list down to a few thousand records.
My current understanding is that entities is not yet ready for projects at this scale. Let me know if I'm missing something. Thanks!
It's unfortunately really hard to make definitive statements about performance because there are so many factors involved.
We give 50k as an estimate of when performance degradation begins based on two major factors: device RAM and transfer connection. These both play a big role because currently the entire Entity List must be stored in memory and also be transferred with each update.
Both are also heavily affected by the width of the data: how many properties there are and how long their values are on average. Speed of transfer will also be affected by how repetitive the data is. That is, if you have many many properties that each have values 'yes' or 'no', that will get compressed down to a much smaller transfer size than a list with few properties that each have unique values.
RAM usage will change significantly with Collect v2024.3.0 which is currently in beta as we work through some remaining issues. This version will store Entity Lists in a database and significantly reduce memory needs for larger Entity Lists. The whole list will still need to be transferred and processed, however.
All this to say, you may be able to have significantly more than 50k rows depending on the shape of your data. I would encourage you to generate sample data of the shape expected in a particular project, upload it as an Entity List and try empirical checks with the devices you will use. Or if you tell us more about the shape you expect and the devices you will use, we may be able to give you a sense of what's possible.
Yes, absolutely.
Yes, we haven't optimized displaying multiple points with the assumption that generally data can be filtered.
That sounds great. You could have a cluster property and a choice filter for that.
Yes, absolutely.
It's certainly not as smooth as we eventually want it to be! As you've described, with some creativity it's possible to make it work, but you need to decide whether the manual interventions (splitting up the data, automating Entity removal, etc) are in reach and worth it for you.
I encountered a similar challenge with follow-up from a previous project, though it involved fewer data points.
My workaround was to load the previous visit data into a CSV file and use various fields, in your case (data collector or cluster), to filter the records for the next follow-up with ease.