Understanding invalid polygon (Self-Intersecting) issues in Collect

We are doing research to better understand why invalid polygons (self-interesting) occur during data collection. We’ve seen this as a challenge specifically in agriculture, environmental monitoring, and public health. Our goal is to understand the contexts and behaviours that cause these errors, so we can help data collectors prevent them and reduce rework for project managers.

:thought_balloon: Problem we’ve heard so far

  • Data collectors sometimes create invalid polygons due to GPS drift or user error, like crossing over already mapped areas.

  • Current workflows lack real-time validation or clear guidance on fixing geometry issues.

  • Errors are discovered after submission, requiring rework. This is very hard to do afterwards because the data isn't available and editing in Enketo is hard.

:brain: What we want to better understand

1. Where the errors occur

  • When is the invalid point created? Was it the last point, the one collected before, or it became invalid because of a much earlier point or something else?

2. Recording behaviours

  • How do users decide which recording method to use?

  • During automatic mode, what are your data collectors typically doing with their phone? For example, is it in their hand and they are looking at it the entire time, or in a pocket?

  • If they are using placement or manual mode, where is their phone?

3. Strategies to prevent invalid polygons

  • What strategies have helped you and the data collectors reduce these errors?

This work builds on a previous discussion about recording methods and whether those should be defined in the form design or remain in ODK Collect. From learned it’s important for users to choose their method and the use cases shared were incredibly helpful.

We’d love to continue that conversation as we explore how we can design a solution to prevent invalid polygons during data collection based on common behaviours.

  • When is the invalid point created?
    • The most common intersecting polygons I see are where they're technically invalid but approximately correct in the eyes of the data collector
      • 1: using a crossing to indicate a very narrow section or a point intersection instead of having >=2 vertices at the isthmus or two polygons
      • 2: trying to create a very thin section to bridge to another area, resulting in a crossing (should be reported separately)
      • 3: slight overlaps with detailed geometry due to time constraints / difficulty placing accurately. depending on shape could result in intersection due to any point
  • How do users decide which recording method to use?
    • I request placement by tapping in almost all cases and have a constraint to ensure that elevation or precision are 0
  • During automatic mode, what are your data collectors typically doing with their phone? For example, is it in their hand and they are looking at it the entire time, or in a pocket?
    • not applicable in almost all cases
  • If they are using placement or manual mode, where is their phone?
    • tablet is handheld in front of user
  • What strategies have helped you and the data collectors reduce these errors?
    • counselling user when bad geometry uploaded

I ran 'check validity' over almost 10000 polygons from a project to see the sorts of errors that QGIS found, looks to be ~1-2% error rate in field geometry.

Duplicate nodes: Don't think these are actual Collect polygon issues, included for interest
  • 21x "2 duplicate node(s) starting at vertex 1" due to first = second. (In the analysis output, all had 17 DPs of precision, generally as 8DPs of values, 6-7 0s or 9s, then 2-3 values at the end - the source geometry didn't have values like this, they were 7-8DPs if desktop created and not modified, and 14-15DPs if modified in Collect so appears to be a conversion issue in the check.)
  • 24x "2 duplicate nodes" "at vertex !=1" occurs when there are consecutive duplicate points elsewhere in the polygon.
    • Not sure how either of these these duplicates occurred - desktop duplicate or Collect (I couldn't intentionally create a duplicate in Collect even at max Mapbox zoom). Checking the source, most of the geometries with duplicate vertices had 7-8DP precision - assume these were desktop and not modified. 3x of the 24 that errored had 14DP precision, so appear to be Collect modified, but perhaps only the non duplicates were adjusted. I'm going to assume these came about from an errant extra clicks with geometry snapping or similar in desktop
  • source geometry for polygon with 14DP that returned 'duplicate vertices at vertex 5' but only a match to 7DP:
geometry

SRID=4326;POLYGON ((-10.18806845920256 -10.596776664266088 0, -10.188079265162 -10.59677407029217 0, -10.18808165801846 -10.596777987775411 0, -10.18807025260708 -10.596780962257768 0, -10.18806846016878 -10.596776671971696 0, -10.18806845920256 -10.596776664266088 0))

  • 1x ring 0 not closed - first != last and there is a crossing (possibly a desktop goof from the DPs but the user also didn't check/correct if so):
unclosed ring geometry and details


SRID=4326;POLYGON ((-10.2209927 -10.2306696 0, -10.2209833 -10.2306667 0, -10.2209871 -10.2306509 0, -10.2209745 -10.2306521 0, -10.220976 -10.2306425 0, -10.2209954 -10.2306454 0, -10.2209833 -10.2306696 0))

  • 165x 'segments x and y of line 0 intersect at ...', 152x of which have 13-14DP so appear to be Collect created/modified.

Again, ok from afar


Trying to create two areas that meet at a point


Trying to create three separate areas in one polygon with intersections in the links

4 Likes

Some great feedback here already - I can't add much more to the details given by @ahblake !

(I tend to collect geoms in a layer above the survey tool then inject the values into Collect / Webforms).

If I was gathering data in Collect, ideally I would want some real-time feedback once a geom becomes invalid. Its pretty easy to implement one of the many polygon intersection algorithms out there. Once a point is clicked that causes an intersection, the geom could highlight with a prompt to warn the user.

However as seen in some of the examples by @ahblake above, the enumerator might be walking a large polygon, and there may be field constraints preventing them getting to a place to correct the intersection (particularly at bottlenecks). The best workflow to handle this would be (1) to allow them to modify vertices, so the two conflicting points can be adjusted as needed (probably not trivial…) (2) or failing that, ensure the intersection message is only a warning and not a blocker for continuing.

Other Considerations

  • Another thing to consider is if any community members either (1) actually want intersecting geoms, as it better represents the reality in the field (2) dont care about intersecting geoms. If any of those cases are true, then it makes more sense to implement this as optional form logic.
  • Complex intersecting geoms (as shown above) are tricky to post process. But many are simple enough. Lots of GIS tools have make_valid functions. A simple buffer(0) will fix some geoms, or even a convex hull if appropriate. I would say in many cases this is more of a small data cleaning challenge, rather than something that needs to be handled by the enumerator.
2 Likes

My examples are all manually placed, so there should be less potential for intersections compared to walking over already covered ground when manual/automatic recording.

You can move any vertex of a polygon while adding with tap and drag, but you can only delete last or add after last. Editing that allows deleting any vertex or adding a vertex between two existing vertices could make it easier to fix a polygon when moving alone can't, but this could be way overcomplicating things!

I agree, while ~200 of my polygons were invalid, they were almost all adequate for what I needed, so a warning or optional constraint would be preferable to hard stop when an intersection exists

3 Likes

My caveat is that generally we collect geodata in the form of points and lines. We have a small number of forms where polygons feature but they are not ‘widely used’ - i.e. only a small number of fully geo-trained enumerators use these forms, and often to ground-truth previously mapped data.

With an Android device’s internal GPS I would only use manual or placement. I have used automatic placement with a high accuracy (2 - 50cm) external device. I generally advise manual recording, as placement is probably only as good as the base map available (which in @ahblake’s case is often very good!). Conversely it can be a good way to hybrid digitise (i.e. place the point on current location and then adjust ‘visually’ if necessary to account for potential drift / poor signal). This is intense, interactive data collection!

During capture of a polygon (or line) I always expect the device to be screen-active and in the hand (even in automatic capture). Otherwise it’s a game of pin the tail on the donkey.

When collecting the data, showing the start vertex in a different colour would be helpful, particularly if there are not many distinguishing features on the ground (e.g. an arbitrary start point). This would help avoid overshoot which is a common cause of error for me.

Unless web forms are going to have full digitising capability (move / add / delete vertices) then an external GIS-type editor is probably more efficient - giving enumerators too many ways to clean their own data could be counter productive for producing ‘good’ primary data (as with many things there is a danger of “perfect” being the enemy of “done” or more likely “adequate for the purposes”).

There is a whole other story about overlapping polygons…

2 Likes

Thank you all for sharing your experiences and common scenarios! Seeing screenshots of the invalid polygons is very insightful. It gives us a better sense of the types of shapes that are causing issues.

I should have mentioned earlier that we’re also looking into overlapping polygons, since we know that’s a major pain point. We’ll share more updates soon and cross-link the posts so everything stays connected :blush:

A couple thoughts I’ve had in the past… if you temporarily make the polygon (geoshape) into a line (geotrace) - then it becomes much more readily editable in Enketo without it constantly complaining.

I’ve very curious about this too. In some cases we’ve seen, it has been from a bunch of points clustered together (eg they stopped for a few minutes whilst still recording?), and with the GPS fix bouncing around a bit is it highly likely you’d get some lines crossing. A quick fix for this situation could be to, say, coalesce adjacent points next to each other in the sequence and very close (eg <1m?) into 1 point. [adjacency would avoid inadvertently coalescing when you return to near the same location to say cross back over a bridge…].

Something like this could say be done iteratively, by the user: hit keep hitting ‘coalesce’ till the polygon is no longer intersecting itself, or they observe it has simply gone too far and the shape radically changes.

Just some thoughts. :thinking:

1 Like

Thanks for looking into this important issue.

When mapping agricultural fields we observe self-intersections with more than a third of all polygons.

When is the invalid point created?

The most frequent error issues are self-intersections at a small level of some meters:

My guess is, that these errors occur due to shifting GPS signals, when enumerators walk slowly or stay at the same place for some time. But this is just a guess.

Other frequent errors (but not in a technical sense) are overlapping polygons. These could be avoided if already registered polygons (from entities) could be displayed during registration to give visual (or accustic?) indications that they are registering a new polygon inside an existing one.

How do users decide which recording method to use?

We train them to use automatic mode. But given the frequent errors produced we think of changing this to the method where the enumerator taps each time she/he changes direction. Which has also potential for errors.

Our enumerators use automatic registration of the polygon when they walk around the irregular shaped fields. With very small fields (about 0.25 ha) the frequency of point registration is set to 5 seconds, larger fields at the standard rate.

Suggestion: it would be great if this frequency could be predefined in the XLSForm.

During automatic mode, what are your data collectors typically doing with their phone? For example, is it in their hand and they are looking at it the entire time, or in a pocket?

Good question. I actually don’t know. I guess it’s in one of their hands. But since we are working in rough environments it might be that arms and hands are moved around to balance.

What strategies have helped you and the data collectors reduce these errors?*

None in the field yet as we don’t know the reasons or how to prevent them. We developed scripts for QGIS to remove these errors (self-intersections, wholes, overlaps, etc.) for a whole layer. We then upload the modified data again into Central (as entity). This method is quite efficient but still not ideal.

Correcting polygons in Central is tedious and very time consuming.

Just displaying errors to the enumerators AFTER he has walked around a field (which can take a lot of time, and you do not want to repeat it) is not good idea either. Even if he walks again around the field, the probability is high that similar errors will occur again.

The user experience of the enumerators should also remain intuitive as it is. That’s why we prefer the automatic mode until now. They basically should not get bothered with technical issues.

So here is my wish-list:

  • Enable Collect to ‘smooth’ lines and remove these small self-intersections on the go. Don’t know if that is technically possible within Collect. But if it is, it should be optional and preferably defined in the XLSForm.

  • Define the standard frequency for automatic point registration in the XLS Form.

  • Optionally display already registered geometries from an entity to give enumerators a visual and/or acoustic warning when an overlapping happens (just like geofencing).

Thanks to the ODK team for looking into these issues. Happy to help.

4 Likes

Hi all,

Thanks for the great feedback and comments. I will try to add my observations.

I agree with Dast and I am also wondering if that cannot come from enumerators registering points to frequently when they are on manual mode.
I have also trained the enumerators to only tap to register a point when they significantly change direction. That reduce the risk off having point to close from each other that with the drift my ended up generating self-intersection (and also create lighter files with less coordinates)

Another issue I face is when enumerators are mapping field that are splited into 2 parts connected by a small path.

→ never really investigate that but seems that it can happen anywhere in the shape as well at corners and small path.

→ Mostly in their hand, registering points too often & manually.

Thanks Aly ! Happy to help

1 Like

Hi everyboy, and thanks a lots for then discussion !

here is my contribution, a lot has already been said.

-> Using "Automatic location recording" : when the enumerator didn't move enough during the recording interval
-> Using manual editing : when the user goes too fast and doesn't care about polygon's validity (doesn't know background consequences)
Mostly when people do set a

I'm pretty sure they always keep their phone in hands.
And I think they choose the recording method regarding the size of the patch they need to draw, its accessibility and its visibility on the aerial basemap map.

2 Likes