Expectations for stability of ODK Suite tool ecosystem

Batkinson · July 29, 2019, 9:00pm

Hi all,

I am not sure if this is the right place to post this type of issue, but this seemed too development-oriented for the community section. I am happy to move the discussion elsewhere if necessary.

I just spent the day investigating, finding workarounds, and fixing some surprise issues that interrupted some existing workflows for my organization where we employ Collect, Briefcase, Pyxform, and a custom, Aggregate-compatible, server. The issues were blocking a simple update of the form for our annual malaria indicator survey.

I will be following up with relevant issue reports shortly. However, I wanted to raise a question about the general expectations around the stability of community versions of the ODK Suite tools since this is not the first time we have been broken like this.

Specifically, should I expect that tools honor semantic version conventions? Said another way, should users and tool integrators expect stability if the major (first) version numbers don't change? If not, how is compatibility easily confirmed outside of trial and error?

Essentially, I am trying to judge whether we can rely on the community process or whether we need to devote more effort organizationally to ensure stability of these tools on our own.

Thanks,

Brent

yanokwa · July 29, 2019, 10:26pm

Hi @Batkinson,

I wanted to describe the process and the challenges to add a bit of context for the general public before jumping into the specific problem you are having.

Changes to the tools require a detailed issue to be filed so we know what the PR will be measured against. Incoming PRs must address that issue. Each PR is code-reviewed and tested by QA to verify functionality. About a week before release, we ship a beta. Betas usually run for a week, but are extended if users report problems. During the beta period, we also do regression testing on multiple devices, multiple operating systems, and multiple servers. The tests QA do are published in the README of the tool repo. If there are problems that make it into production, we've historically been hyper responsive when alerted of problems and shipped fixes in a few hours or days.

All that said, there are a lot of pieces to this ecosystem, they interact in unexpected ways, and there isn't as much test coverage as we'd like (although this is getting better all the time). We try to honor semantic version conventions and aim for no breakage on production releases. We sometimes deviate, but typically only to fix broken behavior, and generally with TSC approval. When we do break things, we highlight that in the beta announcement and ask for feedback.

The community process is as effective as the people who participate in it. Most users get a lot of value out of a subset of the available functionality and that is what they try and we get the most feedback on. If you are a developer who depends on these tools, you'll have more insight into the potential problems and your active participation will help improve the tools for everyone.

I’m sorry you had to spend time fixing surprise problems. If you can share what those problems are, we will work with you to resolve them, and make the necessary changes to catch that class of problem.

Yaw

Batkinson · July 30, 2019, 3:32pm

Hi Yaw,

There is definitely no need to apologize. I appreciate the sentiment, but I included that information not to provoke sympathy, but to help draw a picture for the community - that decisions are having an effect and perhaps not a desired one. I do not know if we are the only people experiencing these kinds of issues, but I thought it might be of interest.

As for sharing, I definitely will be filing issues where appropriate. However, some of these issues are likely not serviceable as defects for the latest versions of the software. They are issues between existing versions of the software, or forms based on older versions of the software, which is unavoidable in long-term real-world deployments. I was trying to understand the philosophy towards ensuring stability for these types deployments, and perhaps through it, determine a better approach to maximize chances that things will not unexpectedly break over time. It seems like the project leans hard on the latest and greatest and stability over time takes a distant back seat. I want to check my assumptions before making a move.

I understand the moving pieces and that comes with any system. However, the concerning part of what I am seeing is that it appears like a pattern of conscious decision making. At a minimum, I want to raise that we are struggling a bit with those decisions over here. This likely is not the best mode to have a detailed conversation about it, so I will not dig into that here. The details are important on these and I will reach out help you resolve the specific issues we are experiencing.

I essentially wanted to get a sense of whether there is a better way of evaluating the risk of updating so we are not broken by what appears to be a minor release. I was hoping for something along the lines of: "Yes, we are using strict semantic versions: if the major version number is different, it's likely to include a breaking change. If only the minor versions is different, it is compatible and includes new features. If only the final number is different, no breaking or feature change is included - only defect fixes.". Another way to manage is LTS releases.

With regards to participation, I agree that the party experiencing issues (namely us this time) is in a prime position to help. In fact, that is why I am posting (I already investigated and deployed a working solution before sending this). I wanted to share that we are experiencing some pain under the existing process and decision making and it does not appear to be simply that things fell through the cracks; the specifics suggest that these things are being done knowingly and intentionally. Perhaps what we are attempting is at odds with the goals of the community, but I suspect that is not the case. It may be that some of the assumptions decision makers are operating under could use some adjustment. In that case, we might be able to offer some counter-examples.

My take-away from your response is that we will just need to devote more effort to ensure we don't get unexpectedly broken. Either being more involved in testing and informing development decisions, or perhaps spending more time evaluating updates. I should be direct with you. That seems like a heavy burden given our desire. We like what you released. We just want to make sure that the principle of least surprise applies over time. In other words, if we make a small change to a form and want to repeat the same process next year, we will not have to spend a surprising amount of resources and effort to do it. That is effectively what is happening to us again. We are fortunate to have resources to devote to doing this, but expecting development-time to ensure users can come back to their ODK-based workflows seems at odds with what I thought was a goal of the project - being a sustainable data collection tool.

I know it is not easy fielding this type of feedback and you should know that we appreciate what you are doing and thank you all for making ODK what it is - you know who you are. We love what you made and are making, we want to keep running it and contributing back what we can.

Thank you for always fielding the challenging ones Yaw.

DavidM · August 1, 2019, 4:42pm

Hi @Batkinson

I can confirm that this forum is very open and helpful for standard users ( like me), there are a lot of developers in the forum, but they are welcoming of us mortals and our issues

Generally the community tools are very stable and release notes have been pretty comprehensive. When functionality is depricated, there is plenty of warning. Like with much complex software suite progression, most of the bugs get ironed out in the beta stages, which are very transparent.
As with all production software, it is perhaps worth waiting for a .n release rather than going with a .0 release so that any bugs that do make it into production are ironed out. The ODK tools have a small footprint, so if you have the resources it is always worth testing yourself before deployment into production.
If there are bespoke elements in your infrastructure, interoperability becomes an issue and you may want to employ more rigerous testing of interoperability before deployment.
I use quite a few open source systems in my line of work and can honestly say that the ODK community is one of the most vibrant and interactive I have come across.

I would suggest keeping a regular eye on the releases forum, perhaps join the beta program and review the published roadmaps. All of the above, (imho) are being excellently maintained, even when compared to any of the best commercial tools in this software space.

Batkinson · August 1, 2019, 6:05pm

Thanks for adding your thoughts and suggestions @DavidM.

I think your point was taken already. It is on us to make sure we defend ourselves against surprises like these again. As I said before, I think that is unreasonable given the particulars here, but I got the answer to my question. We will adjust our expectations accordingly.

My sense is that the instability we are experiencing is related to a desire to make rapid strides on improvements, which is definitely something to encourage. I feel that these particular events might be worthy of a retrospective to balance the picture on how well that's working. We are feeling increasing pressure to fork, despite not wanting to go that route.

I am not a stranger to using, and developing, open source tools myself and I am not attacking ODK or its maintainers. I consider myself part of ODK. However, the reason I raised this is that I felt our experience was something worth raising. Given that your experience has been overwhelmingly good, that is all the more reason to raise it. A good experience does not erase our mixed experience. Voicing it offers an opportunity to inform and, if desired, improve.

Opportunities come and go, so no hard feelings either way.

LN · August 1, 2019, 6:16pm

I look forward to seeing specifics -- it's really hard to figure out how we can do better without.

That sounds like exactly the opposite of what we aim for so certainly a bit crushing to hear! Given all of the processes and checks that @yanokwa has outlined, I'm surprised that you are still have that perception. Those of us who are involved in the day to day certainly are constrained by our capabilities and time but please be assured that we are trying our best to balance providing new functionality that users need and making sure not to break things for existing deployments.

That is the intent but what @yanokwa is getting at is that with such open-ended tools, it can be hard to know what a breaking change is. There is constantly back and forth between those involved in the da-to-day about how users will be affected by changes we make, how best to communicate them, and when to make those changes. We are constantly adding in new analytics and new ways to get feedback from users. But at the end of the day, we know that we are not aware of all that users are doing. Doing things like describing your infrastructure in the showcase, sharing complex forms to be part of our testing strategy, letting us know when you are planning a big deployment are examples of ways that users can help make sure that tools work as well as they can for them.

In your case, since it sounds like some time may pass between use of ODK tools, I would strongly urge you to first try things on versions you knew previously worked, then upgrade, and then immediately let someone know if you experience something unexpected.

Batkinson · August 1, 2019, 7:28pm

Crushing ... ... I did not expect that and that is not what I was looking to achieve. I am sorry. Apparently am not communicating what I want. I think it might help to talk about the specifics, but I wanted to avoid that here. This was about general expectation-setting. I will try to catch someone on slack.

I agree that it can be difficult in general. I don't know if it applies in these particular cases. There is actually evidence that suggests knowledge of breaking behavior, but the change was made anyway. It is unreasonable to expect breaking without knowledge, but that's not the scenario as I understand it. I don't want to drag everyone through it here, so as I said, I will try to connect through a more appropriate channel.

In general I agree. Whatever we're doing seems to be outside of the norm for some reason and "publishing" for the odd cases is likely the best way to ensure the tweaky bits of what we're doing don't stop working. The trick with these is that they do not seem tweaky at all.

This is not true in general for us. We have teams of people using Collect daily. We use Briefcase regularly, but less frequently. The problem here is that we have a fairly involved yearly malaria indicator survey to help us determine a number of things, including how we're doing with our other interventions. The scale of the survey is fairly large and we are are leveraging features that are likely not commonly-used in the community. The pretty-printing change to the xlsform converter was what got us last year. While many ODK users were not relying on that, we were. We have to hand-edit a couple of questions that xlsform can not handle, but javarosa handles happily. Much of the heavy lifting that we need to do year-to-year has been hammered out, but compatibility of the ODK suite has been where we have had to spend more time than expected. Anyway, my goal in sharing wasn't to overwhelm or put more responsibility on the project.

I was assuming knowledge of breakage and looking for some guidance on how/when to avoid an upgrade. I think it will make more sense in the context of the particular issues. For example: https://github.com/opendatakit/briefcase/issues/768

DavidM · August 1, 2019, 7:43pm

So just to add to this - I also have had pressure to add functionality, but rather than fork, what we have landed on is to use the development talent we have to looking at developing around rather than within the eco-system.
We use the standard products and augment them by developing in a mirrored database and BI tools rather than bespoking an aggregate alternative. This way we can rely on the stability of the core tools and tweak our own developments without impact. We can also test the standard products indepedently of our developments.
... If I was looking to bespoke an app server or similar, I would probably look to organisations with an intimate knowledge of the product for assistance, such as Nafundi

Batkinson · August 1, 2019, 8:01pm

The problems we are experiencing are not the result of lack of product knowledge, but the breaking of a published spec. In any case, thanks @DavidM. I suspect we are not doing the same thing, but that's what makes open communities fun.

Xiphware · August 1, 2019, 9:12pm

If there are particular aspects of a published spec that have indeed 'broken' in the course of an update, it would be very helpful to know precisely which spec, and what exactly broke. Typically any changes to a formal spec doc - eg OpenRosa, ODK XForm spec, XLSForm, ODK Central API, etc - go through quite a bit of scrutiny (by ODK developers and TSC) to identify any potential breaking changes to existing forms. So if we've missed something it would be great to know what.

Batkinson · August 1, 2019, 10:42pm

@Xiphware, I am pretty sure you can drop those quotes. It was broken. Ona was affected too, and given the details I hope you can see why.

The particular change I was referring to had to do with submissionList from the Briefcase API. The doc, which pre-dates the current docs but is still present today, specified that cursor values are opaque; that the values are not interpreted.

The reason I assumed that there was prior knowledge, and perhaps I was wrong on the timing, was that it was stated in the code implementing the change.

Batkinson · August 2, 2019, 8:59pm

I have reported detailed bug reports with code that addressed the issues for us, where appropriate. I believe that should be enough to help identify and fix the immediate issues.

However, while fixing the immediate issues in the mainline is nice, it does little to address the likelihood of happening again. I would like to help outline what we're seeing because I don't believe it is unique to us. I also feel this could have been avoided with a simple major version bump when making incompatible changes, which I suspect are likely to happen more with javarosa changing more often.

I am not sure it should go here and slack has been silent for the last couple days. I am not clear what channel would be appropriate for this one anyway.

Recommendations on how to go about proceeding?

yanokwa · August 2, 2019, 10:19pm

Thanks for providing the greater detail. So for the record, the two issues that I think prompted the post are:

It doesn't look like either of these changes were intentional or expected.

For the first issue, it's not clear what the root cause is. Let's dig into at https://github.com/opendatakit/briefcase/issues/767 over the next few days and try to get an answer. We can report back here once we do and figure out what process changes might help.

For the second issue, this is a mistake (both in not catching it in review and in not generalizing the Ona fix). We'll be sending out a fix based on your patch in a few days. Here, I think we can do better. The key to that improvement is to grow the number of folks who participate in the dev and test process. I'm open to ideas the community has on how we can do that. I'm even more open to community members who would like to take the lead on growing that number. If we could get consistent help from one or two more people, it'd make a meaningful difference.

yanokwa · August 8, 2019, 9:38pm

I’ve dug a little deeper and wanted to share a post mortem.

The first issue was a bug that affected custom servers and it has been fixed in the latest release of ODK Briefcase v1.16. If you have a custom server, helping us increase test coverage or helping with QA would help catch this class of problem. I don't think there is anything else we can do as far as process.

The second issue was more subtle. Forms with nested repeats converted with new versions of pyxform break on tools that use JavaRosa <2.12 (pre Dec 2018). This breakage is due to JavaRosa <2.12 having a bug that has now been fixed.

We had not previously identified the full extent of the bug, so we did not anticipate the interaction that @batkinson ran into. I believe this is the first we’ve heard of it because nested repeats are not very common. The bug we were fixing was described here and was communicated with users here.

In terms of versioning, knowing everything we know now, we still would not have changed the major version. This StackOverflow thread captures some of the principles we follow when bug fixes break backwards compatibility.

We’d really appreciate it if folks could immediately post unexpected behavior to the support category. Those who were recently in the code may be able to quickly spot the source of the issue and patch it without you needing to dive deep. That would likely have been the case with the first issue. It’s totally OK if the issue turns out not to be a bug or to be in non-ODK tools.

Going forward, we’ll add pyxform and JavaRosa version information to XLSForm Online/Offline so it’s a little faster to track down issues. Very open to any other ideas folks have to try to catch this class of problem.