Proposal: change Collect versioning from major.minor.patch to year.release

ODK tools have all historically used something that looks like semantic versioning. I think the biggest advantage is that version strings that look like vX.X.X are familiar. Here are some issues:

  1. It's unclear what "breaking changes" are in user-facing tools (as opposed to APIs). We now have 31 minor versions of Collect. Downgrades are typically possible but in practice are not practical and may result in loss of local data. Should some of those have been major version increases?
  2. Making a major version change feels like a really big deal and it's unclear what should trigger it. We have many small changes that will break some users in the next release (removing support for unscoped storage, removing support for the Aggregate 0.9.x API, removing support for file-based settings import [used 9 times in the last year]). Should those trigger a major version change even though we expect the number of affected users will be very small?
  3. Should multiple project support trigger a major version change because it's a big addition? Even though we're designing it so that most users should be entirely unaffected?
  4. We almost certainly have additional big changes coming up with explicit support for entity-based workflows. We'll sandbox those changes so that they only show up if a project configuration requires it. It will likely be a set of features that we expand on over time. Should the start of these additions trigger a major version change? Reaching some threshold of functionality?
  5. We display a version number in Collect so that team leaders can quickly verify that all their data collectors are on the same version. This works well but it doesn't give teams any sense of how recent their version is. We see various problems from users running really old versions. If they saw v2017.2 instead of v1.5.0, they might be more inclined to upgrade.
  6. We release Collect when we have a set of changes ready so the pace of updates is irregular. When a user says they're running v1.21.1, that gives us no context. However, something like 2019.3.1 (first patch of the third release of 2019) would tell us a little bit more.

Items 1-4 suggest that we have decisions to make about Collect versioning coming up no matter what. Given that I don't have a clear sense of what a major version change means for Collect and that I'd love to avoid handwringing around when we do it, I'd like to propose we change the versioning scheme to

YYYY.release.patch

That would also give us some nice little benefits like 5 and 6 and I can't currently think of downsides (that's where you all come in :grimacing:). @Grzesiek2010 pointed out that the release number could be interpreted as a month which is possibly misleading but I think even that misinterpretation carries more information than our current versioning scheme.

JetBrains, the company behind the IntelliJ development tools (which powers Android Studio), moved to this style of versioning in 2016 (with somewhat more complexity because they have many subproducts). As a user, I've found it helpful and their reasoning in the linked article resonates with me.

I don't have strong feelings about other ODK tools moving to this convention. I don't think it needs to be a coordinated effort.

ODK's tools are mostly user-facing (as opposed to developer-facing) and so semantic versioning (aka SemVer) doesn't make a lot of sense, other than it's what we've generally done.

SemVer also doesn't make sense because, at least on the Play Store, it's getting harder to stay on some arbitrary version to avoid getting a breaking change.

We are moving toward more regular release cycles and we typically want people on the newest versions. A calendar-based versioning captures that pretty nicely. It's also where the industry is heading (probably because humans associate version numbers that go up quickly with high-quality software).

If we are going to switch, and I bias toward switching, I'd switch to something like CalVer where it's purely date based. I like YY.MM.patch because it's short, sweet, and Ubuntu uses it. So if we released something today, it'd be 21.02.00.

If we are going to make a versioning change, I think we should move most of the tools to it because I don't love the overhead of having two systems. Maybe we keep SemVer for libraries/APIs and CalVer for user-facing tools.

1 Like

I did not know about CalVer or notice that Ubuntu versions were year-based! I’d prefer to use a 4-digit year so it’s really clear what it means but I’m happy for the next segment to be a month rather than a release count.

1 Like

Good thing to get more sense in version numbers.
And I agree with those arguments :

Generalizing CalVer would help to know if the collect's version used on the field is 100% compatible to the Central server.
If I only think about xlsform capabilities and form features/widgets, when I develop a form with the last version of Central (and associated libraries), there is a risk to get problem with users using older version of Collect

Then I will be able to clearly warn users -> if you want to run SICEN_2021 you need to upgrade at least to 2021_x_y collect's version

Common CalVer between ODK tools could help :slight_smile:

2 Likes

Yeah I'm also happy so that solves my concerns:

2 Likes

I’d like to retract that statement, please!

Here are a few reasons I prefer an incrementing release count within a year over a month:

  • It allows us to talk about an upcoming release in an evergreen way. For example, if we were to switch the versioning scheme now, we could refer to the next release as 2021.2 instead of v1.31. We’re not yet sure what month the release will be ready in so I’m not sure how we’d refer to it with a month-based scheme. We don’t want to say “the next release” because that’s confusing to read once that release is out.
  • We wouldn’t have to give any thought to versioning betas. In a month-based scheme, let’s say a beta is ready on the 15th of a month. We think we will release in the current month but we’re not guaranteed to. We could use the month that the beta is ready in but then that’s really strange for a beta that’s ready on the 30th of one month and will definitely be released in the next. One of the reasons I want to change the versioning scheme is to simplify decision-making. I think an incrementing count better achieves that.
  • A count-based scheme provides information about what the previous release was. In a month-based scheme, we have no way of knowing what version came before 2022.7 without looking it up. If a user’s goal is to upgrade one version behind releases and they’re running 2022.3, they can’t quickly know whether they’re on the desired version.
1 Like

@LN all great points and all points I hadn't thought of. I'm convinced by the incrementing minor segment (as opposed to month) now.

1 Like

Thanks @LN that makes a lot of sense and I also feel convinced.

1 Like

The @TAB has discussed this proposal and agrees with proceeding with this change. How and when this gets rolled out is up to the core team.

2 Likes

Count-based versioning is great for downstream packages because it's hackable as a decimal number, which makes version comparisons easy.

E.g. ruODK switches its parsing behaviour for form_schema based on the backwards incompatible change from XML to JSON format for the form schema between 0.7 and 0.8. As long as versions can be parsed as decimal numbers, and backwards incompatible changes are reflected only in major/minor components, ruODK's users have an easy life.
My hacks should however have no influence on the core team's decision. Just wanted to share my happiness with count-based versioning :slight_smile:

2 Likes