ODK Collect testsuite failures

Xiphware · September 16, 2024, 2:42am

1. What is the issue? Please be detailed.

Running the ODK Collect instrumented tests consistently fails against an (unmodified) base v2024.2.0 clone.

2. What steps can we take to reproduce this issue?

Fork ODK Collect from https://github.com/getodk/collect with TAG='v2024.2.0'
update gradle.properties with the recommended heap size;

org.gradle.jvmargs=-Xmx4096m
test.heap.max=4g

open project in Android Studio [Android Studio Koala | 2024.1.1]
Run 'All Tests' (all the instrumented test) under collect_app/src/androidTest/java, against a plugged in device [Samsung Galaxy 12, Andoid 12/API 31]

when running all the instrumented tests - on an unmodified copy of the v2024.2.0 release branch (other than the heap size change) - several of the test suites appear to consistently fail, at least when run against this particular phone [see below for subsequent attempts against different API levels and/or different emulated hardware].

The failures are:

DeletingRepeatGroupsTest - skipped (0/0) [this test is always skipped it seems, just mentioning it here for completeness]
SavePointTest - 7 fail/0 pass
All fail with: "RecentAppsRule does not support this API level!
AuditTest - 7 fail/0 pass
All fail with: "RecentAppsRule does not support this API level!"
DynamicPreLoadedDataSelects - 1 fail/2 pass
- displayWarningWhenQueryIsBad fails with:
  "androidx.test.espresso.NoMatchingViewException: No views in hierarchy found matching:"
BulkFinalizationTest - 11 fail/0 pass
All fail with: "RecentAppsRule does not support this API level!"
MatchExactlyTest - 7 fail/0 pass
All fail with: "does not support this API level!"
PreviouslyDownloadedOnlyTest - 6 fail/0 pass
All fail with: "does not support this API level!"
AutoSendTest - 7 fail/0 pass
All fail with: "does not support this API level!"
FormMetadataSettingsTest - 1 fail/3 pass
- metadataProperties_shouldBeReloadedAfterSwitchingProjects fails with
  "androidx.test.espresso.NoMatchingViewException: No views in hierarchy found matching:"
FillBlankFormTest - 1 fail/24 pass
- formsWithDate_ShouldSaveFormsWithSuccess fails with
  "androidx.test.espresso.NoMatchingViewException: No views in hierarchy found matching:"

Rerunning the entire test suite several times always apprear to produce the same result [however, rerunning individual subsets of the testsuite produces different results; described below]

3. What have you tried to fix the issue?

As background, I've been working on a custom fork of ODK Collect, mostly rebranding but a few minor functional differences. Obviously, I'm rather concerned about inadvertently breaking something, no matter how minor my changes, so I've been constantly running and rerunning the extensive Collect testsuites. What I observed were a consistent set of tests appearing to now and then fail, sometimes dependent on whether I was running my changes on a physical device - mostly a fairly recent Samsung Galaxy A12 (Android 12), but also an older (but still supported) Huawei Android 5.1 phone - as well as against various different flavors of API level emulators.

So in order to reestablish a true test baseline I've gone back to an unmodified fork of Collect (ie without any customization whatsoever) to see what did/not work. Those results - against the Samsung A12 - are summarized above.

Obviously, some of the tests appear to be complaining about API levels. After some digging around the source code in the tests, there appears to be a check in RecentAppsRule.kt against two very precise API levels: 30 and 34 (but nothing inbetween!?]

override fun before() {
        assertTrue(
            "${this.javaClass.simpleName} does not support this API level!",
            SUPPORTED_SDKS.contains(Build.VERSION.SDK_INT)
        )

        if (Build.VERSION.SDK_INT == 30) {
            removeRecentAppsTooltips()
        }
    }

companion object {
        private val SUPPORTED_SDKS = listOf(30, 34)

So I guess my First Question is what is so special about API 30 and 34 in particular that will always make 3 of the instrumented testsuites always fail? And if it is basically a static version check, why are these suites not simply skipped instead?

Re-run failures under Emulator

In order to work around these without-exception test failures, I then re-ran each of the failed instrumented testcases (but not the entire suite!), this time against an emulated API 34 device under Android Studio. The new results for the previosuly failed testsuites are:

SavePointTest - 7/7 pass
AuditTest - 7/7 pass
BulkFinalizationTest - 11/11 pass
MatchExactlyTest - 7/7 pass
PreviouslyDownloadedOnlyTest - 6/6 pass
AutoSendTest - 7/7 pass
DynamicPreLoadedDataSelects - 3/3 pass
- I also reran just the displayWarningWhenQueryIsBad test on its own and it also passes when run alone
FormMetadataSettingsTest - 4/4 pass
- I also reran just the metadataProperties_shouldBeReloadedAfterSwitchingProjects test on its own and it also passes when run alone
FillBlankFormTest - 25/25 pass
- I also reran just the formsWithDate_ShouldSaveFormsWithSuccess test on its own and it also passes when run alone

So success; everything - particularly the API-sensitive testcases - passes if you run it under a suitable emulator! Or so it seems...

Re-run just non-API test failures against hardware

As described, all the non-API test failures against hardware appear to be due to a view required by the test not being shown. So in order to check whether this might just be a timing issue, I then re-ran these specific sub-testsuites, as well as running just the individual test itself alone. The results (again, against actual hardware) are:

DynamicPreLoadedDataSelects - 1 fail/2 pass
- displayWarningWhenQueryIsBad still fails with:
  "androidx.test.espresso.NoMatchingViewException: No views in hierarchy found matching:" when run the suite (ie both tests)
- displayWarningWhenQueryIsBad also still fails when run alone
FormMetadataSettingsTest - 4/4 pass (!!!)
- metadataProperties_shouldBeReloadedAfterSwitchingProjects now passes when run the suite (4 tests) and it also passes when run the single test alone.
  *** However, as mentioned at the beginning, this single test consistently fails when I run the entire instrumented testsuite run on the device! ***
FillBlankFormTest - 1 fail/24 pass
- formsWithDate_ShouldSaveFormsWithSuccess still fails with
  "androidx.test.espresso.NoMatchingViewException: No views in hierarchy found matching:" when I run the suite (ie all 25 tests)
- formsWithDate_ShouldSaveFormsWithSuccess also still fails when run alone

Hopefully you can see something odd is going on here, especially around the FormMetadataSettingsTest sub-testsuite and the metadataProperties_shouldBeReloadedAfterSwitchingProjects specifically; this specific testcase will run fine alone, or when running its parent sub-testsuite, or when run against an emulator.

Second Question: but why does is this one test consistently fail only on a physical device and only when running the entire instrumented testsuite?

Third Question: why do these two specific tests always fail due to a missing view when run against a physical device, but always work when run against the emulator?

Re-run everything under Emulator

Finally, for completeness - and since the instrumented tests appear to (consistnely!) behave and react slightly different when run against an emulator vs physical device, I also re-ran the entire instrumented testsuite against the API 34 emulator [again, remember I only reran the hardware failing sub-testsuites against the emulator before]. For this, the results were also somewhat unexpected; unsurprisingly all the API-dependent tests passed fine, however now a new failure consistently appeared:

MatchExactlyTest (6/7)
- whenMatchExactlyEnabled_clickingFillBlankForm_andClickingRefresh_whenThereIsAnError_showsNotification_andClickingShowDetails_showsErrorDetails
  "Could not find "Demo project""

This is test failure is consistent: in 4 or 4 attempts, of running the entire instrumented tests against the emulator (so about 5hrs total duration...) this test, alone, always failed. However, running this single test singly alone always appears to work.

Fourth Question: so why does is this one test consistently fail only on the emulator and only when running the entire instrumented testsuite?

4. Conclusion

I would dearly love to be able to rely on all the instrumented testcase for Collect to pass when run against an unmodified clone of the latest Collect source from git, but that doesn't appear to be the case. Oddly, although I am seeing very consistent behavior (suggesting its not particularly sensitive to a timing issue?) in terms of what is passing and failing, they are not failing in what I'd call predictable manner and appear to be dependent of whether I'm running on an emulator vs physical device, and the specific testcase individually, or parent of its parent sub-testsuite, or part of the entire instrumented testsuite as a whole. And if specific tests need a specific API level to be meaningful, it would be nice if they could be skipped if this prerequisite isnt met, with perhaps a similar 'skipped' message.

Although the exercise has given me the confidence of a 'deterministic' baseline - to know what to expect when running full regression tests against any changes I might make to Collect, it has been rather painful...

It would be great if anybody can confirm what I'm seeing and described here, and whether there are known issues with the Collect instrumented testsuite (eg my 4x Questions).

FWIW all the unit tests under collect_app/src/test/java run 100% OK every time.

4. Build and runtime environment

If it matters, I'm running Android Studio Koala | 2024.1.1 on a MacBook Pro M3 Max, 64GB with Sonoma 14.6.1. My dedicated test device is a Samsung Galaxy A12 (with an entirely clean reinstall of OS, with only the basic Samsung and Google apps installed), 128GB, running Android 12 (API level = "31-ext12" under device manager).

FWIW my other test device has been an older Huawei LUA-L02 running Android 5.1 (API level = "22"" under device manager), but I didnt enumerate all its test intricacies here as it would probably just muddy the water more...

I'd love to be able to hit "Run 'All Tests'..." and go mow the lawn for a couple hours and come back to an "Yer all good, you didnt screw anything up!" message. But alas not.

LN · September 18, 2024, 11:17pm

There are a lot of subtle differences between Android API levels and the instrumented test infrastructure is generally fairly brittle. If you want to dig deeper into any of these, we would welcome suggested improvements. In the mean time, our solution has been to target specific API levels and generally run instrumented tests on Firebase Test Lab. You can see a gradle task for this at https://github.com/getodk/collect/blob/66e9c1f10f707b803654bc57f7930b805edf2742/build.gradle#L96

We generally run specific tests to support development and then run the full suite on Test Lab when we believe a set of changes are ready for merge.

Xiphware · September 20, 2024, 4:30am

After re-running this particular testcase against the latest v2024.3.0-beta.4, I observed the same behavior: runs fine on emulator but fails on (my) hardware. Upon taking a closer look at the testcase itself:

@Test
    public void displayWarningWhenQueryIsBad() {
        rule.setUpProjectAndCopyForm("external-csv-search-broken.xml", Collections.singletonList("external-csv-search-produce.csv"))
                .fillNewForm("external-csv-search-broken.xml", "external-csv-search")
                .answerQuestion("Produce search", "blah")
                .swipeToNextQuestion("Produce")
                .assertText("no such column: c_wat (code 1 SQLITE_ERROR): , while compiling: SELECT c_name, c_label FROM externalData WHERE c_wat LIKE ?");
    }

This test would appear to be failing - on hardware - because the actual error displayed is, from a screengrab:

"no such column: c_ wat (code 1
SQLITE_ERROR(1]):, while compiling: SELECT c_name, c_label FROM externalData WHERE c_wat LIKE ?"

specifically, there's a disparity between the expected "...(code 1 SQLITE_ERROR)" and actual "...(code 1 SQLITE_ERROR(1])" string chars.

Perhaps loosening up this (and other?) testcase's asserts so they are less sensitive to the exact error string produced, which I imagine could vary slightly depending on the specific version of underlying lib that is throwing them?

Xiphware · September 24, 2024, 4:40am

To followup on this other failing test, rerunning now against the current ODK Collect release v2024.2.4, I also observed the same behavior: runs fine on emulator but fails on (my) hardware. Upon taking a closer look:

 @Test
    public void formsWithDate_ShouldSaveFormsWithSuccess() {
        //TestCase17
        rule.startAtMainMenu()
                .copyForm("1560_DateData.xml")
                .startBlankForm("1560_DateData")
                .assertText("Jan 01, 1900")
                .swipeToEndScreen("01/01/00")
                .clickFinalize()

                .copyForm("1560_IntegerData.xml")
                .startBlankForm("1560_IntegerData")
                .assertText("5")
                .swipeToEndScreen("5")
                .clickFinalize()

                .copyForm("1560_IntegerData_instanceID.xml")
                .startBlankForm("1560_IntegerData_instanceID")
                .assertText("5")
                .swipeToEndScreen()
                .clickFinalize();
    }

This testcase is probably consistently failing on the assert

.assertText("Jan 01, 1900")

because what appears onscreen on the actual device is "01 Jan 1900"

At least we now know why... hopefully this'll save somebody else some grief.

Xiphware · September 24, 2024, 5:13am

So the only outstanding one is:

This single testcase consistently fails, but only when running the entire instrumented testsuite (!), and only on (my) hardware [it passes when running entire instrumented testsuite on emulator]. But running the testcase metadataProperties_shouldBeReloadedAfterSwitchingProjects just on its own consistently passes, as it does when running the FormMetadataSettingsTest suite just on its own (in both cases on hardware). It'd be nice to understand why this behavior, but it looks like a pretty involved testcase, and I dont have the bandwidth to further debug it right now (especially as it is so much time consuming to reliably reproduce...)

But I think I've got a good handle on what is/not expected to work in the instrumented testsuite now, and why, and on what permutations of API and hardware-vs-emulator. Hopefully this will be useful for posterity.