ODK Central v0.6 Beta

Now that I have the Audit functionality working (having fixed a quirk of XLS > XML conversion if using old version XLSForm Offline)

Now trying to get Audit download working well from Central 0.6

When I use Download all records button the Audit.csv file doesn't seem to have all the audit trails for each individual instance. I only seem to get an audit.csv file containing the initial submission audit trail.

When I download from central via Briefcase these do all get concatenated into a single CSV plus each individual CSV ends up in the media folder.

Thoughts?

I can't find an issue or notice so @issa will need to refresh my memory on where this is documented but it is a known issue that there is no collision resolution on export when multiple attachment files have the same name. This Briefcase issue highlights some of the challenges with collision resolution.

Given that Central 0.7 will produce the same single big CSV export as Briefcase (see the release criteria), do you feel like you still would like to export the individual audit files? If so, do you have a sense of how you'd like them to be named?

i haven't documented this particular quirk anywhere, mostly because i had no idea it existed :slight_smile: but also because it sounds like it is not a Central-related issue. either way, at least 0.7 should resolve this one.

Because central let's you download the individual audit trail meta data if needed I'd be happy if the download everything button gave you one CSV with all changes together.

1 Like

Trying out a few more things.
We pushed some old datasets on to Central using briefcase.
1000+ forms, 20 odd questions with 2 nested repeats.

Downloads fine using briefcase
Won't download using direct download button (Failed - Network error)

?A timeout issue that we need to configure at our end @MatthewMac or ?a Central issue - not sure

One more thought
Should we have a purge submissions button?
We often put up a version of a form, train people on it and then want to wipe the data before real data collection starts.

Currently to do that you would need to delete the form and then upload a new version (with either a different name or version ID)

@dr_michaelmarks apologies for jumping in on this, we are also testing ODK Central at the moment.

re download size: We generate ca 10k submissions per 4 month turtle nesting season with a fair amount of attachments (ca 100-150 GB of photos per season).
@issa what will happen if I hit the "export all submissions" button?
Purging successfully extracted submissions might be the way to go there.

re test submissions: We always include a "training" flag in forms, e.g. real turtles are e.g. Natator depressus (Flatback turtle), Caretta caretta (Loggerhead turtle), and the training turtle is the Corolla corolla (Hatchback turtle). Analysis simply excludes these training records. This solves the problem of switching between training and production forms/settings/accidentally sending production data into the training bucket.

I think it will time out (as it does for me on 1,500 submissions without attachments).
Approaches I can think of

  1. Still using briefcase
  2. Generate file in background and provide link to it - have seen this done on other platforms when the file size is too big for a standard immediate download to work

Purging successfully extracted submissions might be the way to go there.

I definitely don't think you want to permanently purge exported submissions from central - this would not be ok from a Good Clinical Practice perspective for medical/clinical research studies - you need to maintain the original database/records. For example some of our aggregate servers have data on >100,000 individuals in a study.

Might work for some people but doesnt work well for us in clinical studies where we want people to practice using the forms exactly as they will in real life / hospital etc.
Maybe we just have 2 versions of form where one has ID XYZ_training, but personally I still think a purge submissions function could be useful

1 Like

I think it will time out (as it does for me on 1,500 submissions without attachments).

it shouldn't. central is architected to continually stream data out as it is processed, which under normal conditions should not cause a timeout. if something else is happening in your setup, then either we have done something we did not intend on the server side to prevent our designed behaviour, or else the additional layers present in your environment are causing problems.

as for questions around draft or training forms, we are aware that it is a big pain point and we will have some kind of answer to that concern at some point. there are a lot of other smaller questions that will need to be gradually answered before we can get there.

1 Like

@issa re: the timeout @dr_michaelmarks reported
Smaller forms (around 200 entries - zip 90 KB) download just fine. The larger form (12 fields, 1200 entries) throws a timeout error on the console

nginx_1     | 2019/09/10 15:01:46 [error] 10#10: *18 upstream timed out (110: Connection timed out) while reading upstream, client: <my workstation ip>, server: , request: "GET /v1/projects/6/forms/baseline_hh_survey/submissions.csv.zip HTTP/1.1", upstream: "http://172.19.0.4:8383/v1/projects/6/forms/baseline_hh_survey/submissions.csv.zip", host: "<server>.lshtm.ac.uk", referrer: "https://<server>.lshtm.ac.uk/"

Nothing unusual about the server, and there's no proxy to the workstation.

Any ideas?

not immediately. sounds like i have some investigation to do. how long does it hang before timing out?

Unqualified comment from the sidelines:
Is the ODK Central that times out behind another proxy with its own timeout settings?
What's the setup and deployment?
Are there logs? (even inside containers e.g. written to /var/log/odk/std_*)

1 Like

@issa - 60 seconds exactly. Consistently.

@Florian_May Looks like the Nginx proxy is timing out trying to retrieve from Central.
Logs in the service_1 container don't suggest anything wrong -
Nothing in /var/log/odk/stderr.log and only
::ffff:172.19.0.5 - - [11/Sep/2019:10:21:36 +0000] "GET /v1/projects/6/forms/baseline_hh_survey/submissions.csv.zip HTTP/1.0" 200 -
in stdout.log

2 Likes

that's really helpful, thanks! i'll take a look.

Some feedback on App User & Collect setup

  1. The QR codes need to have a human readable component
    I.e
    Project name
    App Username
    This is really important when a supervisor is setting up lots of devices (especially when doing it remotely in a different country from the main IT support team) so they know each device is being configured with the correct settings.
    This will become even more important once it is possible to setup different app users with different form access on a given project.

  2. Related issue is that it's very difficult to look at the subsequent URL in the settings and make sense of it - I.e easily know which server / project you are linking too - this is important when we have field superviaors overseas and we need them to be able to easily check that they are connected to correct project / server

1 Like

For example in collect I'd like it to then show that my URL was XYZ
My project was ABC
And my app Username was 123

That way I can be confident my remote user has set the device up using the correct QR code/settings

@LN these suggestions would have to be implemented in Collect i guess?

It's not only collect though - the actual QR code image central generates under that would ideally have some text that says
URL
Project name/number
App User Name

1 Like

45
Something like this would be ideal

  1. Also an option in Central to download / email the QR code would be very very valuable to help remote users setup devices
    (Ideally automatically naming the QR code file based on Project name / App user-nickname)
1 Like

If I understand correctly, @dr_michaelmarks, you'd like the image that includes the QR code to also include identifying information because it might be emailed, printed out, etc, is that right? That is, there's sufficient context for the QR code when it's displayed from Central but a supervisor setting up devices would not be logged in to Central.

I also do agree that it should be possible to review the server/project/app user configuration from clients. The project id is listed at the end of the URL but it's just a numeric identifier. Perhaps as a first step the project name could be appended to the URL (sluggified) and ignored by the server? For username, I think that if the QR code json included the app user name associated with the username key, the username would show up in the server settings.