ODK Central v0.6 Beta

If R is an option, you can access audit logs via
https://dbca-wa.github.io/ruODK/reference/audit_get.html

Next you might want to filter actions by those involving projects, then parse them into something readable (rectangular).

What is a question that project audit logs could answer?

2 Likes

@dr_michaelmarks I've now enabled the server to send external mail through our internal relay.

1 Like

In GCP compliance (for medical research) its useful to be able to track things that happen to the database so project level audit logs are helpful for that.

The advantage of this is it allows a low level admin to create projects and hand them over to individual researchers without the low-level admin being able to play about with someones project

But what's to stop this person from creating a dummy user on another email account, assigning it manager on the project anyway, and doing with the project what they will? And if they wouldn't because honor system, then why not make them a full administrator?

In general, Central follows a paradigm many (most? all?) permissions systems follow, which is that no user is ever allowed to grant a right that the user themself does not possess.

E.g. listing updates to projects (details in column "details"):

Or get entire log and filter later:

Now that I have the Audit functionality working (having fixed a quirk of XLS > XML conversion if using old version XLSForm Offline)

Now trying to get Audit download working well from Central 0.6

When I use Download all records button the Audit.csv file doesn't seem to have all the audit trails for each individual instance. I only seem to get an audit.csv file containing the initial submission audit trail.

When I download from central via Briefcase these do all get concatenated into a single CSV plus each individual CSV ends up in the media folder.

Thoughts?

I can't find an issue or notice so @issa will need to refresh my memory on where this is documented but it is a known issue that there is no collision resolution on export when multiple attachment files have the same name. This Briefcase issue highlights some of the challenges with collision resolution.

Given that Central 0.7 will produce the same single big CSV export as Briefcase (see the release criteria), do you feel like you still would like to export the individual audit files? If so, do you have a sense of how you'd like them to be named?

i haven't documented this particular quirk anywhere, mostly because i had no idea it existed :slight_smile: but also because it sounds like it is not a Central-related issue. either way, at least 0.7 should resolve this one.

Because central let's you download the individual audit trail meta data if needed I'd be happy if the download everything button gave you one CSV with all changes together.

1 Like

Trying out a few more things.
We pushed some old datasets on to Central using briefcase.
1000+ forms, 20 odd questions with 2 nested repeats.

Downloads fine using briefcase
Won't download using direct download button (Failed - Network error)

?A timeout issue that we need to configure at our end @MatthewMac or ?a Central issue - not sure

One more thought
Should we have a purge submissions button?
We often put up a version of a form, train people on it and then want to wipe the data before real data collection starts.

Currently to do that you would need to delete the form and then upload a new version (with either a different name or version ID)

@dr_michaelmarks apologies for jumping in on this, we are also testing ODK Central at the moment.

re download size: We generate ca 10k submissions per 4 month turtle nesting season with a fair amount of attachments (ca 100-150 GB of photos per season).
@issa what will happen if I hit the "export all submissions" button?
Purging successfully extracted submissions might be the way to go there.

re test submissions: We always include a "training" flag in forms, e.g. real turtles are e.g. Natator depressus (Flatback turtle), Caretta caretta (Loggerhead turtle), and the training turtle is the Corolla corolla (Hatchback turtle). Analysis simply excludes these training records. This solves the problem of switching between training and production forms/settings/accidentally sending production data into the training bucket.

I think it will time out (as it does for me on 1,500 submissions without attachments).
Approaches I can think of

  1. Still using briefcase
  2. Generate file in background and provide link to it - have seen this done on other platforms when the file size is too big for a standard immediate download to work

Purging successfully extracted submissions might be the way to go there.

I definitely don't think you want to permanently purge exported submissions from central - this would not be ok from a Good Clinical Practice perspective for medical/clinical research studies - you need to maintain the original database/records. For example some of our aggregate servers have data on >100,000 individuals in a study.

Might work for some people but doesnt work well for us in clinical studies where we want people to practice using the forms exactly as they will in real life / hospital etc.
Maybe we just have 2 versions of form where one has ID XYZ_training, but personally I still think a purge submissions function could be useful

1 Like

I think it will time out (as it does for me on 1,500 submissions without attachments).

it shouldn't. central is architected to continually stream data out as it is processed, which under normal conditions should not cause a timeout. if something else is happening in your setup, then either we have done something we did not intend on the server side to prevent our designed behaviour, or else the additional layers present in your environment are causing problems.

as for questions around draft or training forms, we are aware that it is a big pain point and we will have some kind of answer to that concern at some point. there are a lot of other smaller questions that will need to be gradually answered before we can get there.

1 Like

@issa re: the timeout @dr_michaelmarks reported
Smaller forms (around 200 entries - zip 90 KB) download just fine. The larger form (12 fields, 1200 entries) throws a timeout error on the console

nginx_1     | 2019/09/10 15:01:46 [error] 10#10: *18 upstream timed out (110: Connection timed out) while reading upstream, client: <my workstation ip>, server: , request: "GET /v1/projects/6/forms/baseline_hh_survey/submissions.csv.zip HTTP/1.1", upstream: "http://172.19.0.4:8383/v1/projects/6/forms/baseline_hh_survey/submissions.csv.zip", host: "<server>.lshtm.ac.uk", referrer: "https://<server>.lshtm.ac.uk/"

Nothing unusual about the server, and there's no proxy to the workstation.

Any ideas?

not immediately. sounds like i have some investigation to do. how long does it hang before timing out?

Unqualified comment from the sidelines:
Is the ODK Central that times out behind another proxy with its own timeout settings?
What's the setup and deployment?
Are there logs? (even inside containers e.g. written to /var/log/odk/std_*)

1 Like

@issa - 60 seconds exactly. Consistently.

@Florian_May Looks like the Nginx proxy is timing out trying to retrieve from Central.
Logs in the service_1 container don't suggest anything wrong -
Nothing in /var/log/odk/stderr.log and only
::ffff:172.19.0.5 - - [11/Sep/2019:10:21:36 +0000] "GET /v1/projects/6/forms/baseline_hh_survey/submissions.csv.zip HTTP/1.0" 200 -
in stdout.log

2 Likes

that's really helpful, thanks! i'll take a look.

Some feedback on App User & Collect setup

  1. The QR codes need to have a human readable component
    I.e
    Project name
    App Username
    This is really important when a supervisor is setting up lots of devices (especially when doing it remotely in a different country from the main IT support team) so they know each device is being configured with the correct settings.
    This will become even more important once it is possible to setup different app users with different form access on a given project.

  2. Related issue is that it's very difficult to look at the subsequent URL in the settings and make sense of it - I.e easily know which server / project you are linking too - this is important when we have field superviaors overseas and we need them to be able to easily check that they are connected to correct project / server

1 Like