Info about accessing submission attachments in external S3 storage

Hi all,

I'm loving the addition of external S3 storage since 2024.2.0: ODK Central v2024.2: Submission deletes via API and S3 media storage

This post is mostly to provide info for anyone searching for similar, rather than a question. Hope this is ok :smile:
(Also I added this to the development section as it's probably more for devs)


Originally I wanted to post a question requesting that the S3 keys for submission attachments are included in the submission JSON/CSV somewhere.

This was primarily because I am using a public access S3 bucket, so it made sense to simply construct the S3 URL using the key and access the data (e.g. to embed multiple submission photos in a web page from their S3 URLs).

However, I realise the typical use case for this would involve a private access bucket, so instead went about another route.

Getting the S3 URLs for submission attachments

This is quite simple in hindsight.
The main process is: list attachments --> request attachment --> get pre-signed S3 URL --> do what you want with the URL! (download, display the img, etc).

  1. List the submissions for the project (i.e. get the submission UUID you are interested in):



        instanceId: "uuid:e83db2b4-5e82-4e61-bc32-04750e511aff"

    It's also possible to get submission UUIDs via the OData endpoint.

  2. List the attachments for a given submission UUID:



    [{"name":"1731676401897.jpg","exists":true}, ...]

    The 'name' field here is stored in the Central database table submission_attachments as field name, and is generated to be unique.

    This is the field that is used to download the attachment below.

  3. Request a pre-signed URL for each attachment:


    Returns (example):


    Note that Central will seamlessly handle either sending the blob directly from the database, or providing a pre-signed URL for download from the S3 bucket.

Like I said above, this seems obvious with hindsight, but I didn't realise Central was capable of providing pre-signed URLs to access the images.
(originally I thought the only way to access the S3 data was from a submission .zip dump).

Hope this helps someone!


Great news! I have yet to start fiddling with the external S3 storage feature, but it is already looking great from the info in your post.

One thing I notice in your "Returns (example)" url is the X-Amz-Expires query parameter:

This means that the signed url will only work for 60 seconds after it's generated and it might come short in some scenarios (store image url to display later).

From the AWS Docs:

Provides the time period, in seconds, for which the generated presigned URL is valid. For example, 86400 (24 hours). This value is an integer. The minimum value you can set is 1, and the maximum is 604800 (seven days). A presigned URL can be valid for a maximum of seven days because the signing key you use in signature calculation is valid for up to seven days.

It could be a great feature to have this configurable somehow, or even better, implemented as a query parameter of the authenticated endpoint /v1/projects/{PROJECT_ID}/forms/{FORM_ID}/submissions/{SUBMISSION_UUID}/attachments/{ATTACHMENT_NAME}

Very good point!

Looks like this is currently hardcoded at 60s (which is quite strict):

It could possibly be a param added to the endpoint you reference --> util.blob.blobResponse --> s3.urlForBlob --> minioClient.presignedGetObject:

Alternatively it could be an env variable configuration.

I could easily PR for this. No idea if it's a desirable change for the dev team though :pray:

1 Like

Yes! But I believe that if your bucket is public all of the query parameters are superfluous and could be omitted, Central just doesn't have special handling for that case. In particular, the expiration time period is not relevant.

@punkch are you expecting to use a private bucket?

1 Like

The docs recommend using a private bucket & I imagine this is the main use case no?

I had quite a unique case to use a public bucket & was also just being lazy :laughing:

1 Like

Ah yes, got it! I didn't read your original post carefully enough to see you were sharing info mostly for the private bucket context. Yes, we do expect that to be the common path.

We'll give some thought to if and how it makes sense to configure signed link expiration and get back to you. The maximum possible time is 7 days.

1 Like

Awesome! That could be a useful feature. Although, I personally have no pressing need for it right now, so it might be worth seeing if many others would use it.

(saying that, it's quite a minor change)

1 Like

Quick update to this - apologies for the bump - in case someone finds this useful.

I just wanted to clarify above, where I said calling /v1/projects/{PROJECT_ID}/forms/{FORM_ID}/submissions/{SUBMISSION_UUID}/attachments/{ATTACHMENT_NAME} returns the pre-signed URL.

While this is true, it does in fact return a redirect response pointing to the S3 URL. So if you automatically follow redirects (-L with cURL, or the default behaviour of Python requests / aiohttp), then you will actually download the file content blob.

Here is a snippet returning all of the pre-signed URLs for a given submission, if you wished to display the images elsewhere (say embedded in a website):

    async def getSubmissionAttachmentUrls(
        projectId: int,
        xFormId: str,
        submissionUuid: str,
    ) -> dict[str, str]:
        """Get a dictionary of attachment names and their pre-signed URLs.

            projectId (int): The ID of the project on ODK Central.
            xFormId (str): The XForm to get the details of from ODK Central.
            submissionUuid (str): The UUID of the submission on ODK Central.

            dict[str, str]: A dictionary mapping attachment names to URLs.
        attachments = await self.listSubmissionAttachments(projectId, xFormId, submissionUuid)
        if not attachments:
            return {}

        async def fetch_url(attachment: dict) -> tuple[str, Optional[str]]:
            """Fetch the pre-signed URL for a given attachment filename."""
            filename = attachment["name"]
            url = f"{self.base}projects/{projectId}/forms/{xform}/submissions/{submissionUuid}/attachments/{filename}"

            # Prevent the redirect and blob download, instead get the S3 URL
            result = await self.session.get(url, ssl=self.verify, allow_redirects=False)

            if result.status in (301, 302, 303, 307, 308):  # is a redirect to the S3 URL
                s3_url = result.headers.get("Location")
                if not s3_url:
                    log.error(f"Couldn't fetch {filename} from Central: {await result.text()}")
                    return filename, None
                log.error(f"Couldn't fetch {filename} from Central: {await result.text()}")
                return filename, None

            return filename, s3_url

        urls = await gather(*(fetch_url(attachment) for attachment in attachments))

        return {filename: url for filename, url in urls if url is not None}

The key part here is using allow_redirects=False for the GET request, then checking for the redirect response code, and extracting the URL from the Location header.