Host ODK Central Docker images on GitHub container registry

Context

For "DIY" users free to choose their own infrastructure, ODK Central comes with a nice and easy orchestration (docker-compose) and guidance on deployment and hosting (DigitalOcean).

For the "need it faster" users, GetODK (and others) offer fully hosted ODK Central instances.

Some ODK users however work for organisations which are big enough to have their own infrastructure (e.g. Kubernetes) and policies (e.g. HIPAA - in house only), which means they have to re-implement DevOps all by themselves within their own contraints.

Gap

ODK Central now uses no less than 9 Docker images, four of them custom built.
The third user group needs

Current approach

Images can be built locally.

(clone ODK Central)
git pull
git submodule update -i
docker build . -f service.dockerfile -t dbcawa/odk_service:1.0.1.0
docker build . -f nginx.dockerfile -t dbcawa/odk_nginx:1.0.1.0
docker build . -f enketo.dockerfile -t dbcawa/odk_enketo:1.0.1.0
docker build . -f secrets.dockerfile -t dbcawa/odk_secrets:1.0.1.0

These in-official images are hosted at
https://hub.docker.com/r/dbcawa/odk_service
https://hub.docker.com/r/dbcawa/odk_nginx
https://hub.docker.com/r/dbcawa/odk_enketo
https://hub.docker.com/r/dbcawa/odk_secrets

Disclaimer: These images are not affiliated with or supported by GetODK, use them at your own risk

Suggested better approach

See also https://github.com/getodk/central/issues/165

This would mean that Docker images are built automatically. The build would be under the control of the GetODK core team. Together with some high-level docs, this could help making ODK Central easier to deploy on non-supported infrastructure.

A distant dream would be a working helm chart for ODK Central, which would simplify ODK Central deployment to:

Update: thinking above out loud at https://github.com/dbca-wa/central/blob/master/README_k8s.md

pre-built, up to date, Docker images.

I'm definitely in favour of hosted images as it would speed up deployments and upgrades and also avoids problems (like we ran into with 1.0) where the machine can run Central but can't build it. However, I'm interested to understand why you think the third group "needs" this?

Yo dowg!

With "need" I mean that pre-built Docker images would make non-standard deployments more accessible by cutting out the "docker build" steps.
In combination with alternatives to docker-compose, e.g. Helm charts, or even just Kubernetes YAML configs for each volume and container, this could make non-standard deployments a bit easier without taking away from the main "docker-compose" and "go faster, use hosted" ways.

In my particular use case, I need to deploy ODK Central on our own infrastructure. We run Azure / Kubernetes / Rancher, so my tasks are:

  • Build images. Dead easy if you have access to an environment with Docker. Harder if not. This hurdle can be completely eliminated through pre-built images. A secondary docker-compose file can use those pre-built images.
  • Deploy ODK Central using images and config settings from docker-compose.

I'm experimenting with Kompose to translate docker-compose into a Helm chart and Kubernetes YAML config files: https://github.com/dbca-wa/central/blob/master/README_k8s.md.
Once I get that to work, the deployment to Kubernetes through Rancher reduces to "upload YAML files to Rancher in the browser".

Lastly, a clarification: Deploying Central via Rancher requires pre-built images, but no Helm chart. One can also create workloads (containers), volumes, and configmaps (settings) by hand, which I describe for ODK Central <= 0.9 at Installing ODK central in microsoft azure cloud - #2 by Florian_May. However, doing so manually with 19 moving parts (containers, volumes, configs) is immensely tedious and error-prone.

Having a Helm chart however could facilitate deployment of Central to any Kubernetes service.

My questions:

  • What are costs and risks of letting GH actions build Docker images?
    • Presumably, once the build is working (it does here), it's automated and wouldn't require mainenance.
    • Ideally, Docker tagging should follow GH tagging: master branch = "latest" tag, GH tag = Docker image tag
  • What are costs and risks of hosting these images on e.g. GetODK's GH packages?
    • Setting up an "organization token" will take a few mins
    • Is download bandwidth limited, are there any costs on high use?
  • If I wanted to share a working ODK Central deployment using Kubernetes / Rancher, what would be the most appropriate place for it?
1 Like

Right that makes sense. You need to reference actual images in a repo rather than a dockerfile (which compose can do).

Getting the build working is time and then Github's org pricing is tiered on Actions: https://github.com/pricing. I'm not 100% clear on if we'd have to use Github Actions (as opposed to Circle CI) though if that was a blocker. My assumption is that you can push from other places but it might be Github wants to do some serious lock ins on this one.

Again it's setup and then pricing tiers: https://github.com/pricing. As far as I can see it's priced on the amount you're hosting rather than bandwidth. Out of interest is there a reason you're suggesting Github's new container service over Docker Hub? Central's pre-built images are already hosted there.

I'd say here in the "Showcase" category. It'd be nice to look at it as something that can be made part of Central, but we'd need to discuss maintenance etc for that.

1 Like

Thanks for all the answers!
I went with GH Actions&Packages simply because it's all in one place.
Of course the images could be built elsewhere (CircleCI) or hosted elsewhere (DockerHub).

Storage for public repos is free, and it looks like GHA minutes won't be exhausted by commit frequency * build minutes. So AFAICT free lunch.

I totally missed https://hub.docker.com/r/getodk/pyxform-http. Are there other official ODK images for Central?

I'll post to showcases once/if I've got Central on Kubernetes to work.

2 Likes

the way i see it:

the service image could easily be prebuilt and distributed as a packaged container. but, building this image takes very little time to begin with.

the nginx image needs a little work: right now the SHAs/version tags of the submodule repos are built into it (files/prebuild/write-version.sh) as a part of the build process. somehow that information would have to be provided at run-time instead. prebuilding this image would definitely save some time.

the enketo image absolutely cannot be prepackaged. this is a limitation with enketo: its configuration is incorporated into its source code at build time.

the secrets image just contains local secret keys. it weighs nothing and does nothing at build time. but it's also pretty easy to prebuild, it's just a COPY command for a single script file.

so, the two images that can be easily prebuilt right now is really cheap to build. the rest take some work and a mountain of work respectively.

1 Like

Help me understand:
The nginx images runs a command that generates version tags. If GHA would rebuild the image on every push, wouldn't that solve that problem by keeping the version info up to date?

Re enketo config: isn't that similar to the service config? In Kubernetes, files can be provided through a configmap, which is a key-value store of filename - filecontent. Configmaps are mounted as volumes to the running container. This provides any file configs (such as https://github.com/dbca-wa/central/blob/master/enketo.dockerfile#L13-L14) at runtime.
Does https://github.com/dbca-wa/central/blob/master/enketo.dockerfile#L6-L11 mean that "mounting config at runtime" won't work?

nginx:
no. it's a cross-dependency. the frontend is served statically by nginx. it needs to report to the user the version of the frontend and the backend and the central repositories. this cannot be done just by building the image when the frontend is updated.

enketo:
no. like, when enketo is compiled from source, it bakes its configuration in. it is not like a normal service where upon boot enketo will read from a file on disk.

1 Like

So the nginx and service images would have to be rebuilt both together whenever either frontend or backend change? Would that keep the version numbers in sync?

And enketo's config requires the user-defined value domain, got it. So until enketo allows to provide domain at runtime, every user has to build it themselves with their respective value for domain.

So the nginx and service images would have to be rebuilt both together whenever either frontend or backend change? Would that keep the version numbers in sync?

yes.

and the enketo situation is afaik being addressed. so this dream may yet happen. i'd be all for prebuilding more.

3 Likes

Yay! Nginx/service could easily be built and pushed by GH actions. Is that something you'd consider a PR for?

Good news about enketo too. Is that discussion happening in the open?

The Enketo PR to pull client settings out of the build is at https://github.com/enketo/enketo-express/pull/210. The next step will be to update the image: https://github.com/enketo/enketo-express/issues/161.

2 Likes

I'm playing around building our own enketo image. So far, I've removed the parts of

COPY files/enketo/config.json.template ${ENKETO_SRC_DIR}/config/config.json.template
COPY files/enketo/config.json.template ${ENKETO_SRC_DIR}/config/config.json

in the Dockerfile and removed this line from start-enketo.sh:

/bin/bash -c "SECRET=$(cat /etc/secrets/enketo-secret) LESS_SECRET=$(cat /etc/secrets/enketo-less-secret) API_KEY=$(cat /etc/secrets/enketo-api-key) envsubst '\$DOMAIN:\$SECRET:\$LESS_SECRET:\$API_KEY:\$SUPPORT_EMAIL' < ${CONFIG_PATH}.template > $CONFIG_PATH"

I will mount the completely rendered config.json at run time into the container. As far as I can see, this works without any problems...

Did I miss something?

This will work as long as you’re ok with anyone who gets the URL of your server being able to make API requests and/or you have some other way of limiting access to the Enketo install. What you’ve done is used the default, insecure API keys instead of generating ones specific to your server install. Having API access to Enketo means being able to read all form definitions on that server, publish new forms, and possibly read some form data.

This is not secure enough for the standard install but could possibly meet your needs.

EDIT: I now see you’re copying in the generated config file with secrets. As far as I know those won’t be picked up until you rebuild but you should double check this.

Now I remember that this commit came in last year which should make it possible to pick up new configuration items without rebuilding. We intended to do some verification around it but never had a chance to.

2 Likes

Is this tangentially related to SSL certificates for DB connections?
If the Enketo secrets can be provided at runtime, could SSL certificates be provided too? (I remember that between knex and slonik, one needs a file path, the other file contents.)

It works like a charm for us. We've also tested if we could create enketo links if our api key in Central is different than in Enketo. The result was: no.

We've also removed the secrets from the docker volume, so we won't lose them if sbd. accidentally executes docker-compose down.

As far as I can tell there is currently no technical obstacle anymore to provide pre-build docker images. If you are interested, I could make a PR with a new setup.

3 Likes

This could be interesting and would simplify a possible deployment in kubernetes. I'm I right?

It'd be great to see a draft PR so we see what the changes are. If you have a public repo, you can share that too. My hope is that we can add this to Central.

Thank you, @yanokwa ! I'll have a look in two weeks, if I can fit this task in our next sprint.

1 Like