Docker build failed due to full disk

Hi,

So I was running a version upgrade for central, when I encountered a DISK FULL error, and the upgrade script aborted. I checked, and although my disk was around 90%, there was still around 2GB space left, and the script should not have aborted. Anyhow, I increased the disk size and tried the upgrade again. Now, the script is not happy about something else, and not moving forward. Please see the logs:

root@ip-172-31-18-130:/home/ubuntu/central# git pull
Already up to date.
root@ip-172-31-18-130:/home/ubuntu/central# git submodule update -i
root@ip-172-31-18-130:/home/ubuntu/central# docker-compose build
[+] Building 1.5s (13/18)
 => [central-service internal] load build definition from service.dockerf  0.1s
 => => transferring dockerfile: 1.21kB                                     0.0s
 => [central-service internal] load .dockerignore                          0.0s
 => => transferring context: 67B                                           0.0s
 => [central-enketo internal] load build definition from enketo.dockerfil  0.2s
 => => transferring dockerfile: 931B                                       0.0s
 => [central-enketo internal] load .dockerignore                           0.2s
 => => transferring context: 67B                                           0.0s
 => ERROR [central-secrets internal] load metadata for docker.io/library/  1.4s
 => [central-nginx internal] load build definition from nginx.dockerfile   0.1s
 => => transferring dockerfile: 1.05kB                                     0.0s
 => [central-nginx internal] load .dockerignore                            0.1s
 => => transferring context: 67B                                           0.0s
 => [central-secrets internal] load .dockerignore                          0.1s
 => => transferring context: 67B                                           0.0s
 => [central-secrets internal] load build definition from secrets.dockerf  0.0s
 => => transferring dockerfile: 103B                                       0.0s
 => [central-enketo internal] load metadata for ghcr.io/enketo/enketo-exp  0.9s
 => ERROR [central-nginx internal] load metadata for docker.io/jonasal/ng  1.2s
 => CANCELED [central-enketo 1/6] FROM ghcr.io/enketo/enketo-express:5.0.  0.3s
 => => resolve ghcr.io/enketo/enketo-express:5.0.2@sha256:db8efad28c0d836  0.0s
 => => sha256:409cbf6f7ec105082cc4cc8015272c0c93c3c8c7f3c 9.20kB / 9.20kB  0.0s
 => => sha256:db8efad28c0d836060a277fcbf43058e61cf05a052b 3.27kB / 3.27kB  0.0s
 => ERROR [central-enketo internal] load build context                     0.1s
 => => transferring context: 25B                                           0.0s
------
 > [central-secrets internal] load metadata for docker.io/library/node:16.17.0:
------
------
 > [central-nginx internal] load metadata for docker.io/jonasal/nginx-certbot:2.4.1:
------
------
 > [central-enketo internal] load build context:
------
failed to solve: failed to walk: resolve : lstat /var/lib/docker/overlay2/diff: no such file or directory
root@ip-172-31-18-130:/home/ubuntu/central#

Can someone help about how to troubleshoot this?

Thanks,
Saad

Hi,

I tried the upgrade around 24 hours later (no changes made), and it moved past the error by itself. The installation seems to be successful, but docker containers are not coming online, specifically nginx. Following errors are there:

root@ip-172-31-18-130:/home/ubuntu/central# docker-compose up -d
[+] Running 8/9
 ⠿ Container central-postgres-1            Started                                                                                                                                      2.3s
 ⠿ Container central-secrets-1             Started                                                                                                                                      2.4s
 ⠿ Container central-enketo_redis_main-1   Started                                                                                                                                      2.3s
 ⠿ Container central-enketo_redis_cache-1  Started                                                                                                                                      2.1s
 ⠿ Container central-pyxform-1             Started                                                                                                                                      1.8s
 ⠿ Container central-mail-1                Started                                                                                                                                      2.2s
 ⠿ Container central-enketo-1              Started                                                                                                                                      3.2s
 ⠿ Container central-service-1             Started                                                                                                                                      4.2s
 ⠸ Container central-nginx-1               Starting                                                                                                                                     1.3s
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown
root@ip-172-31-18-130:/home/ubuntu/central#

And this would be my docker status result:

root@ip-172-31-18-130:/home/ubuntu/central# docker-compose ps
NAME                           COMMAND                  SERVICE              STATUS              PORTS
central-enketo-1               "docker-entrypoint.s…"   enketo               running             8005/tcp
central-enketo_redis_cache-1   "docker-entrypoint.s…"   enketo_redis_cache   running             6379/tcp
central-enketo_redis_main-1    "docker-entrypoint.s…"   enketo_redis_main    running             6379/tcp
central-mail-1                 "/bin/entrypoint.sh …"   mail                 running             25/tcp
central-nginx-1                "/bin/bash /scripts/…"   nginx                created             0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp
central-postgres-1             "docker-entrypoint.s…"   postgres             running             5432/tcp
central-pyxform-1              "gunicorn --bind 0.0…"   pyxform              running
central-secrets-1              "docker-entrypoint.s…"   secrets              exited (0)
central-service-1              "docker-entrypoint.s…"   service              running             8383/tcp

Regards,
Saad

How much free space did you have before running docker-compose build? My guess is that it wasn't a lot and that Docker downloaded the temporary images it needs and that ate up all your space. It probably aborted before it filled up your entire drive to keep your server operable.

The error message about bash not being found suggests that your Docker images are corrupt in some way. I'm assuming you have backups? If so, I'd try a reboot, and then a docker-compose build --no-cache and see if that works better.

Free space was around 2GB. I had a 16GB disk earlier, which I increased to 30GB later.

I rebooted and ran the command, and got this error:

root@ip-172-31-18-130:/home/ubuntu/central# docker-compose build --no-cache
[+] Building 3.1s (29/59)
 => [central-service internal] load build definition from service.dockerf  0.1s
 => => transferring dockerfile: 1.21kB                                     0.0s
 => [central-service internal] load .dockerignore                          0.0s
 => => transferring context: 67B                                           0.0s
 => [central-nginx internal] load build definition from nginx.dockerfile   0.1s
 => => transferring dockerfile: 1.05kB                                     0.0s
 => [central-nginx internal] load .dockerignore                            0.1s
 => => transferring context: 67B                                           0.0s
 => [central-enketo internal] load build definition from enketo.dockerfil  0.1s
 => => transferring dockerfile: 931B                                       0.0s
 => [central-enketo internal] load .dockerignore                           0.1s
 => => transferring context: 67B                                           0.0s
 => [central-secrets internal] load build definition from secrets.dockerf  0.1s
 => => transferring dockerfile: 103B                                       0.0s
 => [central-secrets internal] load .dockerignore                          0.1s
 => => transferring context: 67B                                           0.0s
 => [central-nginx internal] load metadata for docker.io/library/node:16.  1.7s
 => [central-nginx internal] load metadata for docker.io/jonasal/nginx-ce  1.7s
 => [central-enketo internal] load metadata for ghcr.io/enketo/enketo-exp  1.0s
 => [central-enketo 1/6] FROM ghcr.io/enketo/enketo-express:5.0.2@sha256:  0.0s
 => [central-enketo internal] load build context                           0.0s
 => => transferring context: 160B                                          0.0s
 => CACHED [central-enketo 2/6] WORKDIR /srv/src/enketo_express            0.0s
 => [central-enketo 3/6] COPY files/enketo/config.json.template /srv/src/  0.1s
 => [central-enketo 4/6] COPY files/enketo/config.json.template /srv/src/  0.1s
 => [central-enketo 5/6] COPY files/enketo/start-enketo.sh /srv/src/enket  0.0s
 => CANCELED [central-enketo 6/6] RUN apt-get update; apt-get install get  1.7s
 => [central-secrets internal] load build context                          0.1s
 => => transferring context: 111B                                          0.0s
 => CACHED [central-nginx 1/2] FROM docker.io/library/node:16.17.0@sha256  0.0s
 => [central-service internal] load build context                          0.9s
 => => transferring context: 89.02kB                                       0.8s
 => CACHED [central-service stage-1  2/13] WORKDIR /usr/odk                0.0s
 => ERROR [central-service stage-1  3/13] RUN echo "deb http://apt.postgr  1.0s
 => [central-secrets 2/2] COPY files/enketo/generate-secrets.sh ./         0.2s
 => CACHED [central-nginx stage-1  1/13] FROM docker.io/jonasal/nginx-cer  0.0s
 => CANCELED [central-nginx internal] load build context                   1.1s
 => => transferring context: 17.98MB                                       1.1s
 => CANCELED [central-nginx stage-1  2/13] RUN apt-get update; apt-get in  1.3s
 => [central-secrets] exporting to image                                   0.1s
 => => exporting layers                                                    0.1s
 => => writing image sha256:9dbd9adba57a2aef082824f3495124b161aa8fc39ff83  0.0s
 => => naming to docker.io/library/central-secrets                         0.0s
 => CANCELED [central-service intermediate 2/8] COPY . .                   0.2s
------
 > [central-service stage-1  3/13] RUN echo "deb http://apt.postgresql.org/pub/repos/apt/ $(grep -oP 'VERSION_CODENAME=\K\w+' /etc/os-release)-pgdg main" | tee /etc/apt/sources.list.d/pgdg.list;   curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | gpg --dearmor > /etc/apt/trusted.gpg.d/apt.postgresql.org.gpg;   apt-get update;   apt-get install -y cron gettext postgresql-client-9.6:
#0 0.985 runc run failed: unable to start container process: exec: "/bin/sh": stat /bin/sh: no such file or directory
------
failed to solve: process "/bin/sh -c echo \"deb http://apt.postgresql.org/pub/repos/apt/ $(grep -oP 'VERSION_CODENAME=\\K\\w+' /etc/os-release)-pgdg main\" | tee /etc/apt/sources.list.d/pgdg.list;   curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | gpg --dearmor > /etc/apt/trusted.gpg.d/apt.postgresql.org.gpg;   apt-get update;   apt-get install -y cron gettext postgresql-client-9.6" did not complete successfully: exit code: 1
root@ip-172-31-18-130:/home/ubuntu/central#

Do you have up-to-date full disk backups?

Not for this server.

Without backups, your options are limited.

I'd take a snapshot, and on that snapshot, try these steps below. Do this on a snapshot first because it may cause permanent data loss.

Steps to try on a snapshot
  1. Make sure the containers are running
    docker-compose up --detach
    
  2. Delete the service and nginx containers
    docker-compose rm --stop nginx
    docker-compose rm --stop service
    
  3. Make sure all other containers, especially central_postgres_1 are still running
    docker ps --format "{{.Names}} : {{.Status}}"
    
  4. Prune all unused images
    docker image prune
    
  5. Now try a build, stop, and up to see if that works.
    docker-compose build --no-cache
    docker-compose stop
    docker-compose up --detach
    

If the steps above work, take another snapshot of the original, then do the steps on the original.

1 Like

Hi,

I followed the steps. It did not work.

root@ip-172-31-18-130:/home/ubuntu/central# docker-compose up --detach
[+] Running 8/9
 ⠿ Container central-enketo_redis_main-1   Started                         1.3s
 ⠿ Container central-pyxform-1             Started                         1.9s
 ⠿ Container central-postgres-1            Started                         2.1s
 ⠿ Container central-mail-1                S...                            2.1s
 ⠿ Container central-secrets-1             Started                         2.1s
 ⠿ Container central-enketo_redis_cache-1  Started                         2.1s
 ⠿ Container central-enketo-1              Started                         2.8s
 ⠿ Container central-service-1             Started                         3.6s
 ⠿ Container central-nginx-1               Starting                        5.1s
Error response from daemon: failed to create shim task: OCI runtime create faile                                                                                                             d: runc create failed: unable to start container process: exec: "/bin/bash": sta                                                                                                             t /bin/bash: no such file or directory: unknown