Problem with loki update

federico.ballarini · May 12, 2025, 10:02am

NethServer Version: NS8 on AlmaLinux
Module: Loki

Hi all,
I’m trying to udpate loki on two different servers on Almalinux, but I get this error:

<7>podman-pull-missing ghcr.io/nethserver/loki:1.3.1
<7>podman-pull-missing docker.io/traefik:v3.3.5 docker.io/grafana/loki:3.4.3
Trying to pull docker.io/library/traefik:v3.3.5...
Getting image source signatures
Copying blob sha256:3726c0c457c1ef0b6f1451755983ad25f602ecf58be4c89254ea1eddd17376a4
Copying blob sha256:dcc8cd112e3beb9d5fb4b1bcf11d884caeb7e5e00fefe1d07fab43ebc40386e9
Copying blob sha256:f18232174bc91741fdf3da96d85011092101a032a93a388b79e99e69c2d5c870
Copying blob sha256:da1600f8cecfd444899afd69b681a02879ce5607d0278fefa5f8747744c75cc6
Copying config sha256:66c037adf0b4eeeb4b1dbcbfc7520eae76ce967e73f02e1d569808930129ab3b
Writing manifest to image destination
66c037adf0b4eeeb4b1dbcbfc7520eae76ce967e73f02e1d569808930129ab3b
<7>extract-image ghcr.io/nethserver/loki:1.3.1
flock: getting lock took 0.000004 seconds
time="2025-05-12T11:50:37+02:00" level=error msg="While recovering from a failure (saving incomplete layer metadata), error deleting layer \"4070ba230d95721c40f7eacac8440354c8b8fa7622b9ef5e18f763abb97a4e4c\": open /run/user/1003/containers/overlay-layers/.tmp-mountpoints.json3170055619: no space left on device"
Error: creating container storage: open /run/user/1003/containers/overlay-layers/.tmp-mountpoints.json3877251958: no space left on device
Traceback (most recent call last):
  File "/usr/local/agent/actions/update-module/05pullimages", line 81, in <module>
    ).check_returncode()
      ^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/subprocess.py", line 502, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '('extract-image', 'ghcr.io/nethserver/loki:1.3.1')' returned non-zero exit status 125.

The problem is missing space on the device, but the space it’s available:

~]# df -h
Filesystem              Size  Used Avail Use% Mounted on
devtmpfs                4.0M     0  4.0M   0% /dev
tmpfs                   3.8G  1.4M  3.8G   1% /dev/shm
tmpfs                   1.6G  151M  1.4G  10% /run
/dev/mapper/vg-lv_root   79G   41G   35G  54% /
/dev/sda2               493M  268M  189M  59% /boot
tmpfs                   769M  2.0M  767M   1% /run/user/1008
tmpfs                   769M  1.7M  767M   1% /run/user/1001
tmpfs                   769M  1.9M  767M   1% /run/user/1004
tmpfs                   769M  386M  384M  51% /run/user/1003
tmpfs                   769M   84K  769M   1% /run/user/1002
tmpfs                   769M  1.9M  767M   1% /run/user/1007
tmpfs                   769M  1.9M  767M   1% /run/user/1006
tmpfs                   769M  2.2M  767M   1% /run/user/1009
tmpfs                   769M  1.7M  767M   1% /run/user/1005
tmpfs                   769M  1.9M  767M   1% /run/user/1012
tmpfs                   769M  1.8M  767M   1% /run/user/1010
tmpfs                   769M  232K  769M   1% /run/user/1016
tmpfs                   769M  1.9M  767M   1% /run/user/1011
tmpfs                   769M  1.9M  767M   1% /run/user/1015
tmpfs                   769M  1.9M  767M   1% /run/user/1014
tmpfs                   769M  1.9M  767M   1% /run/user/1013
shm                      63M     0   63M   0% /var/lib/containers/storage/overlay-containers/0de3c14fa843af0eac62ea698506e986816404c3374a4b345b9e13707d7516c7/userdata/shm
overlay                  79G   41G   35G  54% /var/lib/containers/storage/overlay/f9353c261b6449388d2e3fb2198bf527f1523d1a6e0226f02abf851aa4900b4a/merged
shm                      63M     0   63M   0% /var/lib/containers/storage/overlay-containers/5ab3d124c0abcb983a50ce0e409b13eb06dcca195ea5225cfedf15abe17394d8/userdata/shm
overlay                  79G   41G   35G  54% /var/lib/containers/storage/overlay/e0b719cda18bc6283c94cb555e428a7064aca5e746463394efe69eb6baae3b37/merged
shm                      63M     0   63M   0% /var/lib/containers/storage/overlay-containers/eccb8dfc9695adc86826aa74ee760ccde6e73b421b71d60ec3de94d8aeaae172/userdata/shm
overlay                  79G   41G   35G  54% /var/lib/containers/storage/overlay/61d8da72af99c37d0e8f125eae00a03a4c5c19b42ebc0b3e28f876dd8d695488/merged
shm                      63M     0   63M   0% /var/lib/containers/storage/overlay-containers/7ab06652bd1b952bff91512f2661e22ff89e641f9e35ee485a08e4a55de562a4/userdata/shm
overlay                  79G   41G   35G  54% /var/lib/containers/storage/overlay/6e8f623dfb557a2b529279b4f30e090882716f24872b3b57c5cd71b639113786/merged
shm                      63M     0   63M   0% /var/lib/containers/storage/overlay-containers/83d6310f5455abea445cb49196cf7d048a102357a394002abe08d03a96ad0297/userdata/shm
overlay                  79G   41G   35G  54% /var/lib/containers/storage/overlay/74919ab069cdf54cbc30898c768a038dda14868b106726b7689f91aeac712c4a/merged
shm                      63M     0   63M   0% /var/lib/containers/storage/overlay-containers/2a26ff31f6d715c41331376de98ecfdafb6a6799fbe2ce739717013317bd2b13/userdata/shm
overlay                  79G   41G   35G  54% /var/lib/containers/storage/overlay/64f188c71dc558761ea965a820ba0b12252ac5511b7bc0bcd0c399307d50be2d/merged
tmpfs                   769M     0  769M   0% /run/user/0

Could someone help me?
Thank you in advance!

mrmarkuz · May 12, 2025, 12:12pm

Let’s check podman info to get used filesystem and other params:

runagent -m loki1 podman info

Please also check the used space of loki:

runagent -m loki1 podman system df

WARNING! Before changing things it’s always good to have a backup.
Maybe it helps to increase the tmpfs by editing /etc/systemd/logind.conf to set the following: (10% is default)

RuntimeDirectorySize=20%

After a reboot tmpfs size should be doubled.

Maybe related:

github.com/containers/podman

[Bug]: error removing container when no free space is left on filesystem

opened 03:37PM - 24 Jan 23 UTC

closed 03:57PM - 24 Jan 23 UTC

chilikk

kind/bug locked - please file new issue/PR

### Issue Description When `/var/lib/containers` is located on an XFS filesyste…m, it is impossible to remove container when no free space is left on that filesystem. Moreover, `podman` ends up in a bad state where the container is no longer visible in the summary but the container's storage is left behind. This situation could, for example, be caused by a container that has exhausted its storage - it becomes impossible to remove such container. In my reproduction the `/var/lib/containers` resides on `XFS` filesystem. I have not been able to reproduce this issue with `ext4`. ### Steps to reproduce the issue Steps to reproduce the issue 1. I am reproducing this in a VM, so initialize the environment first: ``` mkdir reproduction cd reproduction vagrant init generic/centos9s vagrant up vagrant ssh ``` 2. Inside the VM, install required packages and mount `/var/lib/containers` on an XFS filesystem, set the necessary SELinux attributes ``` sudo yum install -y xfsprogs podman sudo fallocate -l 300M /xfs.bin sudo mkfs.xfs /xfs.bin sudo mount -t xfs -o loop /xfs.bin /var/lib/containers sudo chcon -u system_u -t container_var_lib_t /var/lib/containers ``` 3. Start a container that fills its own storage and exits ``` sudo podman pull docker.io/library/alpine:3.17 sudo podman run --name test docker.io/library/alpine:3.17 sh -c 'dd if=/dev/zero of=/bigfile || exit 1' ``` 4. Try to remove the container, see the error message ``` $ sudo podman rm test Error: removing container a3dcb9bde158e64c40429476da0362a6b305d36b0b05b60a732305d2fc2ec08a root filesystem: 2 errors occurred: * open /var/lib/containers/storage/overlay-layers/.tmp-layers.json16945529: no space left on device * open /var/lib/containers/storage/overlay-containers/.tmp-containers.json2124646358: no space left on device ``` 5. Verify that despite the error above the container is gone from the `podman container ls -a` list, however the disk for `/var/lib/containers` is still full which means that the container's storage was left behind: ``` $ sudo podman container ls -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES $ df -h /var/lib/containers Filesystem Size Used Avail Use% Mounted on /dev/loop0 295M 295M 32K 100% /var/lib/containers ``` ### Describe the results you received Error message when deleting a container, container gone from the list of containers while container storage is left behind. ### Describe the results you expected Container successfully removed. ### podman info output ```shell host: arch: amd64 buildahVersion: 1.28.0 cgroupControllers: - cpuset - cpu - io - memory - hugetlb - pids - rdma - misc cgroupManager: systemd cgroupVersion: v2 conmon: package: conmon-2.1.5-1.el9.x86_64 path: /usr/bin/conmon version: 'conmon version 2.1.5, commit: 48adb81a22c26f0660f0f37d984baebe7b9ade98' cpuUtilization: idlePercent: 94.9 systemPercent: 1.74 userPercent: 3.37 cpus: 2 distribution: distribution: '"centos"' version: "9" eventLogger: journald hostname: centos9s.localdomain idMappings: gidmap: null uidmap: null kernel: 5.14.0-205.el9.x86_64 linkmode: dynamic logDriver: journald memFree: 221835264 memTotal: 1864462336 networkBackend: netavark ociRuntime: name: crun package: crun-1.7.2-2.el9.x86_64 path: /usr/bin/crun version: |- crun version 1.7.2 commit: 0356bf4aff9a133d655dc13b1d9ac9424706cac4 rundir: /run/crun spec: 1.0.0 +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL os: linux remoteSocket: path: /run/podman/podman.sock security: apparmorEnabled: false capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT rootless: false seccompEnabled: true seccompProfilePath: /usr/share/containers/seccomp.json selinuxEnabled: true serviceIsRemote: false slirp4netns: executable: /bin/slirp4netns package: slirp4netns-1.2.0-2.el9.x86_64 version: |- slirp4netns version 1.2.0 commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383 libslirp: 4.4.0 SLIRP_CONFIG_VERSION_MAX: 3 libseccomp: 2.5.2 swapFree: 2147479552 swapTotal: 2147479552 uptime: 0h 7m 18.00s plugins: authorization: null log: - k8s-file - none - passthrough - journald network: - bridge - macvlan volume: - local registries: search: - registry.access.redhat.com - registry.redhat.io - docker.io store: configFile: /etc/containers/storage.conf containerStore: number: 0 paused: 0 running: 0 stopped: 0 graphDriverName: overlay graphOptions: overlay.mountopt: nodev,metacopy=on graphRoot: /var/lib/containers/storage graphRootAllocated: 308969472 graphRootUsed: 308936704 graphStatus: Backing Filesystem: xfs Native Overlay Diff: "false" Supports d_type: "true" Using metacopy: "true" imageCopyTmpDir: /var/tmp imageStore: number: 1 runRoot: /run/containers/storage volumePath: /var/lib/containers/storage/volumes version: APIVersion: 4.3.1 Built: 1669638068 BuiltTime: Mon Nov 28 12:21:08 2022 GitCommit: "" GoVersion: go1.19.2 Os: linux OsArch: linux/amd64 Version: 4.3.1 ``` ### Podman in a container No ### Privileged Or Rootless Privileged ### Upstream Latest Release Yes ### Additional environment details _No response_ ### Additional information I have been able to reproduce the issue with `XFS` filesystem, but not with `ext4` filesystem.

github.com/containers/podman

cannot `podman rm` after reboot with fs full

opened 05:09AM - 22 Apr 22 UTC

martinetd

kind/bug

**Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)** … /kind bug **Description** Running `podman rm` (or `podman ps` or any other command) fails on a freshly booted system (runRoot empty) when graphRoot is full. In my particular use case, we have a filesystem dedicated to podman graphRoot, so when that hits maximum capacity our user could no longer delete stopped image to free space. **Steps to reproduce the issue:** I've reproduced on my laptop as follow, as root: ``` # truncate -s 200M /tmp/btr # mkfs.btrfs /tmp/btr # mount /tmp/btr /mnt/t # /src/podman/bin/podman --runroot /run/containers.test --root /mnt/t/containers ps # (eventually at this point run something) # dd if=/dev/urandom of=/mnt/t/filler bs=1M <ENOSPC> # for f in {1..100}; do dd if=/dev/urandom of=/mnt/t/filler.$f bs=4k count=4 status=none || break; done <ENOSPC> (rationale is single big file isn't enough to fill 100% of the FS) # rm -rf /run/containers.test # (simulate reboot) # /src/podman/bin/podman --runroot /run/containers.test --root /mnt/t/container ps ERRO[0000] [graphdriver] prior storage driver overlay failed: write /mnt/t/container/overlay/metacopy-check582242757/l1/.tmp-f3761660769: no space left on device Error: write /mnt/t/container/overlay/metacopy-check582242757/l1/.tmp-f3761660769: no space left on device # (same result with podman rm) # touch '/run/containers.test/overlay/metacopy()-false' '/run/containers.test/overlay/native-diff()-true' # /src/podman/bin/podman --runroot /run/containers.test --root /mnt/t/container ps <works> ``` **Describe the results you received:** ENOSPC error for something that shouldn't require space **Describe the results you expected:** actual listing files or allowing to delete some. **Additional information you deem important (e.g. issue happens only occasionally):** There are various tests made -- rightly so -- on overlay directory that are cached in /run. I see various ways of working around this: - move to cache to the storage we're testing. This is related to a specific graphRoot, so it'd make senes to cache it there instead -- that'd make the cached result persistent so it wouldn't go away on reboot and allow this to work. That's probably for the best -- what if someone changes their graphRoot without resetting their runRoot? - disable these checks for commands that shouldn't care about these (ps, rm probably won't go about creating new overlays, so don't need to know) - allow test failures and handle them as whatever result is safe for some commands (e.g. ps, rm); that's pretty hacky and probably not reliable **Output of `podman version`:** I've reproduced on today's main: ``` Client: Podman Engine Version: 4.0.0-dev API Version: 4.0.0-dev Go Version: go1.17.8 Git Commit: 78ccd833906087d171f608d66a0384135dc80717 Built: Fri Apr 22 13:53:53 2022 OS/Arch: linux/amd64 ``` **Output of `podman info --debug`:** shouldn't be needed, ask if you really want it. **Package info (e.g. output of `rpm -q podman` or `apt list podman`):** built from sources. **Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)** Yes

federico.ballarini · May 12, 2025, 12:57pm

Hi @mrmarkuz !

~]# runagent -m loki1 podman info
host:
  arch: amd64
  buildahVersion: 1.37.6
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.12-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: eb379dceb7efebd9a9d6b3349a57424d83483065'
  cpuUtilization:
    idlePercent: 98.67
    systemPercent: 0.6
    userPercent: 0.73
  cpus: 4
  databaseBackend: sqlite
  distribution:
    distribution: almalinux
    version: "9.5"
  eventLogger: file
  freeLocks: 2045
  hostname: cloud-hosting
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1003
      size: 1
    - container_id: 1
      host_id: 296608
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1003
      size: 1
    - container_id: 1
      host_id: 296608
      size: 65536
  kernel: 5.14.0-503.23.2.el9_5.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 1112879104
  memTotal: 8057352192
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.12.2-1.el9_5.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.12.2
    package: netavark-1.12.2-1.el9.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.12.2
  ociRuntime:
    name: crun
    package: crun-1.16.1-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.16.1
      commit: afa829ca0122bd5e1d67f1f38e6cc348027e3c32
      rundir: /run/user/1003/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240806.gee36266-6.el9_5.x86_64
    version: |
      pasta 0^20240806.gee36266-6.el9_5.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/user/1003/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.3.1-1.el9.x86_64
    version: |-
      slirp4netns version 1.3.1
      commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 7912058880
  swapTotal: 8589930496
  uptime: 1225h 5m 52.00s (Approximately 51.04 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  docker.io:
    Blocked: false
    Insecure: false
    Location: docker.io
    MirrorByDigestOnly: false
    Mirrors:
    - Insecure: false
      Location: ghcr.io/nethserver/docker.io
      PullFromMirror: ""
    Prefix: docker.io
    PullFromMirror: ""
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /home/loki1/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/loki1/.local/share/containers/storage
  graphRootAllocated: 83880697856
  graphRootUsed: 42968928256
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 6
  runRoot: /run/user/1003/containers
  transientStore: false
  volumePath: /home/loki1/.local/share/containers/storage/volumes
version:
  APIVersion: 5.2.2
  Built: 1738640782
  BuiltTime: Tue Feb  4 04:46:22 2025
  GitCommit: ""
  GoVersion: go1.22.9 (Red Hat 1.22.9-2.el9_5)
  Os: linux
  OsArch: linux/amd64
  Version: 5.2.2

and

~]# runagent -m loki1 podman system df
TYPE           TOTAL       ACTIVE      SIZE        RECLAIMABLE
Images         6           3           454.6MB     106.5MB (23%)
Containers     0           0           0B          0B (0%)
Local Volumes  2           0           192.2MB     192.2MB (100%)

davidep · May 12, 2025, 2:05pm

Probably the tmpfs mounted on /run/user/1003 (has Loki uid 1003?) has exhausted the inodes.

In this case either reboot the node or stop/start the Loki user session:

systemctl stop user@$(id -u loki1)
systemctl start user@$(id -u loki1)

federico.ballarini · May 12, 2025, 2:08pm

Yeah! @davidep you nailed it! How can I avoid this happens again?

davidep · May 12, 2025, 2:37pm

You hit a bug which has already been fixed. However once Loki fills the tmpfs a manual recovery is required. See Loki memberlist-kv startup error · Issue #7426 · NethServer/dev · GitHub

federico.ballarini · May 12, 2025, 2:48pm

Thank you!