Sunday i tried to update my server running Nethserver on top of AlmaLinux. Unfortunately i have become a bit lazy with snapshots before updates because everything went very smoothly the last years. After rebooting the server nethserver didn’t start up, i even couldn’t access the cluster-admin page. The log files were flooded with messages that the containers couldn’t get started because of missing env files, but they were where they should be. I tried to figure out what happend, but i’m not experienced enough with podman containers so i decided to try to downgrade AlmaLinux to 9.7 to get the server running again (this time i snapshoted my system). Luckily that worked very easy and after the downgrade the system was running like before.
Think about that before upgrading your AlmaLinux and maybe this effects Rocky as well. Remember to snapshot your system before upgrades, it helps to revert changes, not all downgrades work so flawless.
Now i try to figure out why the upgrade didn’t work and learn about containers.
Hi Corinna, welcome to our community and thank you for the report!
I’m experiencing a very similar issue on Rocky Linux 9.8. Rootless services such as Traefik fail to start after reboot, making the cluster inaccessible.
The problem is reproducible both after a 9.7 → 9.8 upgrade followed by a reboot and on a fresh Rocky Linux 9.8 installation. In the journal I found messages pointing to systemd failures when starting rootless services:
Jun 03 10:08:45 rocky-linux98 systemd[13089]: Starting Rootless module/traefik1 agent...
Jun 03 10:08:45 rocky-linux98 systemd[13108]: agent.service: Failed at step CHDIR spawning /usr/bin/chmod: No such file or directory
Jun 03 10:08:45 rocky-linux98 systemd[13109]: agent.service: Failed at step CHDIR spawning /usr/local/bin/agent: No such file or directory
Jun 03 10:08:45 rocky-linux98 systemd[13089]: agent.service: Main process exited, code=exited, status=200/CHDIR
Jun 03 10:08:45 rocky-linux98 systemd[13089]: agent.service: Failed with result 'exit-code'.
Although I have not verified this with a downgrade, your observation that the issue disappears on 9.7 points to the systemd update as a possible cause.
Relevant versions:
systemd-252-55.el9_7.9.rocky.0.1.x86_64
systemd-252-67.el9_8.2.rocky.0.1.x86_64
We’re currently investigating the issue to identify the exact cause. As a precaution, the NS8 Rocky Linux repositories continue to serve Rocky Linux 9.7 packages until this blocker issue is resolved.
As an AlmaLinux user, I recommend staying on 9.7 for the time being and postponing the upgrade to 9.8 until we have identified and fixed the issue.
The fix passed QA verification on Rocky Linux. The testing commands 1) install a patched core [on node 1] package, 2) upgrade Rocky Linux to version 9.8, 3) reboot the node