Addressing random failures during NS8 Core updates

There have been reports of apparently random failures during NS8 Core updates. Symptoms may vary; for example, the cluster-admin becomes inaccessible, and an “API not found” error is displayed when trying to access /cluster-admin via HTTP.

What these failed updates have in common is that some directories involved in the Core update are left with incorrect permissions. The most notable examples are /etc and /var, which cause many other errors, e.g.:

[root@rl1 ~]# ls -ld /etc/
drwx------. 99 root root 8192 Jan  9 11:33 /etc/

To get a full list of directories with incorrect permissions, run this command:

( cd / ; ls -ld $(grep /$ /var/lib/nethserver/node/state/coreimage.lst) ; )  | grep -- ------

While we work to reproduce and isolate the bug, I want to share a possible remediation procedure:

  1. Fix directories with incorrect permissions, e.g., chmod -c 0755 ....
  2. Reboot the node.

Ref Mattermost

2 Likes

The bug is recorded here: Corrupted system directory permissions after Core update · Issue #7250 · NethServer/dev · GitHub

As workaround, follow the instructions above:

1 Like