Read-only filesystem after CentOS 7.4 update and reboot

NethServer Version: 7.3
Module: server-manager

Some time after reboot getting read-only filesystem with two different (testing) virtual machines.

VM server with openldap account provider
Updating to CentOS 7.4 (some expected warnings):

warning: /etc/nsswitch.conf created as /etc/nsswitch.conf.rpmnew
warning: /etc/dnsmasq.conf created as /etc/dnsmasq.conf.rpmnew
Warning: slapd.service changed on disk. Run 'systemctl daemon-reload' to reload units.
warning: /etc/ssh/sshd_config created as /etc/ssh/sshd_config.rpmnew
warning: /etc/chrony.conf created as /etc/chrony.conf.rpmnew
warning: /etc/chrony.keys created as /etc/chrony.keys.rpmnew

After CentOS 7.4 updates, rebooted the server and the time came wrong.
Getting Nethgui: 500 - Internal server error when accessing “date and time”, “Services”, slow response, and finally no access.

Read-only filesystem messages, probably causing issues like:

  • Many errors regarding rrdtool and collectd.
  • Multiple services failed to start.
Sep 13 22:22:49 server.domain.tld kernel: XFS (dm-0): unknown mount option [acl].
Sep 13 22:22:49 server.domain.tld systemd-journal[508]: Journal started
Sep 13 22:22:46 server.domain.tld systemd[1]: systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 -SECCOMP +BLKID +ELFUTIL
Sep 13 22:22:46 server.domain.tld systemd[1]: Detected virtualization kvm.
Sep 13 22:22:46 server.domain.tld systemd[1]: Detected architecture x86-64.
Sep 13 22:22:46 server.domain.tld systemd[1]: Set hostname to <server.domain.tld>.
Sep 13 22:22:50 server.domain.tld systemd[1]: Started LVM2 metadata daemon.
Sep 13 22:22:50 server.domain.tld systemd[1]: Starting LVM2 metadata daemon...
Sep 13 22:22:50 server.domain.tld systemd-remount-fs[511]: mount: / not mounted or bad option
Sep 13 22:22:50 server.domain.tld systemd-remount-fs[511]: In some cases useful info is found in syslog - try
Sep 13 22:22:50 server.domain.tld systemd-remount-fs[511]: dmesg | tail or so.
Sep 13 22:22:50 server.domain.tld systemd-remount-fs[511]: /bin/mount for / exited with exit status 32.
Sep 13 22:22:50 server.domain.tld systemd[1]: systemd-remount-fs.service: main process exited, code=exited, status=1/FAILURE
Sep 13 22:22:50 server.domain.tld systemd[1]: Failed to start Remount Root and Kernel File Systems.
Sep 13 22:22:50 server.domain.tld systemd[1]: Unit systemd-remount-fs.service entered failed state.
Sep 13 22:22:50 server.domain.tld systemd[1]: systemd-remount-fs.service failed.
Sep 13 22:22:50 server.domain.tld systemd[1]: Starting Configure read-only root support...

Sep 13 22:33:28 server.domain.tld dbus[884]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service'
Sep 13 22:33:28 server.domain.tld dbus-daemon[884]: dbus[884]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service'
Sep 13 22:33:28 server.domain.tld systemd[1]: systemd-timedated.service failed to run 'start' task: Read-only file system
Sep 13 22:33:28 server.domain.tld systemd[1]: Failed to start Time & Date Service.
Sep 13 22:33:28 server.domain.tld systemd[1]: Unit systemd-timedated.service entered failed state.
Sep 13 22:33:28 server.domain.tld systemd[1]: systemd-timedated.service failed.
Sep 13 22:33:28 server.domain.tld systemd[1]: Starting Time & Date Service...
Sep 13 22:33:53 server.domain.tld dbus[884]: [system] Failed to activate service 'org.freedesktop.timedate1': timed out
Sep 13 22:33:53 server.domain.tld dbus-daemon[884]: dbus[884]: [system] Failed to activate service 'org.freedesktop.timedate1': timed out
Sep 13 22:33:53 server.domain.tld httpd[2341]: [EXCEPTION] RuntimeException 1383145266: Socket read error (in /usr/share/nethesis/Nethgui/System/EsmithDatabase.php:376)
Sep 13 22:33:53 server.domain.tld systemd[1]: smwingsd.service: main process exited, code=exited, status=11/n/a
Sep 13 22:33:53 server.domain.tld systemd[1]: Unit smwingsd.service entered failed state.
Sep 13 22:33:53 server.domain.tld systemd[1]: smwingsd.service failed.
Sep 13 22:33:53 server.domain.tld httpd[2341]: [EXCEPTION] RuntimeException 1383145263: Socket write error (in /usr/share/nethesis/Nethgui/System/EsmithDatabase.php:367)
Sep 13 22:33:53 server.domain.tld httpd[2341]: [EXCEPTION] RuntimeException 1383145263: Socket write error (in /usr/share/nethesis/Nethgui/System/EsmithDatabase.php:367)
Sep 13 22:33:53 server.domain.tld httpd[2341]: [EXCEPTION] UnexpectedValueException 1350909145: Nethgui\System\EsmithDatabase: internal database command failed (in /usr/share/nethesis/Nethgui/System/EsmithDataba
Sep 13 22:34:13 server.domain.tld collectd[1217]: rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/server.domain.tld/cpu-0/cpu-interrupt.rrd) failed: /var/lib/collectd/rrd/server.domain.tld/cpu-0/cpu-interrup

Sep 13 22:42:02 server.domain.tld systemd[1]: chronyd.service failed to run 'start' task: Read-only file system
Sep 13 22:42:02 server.domain.tld systemd[1]: Failed to start NTP client/server.
Sep 13 22:42:02 server.domain.tld systemd[1]: Unit chronyd.service entered failed state.
Sep 13 22:42:02 server.domain.tld systemd[1]: chronyd.service failed.
Sep 13 22:42:02 server.domain.tld systemd[1]: Starting NTP client/server...

Sep 13 22:45:13 server.domain.tld httpd[1814]: [WARNING] Nethgui\System\NethPlatform: fsockopen(): unable to connect to unix:///var/run/smwingsd.sock:-1 (Connection refused)
Sep 13 22:45:13 server.domain.tld httpd[1814]: [WARNING] Invalid socket (111): Connection refused. Fall back to exec().
Sep 13 22:45:13 server.domain.tld sudo[2222]:   srvmgr : TTY=unknown ; PWD=/usr/share/nethesis/nethserver-manager ; USER=root ; COMMAND=/sbin/e-smith/db configuration getjson
Sep 13 22:45:13 server.domain.tld dbus[892]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service'
Sep 13 22:45:13 server.domain.tld dbus-daemon[892]: dbus[892]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service'
Sep 13 22:45:13 server.domain.tld systemd[1]: systemd-timedated.service failed to run 'start' task: Read-only file system
Sep 13 22:45:13 server.domain.tld systemd[1]: Failed to start Time & Date Service.
Sep 13 22:45:13 server.domain.tld systemd[1]: systemd-timedated.service failed.
Sep 13 22:45:13 server.domain.tld systemd[1]: Starting Time & Date Service...
Sep 13 22:45:38 server.domain.tld dbus[892]: [system] Failed to activate service 'org.freedesktop.timedate1': timed out
Sep 13 22:45:38 server.domain.tld dbus-daemon[892]: dbus[892]: [system] Failed to activate service 'org.freedesktop.timedate1': timed out
Sep 13 22:45:38 server.domain.tld httpd[1814]: [EXCEPTION] UnexpectedValueException 1350896938: Nethgui\System\EsmithDatabase: internal database command failed! (in /usr/share/nethesis/Nethgui/System/EsmithDatab

VM server with AD account provider

No connection to server-manager
Many “Read-only file system” error messages, mostly causing errors like:

Sep 13 23:47:03 server.domain.tld esmith::event[3294]: Event: nethserver-firewall-base-save FAILED
Sep 13 23:47:03 server.domain.tld esmith::event[3145]: Action: /etc/e-smith/events/trusted-networks-modify/S94firewall-adjust FAILED: 1 [0.565999]
Sep 13 23:47:03 server.domain.tld esmith::event[3145]: Event: trusted-networks-modify FAILED
Sep 13 23:47:03 server.domain.tld nethserver-config-network[1163]: Action: /etc/e-smith/events/interface-update/S95trusted-networks-modify FAILED: 1 [3.295498]
Sep 13 23:47:03 server.domain.tld nethserver-config-network[1163]: Event: interface-update FAILED
Sep 13 23:47:03 server.domain.tld systemd[1]: nethserver-config-network.service: main process exited, code=exited, status=1/FAILURE
Sep 13 23:47:03 server.domain.tld systemd[1]: Failed to start Reconfigure newtork interfaces.
Sep 13 23:47:03 server.domain.tld systemd[1]: Unit nethserver-config-network.service entered failed state.
Sep 13 23:47:03 server.domain.tld systemd[1]: nethserver-config-network.service failed.
1 Like

Hi @dnutan
we got around the read only with booting to an earlier kernel. But that has not allowed us access to the files shared by the AD. We have DHCP and get internet access but can not get the sssd to run properly. Currently removing the kernel-3.10.0-693.2.2.el7.
There is a reference to this issues on the centos.org site https://wiki.centos.org/Manuals/ReleaseNotes/CentOS7#head-281c090cc4fbc6bb5c7d4cd82a266fce807eee7c

samba share with sssd authentication is broken. This is being worked on upstream. A workaround is to downgrade the samba packages to an earlier version. <

If anyone has better ideas please add. TIA

2 Likes

We upgraded our machines from CR repository a couple of days ago, but we didn’t reboot them.

Except for the issue with Samba authentication, for now we didn’t hit any more bug.

Edit

@dnutan I see this on you log:

Sep 13 22:22:49 server.domain.tld kernel: XFS (dm-0): unknown mount option [acl].

Could you try to remove the acl option?

@compsos: booting from previous kernel (3.10.0-514.26.2) instead of the latest (3.10.0-693.2.2) worked.
@giacomo: same issue after removing acl mount option and booting from kernel 3.10.0-693.2.2

Edit: actually, removing acl helped, but there was another unknown option contributing to the issue:

XFS (dm-0): unknown mount option [user_xattr]

The manual has some info on the mount options:

NethServer add a special fstab key inside the configuration e-smith db. Each prop of fstab is in the form mountpoint=options.

# config show fstab 
fstab=configuration
    /=defaults,acl,user_xattr

These are test-bed machines, just to expose problems that could help others.

1 Like

On the other server, having root partition formatted with XFS as well, instead of directly editing fstab file I’ve tried with:

# config show fstab 
fstab=configuration
    /=defaults,acl,user_xattr
# config setprop fstab / defaults
# signal-event fstab-update 

But after a call to signal-event nethserver-samba-update the config prop was rebuilt with the additional options, as per the code, and a reboot drive to a read-only filesystem again. Redid the workaround.

2 Likes

Just wanted to confirm: With kernel 3.10.0-514.el7 everything seems to work on my “all software installed” testmachine…turned ldap into ad and rebooted and it worked!
Samba is working, Yum is working and I don’t get errors in dashboard…so looks good as workaround…

1 Like

Hi
we are now stable on the kernel by removing the latest and going back to 3.10.0-514.26.2.el7.x86_64. And now the systemctl status is OK with no errors on sssd. It did have

indent preformatted text by 4 spacesupdate failed: SERVFAIL

; TSIG error with server: tsig verify failure
update failed: SERVFAIL
; TSIG error with server: tsig verify failure
update failed: SERVFAIL
; TSIG error with server: tsig verify failure
update failed: SERVFAIL
: tkey query failed: GSSAPI error: Major = Unspecified GSS failure. Minor code may provide more information, Minor = Server not found in Kerberos database.
: tkey query failed: GSSAPI error: Major = Unspecified GSS failure. Minor code may provide more information, Minor = Server not found in Kerberos database.
: tkey query failed: GSSAPI error: Major = Unspecified GSS failure. Minor code may provide more information, Minor = Server not found in Kerberos database.

Samba downgraded to

samba-common-libs-4.6.2-8.el7.x86_64

samba-common-4.6.2-8.el7.noarch
samba-4.6.2-8.el7.x86_64
samba-common-tools-4.6.2-8.el7.x86_64
samba-client-libs-4.6.2-8.el7.x86_64
samba-client-4.6.2-8.el7.x86_64
samba-libs-4.6.2-8.el7.x86_64
nethserver-samba-2.0.7-1.ns7.noarch

So what we have left is the ibays can be seen from Linux workstation with the access open to guest.
The only other clue I have to the issue is a nas unit that will join the domain, like windows workstations, but again blocks access, like the server, repeating asking for username and password.
BUT the nas as shown something else. When setting the NTP setting to the IP or the severname it comes back with SERVERNAME.servername.domainname. The 1st servername is in capitals. Is this a clue to the problem?

1 Like

I just replicated the issue.

The problem seems bound to filesystem options (acl and xattr) enabled by nethserver-samba.
These options are useful only when using EXT4, since XFS already have them enabled by default.

The mount fails only with kernel 3.10.0-693.2.2.el7.x86_64. I’ve read all the docs and RPM changelog but I can’t figure out what is the cause. Also kernel behavior isn’t supposed to change for all RHEL 7.x life cycle :frowning:

The workaround is this one:

  1. Reboot with a working kernel ( 3.10.0-514-*). You can verify the running kernel using uname -r

  2. Execute:

config setprop fstab / defaults
signal-event fstab-update 
  1. Reboot with the new kernel

I will try to write down a NethServer issue with a fix, then I will report it to the main thread.

3 Likes

A testing fix is out:

3 Likes

The fix is working well on both servers.

Not related to this issue: If applied to a server updated to CentOS 7.4, there’s an error (probably due to the bugged upstream update):

Action: /etc/e-smith/events/nethserver-samba-update/S30nethserver-samba-libwbclient FAILED: 2 [0.160511]

1 Like

its worked for me after update with this fix
thanks

2 Likes

Samba AD fix here
http://community.nethserver.org/t/centos-7-4-1708-do-not-upgrade-if-using-samba-shared-folders/7801/36

Fix released.

1 Like

But there is no fix at the moment to do a save update, is´ńt it?

If the update is installed before reboot, it will fix also the updated machine.

1 Like

Hi there

I’m not able to access my shares from nethserver with samba AD after update - i applied the update and also this steps:
Workaround

Reboot with a working kernel ( 3.10.0-514-*). You can verify the running kernel using uname -r
Execute:
config setprop fstab / defaults
signal-event fstab-update
Reboot with the new kernel

But it always asks for username and password when accessing the network share folder.

Can anyone help?

This now worked for me:
yum --enablerepo=nethforge-testing update sssd-libwbclient
signal-event nethserver-samba-update

Try this. Works for me

http://community.nethserver.org/t/centos-7-4-1708-do-not-upgrade-if-using-samba-shared-folders/7801/36

1 Like

2 posts were split to a new topic: SSSD ldap error: Cannot contact any KDC