For some time now, there have been increasing problems with my Nethserver 7, which has been running productively for about 3 years. The most important module for me is Dokuwiki.
Issues:
the users and groups are always missing (Account provider generic error: SSSD exit code 1,)
/var/log/messages: Jan 4 16:52:45 daho-nethserver sssd: tkey query failed: GSSAPI error: Major = Unspecified GSS failure. Minor code may provide more information, Minor = Server not found in Kerberos database.
backups fail again and again
brand new: Cockpit no longer starts, the old user interface under 980 is still running
the root partition is 80 full, as the Zabbix database continues to grow
What I did to cover the issues:
DC issues
systemctl restart nsdc - bring back the the user and groups, but not permanently
Restore a config backup - bring back the the user and groups, but not permanently
yum reinstall nethserver-dc - my last intervention some minutes ago; I donāt know the long-time result.
Backup Issue restic unlock -r /mnt/backup-dokuwiki resolves the locks, but not permanently. Next day, next lock.
Cockpit: no solution. I use the old Server Manager.
Zabbix-DB:
I have tried to shrink the DB, but have not been able to complete the procedure to the end.
BTW:
I thought it would be a good idea not to put much energy into the aging installation and migrate to NS8 - DokuWIki exists there as a module.
My attempt to migrate from NS7 to NS8 has so far failed because I could not restore the DokuWiki content. The previous support here in the forum was not effective.
My first priority is to get access to cockpit again. The other errors annoy me and make me nervous in terms of overall stability and the increase in errors.
If it were possible to quickly restore the DokuWiki content in the NS8 installation, I would not put any more effort into the NS7 server and shut it down.
Either way, I need further support with both.
this is mostly like a problem to do with your Default DNS for the DC, i have faced this problem almost 20 times, and each time was something new i learnt.
as for dokuwiki, kindly confrrm if the module has a NS7 to NS8 Migration.
When the locks occurs, is a related restic/backup process in the background?
To get some info on the lock:
restic list locks
restic cat lock your-lock-id # can give the PID holding the lock
Is prune configured and how often it runs? The prune command locks the repository exclusively, preventing other processes from accessing it (while prunning).
# systemctl -l status cockpit-user.socket cockpit.socket cockpit
ā cockpit-user.socket - Cockpit Web Service Socket for Users
Loaded: loaded (/usr/lib/systemd/system/cockpit-user.socket; enabled; vendor preset: disabled)
Active: active (listening) since Thu 2024-01-04 16:48:55 CET; 3h 6min ago
Docs: man:cockpit-ws(8)
Listen: [::]:9191 (Stream)
Jan 04 16:48:55 daho-nethserver.home.dargels.de systemd[1]: Listening on Cockpit Web Service Socket for Users.
ā cockpit.socket - Cockpit Web Service Socket
Loaded: loaded (/usr/lib/systemd/system/cockpit.socket; enabled; vendor preset: disabled)
Active: active (listening) since Thu 2024-01-04 16:48:55 CET; 3h 6min ago
Docs: man:cockpit-ws(8)
Listen: [::]:9090 (Stream)
Jan 04 16:48:55 daho-nethserver.home.dargels.de systemd[1]: Starting Cockpit Web Service Socket.
Jan 04 16:48:55 daho-nethserver.home.dargels.de systemd[1]: Listening on Cockpit Web Service Socket.
ā cockpit.service - Cockpit Web Service
Loaded: loaded (/usr/lib/systemd/system/cockpit.service; static; vendor preset: disabled)
Drop-In: /etc/systemd/system/cockpit.service.d
āānethserver.conf
Active: inactive (dead) since Thu 2024-01-04 16:58:02 CET; 2h 57min ago
Docs: man:cockpit-ws(8)
Process: 5703 ExecStart=/usr/libexec/cockpit-ws (code=exited, status=0/SUCCESS)
Process: 5699 ExecStartPre=/usr/sbin/remotectl certificate --ensure --user=root --group=cockpit-ws --selinux-type=etc_t (code=exited, status=0/SUCCESS)
Main PID: 5703 (code=exited, status=0/SUCCESS)
Jan 04 16:56:32 daho-nethserver.home.dargels.de systemd[1]: Starting Cockpit Web Service...
Jan 04 16:56:32 daho-nethserver.home.dargels.de remotectl[5699]: /usr/bin/chcon: can't apply partial context to unlabeled file ā/etc/cockpit/ws-certs.d/99-nethserver.certā
Jan 04 16:56:32 daho-nethserver.home.dargels.de remotectl[5699]: remotectl: couldn't change SELinux type context 'etc_t' for certificate: /etc/cockpit/ws-certs.d/99-nethserver.cert: Child process exited with code 1
Jan 04 16:56:32 daho-nethserver.home.dargels.de systemd[1]: Started Cockpit Web Service.
Jan 04 16:56:32 daho-nethserver.home.dargels.de cockpit-ws[5703]: Using certificate: /etc/cockpit/ws-certs.d/99-nethserver.cert
I think to recall the same SELinux message regarding cockpit is logged but without causing problems. So no concern here.
You might have to dig deeper (or try to start cockpit service and check status again) to find something relevant preventing cockpit from starting. Otherwise:
-- Logs begin at Thu 2024-01-04 16:48:44 CET, end at Thu 2024-01-04 21:28:54 CET. --
Jan 04 16:49:11 daho-nethserver.home.dargels.de systemd[1]: Starting Cockpit Web Service...
Jan 04 16:49:11 daho-nethserver.home.dargels.de remotectl[2438]: /usr/bin/chcon: can't apply partial context to unlabele
Jan 04 16:49:11 daho-nethserver.home.dargels.de remotectl[2438]: remotectl: couldn't change SELinux type context 'etc_t'
Jan 04 16:49:11 daho-nethserver.home.dargels.de systemd[1]: Started Cockpit Web Service.
Jan 04 16:49:11 daho-nethserver.home.dargels.de cockpit-ws[2441]: Using certificate: /etc/cockpit/ws-certs.d/99-nethserv
Jan 04 16:56:32 daho-nethserver.home.dargels.de systemd[1]: Starting Cockpit Web Service...
Jan 04 16:56:32 daho-nethserver.home.dargels.de remotectl[5699]: /usr/bin/chcon: can't apply partial context to unlabele
Jan 04 16:56:32 daho-nethserver.home.dargels.de remotectl[5699]: remotectl: couldn't change SELinux type context 'etc_t'
Jan 04 16:56:32 daho-nethserver.home.dargels.de systemd[1]: Started Cockpit Web Service.
Jan 04 16:56:32 daho-nethserver.home.dargels.de cockpit-ws[5703]: Using certificate: /etc/cockpit/ws-certs.d/99-nethserv
lines 1-11/11 (END)
-- Logs begin at Fri 2024-01-05 10:12:34 CET, end at Fri 2024-01-05 10:15:03 CET. --
Jan 05 10:13:00 daho-nethserver.home.dargels.de systemd[1]: Starting Cockpit Web Service...
Jan 05 10:13:00 daho-nethserver.home.dargels.de systemd[1]: Started Cockpit Web Service.
Jan 05 10:13:00 daho-nethserver.home.dargels.de cockpit-ws[2443]: Using certificate: /etc/cockpit/ws-certs.d/99-nethserv
lines 1-4/4 (END)
after restart service
Jan 05 10:17:35 daho-nethserver.home.dargels.de systemd[1]: Starting Cockpit Web Service...
Jan 05 10:17:35 daho-nethserver.home.dargels.de remotectl[3055]: /usr/bin/chcon: can't apply partial context to unlabele
Jan 05 10:17:35 daho-nethserver.home.dargels.de remotectl[3055]: remotectl: couldn't change SELinux type context 'etc_t'
Jan 05 10:17:35 daho-nethserver.home.dargels.de systemd[1]: Started Cockpit Web Service.
Jan 05 10:17:35 daho-nethserver.home.dargels.de cockpit-ws[3058]: Using certificate: /etc/cockpit/ws-certs.d/99-nethserv
lines 1-9/9 (END)
Jan 5 10:15:03 daho-nethserver sshd[2679]: Accepted keyboard-interactive/pam for root from 10.99.3.2 port 63649 ssh2
10.99.3.2 is my IP if Iām connected to the OPNSense-VPN.
I disconnected and oh wonder, cockpit starts.
Cockpit was only some seconds accessible, enough time to add 10.99.3.0/255.255.255.0 to trusted networks, then the session was closed.
I have to investigate out why I keep getting kicked out, even when Iām not on the VPN.