High load, errors, long wait times for gui panels

NethServer Version: 7 final
Module:

One of my ns7 pre production servers is suddenly quite unhappy about presenting the dashboard. I seem to have this narrowed to the dashboard request, but not sure because even a visit to the Software Center takes twice as long with a push to 5+ on the load, never run into this with v7. I was going to update it. Haven’t been paying attention to it since I updated it to ncloud 11.0.2.
This instance was normally running at .1 load at any given time since it was built. It’s only got samba AD, file sharing and nextcloud with 2 users because I haven’t had time to migrate the v6.8 instance to it. Any thoughts?
Apr 22 15:01:07 server7c systemd: Stopping user-0.slice. Apr 22 15:53:03 server7c httpd: [NOTICE] Nethgui\Authorization\User: userrootauthenticated Apr 22 15:53:12 server7c httpd: [WARNING] NethServer\Tool\GroupProvider: Account provider connection timed out Apr 22 15:53:12 server7c httpd: [WARNING] Connection timed out Apr 22 15:53:18 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/10eth-unmapped exit code 9 Apr 22 15:53:20 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/10green-dhcp exit code 9 Apr 22 15:53:22 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/20admin-user exit code 9 Apr 22 15:53:24 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40nethserver-dc exit code 9 Apr 22 15:53:28 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40shorewall exit code 9 Apr 22 15:55:06 server7c sshd[11447]: Accepted password for root from 192.168.124.126 port 58780 ssh2 Apr 22 15:55:07 server7c httpd: [ERROR] NethServer\Tool\GroupProvider: AccountProvider_Error_11 Apr 22 15:55:07 server7c httpd: [ERROR] Resource temporarily unavailable Apr 22 15:55:09 server7c systemd: Created slice user-0.slice. Apr 22 15:55:09 server7c systemd: Starting user-0.slice. Apr 22 15:55:09 server7c systemd: Started Session 251 of user root. Apr 22 15:55:09 server7c systemd: Starting Session 251 of user root. Apr 22 15:55:09 server7c systemd-logind: New session 251 of user root. Apr 22 15:55:28 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/10eth-unmapped exit code 9 Apr 22 15:55:31 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/10green-dhcp exit code 9 Apr 22 15:55:33 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/20admin-user exit code 9 Apr 22 15:55:38 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40nethserver-dc exit code 9 Apr 22 15:55:41 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40password_strength exit code 9 Apr 22 15:55:43 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40shorewall exit code 9 Apr 22 15:55:45 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/80backup-data exit code 9 Apr 22 15:59:18 server7c httpd: [ERROR] NethServer\Tool\GroupProvider: AccountProvider_Error_11 Apr 22 15:59:18 server7c httpd: [ERROR] Resource temporarily unavailable Apr 22 15:59:39 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/10eth-unmapped exit code 9 Apr 22 15:59:41 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/10green-dhcp exit code 9 Apr 22 15:59:43 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/20admin-user exit code 9 Apr 22 15:59:45 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40nethserver-dc exit code 9 Apr 22 15:59:47 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40password_strength exit code 9 Apr 22 15:59:49 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40shorewall exit code 9 Apr 22 15:59:51 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/80backup-data exit code 9 Apr 22 16:01:04 server7c systemd: Started Session 252 of user root. Apr 22 16:01:04 server7c systemd: Starting Session 252 of user root.

It seems you’re experiencing problem while accessing the nsdc container.

The web interface is slow because you have scripts trying to connect to the DC without success.

1 Like

I’m not seeing anything else in the logs, maybe the log level isn’t high enough.
This is post reboot, note the bold.

**Apr 24 09:16:36 server7c systemd: Started Shorewall IPv4 firewall.**

Apr 24 09:17:20 server7c systemd: sssd.service start operation timed out. Terminating.
Apr 24 09:17:20 server7c sssd[be[domain.com]]: Shutting down
Apr 24 09:17:20 server7c sssd[be[legacy]]: Shutting down
Apr 24 09:17:20 server7c sssd[nss]: Shutting down
Apr 24 09:17:20 server7c sssd[pam]: Shutting down
Apr 24 09:17:20 server7c systemd: Failed to start System Security Services Daemon.
Apr 24 09:17:20 server7c systemd: Unit sssd.service entered failed state.
Apr 24 09:17:20 server7c systemd: sssd.service failed.
Apr 24 09:17:20 server7c systemd: Reached target User and Group Name Lookups.
Apr 24 09:17:20 server7c systemd: Starting User and Group Name Lookups.
Apr 24 09:17:20 server7c systemd: Starting Login Service…
Apr 24 09:17:20 server7c systemd: Starting Permit User Sessions…
Apr 24 09:17:21 server7c systemd: Started Permit User Sessions.
Apr 24 09:17:21 server7c systemd: Starting Wait for Plymouth Boot Screen to Quit…
Apr 24 09:17:21 server7c systemd: Started Command Scheduler.
Apr 24 09:17:21 server7c systemd: Starting Command Scheduler…
Apr 24 09:17:21 server7c systemd: Starting Terminate Plymouth Boot Screen…
Apr 24 09:17:21 server7c systemd: Started Login Service.
Apr 24 09:17:21 server7c systemd-logind: Watching system buttons on /dev/input/event0 (Power Button)
Apr 24 09:17:21 server7c systemd-logind: Watching system buttons on /dev/input/event1 (Sleep Button)
Apr 24 09:17:21 server7c systemd-logind: Watching system buttons on /dev/input/event4 (Video Bus)
Apr 24 09:17:21 server7c systemd-logind: New seat seat0.
Apr 24 09:17:21 server7c systemd: Received SIGRTMIN+21 from PID 263 (plymouthd).
Apr 24 09:17:21 server7c systemd: Started Terminate Plymouth Boot Screen.
Apr 24 09:17:21 server7c systemd: Started Wait for Plymouth Boot Screen to Quit.
Apr 24 09:17:21 server7c systemd: Started Getty on tty1.
Apr 24 09:17:21 server7c systemd: Starting Getty on tty1…
Apr 24 09:17:21 server7c systemd: Reached target Login Prompts.
Apr 24 09:17:21 server7c systemd: Starting Login Prompts.
Apr 24 09:17:21 server7c systemd: Reached target Multi-User System.
Apr 24 09:17:21 server7c systemd: Starting Multi-User System.
Apr 24 09:17:21 server7c systemd: Starting Update UTMP about System Runlevel Changes…
Apr 24 09:17:21 server7c systemd: Started Update UTMP about System Runlevel Changes.
Apr 24 09:17:21 server7c systemd: Startup finished in 493ms (kernel) + 2.966s (initrd) + 1min 53.591s (userspace) = 1min 57.052s.
Apr 24 09:18:36 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/10eth-unmapped exit code 9
Apr 24 09:18:37 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/10green-dhcp exit code 9
Apr 24 09:18:39 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/20admin-user exit code 9
Apr 24 09:18:43 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40nethserver-dc exit code 9
Apr 24 09:18:45 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40password_strength exit code 9
Apr 24 09:18:47 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40shorewall exit code 9
Apr 24 09:21:41 server7c chronyd[661]: Selected source 45.33.43.25
Apr 24 09:30:26 server7c systemd: Starting Cleanup of Temporary Directories…
Apr 24 09:30:35 server7c systemd: Started Cleanup of Temporary Directories.
Apr 24 09:42:22 server7c sshd[2134]: Accepted password for root from 192.168.124.126 port 51303 ssh2sophos pwd

sssd pam log;

(Mon Apr 24 09:15:31 2017) [sssd[pam]] [sss_dp_init] (0x0010): Failed to connect to monitor services.
(Mon Apr 24 09:15:31 2017) [sssd[pam]] [sss_process_init] (0x0010): fatal error setting up backend connector
(Mon Apr 24 09:15:31 2017) [sssd[pam]] [pam_process_init] (0x0010): sss_process_init() failed

This is getting on my nerves.

[root@server7c ~]# systemctl status nsdc
● nsdc.service - NethServer Domain Controller container
   Loaded: loaded (/usr/lib/systemd/system/nsdc.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-04-24 09:15:13 MST; 9h ago
     Docs: man:systemd-nspawn(1)
 Main PID: 986 (systemd-nspawn)
   Status: "Container running."

[root@server7c ~]# systemctl status sssd.service
● sssd.service - System Security Services Daemon
   Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/sssd.service.d
           └─journal.conf
   Active: failed (Result: timeout) since Mon 2017-04-24 17:55:51 MST; 35min ago
  Process: 16302 ExecStart=/usr/sbin/sssd -D -f (code=exited, status=0/SUCCESS)

Apr 24 17:54:33 server7c.domain.com sssd[be[domain.com]][16306]: Starting up
Apr 24 17:54:35 server7c.domain.com sssd[nss][16308]: Starting up
Apr 24 17:54:35 server7c.domain.com sssd[pam][16309]: Starting up
Apr 24 17:55:50 server7c.domain.com systemd[1]: sssd.service start operation timed out. Terminating.
Apr 24 17:55:51 server7c.domain.com systemd[1]: Failed to start System Security Services Daemon.
Apr 24 17:55:51 server7c.domain.com systemd[1]: Unit sssd.service entered failed state.
Apr 24 17:55:51 server7c.domain.com systemd[1]: sssd.service failed.
Apr 24 17:55:51 server7c.domain.com sssd[be[legacy]][16307]: Shutting down
Apr 24 17:55:51 server7c.domain.com sssd[be[domain.com]][16306]: Shutting down
Apr 24 17:55:52 server7c.domain.com sssd[pam][16309]: Shutting down

Apr 24 16:47:05 nsdc-server7c.domain.com samba[24]: ../source4/dsdb/dns/dns_update.c:324: Failed SPN update - NT_STATUS_IO_TIMEOUT
Apr 24 16:57:04 nsdc-server7c.domain.com samba[24]: ../source4/dsdb/dns/dns_update.c:295: Failed DNS update - NT_STATUS_IO_TIMEOUT

[root@server7c log]# nslookup server7c
Server:         127.0.0.1
Address:        127.0.0.1#53

Name:   server7c.domain.com
Address: 192.168.124.227

[root@server7c log]# nslookup google.com
Server:         127.0.0.1
Address:        127.0.0.1#53

Non-authoritative answer:
Name:   google.com
Address: 216.58.219.14
Name:   google.com
Address: 216.58.219.14
Name:   google.com
Address: 216.58.219.14

It took most of yesterday, but though I was unable to find a direct cause for all this, I was able to get the updates, many, to complete, with included sssd-1.14.0-43.el7_3.14.x86_64.

It may have been this;

* When an SSSD process needed to be restarted because it was being blocked by a
long-running task, a deadlock sometimes occurred during the restart. For
example, this problem occurred when the sssd_be process was enumerating a large
domain. This update fixes a bug in the watchdog code, which prevents the
deadlock. (BZ#1418943)

Enumeration is disabled, so the bug shouldn’t be hit.

Well, I still have the broken snapshot but I’m out of troubleshooting ideas at this point. If it happens to anyone else later maybe we can use my snapshot to troubleshoot as long as I have it.

1 Like

Nope, something is still not right.

Apr 26 09:57:28 server7c systemd: Starting Update UTMP about System Runlevel Changes...
Apr 26 09:57:28 server7c systemd: Started Update UTMP about System Runlevel Changes.
Apr 26 09:57:28 server7c shorewall: Processing /etc/shorewall/started ...
Apr 26 09:57:29 server7c logger: Shorewall started
Apr 26 09:57:29 server7c shorewall: done.
Apr 26 09:57:29 server7c systemd: Started Shorewall IPv4 firewall.
Apr 26 09:57:29 server7c systemd: Startup finished in 596ms (kernel) + 3.123s (initrd) + 1min 12.448s (userspace) = 1min 16.168s.
Apr 26 09:58:11 server7c chronyd[683]: Selected source 74.122.204.3
Apr 26 09:58:37 server7c httpd: [ERROR] NethServer\Tool\GroupProvider: AccountProvider_Error_11
Apr 26 09:58:37 server7c httpd: [ERROR] Resource temporarily unavailable
Apr 26 09:58:53 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/10eth-unmapped exit code 9
Apr 26 09:58:54 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/10green-dhcp exit code 9
Apr 26 09:58:56 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/20admin-user exit code 9
Apr 26 09:58:58 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40nethserver-dc exit code 9
Apr 26 09:59:00 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40password_strength exit code 9
Apr 26 09:59:02 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40shorewall exit code 9
Apr 26 09:59:05 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/80backup-data exit code 9
Apr 26 09:59:15 server7c chronyd[683]: Selected source 208.75.89.4
Apr 26 10:01:13 server7c systemd: Created slice user-0.slice.
Apr 26 10:01:14 server7c systemd: Starting user-0.slice.
Apr 26 10:01:14 server7c systemd: Started Session 1 of user root.
Apr 26 10:01:14 server7c systemd: Starting Session 1 of user root.
Apr 26 10:01:19 server7c systemd: Removed slice user-0.slice.
Apr 26 10:01:19 server7c systemd: Stopping user-0.slice.
Apr 26 10:02:17 server7c httpd: [WARNING] NethServer\Tool\GroupProvider: Account provider connection timed out
Apr 26 10:02:31 server7c httpd: [WARNING] Connection timed out
Apr 26 10:02:46 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/10eth-unmapped exit code 9
Apr 26 10:02:49 server7c sshd[2051]: Accepted password for root from 192.168.124.126 port 51527 ssh2
Apr 26 10:02:49 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/20admin-user exit code 9
Apr 26 10:02:50 server7c systemd: Created slice user-0.slice.
Apr 26 10:02:50 server7c systemd: Starting user-0.slice.
Apr 26 10:02:50 server7c systemd: Started Session 2 of user root.
Apr 26 10:02:50 server7c systemd: Starting Session 2 of user root.
Apr 26 10:02:50 server7c systemd-logind: New session 2 of user root.
Apr 26 10:02:53 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40shorewall exit code 9
Apr 26 10:06:37 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/10eth-unmapped exit code 9
Apr 26 10:06:40 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/10green-dhcp exit code 9
Apr 26 10:06:42 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/20admin-user exit code 9
Apr 26 10:06:44 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40nethserver-dc exit code 9
Apr 26 10:06:46 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40password_strength exit code 9
Apr 26 10:06:48 server7c admin-todos: [ERROR] admin-todos: /etc/nethserver/todos.d/40shorewall exit code 9

Clicking software center brought this to a halt, and I can’t believe that action requires that many php threads. This has the latest nextcloud with the php 5.6 dependency.

Any gui web page request just takes forever, sending the load and wait into the stratosphere, as soon the the request and display completes the load and wait drop back to zero, log viewer, dashboard is the real killer, sure seems like something is up with apache and php.

httpd-admin error log;

[Wed Apr 26 09:57:01.709054 2017] [suexec:notice] [pid 943] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Wed Apr 26 09:57:02.102114 2017] [ssl:error] [pid 943] AH02217: ssl_stapling_init_cert: Can't retrieve issuer certificate!
[Wed Apr 26 09:57:02.102159 2017] [ssl:error] [pid 943] AH02235: Unable to configure server certificate for stapling
[Wed Apr 26 09:57:02.102181 2017] [ssl:warn] [pid 943] AH01906: RSA server certificate is a CA certificate (BasicConstraints: CA == TRUE !?)
[Wed Apr 26 09:57:02.102203 2017] [ssl:warn] [pid 943] AH01909: RSA certificate configured for server7c.domain.com:443 does NOT include an ID which matches the server name
[Wed Apr 26 09:57:02.134476 2017] [auth_digest:notice] [pid 943] AH01757: generating secret for digest authentication ...
[Wed Apr 26 09:57:02.135099 2017] [lbmethod_heartbeat:notice] [pid 943] AH02282: No slotmem from mod_heartmonitor
[Wed Apr 26 09:57:02.135443 2017] [ssl:warn] [pid 943] AH01873: Init: Session Cache is not configured [hint: SSLSessionCache]
[Wed Apr 26 09:57:02.135702 2017] [ssl:error] [pid 943] AH02217: ssl_stapling_init_cert: Can't retrieve issuer certificate!
[Wed Apr 26 09:57:02.135720 2017] [ssl:error] [pid 943] AH02235: Unable to configure server certificate for stapling
[Wed Apr 26 09:57:02.135730 2017] [ssl:warn] [pid 943] AH01906: RSA server certificate is a CA certificate (BasicConstraints: CA == TRUE !?)
[Wed Apr 26 09:57:02.135742 2017] [ssl:warn] [pid 943] AH01909: RSA certificate configured for server7c.domain.com:443 does NOT include an ID which matches the server name
[Wed Apr 26 09:57:15.369415 2017] [mpm_prefork:notice] [pid 943] AH00163: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips PHP/5.4.16 configured -- resuming normal operations
[Wed Apr 26 09:57:15.369466 2017] [core:notice] [pid 943] AH00094: Command line: '/usr/sbin/httpd -f /etc/httpd/admin-conf/httpd.conf -c MaxConnectionsPerChild 12 -D FOREGROUND'
[Wed Apr 26 09:58:37.787420 2017] [:error] [pid 1588] [client 192.168.124.126:51417] [ERROR] NethServer\\Tool\\GroupProvider: AccountProvider_Error_11, referer: https://server7c:980/en-US/Shutdown
[Wed Apr 26 09:58:37.832329 2017] [:error] [pid 1588] [client 192.168.124.126:51417] [ERROR] Resource temporarily unavailable\n, referer: https://server7c:980/en-US/Shutdown
[Wed Apr 26 10:02:17.568136 2017] [:error] [pid 1924] [client 192.168.124.126:51494] [WARNING] NethServer\\Tool\\GroupProvider: Account provider connection timed out, referer: https://server7c:980/en-US/Sssd
[Wed Apr 26 10:02:17.599921 2017] [:error] [pid 1924] [client 192.168.124.126:51494] [WARNING] Connection timed out\n, referer: https://server7c:980/en-US/Sssd

That’s a lot of swap use so I’ve given it another gig.