Certificate Weirdness

Jimbo · February 4, 2020, 10:11am

I have Nethserver running, System version NethServer release 7.7.1908 (final) Kernel release 3.10.0-1062.4.3.el7.x86_64. It’s got three identities, each running with a different FQDN, one on the “main” server and two on virtual servers. I got a certificate from LetsEncrypt, with AlternateDNS names so that one certificate was good for all three FQDNs. Tested them with https on Firefox and Chrome. All good.

Then a few days later, wanted to access the console (port 980) and got a message saying the certificate was bad. This is weird…port 80 and port 443 (http and https) work fine with the certificates, but browsing to 980 gives errors and Firefox won’t go past them.

I examined the certificate, and found it had been rejected because of its validity period: Not After
01/02/2020, 20:03:54 (Greenwich Mean Time).

The default certificate according to the server is the right one, it has three DNS alternatives, and it is valid
Not Before: Jan 30 19:29:27 2020 GMT
Not After : Apr 29 19:29:27 2020 GMT
So when I browse to port 980, its picking up another certificate from somewhere with a bad expiry date.
There are two other certificates on the server, acording to the console, one which is for only one of the sites, and the other the Nethserver’s self-signed original, but neither have an end date of 01/02/2020, 20:03:54

Can anyone suggest what is going on here?

Thanks

Jim

davidep · February 4, 2020, 10:20am

Hi @Jimbo,

Maybe the Apache instance on port 980 was still running with the old certificate in RAM.

Did you try to restart the Server Manager?

From SSH: systemctl restart httpd-admin

From the Server Manager itself (ensure you have SSH access to avoid being locked out!):

Services > Service > httpd-admin > Restart

Jimbo · February 4, 2020, 1:05pm

Thanks, I’ll try that later today when I’m onsite…don’t want to loose contact
Jim

Jimbo · February 4, 2020, 8:32pm

Well, I tried restarting the http-admin service but that made no difference. I wanted to restart the service that was supporting port 9090 (Cockpit) because that was suffering the same issue with Certificates as 980.

However, there is nothing in Services that nominates either poty 9090 or Cockpit, so chose to reboot the server (after taking a full backup!). All came back OK, but the two ports 9090 and 980 still suffer time-expired certs. The regular ports 80 and 443 are working perfectly as expected with a single multi named X.509 cert from LetsEncrypt.
So, unfortunately, I don’t think its a stale certificate issue, or if it is, it can’t be refreshed by restart

Thanks

Jim

Andy_Wismer · February 4, 2020, 8:37pm

@Jimbo

Can you read / display the Cert details (eg from Firefox…)
Issuer,etc…

That info might help finding out WHERE did this certificate come from, that might also solve the problem of where (in which directory) it may be “hiding”…

My 2 cents
Andy

davidep · February 4, 2020, 9:15pm

Ok then I suggest to reapply the certificates configuration from a ssh shell. There could be an interrupted event: in the past months a bug was fixed in this sense. Ensure you have the latest updates, then run

 signal-event certificate-update

Then restart httpd-admin again.

Jimbo · February 5, 2020, 11:11am

Sort of partial success…we are moving in the right direction
I ran all updates, issued the “signal-event Certificate-update” command and restarted httpd-admin as you suggested, and now Cockpit (port 9090) works on https, so thats a major bonus: control traffic is back to being encrypted. Strangely, port 980 now refuses to respond at all…I’ve restarted httpd-admin again, just to be sure, but that made no difference.

So what it looks like is that there was a stale certificate on 9090, and the actions above have cleared it. Whatever has happened to port 980 is affecting both http and httpd and stopping it responding at all. In many ways this is acceptable: Cockpit is by far a better UI, and its secure now, so I don’t need httpd-admin. I may do another full reboot when I get close to the server, but for now its running/secure, so I have a holding position

@Andy, thanks for your interest, I had a look at the certificates a while back, and did so again after your posting, but there was nothing there that gave any location or other info that helped in debugging, and now, following Davidep’s suggestion, the problem has changed from “bad certificate” to “connection timeout”.

Jim

davidep · February 5, 2020, 11:37am

Is httpd-admin running? Please attach the output of

systemctl status httpd-admin
journalctl -u httpd-admin

You could find more information in /var/log/httpd-admin/error_log and /var/log/messages.

Connection timeout? Is there something between your client and port 980?

Jimbo · February 5, 2020, 12:14pm

root@bastion ~]# systemctl status httpd-admin
● httpd-admin.service - Server Manager UI httpd instance
Loaded: loaded (/usr/lib/systemd/system/httpd-admin.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2020-02-05 10:52:34 GMT; 1h 14min ago
Docs: https://github.com/NethServer/nethserver-httpd-admin
Main PID: 16511 (httpd)
CGroup: /system.slice/httpd-admin.service
├─16511 /usr/sbin/httpd -f /etc/httpd/admin-conf/httpd.conf -c MaxConnectionsPerChild 12 -DFOREGROUND
├─16516 /usr/sbin/httpd -f /etc/httpd/admin-conf/httpd.conf -c MaxConnectionsPerChild 12 -DFOREGROUND
├─16517 /usr/sbin/httpd -f /etc/httpd/admin-conf/httpd.conf -c MaxConnectionsPerChild 12 -DFOREGROUND
├─16518 /usr/sbin/httpd -f /etc/httpd/admin-conf/httpd.conf -c MaxConnectionsPerChild 12 -DFOREGROUND
├─16519 /usr/sbin/httpd -f /etc/httpd/admin-conf/httpd.conf -c MaxConnectionsPerChild 12 -DFOREGROUND
└─16520 /usr/sbin/httpd -f /etc/httpd/admin-conf/httpd.conf -c MaxConnectionsPerChild 12 -DFOREGROUND

Feb 05 10:52:34 <server_name_obscured> systemd[1]: Started Server Manager UI httpd instance.
[root@bastion ~]#

[root@bastion ~]# journalctl -u httpd-admin
– Logs begin at Tue 2020-02-04 20:05:55 GMT, end at Wed 2020-02-05 12:08:31 GMT. –
Feb 04 20:06:20 <server_name_obscured> systemd[1]: Started Server Manager UI httpd instance.
Feb 04 20:08:45 <server_name_obscured> httpd[3388]: [NOTICE] Nethgui\Module\Logout: user root logged out
Feb 04 20:23:04 <server_name_obscured> systemd[1]: Reloading Server Manager UI httpd instance.
Feb 04 20:23:04 <server_name_obscured> systemd[1]: Reloaded Server Manager UI httpd instance.
Feb 05 10:52:34 <server_name_obscured> systemd[1]: Stopping Server Manager UI httpd instance…
Feb 05 10:52:34 <server_name_obscured> systemd[1]: Stopped Server Manager UI httpd instance.
Feb 05 10:52:34 <server_name_obscured> systemd[1]: Started Server Manager UI httpd instance.
[root@bastion ~]#

But here is the pay-dirt, in httpd-admin/error_log: repeated versions of the following:

[Wed Feb 05 10:52:34.555371 2020] [mpm_prefork:notice] [pid 2906] AH00169: caught SIGTERM, shutting down
[Wed Feb 05 10:52:34.638419 2020] [suexec:notice] [pid 16511] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Wed Feb 05 10:52:34.639444 2020] [ssl:warn] [pid 16511] AH01909: RSA certificate configured for <server_name_obscured>:443 does NOT include an ID which matches the server name
[Wed Feb 05 10:52:34.675094 2020] [lbmethod_heartbeat:notice] [pid 16511] AH02282: No slotmem from mod_heartmonitor
[Wed Feb 05 10:52:34.675454 2020] [ssl:warn] [pid 16511] AH01873: Init: Session Cache is not configured [hint: SSLSessionCache]
[Wed Feb 05 10:52:34.675920 2020] [ssl:warn] [pid 16511] AH01909: RSA certificate configured for <server_name_obscured>:443 does NOT include an ID which matches the server name
[Wed Feb 05 10:52:34.736326 2020] [mpm_prefork:notice] [pid 16511] AH00163: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/5.4.16 configured – resuming normal operations
[Wed Feb 05 10:52:34.736362 2020] [core:notice] [pid 16511] AH00094: Command line: ‘/usr/sbin/httpd -f /etc/httpd/admin-conf/httpd.conf -c MaxConnectionsPerChild 12 -D FOREGROUND’

So I need to get a certificte for the specific server name, or more simply, go for a wildcard. I’ll do that later when I’m on site again.

Many thanks for your help (and patience!)

Jim

Jimbo · February 6, 2020, 7:13am

Now it gets curious…I did nothing, but on attempting to access the server today, both 980 and 9090 worked correctly. I’ll dig around, but the Certificate update fixed it, just seemed to take its time to do so…

Jim

davidep · February 6, 2020, 8:32am

Just out of curiosity: what is your web browser?

Jimbo · February 6, 2020, 8:44am

I am a bit promiscuous where browsers are concerned, I use Firefox by preference (showed the original problem, no way round it) Chrome as second choice (showed the original problem, gave a way of bypassing it) Internet Explorer (Not tried) Edge (option of desperation, not tried)

davidep · February 6, 2020, 8:47am

I use FF too: it does not like when a web site changes its certificate. Maybe after a few hours it noticed the change and worked correctly

Jimbo · February 6, 2020, 8:50am

Well, you gotta hope
Whatever, I greatly appreciate your time and persistence: Time was when I was hot on Linux CLI stuff, but not so much today and you gave me the help I needed. I’ll mark the thread accordingly!
Thanks
Jim