Can't delete expired certificate

Turbond · June 15, 2025, 12:10am

Under TLS Certificates I can’t delete an expired certificate, I just get a time out error. The domain no longer exists, and has been removed from the mail domain list, but this seems to be related to the renewal issues I’ve been getting, as far as ns8 is concerned it’s still needs renewing after I click delete and this of course can’t happen, causes an error and stays installed.

How do I force remove the certificates and manually use letsencrypt to recreate the correct ones. This is a production server and it’s now causing me a headache on a Sunday morning/afternoon.

BTW it shows a lovely green circle and tick saying it’s obtained, when in fact it’s not and is expired. Please point me to the physical location of this in ns8 and I’ll delete manually if need be, and upload a temporary certificate until this can be fixed.

Please note:

Last login: Sun Jun 15 12:24:00 2025 from 192.168.3.8 [root@kea ~]# api-cli run module/traefik1/delete-certificate --data '{"fqdn":"mail.deleted_domain.co.nz","type":"internal"}' Warning: using user "cluster" credentials from the environment <3>Timeout after about 30 seconds. Certificate not obtained for ['mail.current_doamin.info', 'kea.current_domain.info', 'mail.other_current_domain.co.nz']. <3> false

This is the issue which stops the certificate being deleted and then stops all other certificates renewing. This is a bug indeed

mrmarkuz · June 15, 2025, 8:43am

Regarding the timeout please also check if the needed port is opened and loki is running:

Turbond · June 15, 2025, 6:54pm

As below. So it’s running but getting an error.


2025-06-15T11:14:33+12:00 [1:loki1:agent@loki1] task/module/loki1/3fcebe2b-991b-49bb-b75a-8aabfd96431b: action "get-configuration" status is "completed" (0) at step validate-output.json
2025-06-15T12:33:28+12:00 [1:loki1:systemd] Starting Mark boot as successful...
2025-06-15T12:33:28+12:00 [1:loki1:systemd] Finished Mark boot as successful.
2025-06-15T12:36:28+12:00 [1:loki1:systemd] Created slice User Background Tasks Slice.
2025-06-15T12:36:28+12:00 [1:loki1:systemd] Starting Cleanup of User's Temporary Files and Directories...
2025-06-15T12:36:28+12:00 [1:loki1:systemd] Finished Cleanup of User's Temporary Files and Directories.
2025-06-15T12:45:01+12:00 [1:loki1:loki-server] level=error ts=2025-06-15T00:45:01.791034172Z caller=tail.go:230 component=querier org_id=fake traceID=1d56a6fca4f96567 msg="Error receiving response from grpc tail client" err="rpc error: code = Canceled desc = context canceled"
2025-06-16T05:58:39+12:00 [1:loki1:agent@loki1] task/module/loki1/03da7b65-59bb-4160-a19a-1f55c7053670: get-facts/50facts is starting
2025-06-16T05:58:39+12:00 [1:loki1:agent@loki1] task/module/loki1/03da7b65-59bb-4160-a19a-1f55c7053670: action "get-facts" status is "completed" (0) at step 50facts
2025-06-16T06:34:10+12:00 [1:loki1:loki-server] level=error ts=2025-06-15T18:34:10.026211597Z caller=tail.go:230 component=querier org_id=fake traceID=7e9e7124981dda0d msg="Error receiving response from grpc tail client" err="rpc error: code = Canceled desc = context canceled"
2025-06-16T06:48:19+12:00 [1:loki1:loki-server] level=error ts=2025-06-15T18:48:19.719309552Z caller=tail.go:230 component=querier org_id=fake traceID=149dc87e654965a9 msg="Error receiving response from grpc tail client" err="rpc error: code = Canceled desc = context canceled"
2025-06-16T06:48:52+12:00 [1:loki1:agent@loki1] task/module/loki1/268f9393-0080-4dcb-ab4f-e12555a9290e: get-configuration/10get is starting
2025-06-16T06:48:53+12:00 [1:loki1:agent@loki1] task/module/loki1/268f9393-0080-4dcb-ab4f-e12555a9290e: action "get-configuration" status is "completed" (0) at step validate-output.json

Turbond · June 16, 2025, 3:50am

After a bit more digging around I discovered ALL certificates in TLS can’t be deleted, or renewed, even if I changed the ACME server to staging. However if I change my mail server FQDN, this creates a new certificate (without error) that isn’t listed but works. Now I have to tell everyone the new mail server but at least I can use the email FQDN to access the cluster-admin without security exceptions as well as have my mail working. I’m more than happy to provide logs and debug this as required as I’d like to find what has gone wrong?

Is there a way to manually delete the TLS Certificate store and recreate as this looks like where the error is occurring?

mrmarkuz · June 16, 2025, 7:32am

Maybe it helps to reset the _default_cert.yml to use a self signed certificate? See also ns8-traefik/imageroot/actions/create-module/50create at 3249b89fb3812658d4cabc41e5c167976a5ea5dd · NethServer/ns8-traefik · GitHub
Please backup the system before or at least backup the original _default_cert.yml file, just in case something goes wrong.

Enter traefik environment:

runagent -m traefik1

Reset _default_cert.yml:

cat <<EOF > configs/_default_cert.yml
tls:
  stores:
    default:
      defaultCertificate:
        certFile: /etc/traefik/selfsigned.crt
        keyFile: /etc/traefik/selfsigned.key
EOF