Banner Flood in NetH8 - "Done"

yummiweb · March 25, 2025, 11:55pm

Since the last core update, the web config has been barely usable.

Banners pop up every second with the message “Done” (but no further details).

Observed on various NetH8 systems with at least:
core1 3.6.1
ldapproxy1 1.1.0
loki1 1.2.2
metrics1 0.1.6
samba1 2.4.1
traefik1 3.0.1

After a page refresh in the browser, there’s silence for a few seconds, then it resumes, less quickly but steadily.

My Browser: Waterfox 6.5.4

Regards, Yummiweb

mrmarkuz · March 26, 2025, 8:19am

It seems an app is restarting/reconfiguring all the time.

Could you please check the logs for errors or repeating entries?

I assume you already tried another browser to exclude browser issues…

yummiweb · March 27, 2025, 12:31am

As far as I can tell from the rhythm of the logs, this has something to do with “traefik1” and “mail1.” Accordingly, my mail proxy can no longer deliver emails to Dovecot. My email program can no longer establish an SMTP connection to “mail1.”

Below are two significant, recurring log sections.

By the way, the IP address 192.168.118.70 is my Mac (with the web config in the browser). Testing other browsers is difficult because they always display the “Connection unstable” message.

traefik1

2025-03-27T01:09:01+01:00 [1:traefik1:agent@traefik1] task/module/traefik1/098b12c5-a7fc-44c9-b16c-b4e4e6b309a5: get-certificate/20readconfig is starting
2025-03-27T01:09:01+01:00 [1:traefik1:traefik] 192.168.118.170 - - [27/Mar/2025:00:09:01 +0000] “GET /cluster-admin/api/module/traefik1/task/098b12c5-a7fc-44c9-b16c-b4e4e6b309a5/context HTTP/2.0” 200 237 “-” “-” 61724 “cluster-admin-https@file” “http://127.0.0.1:9311” 20ms
2025-03-27T01:09:01+01:00 [1:traefik1:traefik] 192.168.118.170 - - [27/Mar/2025:00:09:01 +0000] “GET /cluster-admin/api/module/traefik1/task/098b12c5-a7fc-44c9-b16c-b4e4e6b309a5/context HTTP/2.0” 200 237 “-” “-” 61725 “cluster-admin-https@file” “http://127.0.0.1:9311” 30ms
2025-03-27T01:09:01+01:00 [1:traefik1:agent@traefik1] task/module/traefik1/098b12c5-a7fc-44c9-b16c-b4e4e6b309a5: action “get-certificate” status is “completed” (0) at step validate-output.json
2025-03-27T01:09:02+01:00 [1:traefik1:traefik] 192.168.118.170 - - [27/Mar/2025:00:09:02 +0000] “GET /cluster-admin/api/module/traefik1/task/098b12c5-a7fc-44c9-b16c-b4e4e6b309a5/context HTTP/2.0” 200 237 “-” “-” 61726 “cluster-admin-https@file” “http://127.0.0.1:9311” 16ms
2025-03-27T01:09:02+01:00 [1:traefik1:traefik] 192.168.118.170 - - [27/Mar/2025:00:09:02 +0000] “GET /cluster-admin/api/module/traefik1/task/098b12c5-a7fc-44c9-b16c-b4e4e6b309a5/context HTTP/2.0” 200 237 “-” “-” 61727 “cluster-admin-https@file” “http://127.0.0.1:9311” 38ms
2025-03-27T01:09:02+01:00 [1:traefik1:traefik] 192.168.118.170 - - [27/Mar/2025:00:09:02 +0000] “GET /cluster-admin/api/module/traefik1/task/098b12c5-a7fc-44c9-b16c-b4e4e6b309a5/status HTTP/2.0” 200 6933 “-” “-” 61728 “cluster-admin-https@file” “http://127.0.0.1:9311” 16ms
2025-03-27T01:09:09+01:00 [1:traefik1:agent@traefik1] task/module/traefik1/d790fa5b-600a-4a1a-b686-2a6f15288f12: get-certificate/20readconfig is starting
2025-03-27T01:09:09+01:00 [1:traefik1:agent@traefik1] task/module/traefik1/d790fa5b-600a-4a1a-b686-2a6f15288f12: action “get-certificate” status is “completed” (0) at step validate-output.json
2025-03-27T01:09:11+01:00 [1:traefik1:traefik] 192.168.118.170 - - [27/Mar/2025:00:09:11 +0000] “GET /cluster-admin/api/module/traefik1/task/d790fa5b-600a-4a1a-b686-2a6f15288f12/context HTTP/2.0” 200 237 “-” “-” 61729 “cluster-admin-https@file” “http://127.0.0.1:9311” 21ms
2025-03-27T01:09:11+01:00 [1:traefik1:traefik] 192.168.118.170 - - [27/Mar/2025:00:09:11 +0000] “GET /cluster-admin/api/module/traefik1/task/d790fa5b-600a-4a1a-b686-2a6f15288f12/context HTTP/2.0” 200 237 “-” “-” 61731 “cluster-admin-https@file” “http://127.0.0.1:9311” 27ms
2025-03-27T01:09:11+01:00 [1:traefik1:traefik] 192.168.118.170 - - [27/Mar/2025:00:09:11 +0000] “GET /cluster-admin/api/module/traefik1/task/d790fa5b-600a-4a1a-b686-2a6f15288f12/context HTTP/2.0” 200 237 “-” “-” 61732 “cluster-admin-https@file” “http://127.0.0.1:9311” 36ms
2025-03-27T01:09:11+01:00 [1:traefik1:traefik] 192.168.118.170 - - [27/Mar/2025:00:09:11 +0000] “GET /cluster-admin/api/module/traefik1/task/d790fa5b-600a-4a1a-b686-2a6f15288f12/context HTTP/2.0” 200 237 “-” “-” 61730 “cluster-admin-https@file” “http://127.0.0.1:9311” 47ms
2025-03-27T01:09:12+01:00 [1:traefik1:traefik] 192.168.118.170 - - [27/Mar/2025:00:09:12 +0000] “GET /cluster-admin/api/module/traefik1/task/d790fa5b-600a-4a1a-b686-2a6f15288f12/status HTTP/2.0” 200 6934 “-” “-” 61733 “cluster-admin-https@file” “http://127.0.0.1:9311” 21ms
2025-03-27T01:09:17+01:00 [1:traefik1:agent@traefik1] task/module/traefik1/a2b694e1-e07e-40c3-a78f-a08835bbd5d8: get-certificate/20readconfig is starting

mail1

2025-03-27T01:23:49+01:00 [1:mail1:postfix] deb833547fee9765d82dd712d5ec6a035483d023988623cddd6a03b132db2f1f
2025-03-27T01:23:49+01:00 [1:mail1:systemd] postfix.service: Consumed 2.240s CPU time.
2025-03-27T01:23:49+01:00 [1:mail1:systemd] postfix.service: Scheduled restart job, restart counter is at 13168.
2025-03-27T01:23:49+01:00 [1:mail1:systemd] Stopped postfix.service - Postfix MTA/MSA server.
2025-03-27T01:23:49+01:00 [1:mail1:systemd] postfix.service: Consumed 2.240s CPU time.
2025-03-27T01:23:49+01:00 [1:mail1:systemd] Starting get-certificate.service - Get TLS certificate from Traefik…
2025-03-27T01:23:51+01:00 [1:mail1:get-certificate] Certificate for mysrv.mydomain.tld is unchanged.
2025-03-27T01:23:51+01:00 [1:mail1:systemd] Finished get-certificate.service - Get TLS certificate from Traefik.
2025-03-27T01:23:51+01:00 [1:mail1:systemd] Starting postfix.service - Postfix MTA/MSA server…
2025-03-27T01:23:52+01:00 [1:mail1:postfix] systemctl --user --quiet is-enabled clamav.service
2025-03-27T01:23:53+01:00 [1:mail1:podman] 2025-03-27 01:23:53.015978317 +0100 CET m=+0.043328821 image pull Package mail-postfix · GitHub
2025-03-27T01:23:53+01:00 [1:mail1:podman]
2025-03-27T01:23:53+01:00 [1:mail1:podman] 2025-03-27 01:23:53.171569653 +0100 CET m=+0.198920194 container create ce32a036bdda900e5a2e3027228081033277db92cff2520bf045e60e869226fe (image=ghcr.io/nethserver/mail-postfix:1.6.0, name=postfix, io.buildah.version=1.33.7, PODMAN_SYSTEMD_UNIT=postfix.service)
2025-03-27T01:23:53+01:00 [1:mail1:podman] 2025-03-27 01:23:53.301344332 +0100 CET m=+0.328694868 container init ce32a036bdda900e5a2e3027228081033277db92cff2520bf045e60e869226fe (image=ghcr.io/nethserver/mail-postfix:1.6.0, name=postfix, io.buildah.version=1.33.7, PODMAN_SYSTEMD_UNIT=postfix.service)
2025-03-27T01:23:53+01:00 [1:mail1:podman] 2025-03-27 01:23:53.324028981 +0100 CET m=+0.351379519 container start ce32a036bdda900e5a2e3027228081033277db92cff2520bf045e60e869226fe (image=ghcr.io/nethserver/mail-postfix:1.6.0, name=postfix, io.buildah.version=1.33.7, PODMAN_SYSTEMD_UNIT=postfix.service)
2025-03-27T01:23:55+01:00 [1:mail1:postfix/postfix-script] the Postfix mail system is not running
2025-03-27T01:23:56+01:00 [1:mail1:postfix/postfix-script] starting the Postfix mail system
2025-03-27T01:23:56+01:00 [1:mail1:postfix] postfix/postlog: starting the Postfix mail system
2025-03-27T01:23:56+01:00 [1:mail1:postfix/master] fatal: bind 0.0.0.0 port 25: Address in use
2025-03-27T01:23:57+01:00 [1:mail1:podman] 2025-03-27 01:23:57.550371387 +0100 CET m=+0.152226873 container remove ce32a036bdda900e5a2e3027228081033277db92cff2520bf045e60e869226fe (image=ghcr.io/nethserver/mail-postfix:1.6.0, name=postfix, PODMAN_SYSTEMD_UNIT=postfix.service, io.buildah.version=1.33.7)
2025-03-27T01:23:57+01:00 [1:mail1:postfix] ce32a036bdda900e5a2e3027228081033277db92cff2520bf045e60e869226fe
2025-03-27T01:23:57+01:00 [1:mail1:systemd] postfix.service: Consumed 1.933s CPU time.
2025-03-27T01:23:57+01:00 [1:mail1:systemd] postfix.service: Scheduled restart job, restart counter is at 13169.
2025-03-27T01:23:57+01:00 [1:mail1:systemd] Stopped postfix.service - Postfix MTA/MSA server.
2025-03-27T01:23:57+01:00 [1:mail1:systemd] postfix.service: Consumed 1.933s CPU time.
2025-03-27T01:23:57+01:00 [1:mail1:systemd] Starting get-certificate.service - Get TLS certificate from Traefik…
2025-03-27T01:23:58+01:00 [1:mail1:get-certificate] Certificate for mysrv.mydomain.tld is unchanged.
2025-03-27T01:23:59+01:00 [1:mail1:systemd] Finished get-certificate.service - Get TLS certificate from Traefik.
2025-03-27T01:23:59+01:00 [1:mail1:systemd] Starting postfix.service - Postfix MTA/MSA server…
2025-03-27T01:24:00+01:00 [1:mail1:podman]
2025-03-27T01:24:01+01:00 [1:mail1:systemd] Started libpod-bc93d8401940578266334063a652e01116445d250d48223b10449b836272f5b3.scope - libcrun container.
2025-03-27T01:24:01+01:00 [1:mail1:podman] 2025-03-27 01:24:01.118615142 +0100 CET m=+0.345231916 container init bc93d8401940578266334063a652e01116445d250d48223b10449b836272f5b3 (image=ghcr.io/nethserver/mail-postfix:1.6.0, name=postfix, PODMAN_SYSTEMD_UNIT=postfix.service, io.buildah.version=1.33.7)
2025-03-27T01:24:01+01:00 [1:mail1:podman] 2025-03-27 01:24:01.143741757 +0100 CET m=+0.370358559 container start bc93d8401940578266334063a652e01116445d250d48223b10449b836272f5b3 (image=ghcr.io/nethserver/mail-postfix:1.6.0, name=postfix, io.buildah.version=1.33.7, PODMAN_SYSTEMD_UNIT=postfix.service)

“mysrv.mydomain.tld” is, of course, a placeholder for my externally accessible domain…

Addendum:
The mail1 container is version 1.6.0

Addendum 2:
BOTH problem servers are running the latest CORE versions of NETH8.
BOTH were restarted about 30 hours ago (for different reasons).
Unfortunately, I can’t say at this point whether the problem occurred with the core update or only after the system reboot (all Debian 12). I believe the core update was before that.

Addendum 3:
I’ve now tried stopping the containers (to restart them), but it doesn’t work for individual containers:

SERVICE=mail1
for userhome in /home/$SERVICE ; do moduleid=$(basename $userhome); echo ${moduleid}; echo systemctl stop user@$(id -u $moduleid); echo; done

And it doesn’t work for all containers either:

for userhome in /home/*[0-9]; do moduleid=$(basename $userhome); echo ${moduleid}; echo systemctl stop user@$(id -u $moduleid); echo; done

Nothing is stopped, nothing is restarted.

Do I need to use different commands in the meantime?

I’d like to keep the state for further analysis, but I also need to access emails again. So, unfortunately, I have to restart now.

davidep · March 27, 2025, 9:27am

Postfix has entered a crash-loop because port 25 is already used by another process (exim?).

Edit: I agree that the “banner flood” should not happen. It is probably an undesired behavior introduced by the latest update. However the issue here seems the Posfix startup failure, so I changed the topic category to Support.

yummiweb · March 27, 2025, 10:07am

There’s nothing unusual on the NETH8 host other than “mailutils” for scripts to send certain results. As far as I know, that’s just a toolset.

In fact, two Exim packages seem to be running (which were already there):

sudo lsof -i :25

exim4 44658 Debian-exim 4u IPv4 381425 0t0 TCP localhost:smtp (LISTEN)
exim4 44658 Debian-exim 5u IPv6 381426 0t0 TCP localhost:smtp (LISTEN)

They also run on the working machines, but under a different user context. Of course, they come from “mail1”:

master 21457 mail1 12u IPv4 408976 0t0 TCP *:smtp (LISTEN)
master 21457 mail1 13u IPv6 408977 0t0 TCP *:smtp (LISTEN)

For comparison, here’s another machine on which the Nethserver7 migration is still pending:

exim4 44658 Debian-exim 4u IPv4 381425 0t0 TCP localhost:smtp (LISTEN)
exim4 44658 Debian-exim 5u IPv6 381426 0t0 TCP localhost:smtp (LISTEN)

I have neither installed nor activated Exim myself. My installation history is:

history | grep install

apt-get install sudo
sudo apt-get install mc
sudo apt-get install curl
curl https://raw.githubusercontent.com/NethServer/ns8-core/ns8-stable/core/install.sh | bash
apt-get install rsync
apt-get install mailutils

“mailutils” was also installed on the working machines. However, these have not been rebooted since then, nor have they received any updates that would trigger a “mail1” restart. I will, of course, avoid that from now on.

But if I haven’t activated exim as a service through any action of my own, and it’s still active now, how is that?

yummiweb · March 27, 2025, 10:19am

Exim appears to be activated at system startup:

systemctl is-enabled exim4

exim4.service is not a native service, redirecting to systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install is-enabled exim4
enabled

However, this also applies before the “mail1” installation or migration from NETHserver7:

exim4.service is not a native service, redirecting to systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install is-enabled exim4
enabled

And also with the functioning NETH8 (older version):

exim4.service is not a native service, redirecting to systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install is-enabled exim4
enabled

Exim seems to be part of the “standard repertoire,” and I would have expected that the NETH8 routines disabled it during the mail container installation. Did it become re-enabled by a host system update? Or was there a routine that aborted the “native” Exim when “mail1” started and now no longer works?

Addendum: Or couldn’t the latter be implemented accordingly?

yummiweb · March 27, 2025, 10:33am

After terminating the exim process (supposedly user exim4), the user “Master” was able to start his exim process for “mail1.”

I’m rebooting now. I expect the problem will repeat itself.

Result: the same situation. The “native” exim4 process is running.

Can you please reproduce this under Debian 12?

For now, I’ll use one:

kill $(sudo lsof -i :25 | grep “exim4” | head -n 1 | awk ‘{print $2}’)

so that I can possibly set up an autostart.

But it would be nice if someone could replicate this to find out if it’s a result of the latest core updates or one of the latest Debian updates.

And one more thing: The banners keep popping up in the web GUI. It seems like some of it was buffering. Only after a page reload did things clear up.

davidep · March 27, 2025, 10:52am

Exim4 can be installed by Debian installer. I found a couple of threads about it:

yummiweb · March 28, 2025, 7:37pm

When I install Debian for server purposes, I deliberately only install “minimal” and “ssh,” and afterward, no EXIM is installed. I just tested this on various (existing) Debian 12 installations (for server services), and at least there, Exim isn’t installed. However, I’ve now also checked it again for a Debian 12 / NETH8 installation, for which I have a snapshot directly BEFORE the NETH8 installation. Port 25 isn’t used there either, and there’s no Exim there either.

This changes, however, with the installation of “mailutils.” What (in my understanding) was supposed to be just a toolset actually installs Exim and a bunch of other stuff. What a bummer.

Admittedly, I wasn’t paying attention. My reasoning or belief that the host system wouldn’t affect the containers got in the way. But of course, this only applies to the packages; ports used by container services must not be used.

Why this ultimately happened:
The “mailutils” installation procedure suggested and installed Exim because it didn’t see any other available MTAs. The installer couldn’t detect that a container-based MTA was already available. And the installer apparently didn’t perform an alternative check to see if, for example, port 25 was already in use. I would have expected that, though, and normally, a corresponding warning is displayed at the latest when a service is set up. There wasn’t one here. But of course, this isn’t the fault of NETH8, but of the “mailutils” installation procedure.

The only question one might ask is whether there is a standard way to inform the host system which ports are already reserved or in use, so that subsequent installations can then deny use of these ports.

Thank you for your support!

yummiweb · March 28, 2025, 7:43pm

I would just like to briefly reiterate the following question that arose during troubleshooting:

I quote myself:

I’ve now tried stopping the containers (to restart them), but it doesn’t work for individual containers:
SERVICE=mail1
for userhome in /home/$SERVICE ; do moduleid=$(basename $userhome); echo ${moduleid}; echo systemctl stop user@$(id -u $moduleid); echo; done
And it doesn’t work for all containers either:
for userhome in /home/*[0-9]; do moduleid=$(basename $userhome); echo ${moduleid}; echo systemctl stop user@$(id -u $moduleid); echo; done
Nothing is stopped, nothing is restarted.

Do I need to use different commands in the meantime?

Regards Yummiweb

davidep · March 31, 2025, 7:49am

Yes this has been discussed also in the past, and the decision was that the installer checks only for ports used by the core apps, TCP 80 and 443.

The command above stops the whole app session, agent included and this is not a good thing because it becomes unresponsive. For this reason the app session must not be stopped, only a restart is safe. As a side note, those commands assume the home dir is under /home, and that’s true in most cases.

If you want to stop just a container you’ve to impersonate the app user, and the best way is to use runagent (which is based on runuser):

Get a list of running agents (applications) on the local node:

runagent -l

Execute a command (the module must be on the local node):

runagent -m $moduleid systemctl try-restart --user unitname.service

yummiweb · March 31, 2025, 8:44am

Thank you for the clarification. I had overlooked the logical consequences of “Stop.” One more question, though, regarding the “runagent” method:

runagent -m $moduleid systemctl try-restart --user unitname.service

Your example shows the variable “$moduleid” and also “unitname,” which is probably supposed to be a placeholder.

In my understanding, the same values should be used for “$moduleid” and “unitname,” e.g., “mail1” or “samba1”? However, in the example, the names are explicitly different, so my assumption may be incorrect.

Could you please clarify that? Thank you!

This is how it continued for me:

On the NETH8 host, I uninstalled “maulutils,” including the (recommended) installed packages. In my case, there were several packages:

sudo apt purge
libmailutils9 libltdl7 libidn12 gsasl-common libntlm0 mysql-common
libfribidi0 mailutils exim4-config exim4-base libgsasl18
libpython3.11 mailutils-common libevent-2.1-7 libunbound8 mariadb-common
libgssglue1 guile-3.0-libs libmariadb3 exim4-daemon-light libgc1 libpq5
libgnutls-dane0 libncurses6

“psmisc” was left out because it came with my NETH8 installation (it seems), so it has to stay.

How I now send emails via the NETH8 host independently of the NETH8 containers:

For this, I now use “nullmailer.” This performs the role of “sendmail” or “postfix” (for sending) but without binding to port 25. “nullmailer” requires minimal configuration during installation. It can send either directly, via a proxy, or via a full-fledged MX including registration.

Regards, Yummiweb

davidep · March 31, 2025, 8:55am

An example of runagent invocation that stops Postfix of mail1 instance:

 runagent -m mail1 systemctl --user stop postfix.service

Nullmailer seems a good choice, it is a SMTP client (not a server holding port 25). If you’re seeking a more NS8-integrated tool, we recently added an helper command that inherits the Email notification settings. You can send messages with

runagent ns8-sendmail -h

That helper was developed to implement password expiration notifications.