Migrating a mail server instance on a separate data volume : another journey šŸ˜‰

Me again :blush:

So I’m to migrate a quite big NS7 mail server to NS8. I’m trying a hybrid setup with the mail repository stored on a bind-mounted slow storage like I successfully did with nextcloud.

Mail looks like another story. I’ve multiple questions

  1. It looks like the script is launching instances of rsync multiple times in a hardcoded way, but I can’t find where. I need to add the infamous --security-opt label=disable parameter to avoid selinux errors :
# ns8-migration.log excerpt
<7>podman-pull-missing ghcr.io/nethserver/rsync:3.9.2
<7>podman run --rm --privileged --network=host --workdir=/srv --env=RSYNCD_NETWORK=10.5.4.0/24 --env=RSYNCD_ADDRESS=cluster-localnode --env=RSYNCD_PORT=20005 --env=RSYNCD_USER=mail4 --env=RSYNCD_PASSWORD=17055676222ac0-2e0f-4db4-9769-bde36994bfe2 --env=RSYNCD_SYSLOG_TAG=mail4 --volume=/dev/log:/dev/log --replace --name=rsync-mail4 --volume=/home/mail4/.config/state:/srv/state --volume=rspamd-redis:/srv/volumes/rspamd-redis --volume=dovecot-data:/srv/volumes/dovecot-data --volume=clamav-cus-cfg:/srv/volumes/clamav-cus-cfg --volume=clamav-db:/srv/volumes/clamav-db --volume=clamav-cus:/srv/volumes/clamav-cus --volume=rspamd-data:/srv/volumes/rspamd-data --volume=rspamd-override:/srv/volumes/rspamd-override --volume=dovecot-lmtp:/srv/volumes/dovecot-lmtp --volume=postfix-queue:/srv/volumes/postfix-queue --volume=postfix-custom:/srv/volumes/postfix-custom ghcr.io/nethserver/rsync:3.9.2
R
  1. it could be linked to the problem above but I’m not sure; there is a connection refused to the Dovecot API (http://127.0.0.1:9288/doveadm/v1). I suppose it is not running, any idea why ?

Below the entire log file, keep in mind that I restarted the process sometime :

<f..T...... default.txt
dr-xr-xr-x              6 2025/02/14 00:04:11 .
<7>podman-pull-missing ghcr.io/nethserver/rsync:3.9.2
<7>podman run --rm --privileged --network=host --workdir=/srv --env=RSYNCD_NETWORK=10.5.4.0/24 --env=RSYNCD_ADDRESS=cluster-localnode --env=RSYNCD_PORT=20005 --env=RSYNCD_USER=mail4 --env=RSYNCD_PASSWORD=17055676222ac0-2e0f-4db4-9769-bde36994bfe2 --env=RSYNCD_SYSLOG_TAG=mail4 --volume=/dev/log:/dev/log --replace --name=rsync-mail4 --volume=/home/mail4/.config/state:/srv/state --volume=rspamd-redis:/srv/volumes/rspamd-redis --volume=dovecot-data:/srv/volumes/dovecot-data --volume=clamav-cus-cfg:/srv/volumes/clamav-cus-cfg --volume=clamav-db:/srv/volumes/clamav-db --volume=clamav-cus:/srv/volumes/clamav-cus --volume=rspamd-data:/srv/volumes/rspamd-data --volume=rspamd-override:/srv/volumes/rspamd-override --volume=dovecot-lmtp:/srv/volumes/dovecot-lmtp --volume=postfix-queue:/srv/volumes/postfix-queue --volume=postfix-custom:/srv/volumes/postfix-custom ghcr.io/nethserver/rsync:3.9.2
Renamed 'users@fqdn' -> 'users'
...
Bogus entry ignored: `vmail@lebrass.be`
<6>Importing Always BCC Address...
<6>Importing smarthosts...
<6>Importing sender validation...
<6>Importing network table...
<7>BEGIN 
<7>DELETE FROM mynetworks
<7>COMMIT
renamed 'dkim.migration/default.private' -> 'dkim.migration/default.key'
./
./default.txt
./default.key
changed ownership of '/var/lib/rspamd/dkim/default.key' to 101:102
changed ownership of '/var/lib/rspamd/dkim/default.txt' to 101:102
removed 'dkim.migration/default.txt'
removed 'dkim.migration/default.key'
removed directory 'dkim.migration'
Traceback (most recent call last):
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/urllib3/util/connection.py", line 95, in create_connection
    raise err
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/urllib3/connectionpool.py", line 398, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/urllib3/connection.py", line 239, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/lib64/python3.11/http/client.py", line 1303, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1349, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1298, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1058, in _send_output
    self.send(msg)
  File "/usr/lib64/python3.11/http/client.py", line 996, in send
    self.connect()
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/urllib3/connection.py", line 205, in connect
    conn = self._new_conn()
           ^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/urllib3/connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fc388980150>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=9288): Max retries exceeded with url: /doveadm/v1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc388980150>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mail4/.config/actions/import-module/50import_domains", line 30, in <module>
    mail.doveadm_query("mailboxCreate", {"mailbox": ["postmaster"], "user": "vmail"})
  File "/home/mail4/.config/pypkg/mail.py", line 275, in doveadm_query
    oresp = requests.post(f"http://127.0.0.1:{dport}/doveadm/v1", json=req, headers={"Authorization": "X-Dovecot-API " + atok}).json()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/requests/adapters.py", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=9288): Max retries exceeded with url: /doveadm/v1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc388980150>: Failed to establish a new connection: [Errno 111] Connection refused'))
""----------- start nethserver-mail Wed, 23 Jul 2025 15:29:24 +0200
mkdir: création du répertoire « /var/lib/nethserver/nethserver-ns8-migration/nethserver-sogo »
Assertion failed
  File "/var/lib/nethserver/cluster/actions/import-module/50import", line 50, in <module>
    agent.assert_exp(add_module_result['exit_code'] == 0) # add-module is successful
[INFO] Created remote module instance mail1
[INFO] App nethserver-mail is bound to rsync://mail1@10.5.4.1:20004, waiting for task module/mail1/task/3b4e4436-f9de-465b-aaaf-4524db27112b
----------- start nethserver-mail Wed, 23 Jul 2025 15:30:06 +0200
[INFO] App nethserver-mail is bound to rsync://mail1@10.5.4.1:20004, waiting for task module/mail1/task/3b4e4436-f9de-465b-aaaf-4524db27112b
[INFO] Created remote module instance sogo1
[INFO] App nethserver-sogo is bound to rsync://sogo1@10.5.4.1:20006, waiting for task module/sogo1/task/db52c40f-7238-4bf2-8a0d-c31be1753309
----------- sync nethserver-mail Wed, 23 Jul 2025 15:40:50 +0200
<f+++++++++ accounts.json
<f+++++++++ clamd.json
<f+++++++++ domains.json
<f+++++++++ dovecot.json
<f+++++++++ groups.json

EDIT : Tried this Cluster and sd error mail and dovecot updating - #8 by davidep without success.

Thanks !

Here is the migrate script.

I think it needs to be done in all systemd files as all containers use volumes but I’m not sure.

[mail1@ns8rockytest state]$ podman ps --format "{{.Names}}\t{{.Mounts}}"
clamav	[/var/lib/clamav /var/lib/clamav-unofficial-sigs /etc/clamav-unofficial-sigs]
rspamd	[/dev/log /var/lib/rspamd /etc/rspamd/override.d /var/lib/redis]
postfix	[/srv /etc/ssl/postfix /dev/log /var/lib/umail /var/spool/postfix /etc/postfix/main.cf.d]
dovecot	[/dev/log /etc/ssl/dovecot /var/lib/umail /var/lib/vmail /etc/dovecot/local.conf.d /var/lib/dovecot/dict]

Let’s check if it’s running: (no result means it’s ok)

runagent -m mail1 curl http://127.0.0.1:9288/doveadm/v1

Plase also check which containers are started:

runagent -m mail1 podman ps -a

Thanks. Found it before, but the thing is, rsync parameters don’t appear in this code, nor the calls to podman. I’m probably missing something.

It doesn’t. Actually the dovecot container doesn’t start, here is what I found in the log :


Jul 24 06:43:44 cloud8 dovecot[2500]: systemctl --user --quiet is-enabled clamav.service

Jul 24 06:43:44 cloud8 dovecot[2502]: Error: requires at least 1 arg(s), only received 0

Jul 24 06:43:44 cloud8 systemd[1549]: **dovecot****.service: Control process exited, code=exited, status=125/n/a**

Jul 24 06:43:44 cloud8 systemd[1549]: **dovecot****.service: Failed with result 'exit-code'.**

I believe I’ll try a normal migration first before bind-mounting.

Working. I really need to understand how this migrate script is working to be able to adapt it to my use case. There are parts of the logs I can’t find corresponding lines in the code.

Or finding a way to disable those selinux issues.

Looking at the systemd files only dovecot is using the volume dovecot-data.

Ok so part of the migration process is running on the NS8 side :sweat_smile: I guess part of the answer is here : ns8-mail/imageroot/actions/import-module/35migrate_maildirs at bc0d115073e8fc32ec3a53899aa892709ffa597b Ā· NethServer/ns8-mail Ā· GitHub

Did you already edit the dovecot service file? It seems there’s an error.

Please share the systemd service file content:

systemctl --user cat dovecot --no-pager

Yes. I double checked it and could not find anything wrong. I restored a snapshot meanwhile and can’t check it anymore.

Still I’d like to edit the source code that generates this file and other rsync calls in order to add the selinux thing before trying again.

The mail app is just installed with its service files so there’s no special source code that creates the file, see ns8-mail/imageroot/systemd/user/dovecot.service at main Ā· NethServer/ns8-mail Ā· GitHub
Maybe I’m missing something but why not just adding the selinux option to the containers as done with Nextcloud?

I did. It didn’t work :grimacing:

Ok for the service file, but what about the calls to rsync (which also needs the selinux parameter) ?

Sorry, I still don’t get it. Doesn’t it work like with Nextcloud? The rsync containers should be in the mail app environment. Stop, edit and run again.

Yes, because there was an error in the dovecot service file:

Ok… Will try again and report back.

1 Like

I decided to go another way by simply joining a worker node running on slow storage. The least I can say is that it is not straightforward at all even in this supported setup.

One of the problems is that the scripts aren’t logging much (especially not the commands it launches), logs are scattered between target and source machines, and the process is definitively not defensive enough nor robust. It fails, often, for unclear reasons even when studying the logs and the script thoroughly. Worse, when it fails you’d better start over since it rarely survives an interruption.

I’d love to report my findings but at this stage I can only say that it failed for unknown reasons. More to come.

1 Like

I’m happy to report that actually ā€œall went wellā€ at the end.

Actually the migration process had finished (didn’t now) but hung for whatever reason and I had to restart it (500Go transferred).

IMHO all this migration procedure should be put outside the GUI and launched from the CLI in order to get a better control and view of what’s happening. The logs have to be improved with a clear view of the different phases and progress reporting.

I’m a GUI addict but in this case the added value is zero and makes everything unclear and fault prone.

Thanks everybody and @mrmarkus specially :blush:

1 Like