Migration to NS8 fails (Testenvironment)

Hello,
we run nethserver mostly as email-server with only one user using nextcloud.

I made a testenvironment with a copy of the NS7 VM and a freshly installed NS8 on Debian 12. Everything running on Hyper-V 2019 with Snapshots of the starting points to be able to restart.
I copied the ns7 VM into the LAN and changed the IP-address to 192.168.11.151 and the DNS-Server to one extra installed for this migration.

Connecting the NS7 server to the NS8 cluster was successful,
when I start Email Migration it copies about 7 GB and then fails.

NS7 can ping NS8 by its name (ns8.hinz.de) and its original IP Address 192.168.11.152
NS7 can ping the NS8 cluster by IP 10.5.4.1 (no DNS-Entry for this IP set)
NS8 can ping NS7 by name (cloud.hinz.de) and IP 192.168.11.151

roundcube Version is nethserver-roundcubemail-1.5.2-1.ns7.noarch
There is SOGO (nethserver-sogo-1.7.7-1.ns7.noarch) in the list of installed rpms but we do not use it and it does not appear in software center.
ns8-migration.log seems to point at sogo just before the error.

At NS8 Debian there is no firewall installed.

What can I do?

I did not yet find out how to upload a journalctl dump from the ns8 machine.

Failing command:

echo '{"app":"nethserver-mail","action":"start","migrationConfig":{"emailNode":1,"roundcubeNode":1}}' | /usr/bin/setsid /usr/bin/sudo /usr/libexec/nethserver/api/nethserver-ns8-migration/migration/update | jq

On NS7 installed Modules:

nethserver-subscription-ui-3.6.10-1.ns7.noarch
nethserver-sogo-1.7.7-1.ns7.noarch
nethserver-lsm-1.2.4-1.ns7.noarch
nethserver-unbound-1.1.1-1.ns7.noarch
nethserver-mail-server-2.32.2-1.ns7.noarch
nethserver-base-3.9.1-1.ns7.noarch
nethserver-backup-config-2.5.3-1.ns7.noarch
nethserver-firewall-base-3.19.3-1.ns7.noarch
nethserver-httpd-admin-service-2.7.1-1.ns7.noarch
nethserver-collectd-3.1.1-1.ns7.noarch
nethserver-cockpit-lib-1.10.12-1.ns7.noarch
nethserver-mail-imapsync-2.32.2-1.ns7.noarch
nethserver-smartd-1.1.0-1.ns7.noarch
nethserver-subscription-inventory-3.6.10-1.ns7.x86_64
nethserver-memcached-1.2.0-1.ns7.noarch
nethserver-diagtools-1.0.4-1.ns7.noarch
nethserver-lang-en-1.4.6-27.ns7.noarch
nethserver-netdata-2.0.4-1.ns7.noarch
nethserver-fail2ban-1.7.3-1.ns7.noarch
nethserver-dnsmasq-1.7.2-1.ns7.noarch
nethserver-mysql-1.1.5-1.ns7.noarch
nethserver-sssd-1.7.1-1.ns7.noarch
nethserver-mail-smarthost-2.32.2-1.ns7.noarch
nethserver-duc-1.7.0-1.ns7.noarch
nethserver-openssh-1.8.0-1.ns7.noarch
nethserver-mail-filter-2.32.2-1.ns7.noarch
nethserver-antivirus-1.6.1-1.ns7.noarch
nethserver-lib-2.2.11-1.ns7.noarch
nethserver-ntp-1.1.3-1.ns7.noarch
nethserver-nextcloud-1.22.4-1.ns7.noarch
nethserver-phonehome-1.4.0-1.ns7.noarch
nethserver-ns8-migration-1.0.13-1.ns7.x86_64
nethserver-httpd-3.12.3-1.ns7.noarch
nethserver-lang-cockpit-1.4.6-27.ns7.noarch
nethserver-remi-php80-php-fpm-1.0.0-1.ns7.noarch
nethserver-directory-3.4.3-1.ns7.noarch
nethserver-yum-1.4.1-1.ns7.noarch
nethserver-roundcubemail-1.5.2-1.ns7.noarch
nethserver-mail-common-2.32.2-1.ns7.noarch
nethserver-nethforge-release-7-3.ns7.noarch
nethserver-php-1.3.0-1.ns7.noarch
nethserver-release-7-19.ns7.noarch
nethserver-cockpit-1.10.12-1.ns7.noarch
nethserver-httpd-admin-2.7.1-1.ns7.noarch
nethserver-mail-ipaccess-2.32.2-1.ns7.noarch
nethserver-backup-data-1.7.6-1.ns7.noarch
nethserver-subscription-3.6.10-1.ns7.noarch
nethserver-restore-data-2.0.7-1.ns7.noarch
nethserver-rh-mariadb105-1.0.0-1.ns7.noarch
nethserver-hosts-1.2.2-1.ns7.noarch

rpm -qa | grep sogo

sogo-ealarms-notify-4.0.2-1.ns7.x86_64
nethserver-sogo-1.7.7-1.ns7.noarch
sogo-activesync-4.0.2-1.ns7.x86_64
sogo-4.0.2-1.ns7.x86_64
sogo-tool-4.0.2-1.ns7.x86_64

ns8-migration.log:

=========== Join cluster Wed, 03 Jul 2024 09:45:48 +0200
Joined to cluster leader ns8.hinz.de
----------- start nethserver-mail Wed, 03 Jul 2024 09:47:11 +0200
mkdir: created directory ‘/var/lib/nethserver/nethserver-ns8-migration/nethserver-mail’
mkdir: created directory ‘/var/lib/nethserver/nethserver-ns8-migration/nethserver-roundcubemail’
mkdir: created directory ‘/var/lib/nethserver/nethserver-ns8-migration/nethserver-sogo’
/usr/sbin/ns8-bind-app: line 43: printf: null: invalid number
Validation errors: [node: Must be greater than or equal to 1]
[INFO] Created remote module instance mail1
[INFO] App nethserver-mail is bound to rsync://mail1@10.5.4.1:20012, waiting for task module/mail1/task/7f3c3810-7c3a-4ce7-ba06-233ba43ed99d
Assertion failed
  File "/var/lib/nethserver/cluster/actions/import-module/50import", line 50, in <module>
    agent.assert_exp(add_module_result['exit_code'] == 0) # add-module is successful

edit1: rpm sogo output added

Regards
Uwe

I try to reproduce but I have won’t have this mail volume of data :confused:

I migrated a NS7 with 400MB of data without issue

[root@NS2 ~]# rpm -qa | grep nethserver
nethserver-cockpit-1.10.12-1.ns7.noarch
nethserver-firewall-base-3.19.3-1.ns7.noarch
nethserver-hosts-1.2.2-1.ns7.noarch
nethserver-lsm-1.2.4-1.ns7.noarch
nethserver-directory-3.4.3-1.ns7.noarch
nethserver-antivirus-1.6.1-1.ns7.noarch
nethserver-duc-1.7.0-1.ns7.noarch
nethserver-postgresql-1.1.0-1.ns7.noarch
nethserver-phonehome-1.4.0-1.ns7.noarch
nethserver-base-3.9.1-1.ns7.noarch
nethserver-lib-2.2.11-1.ns7.noarch
nethserver-ntp-1.1.3-1.ns7.noarch
nethserver-nethforge-release-7-3.ns7.noarch
nethserver-conference-0.1.0-1.ns7.noarch
nethserver-rh-php73-php-fpm-1.0.0-1.ns7.noarch
nethserver-release-7-19.ns7.noarch
nethserver-memcached-1.2.0-1.ns7.noarch
nethserver-roundcubemail-1.5.2-1.ns7.noarch
nethserver-subscription-ui-3.6.10-1.ns7.noarch
nethserver-yum-1.4.1-1.ns7.noarch
nethserver-php-1.3.0-1.ns7.noarch
nethserver-dnsmasq-1.7.2-1.ns7.noarch
nethserver-smartd-1.1.0-1.ns7.noarch
nethserver-openssh-1.8.0-1.ns7.noarch
nethserver-httpd-admin-service-2.7.1-1.ns7.noarch
nethserver-unbound-1.1.1-1.ns7.noarch
nethserver-stephdl-1.1.9-1.ns7.sdl.noarch
nethserver-lang-cockpit-1.4.6-27.ns7.noarch
nethserver-subscription-inventory-3.6.10-1.ns7.x86_64
nethserver-backup-config-2.5.3-1.ns7.noarch
nethserver-sssd-1.7.1-1.ns7.noarch
nethserver-httpd-3.12.3-1.ns7.noarch
nethserver-mail-common-2.32.2-1.ns7.noarch
nethserver-mail-filter-2.32.2-1.ns7.noarch
nethserver-ns8-migration-1.0.13-1.ns7.x86_64
nethserver-mysql-1.1.5-1.ns7.noarch
nethserver-sogo-1.8.6-1.ns7.noarch
nethserver-collectd-3.1.1-1.ns7.noarch
nethserver-diagtools-1.0.4-1.ns7.noarch
nethserver-subscription-3.6.10-1.ns7.noarch
nethserver-netdata-2.0.4-1.ns7.noarch
nethserver-mail-smarthost-2.32.2-1.ns7.noarch
nethserver-backup-data-1.7.6-1.ns7.noarch
nethserver-mail-server-2.32.2-1.ns7.noarch
nethserver-cockpit-lib-1.10.12-1.ns7.noarch

hypervisor proxmox
target destination rockylinux

so the connection went good, you started to migrate the mail stack and you got an issue, however we do not have the NS8 log that came from journalctl

so retry and when you have a failure, you can do

journalctl > dump

it will create a file with the error of the migration

suspected issue…hypervisor, lost of network connectivity, disk full …

well I do not know but maybe something on the NS8 side and we do not have the clue, the log on NS7 stated we got something wrong, obviously we are aware of course.

In the meanwhile I try to reproduce on debian

How can I send you the journalctl dump?
Reduced to the last boot, it has got 2.700 lines, a little too much to post here?

zip and website to share files across the web

paste in a gist

take care that if you have rebooted the vm, the dump is from the last boot so everything is lost needs to be recover, journalctl should allow to recover from N-x boot

Here you are:
https://www.hinz.de/extern/journalctl-dump.zip
It’s the whole journal, debian does not cut it at reboot.
Uwe

sorry, think I misunderstood
last migration starts after line: “-- Boot c4fb1f91eafa48169fa17fc18c5f8040 --”

Jul 03 09:48:58 ns8 agent@roundcubemail1[1798]: Error: copying system image from manifest list: reading blob sha256:992508d3075b04ec31e5110eb6e611fa5461d107a204d636f423e2f211b02b3e: Get "https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/99/992508d3075b04ec31e5110eb6e611fa5461d107a204d636f423e2f211b02b3e/data?verify=1719995889-9rv0Dzd%2FP1GPQ%2FEBcx0beM148Q8%3D": dial tcp: lookup production.cloudflare.docker.com on 192.168.11.150:53: write udp 192.168.11.152:60659->192.168.11.150:53: write: operation not permitted
Jul 03 09:48:58 ns8 agent@roundcubemail1[1798]: Traceback (most recent call last):
Jul 03 09:48:58 ns8 agent@roundcubemail1[1798]:   File "/usr/local/agent/bin/podman-pull-missing", line 35, in <module>
Jul 03 09:48:58 ns8 agent@roundcubemail1[1798]:     subprocess.run(['podman', 'pull', image_url]).check_returncode()
Jul 03 09:48:58 ns8 agent@roundcubemail1[1798]:   File "/usr/lib/python3.11/subprocess.py", line 502, in check_returncode
Jul 03 09:48:58 ns8 agent@roundcubemail1[1798]:     raise CalledProcessError(self.returncode, self.args, self.stdout,
Jul 03 09:48:58 ns8 agent@roundcubemail1[1798]: subprocess.CalledProcessError: Command '['podman', 'pull', 'docker.io/roundcube/roundcubemail:1.6.6-apache']' returned non-zero exit status 125.
Jul 03 09:48:58 ns8 agent@roundcubemail1[1798]: Traceback (most recent call last):
Jul 03 09:48:58 ns8 agent@roundcubemail1[1798]:   File "/usr/local/agent/actions/create-module/05pullimages", line 45, in <module>
Jul 03 09:48:58 ns8 agent@roundcubemail1[1798]:     agent.run_helper('podman-pull-missing', *images).check_returncode()
Jul 03 09:48:58 ns8 agent@roundcubemail1[1798]:   File "/usr/lib/python3.11/subprocess.py", line 502, in check_returncode
Jul 03 09:48:58 ns8 agent@roundcubemail1[1798]:     raise CalledProcessError(self.returncode, self.args, self.stdout,
Jul 03 09:48:58 ns8 agent@roundcubemail1[1798]: subprocess.CalledProcessError: Command '('podman-pull-missing', 'docker.io/mariadb:10.11.5', 'docker.io/roundcube/roundcubemail:1.6.6-apache')' returned non-zero exit status 1.
Jul 03 09:48:58 ns8 agent@roundcubemail1[1798]: task/module/roundcubemail1/627eeba2-17c7-4361-8fb0-30a98d6e95b7: action "create-module" status is "aborted" (1) at step 05pullimages
Jul 03 09:48:58 ns8 agent@cluster[455]: Assertion failed
Jul 03 09:48:58 ns8 agent@cluster[455]:   File "/var/lib/nethserver/cluster/actions/add-module/50update", line 223, in <module>
Jul 03 09:48:58 ns8 agent@cluster[455]:     agent.assert_exp(create_module_result['exit_code'] == 0) # Ensure create-module is successful
Jul 03 09:48:58 ns8 redis[761]: 1:M 03 Jul 2024 07:48:58.874 * 1 changes in 5 seconds. Saving...
Jul 03 09:48:58 ns8 redis[761]: 1:M 03 Jul 2024 07:48:58.874 * Background saving started by pid 36
Jul 03 09:48:58 ns8 traefik[1042]: 192.168.11.11 - - [03/Jul/2024:07:48:58 +0000] "GET /cluster-admin/api/module/roundcubemail1/task/627eeba2-17c7-4361-8fb0-30a98d6e95b7/context HTTP/2.0" 200 240 "-" "-" 342 "ApiServer-https@file" "http://127.0.0.1:9311" 29ms
Jul 03 09:48:59 ns8 agent@cluster[455]: task/cluster/aec31b5b-d70b-4689-82a0-1311e70ee936: action "add-module" status is "aborted" (2) at step 50update
Jul 03 09:48:59 ns8 agent@cluster[455]: Assertion failed
Jul 03 09:48:59 ns8 agent@cluster[455]:   File "/var/lib/nethserver/cluster/actions/import-module/50import", line 50, in <module>
Jul 03 09:48:59 ns8 agent@cluster[455]:     agent.assert_exp(add_module_result['exit_code'] == 0) # add-module is successful
Jul 03 09:48:59 ns8 agent@cluster[455]: task/cluster/448ac26e-9eee-4235-84e3-6b4f6198d394: action "import-module" status is "aborted" (2) at step 50import
Jul 03 09:48:58 ns8 redis[761]: 36:C 03 Jul 2024 07:48:58.907 * DB saved on disk
Jul 03 09:48:58 ns8 redis[761]: 36:C 03 Jul 2024 07:48:58.908 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
Jul 03 09:48:58 ns8 traefik[1042]: 192.168.11.11 - - [03/Jul/2024:07:48:58 +0000] "GET /cluster-admin/api/module/roundcubemail1/task/627eeba2-17c7-4361-8fb0-30a98d6e95b7/status HTTP/2.0" 200 2140 "-" "-" 343 "ApiServer-https@file" "http://127.0.0.1:9311" 8ms

docker repository sucks when you tried to import the image, so NS8 seems not concerned directly but maybe there is a room for enhancement

see Line 140554

but @davidep I am not sure to understand this line

Jul 03 09:48:58 ns8 agent@roundcubemail1[1798]: Error: copying system image from manifest list: reading blob sha256:992508d3075b04ec31e5110eb6e611fa5461d107a204d636f423e2f211b02b3e: Get "https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/99/992508d3075b04ec31e5110eb6e611fa5461d107a204d636f423e2f211b02b3e/data?verify=1719995889-9rv0Dzd%2FP1GPQ%2FEBcx0beM148Q8%3D": dial tcp: lookup production.cloudflare.docker.com on 192.168.11.150:53: write udp 192.168.11.152:60659->192.168.11.150:53: write: operation not permitted

192.168.11.150:53 is the dns-server in this testenvironment, why does anybody try to write at it?

Python errors are sometimes not easy to catch, for what I understand you failed to download this image from the docker repository and it is not the first time I saw a hick hup of the docker repo.

The concern is that we have a failure and we think the guilty guy is NS8 or the NS7 migration

thank a lot for your input

By the way I just succeed to migrate 5GB of mail from NS7 to NS8, so please retry

|0|/var/lib/nethserver/vmail/vmail@ns7-pve2.org/Maildir|
|---|---|
|0|/var/lib/nethserver/vmail/vmail@ns7-pve2.org|
|5.2G|/var/lib/nethserver/vmail/|

retried without resetting NS7 nor NS8 vm => same error, same command
This time no error related to pullimages

I will set up a new Rocky Linux VM as base for NS8 as many of you seem to use this and I’am no linux guy. (And I never understood this docker container thing so I hope nethserver will do it for me).
Then I will start anew.

Although many people promise to report back and never do, I hope I will :grin:

Uwe

1 Like

we needs logs to understand, maybe a new issue who knows

https://www.hinz.de/extern/NS8-Migration-logs.zip

I am willing to keep on testing the migration on debian if you want me to.
Should I?

I see no error so it works :stuck_out_tongue:

Screenshot 2024-07-04 123505

:cry: this happens after some seconds when I klick “sync data”
the differencing disk does not grow, so I am sure there is no data transfer
There are more than 100GB of email data.

ns8-migration.log shows:

=========== Join cluster Thu, 04 Jul 2024 11:33:25 +0200
Joined to cluster leader ns8.hinz.de
----------- start nethserver-mail Thu, 04 Jul 2024 11:34:01 +0200
mkdir: created directory ‘/var/lib/nethserver/nethserver-ns8-migration/nethserver-roundcubemail’
mkdir: created directory ‘/var/lib/nethserver/nethserver-ns8-migration/nethserver-mail’
mkdir: created directory ‘/var/lib/nethserver/nethserver-ns8-migration/nethserver-sogo’
/usr/sbin/ns8-bind-app: line 43: printf: null: invalid number
Validation errors: [node: Must be greater than or equal to 1]
[INFO] Created remote module instance mail1
[INFO] App nethserver-mail is bound to rsync://mail1@10.5.4.1:20012, waiting for task module/mail1/task/59249fdc-f426-464f-8217-18806eddf269
[INFO] Created remote module instance roundcubemail1
[INFO] App nethserver-roundcubemail is bound to rsync://roundcubemail1@10.5.4.1:20013, waiting for task module/roundcubemail1/task/d886e892-3c9c-40e9-a3b7-a39593a33a68
----------- sync nethserver-mail Thu, 04 Jul 2024 11:53:17 +0200
----------- sync nethserver-mail Thu, 04 Jul 2024 12:34:29 +0200

the NS8 is new ?

No, still the same as when we started here.
but before: yes, freshly installed debian + Updates
NS8 installed by curl https://raw.githubusercontent.com/NethServer/ns8-core/ns8-stable/core/install.sh | bash

I will reset both machines to the point before starting the migration and restart.

1 Like

to be clear but it is my point of view

the NS8 enterprise version is based on rockyLinux
I use redhat/centos Like since more than 17 years, I do like them
I use debian because proxmox and also for my laptop sometime ago, it is nice, no issue so far
I use fedora on my laptop, because brand new versions all the time

Well I would go to rocky linux but NS8 should work with debian, at least tests are done on it to build the cluster and install modules on it (centos stream, debian 12, rocky linux 9)

Test module · NethServer/ns8-mail@33b3e0b · GitHub