NS7 to NS8 migration - sync data fails

I already have, but can try it again. After rebooting the NS8 node, what should I do next? Click the “sync data” button for one of those apps? Does it matter which one?

Try Nextcloud first. I think the reboot can help for it.

OK, rebooted the NS8 system, then did “Sync data” for Nextcloud. ns8-migration.log shows only:

----------- sync nethserver-nextcloud Sat, 23 Mar 2024 07:54:28 -0400

The UI in NS7 shows:
image

The result of “Copy command” is:

[root@neth ~]#  echo '{"app":"nethserver-nextcloud","action":"sync"}' | /usr/bin/setsid /usr/bin/sudo /usr/libexec/nethserver/api/nethserver-ns8-migration/migration/update | jq
{
  "progress": "0.00",
  "time": "0.0",
  "exit": 0,
  "event": "migration-sync",
  "state": "running",
  "step": 0,
  "pid": 0,
  "action": ""
}
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
{
  "pid": 0,
  "status": "failed",
  "event": "migration-sync"
}
{
  "id": "1711195041",
  "type": "ApiFailed",
  "message": "sync nethserver-nextcloud failed"
}

The UI in NS8 shows:

The result of “Copy task trace,” run through a JSON decoder, is:

{

    "context":{
        "action":"import-module",
        "data":{
            "credentials":[
                "nextcloud1",
                "171331998a0a3ec-bf0f-4f21-81ec-0d3bdd9f7a7b"
            ],
            "port":20020,
            "volumes":[
                "nextcloud-app-data"
            ]
        },
        "extra":{
            "description":"ns8-action endpoint http://10.5.4.1:9311",
            "isNotificationHidden":false,
            "title":"module/nextcloud1/import-module"
        },
        "id":"11ff68bd-6a58-4eae-9084-81a530d40ba4",
        "parent":"",
        "queue":"module/nextcloud1/tasks",
        "timestamp":"2024-03-23T11:54:28.831251357Z",
        "user":"admin"
    },
    "status":"aborted",
    "progress":50,
    "subTasks":[
    ],
    "validated":true,
    "result":{
        "error":"<7>podman-pull-missing ghcr.io/nethserver/rsync:2.5.5\n<7>podman run --rm --privileged --network=host --workdir=/srv --env=RSYNCD_NETWORK=10.5.4.0/24 --env=RSYNCD_ADDRESS=cluster-localnode --env=RSYNCD_PORT=20020 --env=RSYNCD_USER=nextcloud1 --env=RSYNCD_PASSWORD=(redacted) --env=RSYNCD_SYSLOG_TAG=nextcloud1 --volume=/dev/log:/dev/log --name=rsync-nextcloud1 --volume=/home/nextcloud1/.config/state:/srv/state --volume=nextcloud-app-data:/srv/volumes/nextcloud-app-data --volume=restic-cache:/srv/volumes/restic-cache ghcr.io/nethserver/rsync:2.5.5\nError: creating container storage: the container name \"rsync-nextcloud1\" is already in use by 9eee0061bee0a5b4aba7ff563581e9a7bba76d3b0bcd425fa4591e19d20e8c99. You have to remove that container to be able to reuse that name: that name is already in use\nTraceback (most recent call last):\n File \"/usr/local/agent/actions/import-module/10recvstate\", line 49, in <module>\n agent.run_helper(*podman_cmd, core_env['RSYNC_IMAGE']).check_returncode()\n File \"/usr/lib/python3.11/subprocess.py\", line 502, in check_returncode\n raise CalledProcessError(self.returncode, self.args, self.stdout,\nsubprocess.CalledProcessError: Command '('podman', 'run', '--rm', '--privileged', '--network=host', '--workdir=/srv', '--env=RSYNCD_NETWORK=10.5.4.0/24', '--env=RSYNCD_ADDRESS=cluster-localnode', '--env=RSYNCD_PORT=20020', '--env=RSYNCD_USER=nextcloud1', '--env=RSYNCD_PASSWORD=(redacted)', '--env=RSYNCD_SYSLOG_TAG=nextcloud1', '--volume=/dev/log:/dev/log', '--name=rsync-nextcloud1', '--volume=/home/nextcloud1/.config/state:/srv/state', '--volume=nextcloud-app-data:/srv/volumes/nextcloud-app-data', '--volume=restic-cache:/srv/volumes/restic-cache', 'ghcr.io/nethserver/rsync:2.5.5')' returned non-zero exit status 125.\n",
        "exit_code":1,
        "file":"task/module/nextcloud1/11ff68bd-6a58-4eae-9084-81a530d40ba4",
        "output":""
    }

}

The reboot didn’t solve the dead container name conflict. Get the container id with

runagent -m nextcloud1 podman ps -a

Then try to remove it, add -f if required

runagent -m nextcloud1 podman rm THEID

Retry sync

1 Like
root@ns8:~# runagent -m nextcloud1 podman ps -a
CONTAINER ID  IMAGE                                 COMMAND               CREATED      STATUS      PORTS       NAMES
9eee0061bee0  ghcr.io/nethserver/rsync:2.5.0-dev.3  rsync --daemon --...  7 weeks ago  Created                 rsync-nextcloud1
root@ns8:~# runagent -m nextcloud1 podman rm 9eee0061bee0
9eee0061bee0
root@ns8:~# runagent -m nextcloud1 podman ps -a
CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES
root@ns8:~#

Then retried the sync. It’s definitely doing something different–there’s still nothing in ns8-migration.log other than the one-line entry that it’s started, but it’s been running for 10+ minutes now without giving an error on either end–the NS8 end reports 50% complete, while the NS7 end reports 0%. Need to leave now; I’ll check back on it in a bit.

1 Like

I wonder if this can be an issue with the migration script :thinking:

I have since upgraded rsync, figuring that working migration to NS8 is more important than hotsync. It didn’t make a difference.

It’s now 11 hours since my last post. The NS8 machine is still reporting:
image

ns8-migration.log has nothing new in it either. No errors, at least; I guess I’ll let it keep running.

OK, nearly 24 hours later. Neither UI says anything–no errors, no mention that anything’s running, no mention that anything completed, nothing. na8-migration.log still has only:

----------- sync nethserver-nextcloud Mon, 25 Mar 2024 06:06:31 -0400

But looks like the rsync image is still running on the ns8 box:

root@ns8:~# runagent -m nextcloud1 podman ps -a
CONTAINER ID  IMAGE                           COMMAND               CREATED       STATUS           PORTS       NAMES
c351ea34feff  ghcr.io/nethserver/rsync:2.5.5  rsync --daemon --...  23 hours ago  Up 23 hours ago              rsync-nextcloud1
root@ns8:~#

Can you see any hanged process on the ns7 side?

I don’t know if it’s “hanged”, but there is an rsync process there:

[root@neth ~]# ps aux | grep rsync
root     13328 22.6  0.1 335880 58960 ?        S    17:06   1:17 /usr/bin/rsync -z -r -a -H -A --delete --files-from=/tmp/tmp.IWQvbnpF1a --exclude-from=/tmp/tmp.2qgVTpIKCk / rsync://hotsyncuser@127.0.0.1/hotsync/
root     14194  0.0  0.0 112816   976 pts/0    S+   17:11   0:00 grep --color=auto rsync

Can you share the contents of those tmp files?

But ain’t those for hotsync?

I wonder if Hotsync blocks or kills the migration sync because from the log I understand the process didn’t finish.

Good question, and the /hotsync in the path would suggest so.

[root@neth ~]# cat /tmp/tmp.HmqhQkJzZH
/var/lib/dokuwiki
/usr/share/dokuwiki
/etc/dokuwiki
/root
/var/lib/nethserver
/var/lib/collectd
/var/lib/rspamd
/var/lib/redis/rspamd
/var/lib/sogo/backups
/var/www/html
/usr/share/nextcloud/config/config.php
/var/lib/nethserver/backup/mysql/
/var/lib/mysql/mysql/
/var/lib/nethserver/secrets/mysql
/usr/share/nextcloud/apps
/etc/cron.hourly
/etc/cron.daily
/etc/cron.weekly
/etc/cron.monthly
/var/lib/nethserver/backup/backup-config.tar.xz
/var/lib/nethserver/backup/backup-config.tar.xz-content.md5
/var/lib/nethserver/backup/backup-config.tar.xz.md5
/etc/yum/vars
/etc/yum.repos.d
/etc/yum.conf

[root@neth ~]# cat /tmp/tmp.XC2vFuTPLI
/var/lib/nethserver/backup/restic/
/var/lib/nethserver/backup/duplicity/
/var/lib/nethserver/db
/root/.ssh
/var/log/lastlog
/var/lib/nethserver/secrets
/var/lib/rspamd/2tld.inc.local
/var/lib/rspamd/dkim_whitelist.inc.local
/var/lib/rspamd/dmarc_whitelist.inc.local
/var/lib/rspamd/mime_types.inc.local
/var/lib/rspamd/rspamd_dynamic
/var/lib/rspamd/spf_dkim_whitelist.inc.local
/var/lib/rspamd/spf_whitelist.inc.local
/var/lib/rspamd/surbl-whitelist.inc.local/var/lib/nethserver/certs
/var/lib/nethserver/openvpn-tunnels
/var/lib/nethserver/backup/
/var/lib/nethserver/db/
/var/log/
[root@neth ~]#

I’d try to exclude the migration tool directory from backup (and hotsync)

Add the line below to the files of exclusion rules:

  • /etc/backup-data.d/custom.exclude
  • /etc/backup-config.d/custom.exclude

Here’s the line to add

/var/lib/nethserver/nethserver-ns8-migration/

Hotsync should be restarted, I don’t know how to do it.

OK, added that line to those two files, and for the time being just disabled hotsync on the NS7 system. Now what–just retry the sync? Should I kill off the rsync process on the NS8 system that’s been running for four days now first?

No, I think it is just waiting for incoming data. I bet the error occurs on the client side.

OK, left the NS8 end alone, and ran Sync Data again on the NS7 end. And it reports no errors–cool. There’s still absolutely nothing in the log:

----------- sync nethserver-nextcloud Fri, 29 Mar 2024 07:23:13 -0400

And nothing at all is reported in the NS8 UI about this. So I’m not sure if anything really happened or not, but the lack of an error is new. Let’s see what happens when I try Mail:

.d..t...... ./
<f.st...... dump.rdb
<f..T...... default.private
<f..T...... default.txt
removed ‘roundcubemail.sql’
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]

…so that’s still failing. As before, the NS8 UI reports the action is 16% complete, and it will likely stay there forever:
image

The result of “copy command” on the NS7 system doesn’t seem useful:

[root@neth ~]#  echo '{"app":"nethserver-mail","action":"sync"}' | /usr/bin/setsid /usr/bin/sudo /usr/libexec/nethserver/api/nethserver-ns8-migration/migration/update | jq
{
  "progress": "0.00",
  "time": "0.0",
  "exit": 0,
  "event": "migration-sync",
  "state": "running",
  "step": 0,
  "pid": 0,
  "action": ""
}
{
  "pid": 0,
  "status": "failed",
  "event": "migration-sync"
}
{
  "id": "1711711976",
  "type": "ApiFailed",
  "message": "sync nethserver-mail failed"
}

There is now a rsync container running under mail1 on the NS8 box:

root@ns8:~# runagent -m mail1 podman ps -a
CONTAINER ID  IMAGE                           COMMAND               CREATED             STATUS                 PORTS       NAMES
6eafb8b32658  ghcr.io/nethserver/rsync:2.6.0  rsync --daemon --...  About a minute ago  Up About a minute ago              rsync-mail1

Don’t worry about that.

The error is clear, but the context is not!

Referring to the migration tool dev’s doc, let’s try to understand what’s going bad. Try to manually run the sync of Mail, which runs also mail-related apps.

cd /usr/share/nethesis/nethserver-ns8-migration/apps/nethserver-mail
bash -x ./migrate
[root@neth nethserver-mail]# bash -x ./migrate
+ set -e
+ source /etc/nethserver/agent.env
++ PATH=/usr/local/agent/pyenv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/agent/bin
++ AGENT_INSTALL_DIR=/usr/share/nethesis/nethserver-ns8-migration
++ AGENT_STATE_DIR=/var/lib/nethserver/nethserver-ns8-migration
+ source /var/lib/nethserver/nethserver-ns8-migration/agent.env
++ REDIS_ADDRESS=10.5.4.1:6379
++ AGENT_ID=node/2
++ REDIS_USER=node/2
++ REDIS_PASSWORD=(redacted)
+ source /var/lib/nethserver/nethserver-ns8-migration/environment
++ NODE_ID=2
++ USER_DOMAIN=directory.nh
+ cd /var/lib/nethserver/nethserver-ns8-migration/nethserver-mail
+ source bind.env
++ RSYNC_ENDPOINT=rsync://mail1@10.5.4.1:20022
++ RSYNC_PASSWORD=(redacted)
++ MODULE_INSTANCE_ID=mail1
++ MODULE_NODE_ID=1
++ IMPORT_TASK_ID=module/mail1/task/b55e4de1-67bb-47c8-a1f0-3ae468d4139b
+ hostname -d
+ echo directory.nh
+ /sbin/e-smith/db domains printjson
+ /sbin/e-smith/db accounts printjson
+ /sbin/e-smith/config printjson postfix
+ /sbin/e-smith/config printjson dovecot
+ /sbin/e-smith/config printjson rspamd
+ /sbin/e-smith/config printjson clamd
+ /usr/libexec/nethserver/list-groups
+ jq keys
+ /usr/libexec/nethserver/list-users
+ jq keys
+ : rsync://mail1@10.5.4.1:20022
+ export RSYNC_PASSWORD
+ : mail1
+ export MAIL_INSTANCE_ID=mail1
+ MAIL_INSTANCE_ID=mail1
+ rsync -i --remove-source-files user_domain.txt mail_domain.txt users.json groups.json domains.json accounts.json postfix.json dovecot.json rspamd.json clamd.json rsync://mail1@10.5.4.1:20022/data/state/
<f..T...... accounts.json
<f..T...... clamd.json
<f..T...... domains.json
<f..T...... dovecot.json
<f..T...... groups.json
<f..T...... mail_domain.txt
<f..T...... postfix.json
<f..T...... rspamd.json
<f..T...... user_domain.txt
<f..T...... users.json
+ [[ '' == \f\i\n\i\s\h ]]
+ rsync -i --archive --usermap=1-1000:100 --groupmap=1-1000:101 --exclude lucene-indexes/ --delete /var/lib/nethserver/vmail/ rsync://mail1@10.5.4.1:20022/data/volumes/dovecot-data/
.d..t...... admin@familybrown.org/Maildir/.INBOX.Fail2ban/
(snip)
+ [[ -f /var/lib/redis/rspamd/dump.rdb ]]
+ rsync -i --archive --usermap=1-1000:100 --groupmap=1-1000:101 --delete --exclude '*' /var/lib/redis/rspamd/ rsync://mail1@10.5.4.1:20022/data/volumes/rspamd-redis/
.d..t...... ./
+ rsync -i --archive --usermap=1-1000:100 --groupmap=1-1000:101 /var/lib/redis/rspamd/dump.rdb rsync://mail1@10.5.4.1:20022/data/volumes/rspamd-redis/persistent.rdb
<f.st...... dump.rdb
+ rsync -i --recursive --perms /etc/opendkim/keys/default.txt /etc/opendkim/keys/default.private rsync://mail1@10.5.4.1:20022/data/state/dkim.migration/
<f..T...... default.private
<f..T...... default.txt
+ [[ '' != \f\i\n\i\s\h ]]
+ migrate_deps
+ [[ -f /var/lib/nethserver/nethserver-ns8-migration/nethserver-webtop5/bind.env ]]
+ [[ -f /var/lib/nethserver/nethserver-ns8-migration/nethserver-roundcubemail/bind.env ]]
+ command /usr/share/nethesis/nethserver-ns8-migration/apps/nethserver-roundcubemail/migrate
+ /usr/share/nethesis/nethserver-ns8-migration/apps/nethserver-roundcubemail/migrate
removed ‘roundcubemail.sql’
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]

I’ve snipped lines dealing with individual mail accounts and the Redis and rsync passwords, but otherwise left it unchanged.