I already have, but can try it again. After rebooting the NS8 node, what should I do next? Click the “sync data” button for one of those apps? Does it matter which one?
Try Nextcloud first. I think the reboot can help for it.
OK, rebooted the NS8 system, then did “Sync data” for Nextcloud. ns8-migration.log
shows only:
----------- sync nethserver-nextcloud Sat, 23 Mar 2024 07:54:28 -0400
The UI in NS7 shows:
The result of “Copy command” is:
[root@neth ~]# echo '{"app":"nethserver-nextcloud","action":"sync"}' | /usr/bin/setsid /usr/bin/sudo /usr/libexec/nethserver/api/nethserver-ns8-migration/migration/update | jq
{
"progress": "0.00",
"time": "0.0",
"exit": 0,
"event": "migration-sync",
"state": "running",
"step": 0,
"pid": 0,
"action": ""
}
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
{
"pid": 0,
"status": "failed",
"event": "migration-sync"
}
{
"id": "1711195041",
"type": "ApiFailed",
"message": "sync nethserver-nextcloud failed"
}
The UI in NS8 shows:
The result of “Copy task trace,” run through a JSON decoder, is:
{
"context":{
"action":"import-module",
"data":{
"credentials":[
"nextcloud1",
"171331998a0a3ec-bf0f-4f21-81ec-0d3bdd9f7a7b"
],
"port":20020,
"volumes":[
"nextcloud-app-data"
]
},
"extra":{
"description":"ns8-action endpoint http://10.5.4.1:9311",
"isNotificationHidden":false,
"title":"module/nextcloud1/import-module"
},
"id":"11ff68bd-6a58-4eae-9084-81a530d40ba4",
"parent":"",
"queue":"module/nextcloud1/tasks",
"timestamp":"2024-03-23T11:54:28.831251357Z",
"user":"admin"
},
"status":"aborted",
"progress":50,
"subTasks":[
],
"validated":true,
"result":{
"error":"<7>podman-pull-missing ghcr.io/nethserver/rsync:2.5.5\n<7>podman run --rm --privileged --network=host --workdir=/srv --env=RSYNCD_NETWORK=10.5.4.0/24 --env=RSYNCD_ADDRESS=cluster-localnode --env=RSYNCD_PORT=20020 --env=RSYNCD_USER=nextcloud1 --env=RSYNCD_PASSWORD=(redacted) --env=RSYNCD_SYSLOG_TAG=nextcloud1 --volume=/dev/log:/dev/log --name=rsync-nextcloud1 --volume=/home/nextcloud1/.config/state:/srv/state --volume=nextcloud-app-data:/srv/volumes/nextcloud-app-data --volume=restic-cache:/srv/volumes/restic-cache ghcr.io/nethserver/rsync:2.5.5\nError: creating container storage: the container name \"rsync-nextcloud1\" is already in use by 9eee0061bee0a5b4aba7ff563581e9a7bba76d3b0bcd425fa4591e19d20e8c99. You have to remove that container to be able to reuse that name: that name is already in use\nTraceback (most recent call last):\n File \"/usr/local/agent/actions/import-module/10recvstate\", line 49, in <module>\n agent.run_helper(*podman_cmd, core_env['RSYNC_IMAGE']).check_returncode()\n File \"/usr/lib/python3.11/subprocess.py\", line 502, in check_returncode\n raise CalledProcessError(self.returncode, self.args, self.stdout,\nsubprocess.CalledProcessError: Command '('podman', 'run', '--rm', '--privileged', '--network=host', '--workdir=/srv', '--env=RSYNCD_NETWORK=10.5.4.0/24', '--env=RSYNCD_ADDRESS=cluster-localnode', '--env=RSYNCD_PORT=20020', '--env=RSYNCD_USER=nextcloud1', '--env=RSYNCD_PASSWORD=(redacted)', '--env=RSYNCD_SYSLOG_TAG=nextcloud1', '--volume=/dev/log:/dev/log', '--name=rsync-nextcloud1', '--volume=/home/nextcloud1/.config/state:/srv/state', '--volume=nextcloud-app-data:/srv/volumes/nextcloud-app-data', '--volume=restic-cache:/srv/volumes/restic-cache', 'ghcr.io/nethserver/rsync:2.5.5')' returned non-zero exit status 125.\n",
"exit_code":1,
"file":"task/module/nextcloud1/11ff68bd-6a58-4eae-9084-81a530d40ba4",
"output":""
}
}
The reboot didn’t solve the dead container name conflict. Get the container id with
runagent -m nextcloud1 podman ps -a
Then try to remove it, add -f if required
runagent -m nextcloud1 podman rm THEID
Retry sync
root@ns8:~# runagent -m nextcloud1 podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9eee0061bee0 ghcr.io/nethserver/rsync:2.5.0-dev.3 rsync --daemon --... 7 weeks ago Created rsync-nextcloud1
root@ns8:~# runagent -m nextcloud1 podman rm 9eee0061bee0
9eee0061bee0
root@ns8:~# runagent -m nextcloud1 podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
root@ns8:~#
Then retried the sync. It’s definitely doing something different–there’s still nothing in ns8-migration.log
other than the one-line entry that it’s started, but it’s been running for 10+ minutes now without giving an error on either end–the NS8 end reports 50% complete, while the NS7 end reports 0%. Need to leave now; I’ll check back on it in a bit.
I wonder if this can be an issue with the migration script
I have since upgraded rsync, figuring that working migration to NS8 is more important than hotsync. It didn’t make a difference.
It’s now 11 hours since my last post. The NS8 machine is still reporting:
ns8-migration.log
has nothing new in it either. No errors, at least; I guess I’ll let it keep running.
OK, nearly 24 hours later. Neither UI says anything–no errors, no mention that anything’s running, no mention that anything completed, nothing. na8-migration.log
still has only:
----------- sync nethserver-nextcloud Mon, 25 Mar 2024 06:06:31 -0400
But looks like the rsync image is still running on the ns8 box:
root@ns8:~# runagent -m nextcloud1 podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c351ea34feff ghcr.io/nethserver/rsync:2.5.5 rsync --daemon --... 23 hours ago Up 23 hours ago rsync-nextcloud1
root@ns8:~#
Can you see any hanged process on the ns7 side?
I don’t know if it’s “hanged”, but there is an rsync process there:
[root@neth ~]# ps aux | grep rsync
root 13328 22.6 0.1 335880 58960 ? S 17:06 1:17 /usr/bin/rsync -z -r -a -H -A --delete --files-from=/tmp/tmp.IWQvbnpF1a --exclude-from=/tmp/tmp.2qgVTpIKCk / rsync://hotsyncuser@127.0.0.1/hotsync/
root 14194 0.0 0.0 112816 976 pts/0 S+ 17:11 0:00 grep --color=auto rsync
Can you share the contents of those tmp files?
But ain’t those for hotsync?
I wonder if Hotsync blocks or kills the migration sync because from the log I understand the process didn’t finish.
Good question, and the /hotsync
in the path would suggest so.
[root@neth ~]# cat /tmp/tmp.HmqhQkJzZH
/var/lib/dokuwiki
/usr/share/dokuwiki
/etc/dokuwiki
/root
/var/lib/nethserver
/var/lib/collectd
/var/lib/rspamd
/var/lib/redis/rspamd
/var/lib/sogo/backups
/var/www/html
/usr/share/nextcloud/config/config.php
/var/lib/nethserver/backup/mysql/
/var/lib/mysql/mysql/
/var/lib/nethserver/secrets/mysql
/usr/share/nextcloud/apps
/etc/cron.hourly
/etc/cron.daily
/etc/cron.weekly
/etc/cron.monthly
/var/lib/nethserver/backup/backup-config.tar.xz
/var/lib/nethserver/backup/backup-config.tar.xz-content.md5
/var/lib/nethserver/backup/backup-config.tar.xz.md5
/etc/yum/vars
/etc/yum.repos.d
/etc/yum.conf
[root@neth ~]# cat /tmp/tmp.XC2vFuTPLI
/var/lib/nethserver/backup/restic/
/var/lib/nethserver/backup/duplicity/
/var/lib/nethserver/db
/root/.ssh
/var/log/lastlog
/var/lib/nethserver/secrets
/var/lib/rspamd/2tld.inc.local
/var/lib/rspamd/dkim_whitelist.inc.local
/var/lib/rspamd/dmarc_whitelist.inc.local
/var/lib/rspamd/mime_types.inc.local
/var/lib/rspamd/rspamd_dynamic
/var/lib/rspamd/spf_dkim_whitelist.inc.local
/var/lib/rspamd/spf_whitelist.inc.local
/var/lib/rspamd/surbl-whitelist.inc.local/var/lib/nethserver/certs
/var/lib/nethserver/openvpn-tunnels
/var/lib/nethserver/backup/
/var/lib/nethserver/db/
/var/log/
[root@neth ~]#
I’d try to exclude the migration tool directory from backup (and hotsync)
Add the line below to the files of exclusion rules:
- /etc/backup-data.d/custom.exclude
- /etc/backup-config.d/custom.exclude
Here’s the line to add
/var/lib/nethserver/nethserver-ns8-migration/
Hotsync should be restarted, I don’t know how to do it.
OK, added that line to those two files, and for the time being just disabled hotsync on the NS7 system. Now what–just retry the sync? Should I kill off the rsync process on the NS8 system that’s been running for four days now first?
No, I think it is just waiting for incoming data. I bet the error occurs on the client side.
OK, left the NS8 end alone, and ran Sync Data again on the NS7 end. And it reports no errors–cool. There’s still absolutely nothing in the log:
----------- sync nethserver-nextcloud Fri, 29 Mar 2024 07:23:13 -0400
And nothing at all is reported in the NS8 UI about this. So I’m not sure if anything really happened or not, but the lack of an error is new. Let’s see what happens when I try Mail:
.d..t...... ./
<f.st...... dump.rdb
<f..T...... default.private
<f..T...... default.txt
removed ‘roundcubemail.sql’
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
…so that’s still failing. As before, the NS8 UI reports the action is 16% complete, and it will likely stay there forever:
The result of “copy command” on the NS7 system doesn’t seem useful:
[root@neth ~]# echo '{"app":"nethserver-mail","action":"sync"}' | /usr/bin/setsid /usr/bin/sudo /usr/libexec/nethserver/api/nethserver-ns8-migration/migration/update | jq
{
"progress": "0.00",
"time": "0.0",
"exit": 0,
"event": "migration-sync",
"state": "running",
"step": 0,
"pid": 0,
"action": ""
}
{
"pid": 0,
"status": "failed",
"event": "migration-sync"
}
{
"id": "1711711976",
"type": "ApiFailed",
"message": "sync nethserver-mail failed"
}
There is now a rsync container running under mail1
on the NS8 box:
root@ns8:~# runagent -m mail1 podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6eafb8b32658 ghcr.io/nethserver/rsync:2.6.0 rsync --daemon --... About a minute ago Up About a minute ago rsync-mail1
Don’t worry about that.
The error is clear, but the context is not!
Referring to the migration tool dev’s doc, let’s try to understand what’s going bad. Try to manually run the sync of Mail, which runs also mail-related apps.
cd /usr/share/nethesis/nethserver-ns8-migration/apps/nethserver-mail
bash -x ./migrate
[root@neth nethserver-mail]# bash -x ./migrate
+ set -e
+ source /etc/nethserver/agent.env
++ PATH=/usr/local/agent/pyenv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/agent/bin
++ AGENT_INSTALL_DIR=/usr/share/nethesis/nethserver-ns8-migration
++ AGENT_STATE_DIR=/var/lib/nethserver/nethserver-ns8-migration
+ source /var/lib/nethserver/nethserver-ns8-migration/agent.env
++ REDIS_ADDRESS=10.5.4.1:6379
++ AGENT_ID=node/2
++ REDIS_USER=node/2
++ REDIS_PASSWORD=(redacted)
+ source /var/lib/nethserver/nethserver-ns8-migration/environment
++ NODE_ID=2
++ USER_DOMAIN=directory.nh
+ cd /var/lib/nethserver/nethserver-ns8-migration/nethserver-mail
+ source bind.env
++ RSYNC_ENDPOINT=rsync://mail1@10.5.4.1:20022
++ RSYNC_PASSWORD=(redacted)
++ MODULE_INSTANCE_ID=mail1
++ MODULE_NODE_ID=1
++ IMPORT_TASK_ID=module/mail1/task/b55e4de1-67bb-47c8-a1f0-3ae468d4139b
+ hostname -d
+ echo directory.nh
+ /sbin/e-smith/db domains printjson
+ /sbin/e-smith/db accounts printjson
+ /sbin/e-smith/config printjson postfix
+ /sbin/e-smith/config printjson dovecot
+ /sbin/e-smith/config printjson rspamd
+ /sbin/e-smith/config printjson clamd
+ /usr/libexec/nethserver/list-groups
+ jq keys
+ /usr/libexec/nethserver/list-users
+ jq keys
+ : rsync://mail1@10.5.4.1:20022
+ export RSYNC_PASSWORD
+ : mail1
+ export MAIL_INSTANCE_ID=mail1
+ MAIL_INSTANCE_ID=mail1
+ rsync -i --remove-source-files user_domain.txt mail_domain.txt users.json groups.json domains.json accounts.json postfix.json dovecot.json rspamd.json clamd.json rsync://mail1@10.5.4.1:20022/data/state/
<f..T...... accounts.json
<f..T...... clamd.json
<f..T...... domains.json
<f..T...... dovecot.json
<f..T...... groups.json
<f..T...... mail_domain.txt
<f..T...... postfix.json
<f..T...... rspamd.json
<f..T...... user_domain.txt
<f..T...... users.json
+ [[ '' == \f\i\n\i\s\h ]]
+ rsync -i --archive --usermap=1-1000:100 --groupmap=1-1000:101 --exclude lucene-indexes/ --delete /var/lib/nethserver/vmail/ rsync://mail1@10.5.4.1:20022/data/volumes/dovecot-data/
.d..t...... admin@familybrown.org/Maildir/.INBOX.Fail2ban/
(snip)
+ [[ -f /var/lib/redis/rspamd/dump.rdb ]]
+ rsync -i --archive --usermap=1-1000:100 --groupmap=1-1000:101 --delete --exclude '*' /var/lib/redis/rspamd/ rsync://mail1@10.5.4.1:20022/data/volumes/rspamd-redis/
.d..t...... ./
+ rsync -i --archive --usermap=1-1000:100 --groupmap=1-1000:101 /var/lib/redis/rspamd/dump.rdb rsync://mail1@10.5.4.1:20022/data/volumes/rspamd-redis/persistent.rdb
<f.st...... dump.rdb
+ rsync -i --recursive --perms /etc/opendkim/keys/default.txt /etc/opendkim/keys/default.private rsync://mail1@10.5.4.1:20022/data/state/dkim.migration/
<f..T...... default.private
<f..T...... default.txt
+ [[ '' != \f\i\n\i\s\h ]]
+ migrate_deps
+ [[ -f /var/lib/nethserver/nethserver-ns8-migration/nethserver-webtop5/bind.env ]]
+ [[ -f /var/lib/nethserver/nethserver-ns8-migration/nethserver-roundcubemail/bind.env ]]
+ command /usr/share/nethesis/nethserver-ns8-migration/apps/nethserver-roundcubemail/migrate
+ /usr/share/nethesis/nethserver-ns8-migration/apps/nethserver-roundcubemail/migrate
removed ‘roundcubemail.sql’
rsync: failed to connect to 10.5.4.1 (10.5.4.1): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(126) [sender=3.1.2]
I’ve snipped lines dealing with individual mail accounts and the Redis and rsync passwords, but otherwise left it unchanged.