NS7 to NS8 migration tool fails to connect

NethServer Version: 7.9.2009 and 8 beta
Module: ns8-migration
Trying to join connect my NS7 to my new NS8 and I get an error after entering credentials and attempting to connect.

Using the provided command to get details gives this:

[root@neth ~]#  echo '{"action":"login","Host":"$IP","User":"$USER","Password":"$PASS","TLSVerify":"disabled"}' | /usr/bin/setsid /usr/bin/sudo /usr/libexec/nethserver/api/nethserver-ns8-migration/connection/update | jq
Odd number of elements in hash assignment at /usr/share/perl5/vendor_perl/esmith/db.pm line 273.
{
  "steps": 2,
  "pid": 18003,
  "args": "",
  "event": "nethserver-ns8-migration-save"
}
{
  "step": 1,
  "pid": 18003,
  "action": "S05generic_template_expand",
  "event": "nethserver-ns8-migration-save",
  "state": "running"
}
{
  "progress": "0.50",
  "time": "0.152543",
  "exit": 0,
  "event": "nethserver-ns8-migration-save",
  "state": "done",
  "step": 1,
  "pid": 18003,
  "action": "S05generic_template_expand"
}
{
  "step": 2,
  "pid": 18003,
  "action": "S90adjust-services",
  "event": "nethserver-ns8-migration-save",
  "state": "running"
}
{
  "progress": "1.00",
  "time": "1.007077",
  "exit": 0,
  "event": "nethserver-ns8-migration-save",
  "state": "done",
  "step": 2,
  "pid": 18003,
  "action": "S90adjust-services"
}
{
  "pid": 18003,
  "status": "success",
  "event": "nethserver-ns8-migration-save"
}
Traceback (most recent call last):
  File "/usr/lib64/python3.6/urllib/request.py", line 1349, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/usr/lib64/python3.6/http/client.py", line 1254, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib64/python3.6/http/client.py", line 1300, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.6/http/client.py", line 1249, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.6/http/client.py", line 1036, in _send_output
    self.send(msg)
  File "/usr/lib64/python3.6/http/client.py", line 974, in send
    self.connect()
  File "/usr/lib64/python3.6/http/client.py", line 946, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "/usr/lib64/python3.6/socket.py", line 724, in create_connection
    raise err
  File "/usr/lib64/python3.6/socket.py", line 713, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/sbin/ns8-join", line 215, in <module>
    add_external_domain_response = call(api_endpoint, "add-external-domain", payload['token'], add_external_domain_request, False)
  File "/usr/sbin/ns8-join", line 45, in call
    post = request.urlopen(req, context=ctx)
  File "/usr/lib64/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib64/python3.6/urllib/request.py", line 526, in open
    response = self._open(req, data)
  File "/usr/lib64/python3.6/urllib/request.py", line 544, in _open
    '_open', req)
  File "/usr/lib64/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib64/python3.6/urllib/request.py", line 1377, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/lib64/python3.6/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 111] Connection refused>
{
  "id": "1693864816",
  "type": "CommandFailed",
  "message": "See /var/log/messages"
}

And the information in /var/log/messages show:

Sep  4 15:00:15 neth esmith::event[18003]: Event: nethserver-ns8-migration-save
Sep  4 15:00:15 neth esmith::event[18003]: expanding /etc/wireguard/wg0.conf
Sep  4 15:00:15 neth esmith::event[18003]: expanding /etc/httpd/conf.d/00ns8migration.conf
Sep  4 15:00:15 neth esmith::event[18003]: Action: /etc/e-smith/events/actions/generic_template_expand SUCCESS [0.152543]
Sep  4 15:00:15 neth systemd: Reloading.
Sep  4 15:00:15 neth systemd: [/usr/lib/systemd/system/netdata.service:49] Failed to parse capability in bounding/ambient set, ignoring: CAP_PERFMON
Sep  4 15:00:15 neth systemd: [/usr/lib/systemd/system/netdata.service:70] Unknown lvalue 'ProtectControlGroups' in section 'Service'
Sep  4 15:00:15 neth systemd: [/usr/lib/systemd/system/netdata.service:77] Unknown lvalue 'BindReadOnlyPaths' in section 'Service'
Sep  4 15:00:15 neth systemd: [/usr/lib/systemd/system/netdata.service:78] Unknown lvalue 'BindReadOnlyPaths' in section 'Service'
Sep  4 15:00:15 neth esmith::event[18003]: [INFO] service agent restart
Sep  4 15:00:15 neth systemd: Stopping NS8 agent...
Sep  4 15:00:15 neth systemd: Stopped NS8 agent.
Sep  4 15:00:15 neth systemd: Started NS8 agent.
Sep  4 15:00:15 neth agent: Task queue pop error: dial tcp $VPN_IP:6379: connect: connection refused
Sep  4 15:00:15 neth systemd: Reloading.
Sep  4 15:00:15 neth systemd: [/usr/lib/systemd/system/netdata.service:49] Failed to parse capability in bounding/ambient set, ignoring: CAP_PERFMON
Sep  4 15:00:15 neth systemd: [/usr/lib/systemd/system/netdata.service:70] Unknown lvalue 'ProtectControlGroups' in section 'Service'
Sep  4 15:00:15 neth systemd: [/usr/lib/systemd/system/netdata.service:77] Unknown lvalue 'BindReadOnlyPaths' in section 'Service'
Sep  4 15:00:15 neth systemd: [/usr/lib/systemd/system/netdata.service:78] Unknown lvalue 'BindReadOnlyPaths' in section 'Service'
Sep  4 15:00:15 neth esmith::event[18003]: [INFO] service wg-quick@wg0 restart
Sep  4 15:00:15 neth systemd: Stopping WireGuard via wg-quick(8) for wg0...
Sep  4 15:00:15 neth wg-quick: [#] ip link delete dev wg0
Sep  4 15:00:15 neth systemd: Stopped WireGuard via wg-quick(8) for wg0.
Sep  4 15:00:15 neth systemd: Starting WireGuard via wg-quick(8) for wg0...
Sep  4 15:00:15 neth wg-quick: [#] ip link add wg0 type wireguard
Sep  4 15:00:15 neth wg-quick: [#] wg setconf wg0 /dev/fd/63
Sep  4 15:00:15 neth wg-quick: [#] ip -4 address add 10.5.4.6 dev wg0
Sep  4 15:00:15 neth wg-quick: [#] ip link set mtu 1420 up dev wg0
Sep  4 15:00:15 neth wg-quick: [#] ip -4 route add 10.5.4.0/24 dev wg0
Sep  4 15:00:15 neth systemd: Started WireGuard via wg-quick(8) for wg0.
Sep  4 15:00:15 neth systemd: Reloading.
Sep  4 15:00:15 neth systemd: [/usr/lib/systemd/system/netdata.service:49] Failed to parse capability in bounding/ambient set, ignoring: CAP_PERFMON
Sep  4 15:00:15 neth systemd: [/usr/lib/systemd/system/netdata.service:70] Unknown lvalue 'ProtectControlGroups' in section 'Service'
Sep  4 15:00:15 neth systemd: [/usr/lib/systemd/system/netdata.service:77] Unknown lvalue 'BindReadOnlyPaths' in section 'Service'
Sep  4 15:00:15 neth systemd: [/usr/lib/systemd/system/netdata.service:78] Unknown lvalue 'BindReadOnlyPaths' in section 'Service'
Sep  4 15:00:16 neth esmith::event[18003]: [INFO] service httpd reload
Sep  4 15:00:16 neth systemd: Reloading The Apache HTTP Server.
Sep  4 15:00:16 neth systemd: Reloaded The Apache HTTP Server.
Sep  4 15:00:16 neth esmith::event[18003]: Action: /etc/e-smith/events/actions/adjust-services SUCCESS [1.007077]
Sep  4 15:00:16 neth esmith::event[18003]: Event: nethserver-ns8-migration-save SUCCESS

This seems to be the offending line:
neth agent: Task queue pop error: dial tcp $VPN_IP:6379: connect: connection refused

It looks like it may be attempting to dial the Wireguard address configured on NS8 and is failing.

did you have used a FQDN to reach the NS8 server, does this FQDN is resolved locally ???

I used the IP directly and disabled TLS. I can telnet to the IP on the SSH port from NS7 without issue.

1 Like

I can confirm I have too an issue, looking in it

1 Like

OK I started again on my real dev server, the difference with the laptop is that I have a bridged IP on my network and this time I succeed to connect to the NS8 and I have migrated a mattermost app

what application do you have installed on NS7 to migrate
What resources did you have allocated to the VM, is it real hardware ???

1 Like

I succeed to migrate a NS7 to rocky9 NS8 with mattermost and openldap (nethserver-directory), I used both IP and FQDN

1 Like

Both machines are virtual machines.

The old server does have a few bridged connections, not to mention the typical docker networks. The new machine doesn’t have any bridged connections (yet) - I just see the loopback, WAN interface, LAN interface, and wireguard interface.

There are many applications installed (including Nextcloud, Mail, Mattermost) and even more running in containers I plan to migrate manually.

I went to look again this weekend and now I see this error in the migration interface. Are there any known fixes for this before I start ripping things out of the new server?

Migrating ns7 to a working cluster could raise name conflicts. The user domain existence is checked, but other things like application HTTP routes are not.

If you really know what your’re doing, remove the conflicting user domain from the cluster and go on. Otherwise migrate to a new cluster.

The cluster wasn’t “working”. It’s a fresh install that I made by following the migration steps: NethServer 7 migration — NS8 documentation

Now that Beta 2 is out I reset the new server and gave migration another try. The NS7 still had the new NS8 node configured, so I clicked the button to disconnect from that node. I waited until it confirmed it had been disconnected and then I tried again.

I’m still getting an error, and not even getting as far as the VPN connection setup. Here’s the error:

Odd number of elements in hash assignment at /usr/share/perl5/vendor_perl/esmith/db.pm line 273.
{
  "steps": 2,
  "pid": 16509,
  "args": "",
  "event": "nethserver-ns8-migration-save"
}
{
  "step": 1,
  "pid": 16509,
  "action": "S05generic_template_expand",
  "event": "nethserver-ns8-migration-save",
  "state": "running"
}
{
  "progress": "0.50",
  "time": "0.108469",
  "exit": 0,
  "event": "nethserver-ns8-migration-save",
  "state": "done",
  "step": 1,
  "pid": 16509,
  "action": "S05generic_template_expand"
}
{
  "step": 2,
  "pid": 16509,
  "action": "S90adjust-services",
  "event": "nethserver-ns8-migration-save",
  "state": "running"
}
{
  "progress": "1.00",
  "time": "0.7759",
  "exit": 256,
  "event": "nethserver-ns8-migration-save",
  "state": "done",
  "step": 2,
  "pid": 16509,
  "action": "S90adjust-services"
}
{
  "pid": 16509,
  "status": "failed",
  "event": "nethserver-ns8-migration-save"
}
Traceback (most recent call last):
  File "/usr/sbin/ns8-join", line 152, in <module>
    subprocess.run(['/sbin/e-smith/signal-event', '-j', 'nethserver-ns8-migration-save'], check=True)
  File "/usr/lib64/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/sbin/e-smith/signal-event', '-j', 'nethserver-ns8-migration-save']' returned non-zero exit status 1.
{
  "id": "1696202961",
  "type": "CommandFailed",
  "message": "See /var/log/messages"
}

I’m guessing it has something to do with this function that is splitting out DB configuration and for some reason comes up with an incorrect number of arguments for a proper key/value pair:

sub _db_string_to_type_and_hash ($)
{
    my ($arg) = @_;
    return ('', ()) unless defined $arg;

    # The funky regex is to avoid escaped pipes.
    # If you specify a negative limit empty trailing fields are omitted.
    return split(/(?<!\\)\|/, $arg, -1);
}

Looking into this further, it appears the DB configuration is being saved in a way that this function doesn’t expect. Pulling it as JSON it makes sense:

{"props":{"admin":"admin","":"","disabled":"disabled","User":"admin","TLSVerify":"disabled","LeaderIpAddress":"10.5.4.1","Password":"$PASSWORD","Host":"$HOSTNAME","enabled":"enabled","$SOME_WEIRD_STRING":"TLSVerify"},"name":"ns8","type":"configuration"}

but pulling it as a raw value (which is being passed in to the function above) returns this garbled mess:

configuration|||$SOME_WEIRD_STRING|TLSVerify|Host|$HOST|LeaderIpAddress|10.5.4.1|Password|$PASSWORD|TLSVerify|disabled|User|admin|admin|admin|disabled|disabled|enabled|enabled

A quick regex test shows that the first | in configuration||| gets caught by the regex

@giacomo or @davidep - I see both of you in GitHub, perhaps you can provide some insight? :slight_smile:

I’m too nervous to play around with this RegEx in my prod instance, since I’m not sure if adjusting it will break other things.

1 Like

Do you have a | (pipe) character in NS8 admin password? E-smith DB does not support strings containing it and the UI validation logic could not protect the input data enough.

:face_with_raised_eyebrow: is this a… known fact?

Yes it is a limitation or bug of e-smith that has been never fixed…

Thanks for sharing.
I’d loose the bet that this particular detail was never into NethServer documentation?

No. Consider that user input must be validated and free strings, like passwords, are not stored in e-smith DB.

I’d suggest to write this detail into the migration procedure documentation.

1 Like