Failed even to start migration... any ideas?

So I am trying to make a preliminary run of the migration tool.
My NS8 is set up, just one node, nothing added to it from “stock”, fully updated.
NS7 also fully updated.
So I run the migration tool and this is what I got…


Also in NS8 I got this in the messages…
image
I am not sure about the “add node” thing, as there is already a node.

Any ideas?

I see a few time outs…


Logs

/var/log/messages
Search
Regexp filter

[Thu Jan  1 02:00:00 1970]Jan 13 19:25:01 quasar systemd: Started Session 148710 of user asterisk.
[Thu Jan  1 02:00:00 1970]Jan 13 19:25:01 quasar systemd: Started Session 148711 of user root.
[Thu Jan  1 02:00:00 1970]Jan 13 19:25:01 quasar systemd: Started Session 148712 of user root.
[Thu Jan  1 02:00:00 1970]Jan 13 19:26:01 quasar systemd: Started Session 148713 of user root.
[Thu Jan  1 02:00:00 1970]Jan 13 19:26:01 quasar systemd: Started Session 148714 of user asterisk.
[Thu Jan  1 02:00:00 1970]Jan 13 19:27:01 quasar systemd: Started Session 148715 of user root.
[Thu Jan  1 02:00:00 1970]Jan 13 19:27:01 quasar systemd: Started Session 148716 of user asterisk.
[Thu Jan  1 02:00:00 1970]Jan 13 19:27:01 quasar systemd: Started Session 148717 of user root.
[Thu Jan  1 02:00:00 1970]Jan 13 19:27:47 quasar dbus[763]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service'
[Thu Jan  1 02:00:00 1970]Jan 13 19:27:47 quasar systemd: Starting Time & Date Service...
[Thu Jan  1 02:00:00 1970]Jan 13 19:27:47 quasar dbus[763]: [system] Successfully activated service 'org.freedesktop.timedate1'
[Thu Jan  1 02:00:00 1970]Jan 13 19:27:47 quasar systemd: Started Time & Date Service.
[Thu Jan  1 02:00:00 1970]Jan 13 19:28:01 quasar systemd: Started Session 148718 of user root.
[Thu Jan  1 02:00:00 1970]Jan 13 19:28:01 quasar systemd: Started Session 148719 of user asterisk.
[Thu Jan  1 02:00:00 1970]Jan 13 19:28:47 quasar dbus[763]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service'
[Thu Jan  1 02:00:00 1970]Jan 13 19:28:47 quasar systemd: Starting Time & Date Service...
[Thu Jan  1 02:00:00 1970]Jan 13 19:28:47 quasar dbus[763]: [system] Successfully activated service 'org.freedesktop.timedate1'
[Thu Jan  1 02:00:00 1970]Jan 13 19:28:47 quasar systemd: Started Time & Date Service.
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:01 quasar systemd: Started Session 148721 of user root.
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:01 quasar systemd: Started Session 148720 of user asterisk.
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar /sbin/e-smith/db[13387]: /var/lib/nethserver/db/configuration: OLD wg-quick@ns8=service|Address|10.5.4.2|RemoteEndpoint|quasar-2.local.ilogic.gr:55820|RemoteKey|ZemISLFpuD/bFCQV892+lHOo10RW25AlQF0mlOd0eXE=|RemoteNetwork|10.5.4.0/24|status|enabled
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar /sbin/e-smith/db[13387]: /var/lib/nethserver/db/configuration: NEW wg-quick@ns8=service|Address|10.5.4.3|RemoteEndpoint|quasar-2.local.ilogic.gr:55820|RemoteKey|ZemISLFpuD/bFCQV892+lHOo10RW25AlQF0mlOd0eXE=|RemoteNetwork|10.5.4.0/24|status|enabled
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar esmith::event[13389]: Event: nethserver-ns8-migration-save
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar esmith::event[13389]: expanding /etc/httpd/conf.d/00ns8migration.conf
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar esmith::event[13389]: expanding /etc/wireguard/ns8.conf
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar esmith::event[13389]: Action: /etc/e-smith/events/actions/generic_template_expand SUCCESS [0.051618]
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: Reloading.
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: [/usr/lib/systemd/system/suricata.service:17] Unknown lvalue 'MemoryDenyWriteExecute' in section 'Service'
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: [/usr/lib/systemd/system/suricata.service:18] Unknown lvalue 'LockPersonality' in section 'Service'
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: [/usr/lib/systemd/system/suricata.service:19] Unknown lvalue 'ProtectControlGroups' in section 'Service'
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: [/usr/lib/systemd/system/suricata.service:20] Unknown lvalue 'ProtectKernelModules' in section 'Service'
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: [/usr/lib/systemd/system/netdata.service:9] Unknown lvalue 'LogNamespace' in section 'Service'
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar esmith::event[13389]: [INFO] service httpd reload
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: Reloading The Apache HTTP Server.
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: Reloaded The Apache HTTP Server.
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: Reloading.
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: [/usr/lib/systemd/system/suricata.service:17] Unknown lvalue 'MemoryDenyWriteExecute' in section 'Service'
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: [/usr/lib/systemd/system/suricata.service:18] Unknown lvalue 'LockPersonality' in section 'Service'
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: [/usr/lib/systemd/system/suricata.service:19] Unknown lvalue 'ProtectControlGroups' in section 'Service'
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: [/usr/lib/systemd/system/suricata.service:20] Unknown lvalue 'ProtectKernelModules' in section 'Service'
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: [/usr/lib/systemd/system/netdata.service:9] Unknown lvalue 'LogNamespace' in section 'Service'
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar esmith::event[13389]: [INFO] service wg-quick@ns8 restart
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: Stopping WireGuard via wg-quick(8) for ns8...
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar wg-quick: [#] ip link delete dev ns8
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: Stopped WireGuard via wg-quick(8) for ns8.
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: Starting WireGuard via wg-quick(8) for ns8...
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar wg-quick: [#] ip link add ns8 type wireguard
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar wg-quick: [#] wg setconf ns8 /dev/fd/63
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar wg-quick: [#] ip -4 address add 10.5.4.3 dev ns8
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar wg-quick: [#] ip link set mtu 1420 up dev ns8
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar wg-quick: [#] ip -4 route add 10.5.4.0/24 dev ns8
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar systemd: Started WireGuard via wg-quick(8) for ns8.
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar esmith::event[13389]: Action: /etc/e-smith/events/actions/adjust-services SUCCESS [0.292946]
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:12 quasar esmith::event[13389]: Event: nethserver-ns8-migration-save SUCCESS
[Thu Jan  1 02:00:00 1970]Jan 13 19:29:39 quasar cockpit-ws: couldn't read from connection: Peer sent fatal TLS alert: Unknown certificate
[Thu Jan  1 02:00:00 1970]Jan 13 19:30:01 quasar systemd: Started Session 148723 of user root.
[Thu Jan  1 02:00:00 1970]Jan 13 19:30:01 quasar systemd: Started Session 148724 of user asterisk.
[Thu Jan  1 02:00:00 1970]Jan 13 19:30:01 quasar systemd: Started Session 148725 of user root.
[Thu Jan  1 02:00:00 1970]Jan 13 19:30:01 quasar systemd: Started Session 148722 of user root.
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:01 quasar systemd: Started Session 148726 of user asterisk.
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:01 quasar systemd: Started Session 148727 of user root.
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: Traceback (most recent call last):
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/lib64/python3.6/urllib/request.py", line 1349, in do_open
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: encode_chunked=req.has_header('Transfer-encoding'))
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/lib64/python3.6/http/client.py", line 1254, in request
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: self._send_request(method, url, body, headers, encode_chunked)
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/lib64/python3.6/http/client.py", line 1300, in _send_request
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: self.endheaders(body, encode_chunked=encode_chunked)
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/lib64/python3.6/http/client.py", line 1249, in endheaders
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: self._send_output(message_body, encode_chunked=encode_chunked)
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/lib64/python3.6/http/client.py", line 1036, in _send_output
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: self.send(msg)
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/lib64/python3.6/http/client.py", line 974, in send
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: self.connect()
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/lib64/python3.6/http/client.py", line 946, in connect
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: (self.host,self.port), self.timeout, self.source_address)
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/lib64/python3.6/socket.py", line 724, in create_connection
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: raise err
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/lib64/python3.6/socket.py", line 713, in create_connection
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: sock.connect(sa)
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: TimeoutError: [Errno 110] Connection timed out
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: During handling of the above exception, another exception occurred:
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: Traceback (most recent call last):
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/sbin/ns8-join", line 190, in <module>
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: update_routes_response = call(api_endpoint, "update-routes", payload['token'], update_routes_request, False)
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/sbin/ns8-join", line 45, in call
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: post = request.urlopen(req, context=ctx)
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/lib64/python3.6/urllib/request.py", line 223, in urlopen
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: return opener.open(url, data, timeout)
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/lib64/python3.6/urllib/request.py", line 526, in open
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: response = self._open(req, data)
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/lib64/python3.6/urllib/request.py", line 544, in _open
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: '_open', req)
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/lib64/python3.6/urllib/request.py", line 504, in _call_chain
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: result = func(*args)
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/lib64/python3.6/urllib/request.py", line 1377, in http_open
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: return self.do_open(http.client.HTTPConnection, req)
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: File "/usr/lib64/python3.6/urllib/request.py", line 1351, in do_open
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: raise URLError(err)
[Thu Jan  1 02:00:00 1970]Jan 13 19:31:20 quasar cockpit-bridge: urllib.error.URLError: <urlopen error [Errno 110] Connection timed out>

Ensure the leader VPN endpoint is reachable by ns7. This is documented here:

https://docs.nethserver.org/projects/ns8/en/latest/system_requirements.html#worker-node-requirements

the API server response contains the leader VPN endpoint: it is a host address with a UDP port number used to set up a Wireguard VPN. The VPN endpoint is configured during the cluster creation procedure. Ensure it is not blocked by other network appliances.

In other words, try to use the leader fqdn ( quasar-2.local.ilogic.gr) not its IP.

I don’t know how much matters in a migration but this looks strange.

I noticed that in NS7 logs too…
NS7 has proper time though:
image

(following up on VPN endpoint… btw both NS7 and NS8 are VMs on the same machine)

1 Like

Well I read the doc section you pointed @davidep

  1. Not sure what is “worker node”, but I can definitely reach /cluster-admin/ both by FQDN or IP (from my own PC). Trying to ping NS8 IP and FQDN from inside NS7 both reply, but FQDN resolved to my EXTERNAL IP address (!?)… This is super weird because my NS7 setup has defined as DNS
  • first my own internal technitium dns …that has an entry for my NS8 resolving to internal IP (so it didn’t even ask that DNS)
  • second my router’s DNS, that has no entry for my NS8 and even reaching my external (domain host) DNS, that is authoritative, there is again no entry about NS8…
    …so this is VERY weird.
    In any case I added a DNS entry inside my NS7 own DNS (that I don’t use as it is VERY basic as you all know) and this resolved this weird behavior.
  1. My NS8 is reachable using HTTPS, although it is non-secure as I haven’t managed certs yet. I cannot verify if it is also reachable from my NS7, but there should be no firewall blocking between two VMs in the same network.

  2. Again the port used by VPN in NS8 shouldn’t be blocked by anything because NS8 has of course its own IP and is handling its ports by itself.

So… Not sure what fails. I will retry after the DNS fix I issued (although you can see in screen captures above I used IP not FQDN, so it shouldn’t be the problem).

BTW after a while my NS7 reported this:
image
…(which verifies it used IP)…

EDIT: Restarting migration process (pressing “connect to a different cluser”) and using the FQDN this time, results in the exact same errors.

You have to fix the DNS configuration to resolve that fqdn to the internal IP address.

You could try to do it from ns7 hosts page.


As alternative, recreate the ns8 cluster and set the IP address as VPN endpoint. That means the leader will be unreachable by nodes outside your LAN.

I already did fix the DNS issue (I write this above) - it resolves properly from within NS7 console now.
It did not solve the problem.

How can I check the current VPN endpoint though?

I have an interesting evolution (read: bad)…
When you talked about VPN endpoint, I checked the node FQDN and it was wrong (in “3 dots”).
So I changed it to the same FQDN that is reported by DNSes.
This didn’t fix anything.
So I rebooted the VM.

…and although I reach login prompt in the console (and even reports how I can access /cluser-admin/ over web), I cannot reach /cluster-admin/ any more!!! From FQDN OR IP!!!

Have to say it is VERY easy for NS8 to go boom currently… Definitely not ready for production.

OK newer news.
I started from scratch, from fresh NS8 VM, where I set the VPN related FQDN (or the leader node?) correctly from scratch, from the setup wizard.
This allowed migration in NS7 to query installed apps properly.
I don’t know if the migration will work but the issue of this thread is resolved.

So, I don’t know if the issue was that this FQDN was not set correctly from the beginning, or something else.
I do know that going to leader node and clicking to change FQDN FAILS the whole server. So this is definitely something to be looked at.