Struggling to establish the migration tunnel between ns7 and ns8

the migration tool doesn’t initialise correctly. Might be related to a broken network configuration due to a vm clone.

It looks like the wg tunnel doesn’t pass ns7 → ns8 packets

UDP is ok :

# nc -u -v -z ns8ip 55820
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to ns8ip:55820.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.01 seconds.

ns7 side :

# wg
interface: ns8
  public key: eeFBt/5GFP3xAIH7KSo3FK5zxiHHcv730/ZF05DD4wM=
  private key: (hidden)
  listening port: 49781

peer: Q4KHXnl0W30xvee9kLFG/7Kiz6lSZwU9h9pGTuY14Tc=
  endpoint: ns8ip:55820
  allowed ips: 10.5.4.0/24
  transfer: 0 B received, 2.31 KiB sent
  persistent keepalive: every 25 seconds

ns8 side :

# wg
interface: wg0
  public key: Q4KHXnl0W30xvee9kLFG/7Kiz6lSZwU9h9pGTuY14Tc=
  private key: (hidden)
  listening port: 55820

peer: eeFBt/5GFP3xAIH7KSo3FK5zxiHHcv730/ZF05DD4wM=
  endpoint: ns7ip:49781
  allowed ips: 10.5.4.5/32
  transfer: 2.75 KiB received, 1.85 KiB sent
  persistent keepalive: every 25 seconds

peer: ePFv1JlVWfmv/+4VNRS6hyKnf5R/6aXM4y1TN4OY0HI=
  allowed ips: 10.5.4.3/32
  persistent keepalive: every 25 seconds

peer: VADjvxTQPb4uFbPIynCUKDMnz8RJ1KmaXNhvzWr7AjI=
  allowed ips: 10.5.4.2/32
  persistent keepalive: every 25 seconds

peer: vFdZxzJagP9X85puBWHfjq8futckHlY5jAvtsQr6GjY=
  allowed ips: 10.5.4.4/32
  persistent keepalive: every 25 seconds

Notice the four peers ? I tried to delete them using wg set peer xx remove but they keep coming when relaunching the migration tool. Is this expected ?

Tts

the funny thing is also that at at ns7 side :

peer: Q4KHXnl0W30xvee9kLFG/7Kiz6lSZwU9h9pGTuY14Tc=
endpoint: 188.34.138.54:55820
allowed ips: 10.5.4.0/24
transfer: 0 B received, 2.31 KiB sent
persistent keepalive: every 25 seconds

but ns8 side

peer: eeFBt/5GFP3xAIH7KSo3FK5zxiHHcv730/ZF05DD4wM=
endpoint: ns7ip:49781
allowed ips: 10.5.4.5/32
transfer: 2.75 KiB received, 1.85 KiB sent
persistent keepalive: every 25 seconds

I stopped shorewall and it worked… then stopped again. No idea why.

# iptables -L -n -v ns7 side
Chain INPUT (policy ACCEPT 16524 packets, 5459K bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 22391 packets, 12M bytes)
 pkts bytes target     prot opt in     out     source               destination         
[root@cloud ~]# 

What’s the exact error?

Which virtualization do you use?

Does the cloned VM use same IP, hostname or MAC or something else that could cause issues?

Is ping working from NS7 wg to NS8 wg and other way round?

Maybe try to disconnect NS7 migration tool and clean up NS8 as explained in Release notes — NS8 documentation (Updates are suspended during NS7 migration)

Maybe firewall issue? UDP port 49781 is opened? (usually it’s 55820/UDP)

EDIT:

Are there some hints in the logs?

1 Like

Thanks ! Let me try this before anything else.

If there’s some firewall device in between it can help to clear all connections from/to port 55820/UDP.

Here’s a working connection to compare:

NS8:

[root@node ~]# wg
interface: wg0
  public key: rlMdLPEG6O+MgWEWLzvh+tkjKoFBmxWCPz5zm8pzS0w=
  private key: (hidden)
  listening port: 55820

peer: 3f9TVTBRE0uAGeJj+yacvW2ugr0SA7V5hckEHS2XPxA=
  endpoint: 192.168.3.159:48574
  allowed ips: 10.5.4.2/32
  latest handshake: 11 seconds ago
  transfer: 11.15 KiB received, 66.12 KiB sent
  persistent keepalive: every 25 seconds

NS7:

[root@neth ~]# wg
interface: ns8
  public key: 3f9TVTBRE0uAGeJj+yacvW2ugr0SA7V5hckEHS2XPxA=
  private key: (hidden)
  listening port: 48574

peer: rlMdLPEG6O+MgWEWLzvh+tkjKoFBmxWCPz5zm8pzS0w=
  endpoint: 192.168.3.141:55820
  allowed ips: 10.5.4.0/24
  latest handshake: 2 minutes, 20 seconds ago
  transfer: 65.93 KiB received, 10.89 KiB sent
  persistent keepalive: every 25 seconds

yes. that’s a proxmox server and two guests with public ip’s. The hetzner firewall is on but opened for 55820.

Since nc is able to “connect” I guess that it’s ok ?

I assume the firewall on the virtual network devices is disabled.

Error connecting to NS8: urllib.error.URLError: <urlopen error [Errno 110] Connection timed out>

That’s when it tries to connect to http://10.5.4.1:9311 (which works when testing locally)

yes definitively.

nope that’s the point indeed.

Yes everything changed so I believe there is some leftover at the local firewall (shorewall / iptables) but resetting the rules and stopping shorewall doesn’t seems to help.

I did it and that cleared the unwanted peers but still it’s not functioning.

Is the NS8 UDP port reachable from NS7?

nc -vz -u <ns8_ip> 55820

Is the right UDP port listening at the NS7?

ss -tulpn | grep 49781

Maybe check the configs of NS7 and NS8?

NS8:

[root@node ~]# cat /etc/wireguard/wg0.conf 
[Interface]
Address = 10.5.4.1/32
ListenPort = 55820
PrivateKey = cJUkbGjMOgVUW/d91tia6NtfodsV3kO8qpZ96Ni+9k4=

[Peer]
PublicKey = 3f9TVTBRE0uAGeJj+yacvW2ugr0SA7V5hckEHS2XPxA=
AllowedIPs = 10.5.4.2/32
PersistentKeepalive = 25

NS7:

[root@neth ~]# cat /etc/wireguard/ns8.conf 
# ================= DO NOT MODIFY THIS FILE =================
# 
# Manual changes will be lost when this file is regenerated.
#
# Please read the developer's guide, which is available
# at NethServer official site: https://www.nethserver.org
#
# 
[Interface]
Address = 10.5.4.2
PrivateKey = wHwAtnAcKRIv291SbV7EXd2Uocrfv7POgEmTOsT9UHQ=

[Peer]
PublicKey = rlMdLPEG6O+MgWEWLzvh+tkjKoFBmxWCPz5zm8pzS0w=
AllowedIPs = 10.5.4.0/24
Endpoint = node.ns8rockytest2.com:55820
PersistentKeepalive = 25

If there are hostnames used, is the DNS working correctly?

yes

# nc -vz -u cloud8.toucheatout.be 55820
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 188.34.138.54:55820.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.01 seconds.

Is the right UDP port listening at the NS7?

nope ! should NS7 listen too ??

#  cat /etc/wireguard/wg0.conf 
[Interface]
Address = 10.5.4.1/32
ListenPort = 55820
PrivateKey = redacted=

[Peer]
PublicKey = dbGiSk1MTIpAjx/Hjs5Kl9HT84X8Dj2uFCoocqcsSQU=
AllowedIPs = 10.5.4.6/32
PersistentKeepalive = 25



# cat /etc/wireguard/ns8.conf 
# ================= DO NOT MODIFY THIS FILE =================
# 
# Manual changes will be lost when this file is regenerated.
#
# Please read the developer's guide, which is available
# at NethServer official site: https://www.nethserver.org
#
# 
[Interface]
Address = 10.5.4.6
PrivateKey = redacted=

[Peer]
PublicKey = Q4KHXnl0W30xvee9kLFG/7Kiz6lSZwU9h9pGTuY14Tc=
AllowedIPs = 10.5.4.0/24
Endpoint = ns8fqdn:55820
PersistentKeepalive = 25
`

DNS working perfectly (that’s was not an easy part, I had to reinstall ns8 to get it working)

Yes, to check on NS7:

[root@neth ~]# wg
interface: ns8
  public key: 3f9TVTBRE0uAGeJj+yacvW2ugr0SA7V5hckEHS2XPxA=
  private key: (hidden)
  listening port: 48574

Get the listening port and check using ss if it’s listening:

[root@neth ~]# ss -tulpn | grep 48574
udp    UNCONN     0      0         *:48574                 *:*                  
udp    UNCONN     0      0      [::]:48574              [::]:*

right, the port changed and it’s listening correctly

# ss -tulpn | grep 56886
udp    UNCONN     0      0         *:56886                 *:*                  
udp    UNCONN     0      0      [::]:56886              [::]:*   

trying with another client to validate that ns8 is working correctly.

Did you try to disable the NS7 firewall using

shorewall clear

To reenable the firewall on NS7:

signal-event firewall-adjust

EDIT:

Maybe there is still an old wg connection running on NS7 that conflicts? (There was nethserver-wireguard…)

ls /etc/wireguard

yes

nope…

I tried to connect with another client, it works. NS8 is ok, the problem is related to NS7, probably the network config is broken : the ping packets are not leaving or reaching ns7.

the routing table is ok. I’m out if ideas. And that’s just to setup a clone to test the migration path… Seriously considering migrating by hand.

Maybe the issue came from cloning. You could try to setup a test NS7 by using backup/restore to avoid the cloning.

You could also try migrating the original NS7, if something goes wrong during migration you could step back as explained here: GitHub - NethServer/nethserver-ns8-migration

EDIT:

Routing to compare:

NS8:

[root@node ~]# ip r
default via 192.168.3.11 dev eth0 proto static metric 100 
10.5.4.2 dev wg0 
192.168.3.0/24 dev eth0 proto kernel scope link src 192.168.3.141 metric 100

NS7:

[root@neth ~]# ip r
default via 192.168.3.11 dev eth0 
10.5.4.0/24 dev ns8 scope link 
192.168.3.0/24 dev eth0 proto kernel scope link src 192.168.3.159
1 Like

Still in prod… And no snapshots :grimacing: I’m really not wanting to break that server or relying to backup restores.

Still I could try to restore a backup on the target machine… Took me so much time already :slightly_frowning_face:

Routing tables are… WAIT

NS7 is NOT on a public address :scream::scream::scream::scream::skull_and_crossbones::skull_and_crossbones::skull_and_crossbones: It’s behind a nethsec !!! I have to forward UDP !

Silly me. Should take some rest. I’ll report back.

1 Like

@mrmarkuz

Just to clarify : NS8 is the VPN server, correct ? So there is no need to forward ports since asfaik nethsec passes all the outgoing traffic (including UDP ?) when NS7 tries to connect…

Yes, regarding the migration it could be seen as server but wireguard is a peer to peer VPN and the peers need to be able to reach each other.

EDIT:

Maybe it helps to just restart the wireguard connections?

NS7:

systemctl restart wg-quick@ns8

NS8:

systemctl restart wg-quick@wg0

See also Migration to NS8 another try - #11 by mrmarkuz