MultiWAN provider marked as down

Hi,
I’m have a working NethServer release 7.6.1810 (final) running in a KVM virtual machine. I have two Internet providers, one is FTTH (name f_gen) and the other one an old ADSL (named adsl).
I’m trying to setup a secondary FTTH with another provider (named fjazz) to replace the ADSL connection with no luck. After the interface is up and NethServer starts to test the connectivity using pings to 8.8.8.8 the provider is marked as down.
The only difference with the first FTTH provider is that this new one uses DHCP addressing instead of a fixed IP address.
I’ve manually pinged 8.8.8.8 through that interface and I have found that it doesn’t work when I indicate the output interface, which I guess is what NethServer does, but it works when I set the output IP address instead:

[root@nethserver ~]# ping -c1 -I eth2 8.8.8.8
PING 8.8.8.8 (8.8.8.8) from 188.77.X.Y eth2: 56(84) bytes of data.

— 8.8.8.8 ping statistics —
1 packets transmitted, 0 received, 100% packet loss, time 0ms

[root@nethserver ~]#
[root@nethserver ~]# ping -c1 -I 188.77.X.Y 8.8.8.8
PING 8.8.8.8 (8.8.8.8) from 188.77.X.Y : 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=124 time=11.9 ms

For the other providers, it doesn’t matter if I use source IP address or device name:

[root@nethserver ~]# ping -c1 -I eth4 8.8.8.8
PING 8.8.8.8 (8.8.8.8) from 172.16.1.253 eth4: 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=123 time=41.5 ms

I’ve also captured traffic from the new provider interface while running “ping -I eth2 8.8.8.8” and the only packets shown are ARP, asking MAC address for 8.8.8.8, which seems to indicate that somehow the server is not routing packets to the gateway received by DHCP. I guess that this is the reason why the provider is being marked as down but I cannot understand what is causing this behavior.

Everything else seems fine, I’ve checked shorewall configuration, routing tables, etc. Here is the output for several “ip” commands.

[root@nethserver ~]# ip rule
0: from all lookup local
999: from all lookup main
10000: from all fwmark 0x10000/0xf0000 lookup f_gen
10002: from all fwmark 0x30000/0xf0000 lookup adsl
20000: from 192.168.0.253 lookup f_gen
20000: from 188.77.X.Y lookup fjazz
20000: from 172.16.1.253 lookup adsl
26900: from all lookup f_gen
26902: from all lookup adsl
32765: from all lookup balance
32767: from all lookup default
[root@nethserver ~]# ip route show table fjazz
default via 188.77.A.B dev eth2 src 188.77.X.Y
188.77.A.B dev eth2 scope link src 188.77.X.Y
[root@nethserver ~]# ip route show table f_gen
default via 192.168.0.1 dev eth1 src 192.168.0.253
192.168.0.1 dev eth1 scope link src 192.168.0.253
[root@nethserver ~]# ip route show table adsl
default via 172.16.1.1 dev eth4 src 172.16.1.253
172.16.1.1 dev eth4 scope link src 172.16.1.253
[root@nethserver ~]#

Any clues?

1 Like

Have you tried to ping another IP?

1 Like

Hi,
Yes I did, same result.

Link of ethernet card is OK also when provider is down? Correct? Is there something interesting in shorewall logs?

Hi Federico.Yes, everything is fine at ethernet level.

I’ve tested the following setup: I’ve connected a simple router with two interfaces between the provider and NethServer, with dhcp addressing for the provider interface and a static private network for the interface to NethServer. In this scenario everything works as expected, pings are successful and connection is no longer marked as down.

My conclussions is that something is wrong when a provider has a dynamic address, but I cannot figure out what the problem is. In “/etc/shorewall/providers” file, the gateway for the dhcp provider is set as “detect”. Maybe detection is not working right, and routes are not populated accordingly?

I never encountered this problems… the only problem I notice is that some routers ban the server IP after X ping requests…
If you do a shorewall restart everything works fine? How much time did you have to wait before interface goes down?

In this case is not a problem with the IP being banned, ping -I “source-ip” works, whereas ping -I “interface” doesn’t. My guess is that the script to detect if a provider is down uses the second option.

I’ve rebooted NethServer in case there was something wrong with the routing table, removed and re-added provider, …, but no luck.