Shorewall hangs... How do you debug Shorewall?

I am running Nethserver on a Dell Inc. Precision WorkStation T3500 (Intel(R) Xeon(R) CPU W3550 @ 3.07GHz x 8. Kernel Release 3.10.0-1160.45.1.el7.x86_64, Operating System - NethServer release 7.9.2009 (final)

Its all been working fine for literally years. I had an Archer D2 wireless router on the “outside” (Red) and ran a couple of wired servers on the router ports and some clients on the wireless (it took the load off the server, if they were going to the Internet). I’ve also go an “inner” LAN, for more secure clients, on a second NIC (Green).

I wanted to start using a VPN, so I upgraded to an ASUS RT AC86U wireless router, running Merlin, because it would allow me to run OpenVPN, just one instance instead of instances on each client device. I’m using SurfShark for my VPN.

This set-up seems to have “issues” with Shorewall, but nly on the NethServer (the only Linux /shorewall box, the others are a Hik NVR and a Cisco SPA) At random times during the day, the clients on the local red net can’t contact the NethServer on the red net (through the router, but on the same LAN/Subnet), though they can contact other devices in the red net, and clients outside the local net (in the internet) can contact the Nethserver (that I can’t reach from the Red net???).

This behaviour affects my all servers (mail servers and virtual hosts (3)) on the Nethserver, but nothing else. I can still ssh into the NethServer (thankfully!).

I spent a lot of time debugging the router, with folk helping who knew the router inside out, and we could find nothing wrong. Then as part of the debug process, I rebooted the NethServer and the connections from clients to NethServer came to life again. After that, I waited to see if the problem would occur again, and it did (seems to happen between 5 minutes and 4 hourly intervals) so this time, I just restarted Shorewall. Sure enough, that cleared the problem.

I’ve since proven that the problem can be fixed reliably by restarting Shorewall, but that is a bit heavy handed on a production server, and I’d like to fix it. I have had a look at the logs, but I’m not so hot on analysing Shorewall logs. The configuration was set up automatically at system build time, so I would hope there was nothing wrong with that.

Has anyone come across this kind of weirdness? Any thoughts? How do you debug Shorewall???

First gamechanger question: did you already considered to “put on standby” for a considerable amount of time (at least 1 week) your VPN arrangemente for SurfShark?.
Are you still in possess of your old Archer D2 router and switch it for enough time as test?

I have a 30 day, money back deal with SurfShark if it is unsatisfactory, and yes, I still have the old router.

UPDATE: I have lost communications again, and this time I couldn’t even ssh to the server, so I have the old router back in the network now because I need a relaible network, even if I can’t run a VPN on it.

The shorewall log might have some clue.
https://shorewall.org/troubleshoot.htm

Whoops…Wasn’t shorewall at all…Fail2Ban had grabbed my router address and banned it. Now fixed by permanently unbanning the router address

3 Likes

Nice hint for other troubles, @Jimbo :wink: