Nethsecurity Internet Connection State Unknown

NethSecurity Version: 23.05.5-ns.1.4.1
Module: Web GUI (?) / connectivity monitoring

Two Issues Logging into the Web GUI I am presented with
a) An error message that ns.update failed
b) Internet connection status “unknown” ← Probably this is the actual issue

a)

root@:~# /usr/libexec/rpcd/ns.update call check-system-update
{“error”: “connection_error”}

b)

However, the Nethsecurity server is running just fine and is accessible via SSH. Internet connection is present, or I would not be posting this.

I will leave the system in the current state for now, ergo I would be happy to gather diagnostic information on what seems to be a Web GUI issue and not an actual internet connectivity loss.

More Details:

The realtime monitor does not work:

Netdata graphical elements are broken (besides CPU usage which seems to display correctly, the other graphical elements of key system metrics are bugged):

More information:

Via SSH:
[“xxxx” and “XXX” are redacted IPs and not actual IPs - actual IPs are correctly displayed WAN and peer IPs].

The point is, the link state is “UNKNOWN”.

root@castis:~# ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
eth0 UP
eth1 UP
ifb-dns UNKNOWN xxxx::xxxx:xxxx:xxxx:xxxx/64
br-lan UP 192.168.0.1/24
pppoe-wan UNKNOWN XXX.XXX.XXX.XXX peer XXX.XXX.XXX.XXX/32 xxxx::xxxx:xxxx:xxxx:xxxx peer xxxx::xxxx:xxxx:xxxx:xxxx/128
ifb-pppoe-wan UNKNOWN xxxx::xxxx:xxxx:xxxx:xxxx/64

So, questions I have for those who know more (you kind folks):

The PPPOE client, the firewall, the routing, are working, as here I am with internet connectivity. It seems the OpenWRT side of things (LUCI is that what you call it?) broke.

Can I re-start the LUCI / GUI management service without rebooting the entire Nethsecurity system?

And no, it wouldn’t be terrible for me to reboot the Nethsecurity server, I can do that. And I will do that soon. =)

Just talking to myself here:

rebooted the Netsecurity VM - the problem persists. DNS isn’t working (the NS isn’t supplying DNS and it’s not making a connection to the upstream DNS provider.)

I stopped NS and started up the Arista NGFW instance I have been using for the last few years until this experiment with NS. Here everything is fine.

So, I can say my ISP, GPON, switching, the KVM Server (host is RHEL 9) and the rest of the infrastructure is OK (being entirely unchanged), but the Nethsecurity system decided today that it doesn’t want to work anymore.

I will start the NS VM behind the Arista NGFW (so NS will no longer route for anything, and also not make the PPPOE connection, but NS will not know that) and I will continue to have a look at it to see why it it stopped working spontaneously.

Perhaps I will create a new VM and boot the unconfigured disk image and do a restore of the configuration and see at what happens.

Again, I am quite happy to deliver logs or other information, but I don’t know specifically what is advantageous to look at in particular for clues.

1 Like

Please check and share the logs. You can get them on the logs page in the UI or from /var/log/messages.
If there are too much entries, you could use an online service like pastebin or github gists to share the logs.

Hi Markus

thanks for having a look.
I have switched browsers (Firefox, Chrome) and the issue persists. The VM has been re-started, also the issue persists.

Again, the issue is: even though Nethsecurity makes the PPPOE connection, and WAN is up and running, the Web GUI reports that “Internet Connections Unknown” and any functionality dealing with the internet connection (reports, stats, status, update check, blocklist download, etc.) doesn’t work.

To the Logs:
This sticks out:

dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:28:53 castis dnsmasq[1]: FAILED to start up
procd: Instance dnsmasq::cfg01411c s in a crash loop 6 crashes, 0 seconds since last crash

Wow, so dnsmasq is broken because… there is a duplicate IP defined as both a DHCP static lease and as a separate DNS entry?! So NAT is broken and DNS is broken and everything? Because of a local DHCP static lease???

Here all the logs from bootup (skipping the normal kernel stuff, which I can also provide if they are useful):

Feb 16 01:28:53 ns_server kernel: [ 4.429341] xt_time: kernel timezone is -0000
Feb 16 01:28:53 ns_server kernel: [ 4.445231] PPP generic driver version 2.4.2
Feb 16 01:28:53 ns_server kernel: [ 4.446778] NET: Registered PF_PPPOX protocol family
Feb 16 01:28:53 ns_server kernel: [ 4.448473] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
Feb 16 01:28:53 ns_server kernel: [ 4.449347] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
Feb 16 01:28:53 ns_server kernel: [ 6.615549] 8021q: adding VLAN 0 to HW filter on device eth0
Feb 16 01:28:53 ns_server kernel: [ 6.616405] br-lan: port 1(eth0) entered blocking state
Feb 16 01:28:53 ns_server kernel: [ 6.616995] br-lan: port 1(eth0) entered disabled state
Feb 16 01:28:53 ns_server kernel: [ 6.617664] device eth0 entered promiscuous mode
Feb 16 01:28:53 ns_server kernel: [ 6.618471] br-lan: port 1(eth0) entered blocking state
Feb 16 01:28:53 ns_server kernel: [ 6.619112] br-lan: port 1(eth0) entered forwarding state
Feb 16 01:28:53 ns_server kernel: [ 6.621709] 8021q: adding VLAN 0 to HW filter on device eth1
Feb 16 01:28:53 ns_server dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:28:53 ns_server dnsmasq[1]: FAILED to start up
Feb 16 01:28:53 ns_server dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:28:53 ns_server dnsmasq[1]: FAILED to start up
Feb 16 01:28:54 ns_server netifyd[5549]: Netify Agent/4.4.3 (openwrt; x86_64; conntrack; netlink; dns-cache; plugins; regex)
Feb 16 01:28:54 ns_server netifyd[5549]: Unable to hash file: /etc/netify.d/netify-sink.conf: No such file or directory
Feb 16 01:28:54 ns_server netifyd[5549]: Legacy category format detected: /etc/netify.d/netify-categories.json
Feb 16 01:28:54 ns_server netifyd[5549]: Legacy category format detected: /etc/netify.d/netify-categories.json
Feb 16 01:28:54 ns_server netifyd[5549]: Error opening directory: /etc/netify.d/domains.d: No such file or directory
Feb 16 01:28:54 ns_server netifyd[5549]: np-nfa: Netify Agent Flow Actions Plugin, v1.0.13
Feb 16 01:28:54 ns_server netifyd[5549]: np-nfa: Copyright (C) 2022 eGloo Incorporated.
Feb 16 01:28:54 ns_server netifyd[5549]: np-nfa: flow action targets: ctlabel, log, nftset
Feb 16 01:28:54 ns_server kernel: [ 11.869836] device br-lan entered promiscuous mode
Feb 16 01:28:54 ns_server kernel: [ 11.870757] device eth1 entered promiscuous mode
Feb 16 01:28:54 ns_server netifyd[5549]: np-stats: Netify Agent Stats Plugin v1.0.17 (C) 2021 eGloo Incorporated.
Feb 16 01:28:56 ns_server dpireport[3750]: INFO: Connected to socket
Feb 16 01:28:56 ns_server netifyd[5549]: Unhandled signal: RT35
Feb 16 01:28:58 ns_server dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:28:58 ns_server dnsmasq[1]: FAILED to start up
Feb 16 01:29:02 ns_server netifyd[5549]: nap-api-update: Error: 6
Feb 16 01:29:03 ns_server dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:29:03 ns_server dnsmasq[1]: FAILED to start up
Feb 16 01:29:04 ns_server pppd[2833]: Timeout waiting for PADO packets
Feb 16 01:29:04 ns_server pppd[2833]: Unable to complete PPPoE Discovery
Feb 16 01:29:04 ns_server pppd[2833]: Exit.
Feb 16 01:29:04 ns_server netifd: Interface 'wan' is now down
Feb 16 01:29:04 ns_server netifd: Interface 'wan' is setting up now
Feb 16 01:29:04 ns_server pppd[5662]: Plugin pppoe.so loaded.
Feb 16 01:29:04 ns_server pppd[5662]: PPPoE plugin from pppd 2.4.9
Feb 16 01:29:04 ns_server pppd[5662]: pppd 2.4.9 started by root, uid 0
Feb 16 01:29:04 ns_server pppd[5662]: PPP session is 174
Feb 16 01:29:04 ns_server pppd[5662]: Connected to xx:xx:xx:xx:xx:xx via interface eth1
Feb 16 01:29:04 ns_server kernel: [ 22.199902] pppoe-wan: renamed from ppp0
Feb 16 01:29:04 ns_server pppd[5662]: Renamed interface ppp0 to pppoe-wan
Feb 16 01:29:04 ns_server pppd[5662]: Using interface pppoe-wan
Feb 16 01:29:04 ns_server pppd[5662]: Connect: pppoe-wan <--> eth1
Feb 16 01:29:06 ns_server nginx: ::1 - - [16/Feb/2025:01:29:06 +0100] "GET /api/v2/stats HTTP/1.1" 400 255 "-" "python-urllib3/2.0.4"
Feb 16 01:29:08 ns_server pppd[5662]: Remote message: SRU=XXXXXX#SRD=XXXXXX#
Feb 16 01:29:08 ns_server pppd[5662]: PAP authentication succeeded
Feb 16 01:29:08 ns_server pppd[5662]: peer from calling number XX:XX:XX:XX:XX:XX authorized
Feb 16 01:29:08 ns_server pppd[5662]: local IP address XX.XXX.XX.XXX
Feb 16 01:29:08 ns_server pppd[5662]: remote IP address XX.XXX.XXX.XXX
Feb 16 01:29:08 ns_server pppd[5662]: primary DNS address XXX.XXX.XXX.XXX
Feb 16 01:29:08 ns_server pppd[5662]: secondary DNS address XXX.XXX.XXX.XXX
Feb 16 01:29:08 ns_server pppd[5662]: local LL address XXXX::XXXX:XXXX:XXXX:XXXX
Feb 16 01:29:08 ns_server pppd[5662]: remote LL address XXXX::XXXX:XXXX:XXXX:XXXX
Feb 16 01:29:08 ns_server netifd: Network device 'pppoe-wan' link is up
Feb 16 01:29:08 ns_server netifd: Interface 'wan' is now up
Feb 16 01:29:08 ns_server netifd: Network alias 'pppoe-wan' link is up
Feb 16 01:29:08 ns_server netifd: Interface 'wan_6' is enabled
Feb 16 01:29:08 ns_server netifd: Interface 'wan_6' has link connectivity
Feb 16 01:29:08 ns_server netifd: Interface 'wan_6' is setting up now
Feb 16 01:29:08 ns_server qosify: start interface pppoe-wan
Feb 16 01:29:08 ns_server mwan3-hotplug[6032]: mwan3 hotplug on wan not called because interface disabled
Feb 16 01:29:08 ns_server firewall: Reloading firewall due to ifup of wan (pppoe-wan)
Feb 16 01:29:08 ns_server firewall: Reloading firewall due to ifupdate of wan (pppoe-wan)
Feb 16 01:29:08 ns_server ddns-scripts[6259]: myddns_ipv4: PID '6259' started at 2025-02-16 01:29
Feb 16 01:29:08 ns_server ddns-scripts[6259]: myddns_ipv4: Service section disabled! - TERMINATE
Feb 16 01:29:08 ns_server ddns-scripts[6259]: myddns_ipv4: PID '6259' exit WITH ERROR '1' at 2025-02-16 01:29
Feb 16 01:29:08 ns_server dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:29:08 ns_server dnsmasq[1]: FAILED to start up
Feb 16 01:29:08 ns_server procd: Instance dnsmasq::cfg01411c s in a crash loop 6 crashes, 0 seconds since last crash
Feb 16 01:29:10 ns_server banIP-[4157]: start banIP processing (boot)
Feb 16 01:29:10 ns_server banIP-[4157]: add uplink 'XX.XXX.XX.XXX/32' to local allowlist
Feb 16 01:29:10 ns_server banIP-[4157]: add uplink 'XXXX::XXXX:XXXX:XXXX:XXX/128' to local allowlist
Feb 16 01:29:10 ns_server banIP-[4157]: initialize banIP nftables namespace
Feb 16 01:29:10 ns_server banIP-[4157]: start banIP download processes
Feb 16 01:29:23 ns_server login[1140]: root login on 'tty1'
Feb 16 01:29:23 ns_server dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:29:23 ns_server dnsmasq[1]: FAILED to start up
Feb 16 01:29:28 ns_server dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:29:28 ns_server dnsmasq[1]: FAILED to start up
Feb 16 01:29:33 ns_server dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:29:33 ns_server dnsmasq[1]: FAILED to start up
Feb 16 01:29:38 ns_server dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:29:38 ns_server dnsmasq[1]: FAILED to start up
Feb 16 01:29:43 ns_server dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:29:43 ns_server dnsmasq[1]: FAILED to start up
Feb 16 01:29:44 ns_server adblock-4.1.5[2665]: dns backend 'dnsmasq' not running or executable
Feb 16 01:29:46 ns_server dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:29:46 ns_server dnsmasq[1]: FAILED to start up
Feb 16 01:29:51 ns_server dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:29:51 ns_server dnsmasq[1]: FAILED to start up
Feb 16 01:29:56 ns_server dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:29:56 ns_server dnsmasq[1]: FAILED to start up
Feb 16 01:30:00 ns_server crond[2944]: USER root pid 7826 cmd /usr/bin/ns-objects-reload-dns
Feb 16 01:30:00 ns_server crond[2944]: USER root pid 7827 cmd sleep $(( RANDOM % 60 )); /usr/sbin/send-heartbeat
Feb 16 01:30:01 ns_server dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:30:01 ns_server dnsmasq[1]: FAILED to start up
Feb 16 01:30:06 ns_server dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:30:06 ns_server dnsmasq[1]: FAILED to start up
Feb 16 01:30:11 ns_server dnsmasq[1]: duplicate dhcp-host IP address 192.168.0.216 at line 58 of /var/etc/dnsmasq.conf.cfg01411c
Feb 16 01:30:11 ns_server dnsmasq[1]: FAILED to start up
Feb 16 01:30:11 ns_server procd: Instance dnsmasq::cfg01411c s in a crash loop 6 crashes, 0 seconds since last crash
Feb 16 01:30:28 ns_server nginx: 192.168.0.11 - - [16/Feb/2025:01:30:28 +0100] "GET / HTTP/1.1" 200 391 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
Feb 16 01:30:28 ns_server nginx: 192.168.0.11 - - [16/Feb/2025:01:30:28 +0100] "GET /assets/index-BtOPPq6e.js HTTP/1.1" 200 1019369 "https://192.168.0.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
Feb 16 01:30:28 ns_server nginx: 192.168.0.11 - - [16/Feb/2025:01:30:28 +0100] "GET /assets/index-DxM9mes3.css HTTP/1.1" 200 86250 "https://192.168.0.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
Feb 16 01:30:28 ns_server nginx: 192.168.0.11 - - [16/Feb/2025:01:30:28 +0100] "GET /branding.js HTTP/1.1" 200 244 "https://192.168.0.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
Feb 16 01:30:28 ns_server nginx: 192.168.0.11 - - [16/Feb/2025:01:30:28 +0100] "GET /login_logo.svg HTTP/1.1" 200 9628 "https://192.168.0.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
Feb 16 01:30:28 ns_server nginx: 192.168.0.11 - - [16/Feb/2025:01:30:28 +0100] "GET /assets/StandaloneDashboardView-DTcv6zW7.js HTTP/1.1" 200 17678 "https://192.168.0.1/assets/index-BtOPPq6e.js" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
Feb 16 01:30:28 ns_server nginx: 192.168.0.11 - - [16/Feb/2025:01:30:28 +0100] "GET /Poppins-Regular.ttf HTTP/1.1" 200 158240 "https://192.168.0.1/assets/index-DxM9mes3.css" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
Feb 16 01:30:28 ns_server nginx: 192.168.0.11 - - [16/Feb/2025:01:30:28 +0100] "GET /favicon.ico HTTP/1.1" 200 4286 "https://192.168.0.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
Feb 16 01:30:30 ns_server banIP-[4157]: download for feed 'allowlist' failed (rc: 6/log: curl: (6) Could not resolve host: bl.nethesis.it#012curl: (6) Could not resolve host: bl.nethesis.it#012curl: (6) Could not resolve host: bl.nethesis.it#012curl: (6) Could not resolve host: bl.nethesis.it#012curl: (6) Could not resolve host: bl.nethesis.it#012curl: (6) Could not resolve host: bl.nethesis.it)
Feb 16 01:30:30 ns_server banIP-[4157]: start banIP domain lookup
Feb 16 01:30:30 ns_server banIP-[4157]: domain lookup finished in 0m 0s (blocklist, 0 domains, 0 IPs)
Feb 16 01:30:30 ns_server banIP-[4157]: domain lookup finished in 0m 0s (allowlist, 0 domains, 0 IPs)
Feb 16 01:30:30 ns_server banIP-[4157]: start detached banIP log service (/usr/bin/tail)
Feb 16 01:30:35 ns_server nethsecurity-api[4514]: nethsecurity_api 2025/02/16 01:30:35 middleware.go:77: [INFO][AUTH] authentication success for user root from 192.168.0.11
Feb 16 01:30:35 ns_server nethsecurity-api[4514]: nethsecurity_api 2025/02/16 01:30:35 middleware.go:185: [INFO][AUTH] login response success for user root
Feb 16 01:30:35 ns_server nginx: 192.168.0.11 - - [16/Feb/2025:01:30:35 +0100] "POST /api/login HTTP/1.1" 200 256 "https://192.168.0.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
Feb 16 01:31:08 ns_server nginx: 192.168.0.11 - - [16/Feb/2025:01:31:08 +0100] "POST /api/2fa/otp-verify HTTP/1.1" 200 237 "https://192.168.0.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
Feb 16 01:31:08 ns_server nginx: 192.168.0.11 - - [16/Feb/2025:01:31:08 +0100] "GET /logo_light.svg HTTP/1.1" 200 9679 "https://192.168.0.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
Feb 16 01:31:08 ns_server nethsecurity-api[4514]: nethsecurity_api 2025/02/16 01:31:08 middleware.go:169: [INFO][AUTH] authorization success for user root. POST /api/ubus/call {"path":"uci","method":"changes","payload":{}}
Feb 16 01:31:08 ns_server nethsecurity-api[4514]: nethsecurity_api 2025/02/16 01:31:08 middleware.go:169: [INFO][AUTH] authorization success for user root. POST /api/ubus/call {"path":"system","method":"board","payload":{}}
Feb 16 01:31:08 ns_server nginx: 192.168.0.11 - - [16/Feb/2025:01:31:08 +0100] "POST /api/ubus/call HTTP/1.1" 200 281 "https://192.168.0.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
Feb 16 01:31:08 ns_server nginx: 192.168.0.11 - - [16/Feb/2025:01:31:08 +0100] "POST /api/ubus/call HTTP/1.1" 200 88 "https://192.168.0.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"

Solution:

→ Deleted the duplicate IP address in the static DHCP leases list.

Everything works again.

SERIOUS ISSUE: by accidentally adding a static DHCP lease with an already defined IP, the entire firewall server broke. Specifically, it broke dnsmasq. You can see it in the logs.

Can this be safeguarded against? Either check for duplicate IPs when the admin tries to add a new static lease and prevent that, or… make DNSMASQ more robust?

Edit: The likelyhood of accidentally adding duplicate DNS entries and/or duplicate DHCP static leases is high, since there is no way to tell if what you’re typing in for a name or an IP isn’t already defined.

It would be great if you could atleast sort by IP and sort by name, so you can manually cross-check the entries.

…Or Nethsecurity checks name + IPs when you make new entries and blocks duplicates. A feature suggestion. =)

2 Likes

Hi again!
It’s weird that this happens, due to the check for duplicate already there.

Could you give a bit more details on how you added the static leases?

1 Like

Hi Tommaso,

I am not exactly sure, let me recreate the scenario:

To test Nethsecurity, I had to a) copy all DNS entries b) copy all static leases from my normally running firewall (Arista NGFW) (also running in a KVM VM, which is interesting as I can compare the resource requirements and performance of these different solutions - maybe I’ll write briefly about that some day).

I joined this community to find a way to automate the en masse import of DNS entries and static leases, via script or other solution. There was no ready solution that I found, because it’s not as easy as just adding entries into a config file. DNSMASQ makes it difficult. :wink:

So, I spent some quality time cut and pasting between two GUIs, in this order: first addeded all DNS entries, then added all static leases, in the WEB GUI.

I had questions while doing this (I’d be happy to know the answer, but I can guess what it is, seeing the error that happened): I did not know if you need to have static leases exclusively in the IP range(s) defined in the DHCP system, or not (context: Arista NGFW does not require this, but pfSense does.) Also, I noticed that by adding a static lease, the DNS entry is created for you as part of the input process for the lease - an interesting implementation. At this point I wondered what happens to all those DNS entries I had previously created, which already match the static leases I was entering (because Arista NGFW does it this way: static leases are just MAC + IPs, you create the DNS entries separately along with all the other DNS entries you defined.)

And what seemed to happen is that for all but this one static lease, the previously entered DNS entries were either overwritten (with the same value) or they just got used as they already existed, without conflict. [That is the question, do they get overwritten or not?]

tl;dr: This one static lease had an error: the existing DNS entry did not match with the hostname + IP I added in the static lease. This created a second DNS entry with the same name but with the other IP that I added via static leases. ← This seems to be the origin of the failure.

I am ofcourse shocked that dnsmasq will break entirely and go into a crash loop if duplicate DNS entries are defined. And if dnsmasq breaks, the router / firewall breaks. I learned my lesson this once, but I don’t know of all the other ways you can break dnsmasq - do we have a list of things to never ever configure in dnsmasq upon pain of death (or broken firewall)?

Looking at the duplicate check you linked to (thanks!):

def add_static_lease(args):
    u = EUci()
    if is_reserved(u, args["macaddr"]):
        return utils.validation_error("mac", "mac_already_reserved", args["macaddr"])

It is only checking for duplicate reserved MAC addresses. This does not prevent adding a static lease that is given an DNS entry that has already been defined - which will cause DNSMASQ crash loop.

Perhaps need to also check if IP and also Name is reserved.

Hmm, something like…

    if is_reserved(u, args["macaddr"]):
        return utils.validation_error("mac", "mac_already_reserved", args["macaddr"])
    if is_reserved(u, args["ipaddr"]):
        return utils.validation_error("ip", "ip_already_reserved", args["ipaddr"])
    if is_reserved(u, args["hostname"]):
        return utils.validation_error("name", "host_already_reserved", args["hostname"])

Because static leases define all three properties: host, ip, mac, all of them have to be checked if they already exist.

But ofcourse
def is_reserved(u, mac, exclude_lease_id=''):
is not setup to handle anything but macaddr.

Another idea: if dnsmasq encounters a duplicate DNS entry: how about ignoring it, log a warning, and keep going it instead of crashing? =)

Dnsmasq is not at all under our control :smiley:
We just provide configuration through UCI that then gets pretty compiled in configuration files for it to handle.

It seems you are the first to spot this issue, I’ve never heard of such a thing, even with all the installations live at the moment (and trust me, tickets internally are not few :stuck_out_tongue: )

If in any case the issue arises again, I’ll patch it somehow (or you can contribute to the project if you like :wink: )

At the moment I’m a bit behind schedule with the migration from 23.05 to 24.10, can’t say I have time to spare with this unfortunately. If you don’t have time or want to directly contribute to the project, even opening an issue is an option.

Ty!

Hi Tommaso,

thanks for the reply. Probably this would only ever come up in my exact circumstances, specifically the ordered manual migration of a set of DNS entries and DHCP static leases, without experience with the static lease input mechanism in OpenWRT / Netsecurity.

The issue is resolved for me, I found the error and corrected it. The discussion here is simply about a unhandled exception in dnsmasq and the lack of checking that could prevent it.

But as I said, I am currently (successfully) evaluating Nethsecurity, it is functioning (again) just fine.

So it’s all good! Maybe I’ll have a look at the input checking code some more. But it is not critical. For now, I look forward to a NethSec based on 24.10! :wink:

1 Like