Nethserver DNS primary and secondary server entries have mixed priorities in dnsmasq?

,

Just upgraded to NS7 and I do have an issue with DNS definition on the nethserver:
Description of Setup:

  • Single Nethserver (Virtual)Machine - working mainly as Mail-Server (with “modified” smarthost feature) + Sogo + Nextcloud
  • My Network has a separate local DNS Server (on different (virtual) machine) running “bind” (IP=10.0.1.8)
  • My local DNS Server has all the local hostnames defined
  • My network has a router as default GW to the Internet, having address 10.0.1.1 - forwarding also DNS requests to the internet DNS servers of my ISP if the local DNS is down or cannot resolve the name.
  • NS7 DHCP is disabled.
  • NS7 DNS should work as forwarder to my existing DNS Server 10.0.1.8

Thus I configured the NS7 DNS entries in the “network” part with
primaryDNS = 10.0.1.8
secondaryDNS= 10.0.1.1

However I noticed that local names cannot be resolved as the DNS request seems not be forwarded to my local primary DNS server.

– from /var/log/messages when restarting the dnsmasq:

systemd: Started DNS caching server..
Dec 28 22:26:26 server1 systemd: Starting DNS caching server....
Dec 28 22:26:26 server1 dnsmasq[2412]: started, version 2.76 cachesize 4000
Dec 28 22:26:26 server1 dnsmasq[2412]: compile time options: IPv6 GNU-getopt DBus no-i18n IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no- DNSSEC loop-detect inotify
Dec 28 22:26:26 server1 dnsmasq-tftp[2412]: TFTP root is /var/lib/tftpboot
Dec 28 22:26:26 server1 dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain spamhaus.org
Dec 28 22:26:26 server1 dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain dnswl.org
Dec 28 22:26:26 server1 dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain uribl.com
Dec 28 22:26:26 server1 dnsmasq[2412]: using nameserver 10.0.1.1#53
Dec 28 22:26:26 server1 dnsmasq[2412]: using nameserver 10.0.1.8#53
Dec 28 22:26:26 server1 dnsmasq[2412]: read /etc/hosts - 2 addresses

so the primary DNS entry appears as second nameserver in the logs. Not sure this is the issue - however in the /etc/dnsmasq.conf file the following has been generated from the templates:

# Don't read /etc/resolv.conf. Get upstream servers only from the
# command line or the dnsmasq configuration file.
no-resolv

# Specify IP address of upstream servers directly. Setting this flag
# does not suppress reading of /etc/resolv.conf, use "no-resolv" to do
# that.
server=10.0.1.8
server=10.0.1.1


# By  default,  dnsmasq  will  send queries to any of the upstream
# servers it knows about and tries to favour servers that are known
# to  be  up.  Uncommenting this forces dnsmasq to try each query
# with  each  server  strictly  in  the  order  they   appear   in
# /etc/resolv.conf
strict-order


# forward RBL queries to localhost unbound
server=/uribl.com/127.0.0.1#10053
server=/dnswl.org/127.0.0.1#10053
server=/spamhaus.org/127.0.0.1#10053

I.e. the nameservers appear in different order here compared how they appear in the message log. As the “strict-order” option is set in the /etc/dnsmasq.conf file this could be important.

output from dig (quering a local host and a remote host):
# dig homeserver2.home.lan www.google.com

; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7_4.1 <<>> homeserver2.home.lan  www.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 4282
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;homeserver2.home.lan.          IN      A

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Dec 28 22:34:44 CET 2017
;; MSG SIZE  rcvd: 38

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55500
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.google.com.                        IN      A

;; ANSWER SECTION:
www.google.com.         166     IN      A       216.58.213.228

;; Query time: 12 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Dec 28 22:34:44 CET 2017
;; MSG SIZE  rcvd: 48

Clearly seen that the lookup for local hostname homeserver2.home.lan does not succeed. (this hostname is only defined on the local separate DNS server 10.0.1.8 )

Then - second scenario:
When I define the primary and secondary DNS servers mixed: i.e.
Primary DNS = 10.0.1.1 (i.e. going directly to internet)
Secondary DNS = 10.0.1.8 (i.e. going to my local DNS server)

Output of /var/log/messages during dnsmasq restart:

Dec 28 22:39:11 server1 systemd: Started DNS caching server..
Dec 28 22:39:11 server1 systemd: Starting DNS caching server....
Dec 28 22:39:11 server1 dnsmasq[2693]: started, version 2.76 cachesize 4000
Dec 28 22:39:11 server1 dnsmasq[2693]: compile time options: IPv6 GNU-getopt DBus no-i18n IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
Dec 28 22:39:11 server1 dnsmasq-tftp[2693]: TFTP root is /var/lib/tftpboot
Dec 28 22:39:11 server1 dnsmasq[2693]: using nameserver 127.0.0.1#10053 for domain spamhaus.org
Dec 28 22:39:11 server1 dnsmasq[2693]: using nameserver 127.0.0.1#10053 for domain dnswl.org
Dec 28 22:39:11 server1 dnsmasq[2693]: using nameserver 127.0.0.1#10053 for domain uribl.com
Dec 28 22:39:11 server1 dnsmasq[2693]: using nameserver 10.0.1.8#53
Dec 28 22:39:11 server1 dnsmasq[2693]: using nameserver 10.0.1.1#53
Dec 28 22:39:11 server1 dnsmasq[2693]: read /etc/hosts - 2 addresses

Here now the 10.0.1.8 server appears first in the logs as nameserver.

extract from /etc/dnsmasq.conf (where however the 10.0.1.8 server is defined as second server - as in the NS7 admin GUI):

# Specify IP address of upstream servers directly. Setting this flag
# does not suppress reading of /etc/resolv.conf, use "no-resolv" to do
# that.
server=10.0.1.1
server=10.0.1.8

Finally the result from DNS lookup query for the second scenario:

# dig  homeserver2.home.lan www.google.com

; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7_4.1 <<>> homeserver2.home.lan  www.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35464
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 2

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;homeserver2.home.lan.          IN      A

;; ANSWER SECTION:
homeserver2.home.lan.   604800  IN      A       10.0.1.63

;; AUTHORITY SECTION:
home.lan.               604800  IN      NS      ns.home.lan.

;; ADDITIONAL SECTION:
ns.home.lan.            604800  IN      A       10.0.1.8

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Dec 28 22:42:35 CET 2017
;; MSG SIZE  rcvd: 98

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31261
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 5

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.google.com.                        IN      A

;; ANSWER SECTION:
www.google.com.         300     IN      A       172.217.19.68

;; AUTHORITY SECTION:
google.com.             44759   IN      NS      ns1.google.com.
google.com.             44759   IN      NS      ns2.google.com.
google.com.             44759   IN      NS      ns4.google.com.
google.com.             44759   IN      NS      ns3.google.com.

;; ADDITIONAL SECTION:
ns1.google.com.         41206   IN      A       216.239.32.10
ns2.google.com.         41206   IN      A       216.239.34.10
ns3.google.com.         41206   IN      A       216.239.36.10
ns4.google.com.         41206   IN      A       216.239.38.10

;; Query time: 29 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Dec 28 22:42:35 CET 2017
;; MSG SIZE  rcvd: 195

It clearly shows that the DNS lookup to the local server is now successful - so the local DNS server is queried first although it is configured as secondary DNS server in the NS7 config.

Is this somehow a problem from dnsmasq or another issue? Maybe somebody knows?
Thanks!

3 Likes

You are right, it seems that dnsmasq uses the last dns forwarder entry as primary one.

/etc/dnsmasq.conf:

server=84.200.69.80 server=208.67.222.222

To put some dnsmasq statistics to /var/log/messages you may do:

kill -USR1 $(pidof dnsmasq)

tail /var/log/messages ... Dec 28 23:10:55 server dnsmasq[31783]: server 208.67.222.222#53: queries sent 38481, retried or failed 4660 Dec 28 23:10:55 server dnsmasq[31783]: server 84.200.69.80#53: queries sent 4660, retried or failed 1

Thanks for confirmation.
However, I did not have this behaviour on my old Nethserver6 machine:

And actually just re-checked on Nethserver6.8:
Putting
primary DNS to 10.0.1.8
secondary DNS to 10.0.1.1

results in following output in /var/log/messages

Dec 29 06:54:12 server1b esmith::event[97506]: Starting dnsmasq: [  OK  ]#015
Dec 29 06:54:12 server1b esmith::event[97506]: [INFO] dnsmasq restart
Dec 29 06:54:12 server1b esmith::event[97506]: Action: /etc/e-smith/events/actions/adjust-services SUCCESS [1.102166]
Dec 29 06:54:12 server1b esmith::event[97506]: Event: nethserver-hosts-save SUCCESS
Dec 29 06:55:28 server1b /sbin/e-smith/db[97712]: /var/lib/nethserver/db/configuration: OLD dns=configuration|NameServers|10.0.1.1,10.0.1.8
Dec 29 06:55:28 server1b /sbin/e-smith/db[97712]: /var/lib/nethserver/db/configuration: NEW dns=configuration|NameServers|10.0.1.8,10.0.1.1
Dec 29 06:55:28 server1b esmith::event[97715]: Event: nethserver-hosts-save 
Dec 29 06:55:28 server1b esmith::event[97715]: expanding /etc/dnsmasq.conf
Dec 29 06:55:28 server1b esmith::event[97715]: expanding /etc/resolv.conf
Dec 29 06:55:28 server1b esmith::event[97715]: expanding /etc/collectd.d/ping.conf
Dec 29 06:55:28 server1b esmith::event[97715]: Action: /etc/e-smith/events/actions/generic_template_expand SUCCESS [0.171011]
Dec 29 06:55:28 server1b esmith::event[97715]: [INFO] service collectd restart
Dec 29 06:55:28 server1b collectd[97552]: Exiting normally.
Dec 29 06:55:28 server1b collectd[97552]: collectd: Stopping 5 read threads.
Dec 29 06:55:28 server1b collectd[97552]: rrdtool plugin: Shutting down the queue thread. This may take a while.
Dec 29 06:55:28 server1b collectd[97552]: ping plugin: Shutting down thread.
Dec 29 06:55:28 server1b collectd[97552]: collectd: Stopping 5 write threads.
Dec 29 06:55:28 server1b collectdmon[97551]: Info: collectd terminated with exit status 0
Dec 29 06:55:28 server1b collectdmon[97551]: Info: shutting down collectdmon
Dec 29 06:55:28 server1b esmith::event[97715]: Stopping collectd: [  OK  ]#015
Dec 29 06:55:28 server1b esmith::event[97715]: Starting collectd: [  OK  ]#015
Dec 29 06:55:28 server1b esmith::event[97715]: [INFO] collectd restart
Dec 29 06:55:28 server1b collectd[97761]: Initialization complete, entering read-loop.
Dec 29 06:55:28 server1b esmith::event[97715]: [INFO] service dnsmasq restart
Dec 29 06:55:28 server1b dnsmasq[97621]: exiting on receipt of SIGTERM
Dec 29 06:55:29 server1b esmith::event[97715]: Shutting down dnsmasq: [  OK  ]#015
Dec 29 06:55:29 server1b dnsmasq[97830]: started, version 2.48 cachesize 4000
Dec 29 06:55:29 server1b dnsmasq[97830]: compile time options: IPv6 GNU-getopt DBus no-I18N DHCP TFTP "--bind-interfaces with SO_BINDTODEVICE"
Dec 29 06:55:29 server1b dnsmasq-tftp[97830]: TFTP root is /var/lib/tftpboot 
Dec 29 06:55:29 server1b dnsmasq[97830]: using nameserver 127.0.0.1#10053 for domain spamhaus.org
Dec 29 06:55:29 server1b dnsmasq[97830]: using nameserver 127.0.0.1#10053 for domain dnswl.org
Dec 29 06:55:29 server1b dnsmasq[97830]: using nameserver 127.0.0.1#10053 for domain uribl.com
Dec 29 06:55:29 server1b dnsmasq[97830]: using nameserver 10.0.1.1#53
>>>Dec 29 06:55:29 server1b dnsmasq[97830]: using nameserver 10.0.1.8#53<<<
Dec 29 06:55:29 server1b dnsmasq[97830]: read /etc/hosts - 2 addresses
Dec 29 06:55:29 server1b esmith::event[97715]: Starting dnsmasq: [  OK  ]#015
Dec 29 06:55:29 server1b esmith::event[97715]: [INFO] dnsmasq restart
Dec 29 06:55:29 server1b esmith::event[97715]: Action: /etc/e-smith/events/actions/adjust-services SUCCESS [0.722721]
Dec 29 06:55:29 server1b esmith::event[97715]: Event: nethserver-hosts-save SUCCESS

As you can see the 10.0.1.8 primary DNS is appearing as the last entry in the syslog.

So this seems to me as a regression issue in the dnsmasq service itself when dnsmasq is reading the entries from the dnsmasq.conf file (as the e-smith template expansion always put the primary DNS server as first server in the dnsmasq.conf file, correctly I would say).

On NS6.8:

# dnsmasq -v
Dnsmasq version 2.48  Copyright (C) 2000-2009 Simon Kelley
Compile time options IPv6 GNU-getopt DBus no-I18N DHCP TFTP "--bind-interfaces with SO_BINDTODEVICE"

This software comes with ABSOLUTELY NO WARRANTY.
Dnsmasq is free software, and you are welcome to redistribute it
under the terms of the GNU General Public License, version 2 or 3.

On NS7:

# dnsmasq -v
Dnsmasq version 2.76  Copyright (c) 2000-2016 Simon Kelley
Compile time options: IPv6 GNU-getopt DBus no-i18n IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify

This software comes with ABSOLUTELY NO WARRANTY.
Dnsmasq is free software, and you are welcome to redistribute it
under the terms of the GNU General Public License, version 2 or 3.

Maybe at least that should be noted somewhere in the docs or in the NS7 GUI.

1 Like

I think I can confirm that dnsmasq behaviour changed in 7, but I can’t find a reference in the changelog. A quick inspection of the code didn’t reveal the change, either.
Before asking dnsmasq developer, I think we need to re-evaluate the usage of the strict-order option (from here):

That’s one reason why “strict-order” is broken and not recommended.

@jrieder, to solve your problem you may try to explicitly tell dnsmasq to query your internal dns for internal host names.
Add a template-custom for dnsmasq.conf (27internal_dns) containing:
server=/home.lan/10.0.1.8

I have a long-standing feature request for a panel to configure these options.

P.S. Impressive bug report.

4 Likes

Great work on this @jrieder
Since this is confirmed I took the liberty to change the topic from support to bug.

1 Like