Resolving DNS, DHCP, IP issues

ok some things to check are

  1. That the dns entries ok as in your ns instance is referenced to 192.168.32.6 and nsdc is named differently eg. ad.server.tld and pointing to 192.168.32.7

  2. the dns thats referencing them is port forwarded correctly in your router and if your not using nethserver as dns that the dns is on the same subnet.

  3. check you can ping 1.1.1.1 from your nethserver (check 8.8.8.8 as well)

  4. check you can ping google.com (to check the dns is working and not just the routing

  5. if you have any other dns servers have they got the right ip

  6. since you mentioned using 1.1.1.1 I’m assuming your also using cloudflare if so check your dns listed there points to your static ip

  7. lastly go to ping.eu select port check and test if port 53 is open on your static ip (obviously this may be irrelevant if you don’t intend on having local dns be able to access wan

I’m sure there are more things to check but on more than one occasion i’ve had issues with dns and stuffed around for a few hours only to check and find i changed an ip and forgot to fix it in dns

edit stupid question but have you tried pinging 192.168.32.1

i also noticed above
192.168.32.1 0.0.0.0 255.255.255.255
shouldnt subnetmask be 255.255.255.0
mine for instance is

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         gateway         0.0.0.0         UG    0      0        0 br0
10.1.1.0        0.0.0.0         255.255.255.0   U     0      0        0 br0

Thank you, I need a well-structured approach!

Stupid questions first (stupid questions firstly are on me…): yes I can ping 192.168.32.1 from NS and from other LAN clients. However I would linke to learn a way to check, what devices answers my ping. (as said, its website ip someteimes got answered by NS. In this case, a successful ping from NS - to itself - is a false friend).

ad 1: DNS

  • I have configured nsdc, and it provides a DNS service. This is promoted via DHCP by the router. There’s not much to configure in cockpit. Base DN = dc=ad,dc=server,dc=gsr; LDAP Server URI=ldaps://nsdc-neth.ad.server.gsr; AD IP=192.168.32.7 – I have no idea whether NS uses this DNS to resolve anything but domain authentification. nslookup ad.server.gsr resolves to 192.168.32.7 via 127.0.0.1 from NS and via 192.168.32.7 from other LAN client. Looks good.
  • In cockpit, I set DNS to 9.9.9.9 and 1.1.1.1, as they provide a good service. I do not use other cloudflare services.
  • In cockpit DNS tab, I do have one single entry: router.gsr → 192.168.32.1
  • Hostname of the NS instance is set to neth.server.gsr, CIDR in network tab is set to 192.168.32.6/24, nslookup neth.server.gsr resolves to correct ip via 192.168.32.7 from LAN client.
  • the router has a very small LAN DNS application. This is meant to be kinda fallback. Relevant entries: neth.server.gsr → 192.168.32.6 and router.gsr → 192.168.32.1

ad 2:
I do not really understand the question about port forwarding. DNS services are either on local subnet or in internet, but there are no requests from the internet to the local DNS services.

ad 3:
ping from NS to 1.1.1.1 or 8.8.8.8 gets no response - even no timout warning - That is one of the issues I am struggling with.

ad 4:
I cannot ping google.com from NS - no response. Same when doing nslookup on google.com

gotta leave office now…

Hi @sternkrabbe,

May I suggest:

The router configured as pass-through

† The Gateway for the router can be found with traceroute:

# traceroute google.com
traceroute to google.com (142.251.33.174), 30 hops max, 60 byte packets
 1  lo0-0-lns01-tor2.teksavvy.com (206.248.155.132)  16.155 ms  16.145 ms  16.200 ms
 2  ae0-2111-bdr01-tor.teksavvy.com (206.248.149.8)  11.021 ms  11.667 ms ae1-2121-bdr01-tor2.teksavvy.com (206.248.149.16)  11.579 ms
 3  142.250.168.170 (142.250.168.170)  11.652 ms  12.467 ms  13.295 ms
 4  108.170.250.241 (108.170.250.241)  13.012 ms 108.170.250.225 (108.170.250.225)  13.300 ms *
 5  yyz10s17-in-f14.1e100.net (142.251.33.174)  13.162 ms 142.251.68.26 (142.251.68.26)  13.546 ms 74.125.244.145 (74.125.244.145)  14.094 ms
#

From th router console, in this case 206.248.155.132 is the gateway (line # 1) of the router.

Thank you Andy,

Michel-André

@michelandre

Hi Michel-André

In your little sketch above, the top Router (192.168.32.1) has two Gateways (!) but NO DNS!

Now how you would use Google-DNS or Cloudflare DNS is an intrigueing idea, but I think you meant GW: DHCP and the other two were “DNS”…

My 2 cents
Andy

2 Likes

I found this explanation of router address 0.0.0.0 at a website Linux Route Command Help and Examples :crazy_face:

192.168.1.0 * 255.255.255.0 U 0 0 0 eth0
This shows us how the system is currently configured. If a packet comes into the system and has a destination in the range 192.168.1.0 through 192.168.1.255, then it is forwarded to the gateway *, which is 0.0.0.0 — a special address which represents an invalid or non-existant destination. So, in this case, our system will not route these packets.

The suspicious entry that you noticed would not route requests for the router. Both erntries above it would not route local ip packets via br0 and tungsr. Dont yet know what it means in the end…

… while typing two comments were posted here – I will have a look …

@sternkrabbe

Hi

The IP 0.0.0.0 is often - depending on product / distro - used as placeholder for either

ALL IPs (*)

or

No IPs, or invalid.

Unfortunately, there are No RFCs defining this, so everyone (producers) use as they feel like!

Example:
SonicWalls use 0.0.0.0 in the RoadWarrior VPN configuration as placeholder for dynamic IP on the client side (= ANY)…

My 2 cents
Andy

1 Like

Michael-André:

It seems difficult to me to achieve this or an similar configuration. NS and NSDC are running on one bare metal machine with one single networking interface. So I don’t know how to split blue and green network. I can setup DHCP only in NS, not in NSDC (?).

Andy:
I wish I could fully understand the routing table… it looks simple, but: what is the order of the rules? top-down? Does router 0.0.0.0 here mean any (like passthrough) or none (like dismiss)? Is there a difference with packet origin (like ovpn, the machine itself, incoming via interface)? What internal routes are considered first (before routing table)?

sorry, need some sleep now

@sternkrabbe

Rules are (almost everywhere) always top to bottom…

Have a good rest!

Andy

in your nethserver set the dns on the front page to
192.168.32.1 and 1.1.1.1

I think we have to look somewhere else. Today, the behaviour of the machine changed completely. Assuming I am not sleepwalking, nothing has been changed manually overnight. Lika a cron job cleaning up things. Routing table for example shows exactly the same entries as before.

What else is involved and how can I troubleshoot it? Are there caches? Can they be read? Refreshed?

The DNS entries made in server manager show up in hosts file as remote hosts. Should router.gsr rather be a local host:

#
# 10localhost
#
127.0.0.1       localhost       localhost.localdomain


#
# 20hostname(s)
#
192.168.32.6            neth.server.gsr neth neth.ad.server.gsr



#
# 30hosts_remote
#
192.168.32.1       router.gsr
# remoteBackup
192.168.1.6        dynamic.dns.tld


#
# 40hosts_local
#

content of /etc/sysconfig/network-scripts/route-br0:

192.168.32.1 dev br0
default via 192.168.32.1 dev br0

looks fine, but on this page (https://linoxide.com/how-to-flush-routing-table-from-cache/) I found different syntax for this file (referring to CentOS 7):

GATEWAY0=192.168.1.254
NETMASK0=255.255.255.0
ADDRESS0=192.168.5.0

GATEWAY1=10.10.10.1
NETMASK1= 255.255.255.240
ADDRESS1=10.164.234.132

Can you explain where 192.168.1.6 is on your network?

The way you explained your network, nobody should be able to reach this IP …

Michel-André

also if you don’t mind me asking what is the make/model of your router and do you have a static ip (if you do don’t worry about posting it here) or a dynamic (and are using a ddns service) might not help much but it could help to give a complete picture of the network (at least to me)

Michael-André:
you are very right. I wanted to focus on what could be relevant. This IP, “remotebackup”, is member of a ovpn subnet, a tunnel client. I use it for backup purposes, “remotebackup” is the name of a ssh remote host configuration. Before, I used to contact this backup server via a dyndns service and ssh, now, as this host can no longer be reached by dnamic dns, I created the ovpn tunnel and mapped the former dyndns address to the opvn subnet address. You can see this subnet in the routing table for interface tunbackovpn.

Shane:
I don’t mind, of course: The router is a draytec vigor 9200Vn, a typical SOHO router with a nice feature set. The router defines a subnet inside LAN of a coworking space, which itself connects to the internet via a fritzbox 7540 router. This sounds bit complicated, but since all other clients in my LAN and all other clients in the LAN of the coworking space, do not have any connectivity issues + plus: my setup worked well until last week (and I screwed it up more later on) = I think everything in this direction should be fine. To the internet, there is no static ip, but it can be reached by a dynamic dns name: gartenspielraum.no-ip.org (no TLS cert yet).

Andy:
rule order does not matter in the routing table, as I learned. Usually default route comes first, and it fits to any request - so all other rules never would be used. I found this lucid answer on superuser:

The order in the table doesn’t matter; routes with a longer prefix always take priority. If you stop clinging to netmasks and consider the prefix lengths instead (which ip route shows), you have 123.x.x.128/27 and 123.x.x.151/32 , and the latter – more specific – route will take priority over the former (more generic one).

Status: Today I can ping remote and local from NS without problems. However, I want to learn and I want to understand. It does not feel good to have a configuration that works somehow and somehow can stop working again.

I will look for a tool to draw a picture of the network. Maybe it’s fun to do :slight_smile:

1 Like

The network existed before introducing NS in 2019. L2TP VPN to the router is a leftover from the time before, but still in use.

I am still interested in a real answer to the causes of those connectivity problems. I am very sure, it is something below user network configuration level, like netfilter preprocessing, interface handling … or firewall panicing.

Still not done, I try to solve this chunk-by-chunk.
So one strange thing on my net is related to arp: The NS sometimes responding to the IP address of the router.

“Sometimes” is not a very helpful word. But let’s have a closer look into two scenarios:

constants in both scenarios

  • arp on the router shows no entry for x.1 (itself), points x.6 to NS
  • arp on NS points to routers MAC: router.gsr (192.168.32.1) auf rr:rr:rr:rr:rr:rr [ether] on br0

A: connecting a client via Wi-Fi of the router

  • router.gsr or 192.168.32.1 shows the website of the NS instance in browser on the client
  • arp -a on the client connects x.1 and x.6 to MAC of NS

B: connecting a clint via IPsec VPN of the router

  • router.gsr or x.1 shows website of the router
  • arp -a on the client connects x.1 to MAC of the router

Assuming the normal procedure, every device responds to arp requests only for itself. So NS responds to the wrong IP. In a race condition between NS and router, sometimes NS wins, sometimes does the router.
Or: Does the router promote arp information to the network as well?
Or: it is a MITM attack originating from NS…

O-kay, I think now I got the culprit:

tungsrovpn: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST>  mtu 1500
        inet 192.168.32.1  netmask 255.255.255.0  destination 192.168.32.1
        inet6 fe80::f5e3:ddcd:b771:f71d  prefixlen 64  scopeid 0x20<link>
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 100  (UNSPEC)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3  bytes 144 (144.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

This tunnel once was for testing purposes, if I could share local subnet with VPN clients. It should not be runnning, but I wasn’t aware it cause problems when active. I changed its setting and deactivated it now. I did not know it would grab x.1 without checking, and you cannot see this in the admin interface.

I had already posted this information in #11.

I will report if some of my issues are resolved now.