Possible DNS Lookup Issue

NethServer Version: 7.9
Module: Various

Hi,

I am experiencing an odd problem with my Nethserver install and I am not sure where to start with the diagnosis process.

Almost daily now, my Nethserver stops accepting SMTP email and I am unable to connect to the Cockpit via my external host name. i.e. nethserver.mydomain.com:9090/nethserver.

I can still connect to the webserver on 80/433 via nethserver.mydomain.com and I am able to connect to cockpit via local IP address.

A restart of the nethserver VM resolves the issue but is not an ideal solution.

My feeling is this is some kind of dns lookup related problem but I have no idea if I am right and where to start looking for possible errors.

Where should I start looking?

Thanks

John

@kiemosan

Hi John

A bit more info might help others to help you…

You talk about a VM (All my 30+ NethServers are VMs in Proxmox) - but not a word WHAT Hypervisor you’re using. Hypervisors are FAR from equal.

Second: I do not understand this:

This sounds extremly strange? Why do you access your Server from it’s external IP?

I always use either DNS Name or internal IP… BUT ALL DNS Names resolve to internal IPs!
(I have my Internal DNS on NethServer and my OPNsense firewall).

Even if connecting with VPN, the DNS Name of my NethServer resolves internally.
Only when connecting without VPN from external do I use an external IP (Given by External DNS).

Also you don’t say if this VM is on premises - or is it hosted at some Provider / Hoster?

My 2 cents
Andy

Hi Andy,

Hypervisor is ESXi 6.7, on-premise hosted. This is my own personal system/setup at home but I also run ESXi at work on an 8 node cluster :wink:

Nethserver is not my primary router (I use Google WiFi router mesh system at home) and nethserver does not handle DNS lookups on my internal network so I can’t operate like that. So when accessing my nethserver via its dns host name, it goes via external ip and port forwarded back to my nethserver on an additional NIC. This is down to Google WiFi insistance on only port forwarding to IPs it has assigned via DHCP. Its an unusual setup but it does work well for me.

Nethserver will not be aware of any of this networking.

@Andy_Wismer So this is what I am doing

Only when connecting without VPN from external do I use an external IP (Given by External DNS).

And this is my point.

  1. Browsing the main default website i.e. nethserver.mydomain.com works ok.
  2. Trying to access cockpit via nethserver.mydomain.com:9090/nethserver does not work.
  3. Mail does not get accepted via port 25 from my mail provider.

However…

Rebooting nethserver resolves points 2 and 3 above.

Hi

I’ve used VMWare extremly intensive in the past, from the first Beta before 2000! I moved to Proxmox about 7 years ago and wouldn’t consider moving back to ESXi or any of the other majors.

But I can confirm that NethServer works well on ESXi, if installed correctly with “promisious mode” active on ESXi (If just for the NethServer AD).

I also NEVER use NethServer (Except in Hosted environments, where I don’t have other options) as my Router / Firewall, I use OPNsense, at home and for all my clients.

As such, I assume that the issue stems from how your Google WiFi / Mesh Router handles IPs and Port-Forwarding, including the so-called “Hairpin”, where you access an internal server using it’s external IP.

Specifically, I assume that some stack/heap or whatever in the Networking of the router filles up and it drops connection to the external IP. Or causes the NethServer to internally fill up some internal stacks or buffers.

This would explain why the reboot of NethServer helps…

Are there any information in NethServer logs at the time?

And: Can the server still be accessed from the outside? (Eg using a mobile phones Hotspot for testing)?

My 2 cents
Andy

The Google WiFi has never needed a reboot so I don’t think the issue is there. I’ve run this config for several years and it has remained robust until recently. This is a personal setup and the Google WiFi gives me several advantages for managing the wifi at home which I like.

Clearly, Nethserver is having a problem of some kind. Which logs would be best to look at?

Yes but only the webserver default site is accessible. As mentioned above, Cockpit on port 9090 becomes inaccessible and port 25 does not accept any mail.

Does anything change if you reboot the Google Router? (When the issue crops up?) - instead of rebooting the NethServer right away…

Not all effects are seen locally, some effects may “wander”, especially if it’s the router causing it…

In the days of Win2000, the easiest method to kill the server was to set a Windows 9x Hostname to the Domainname of the Windows Server… A good example of a wrong config on a Win9x PC, but the effect was seen on the Server… :slight_smile:

My 2 cents
Andy

I’ve not tried that. I’d have to check the next time it happens. What logs should I be looking at the next time it happens on nethserver also?

@kiemosan the lookup issue should happen only… outside your network. So until the pubblic IP is the same registered into your A DNS record on your provider… should be fine.

Did you also updated the reverse PTR record?

@pike No, I use a domestic dynamic IP internet provider at home and use a dynamic DNS setup consequently.

I have a cname in place to map to my dynamic hostname from my primary domain.

So nethserver.mydomain.com >> myhost.dynamicdns.com >> ISP dynamic IP.

I say its a lookup issue simply because I don’t know what actually is happening on Nethserver to cause it :slight_smile:

Which device updates myhost.dynamicdns.com?
I know that NethServer has ddns client among it’s packages, buy i don’t know if it’s your router or simply a server for you network…

Another VM on my home LAN has the dynamic DNS client running on it.

Nethserver is just a server for mail on my LAN. Routing handled as described above.

@kiemosan

Hi John
Sorry, I have to leave to visit a client, a Win10 is suddenly having issues, trying to login with a wrong Network User, even though that PC is in the AD domain… :frowning:

My 2 cents
Andy

No worries, the problem isn’t happening right now.

Just let me know what log files to trawl.

Check /var/log/messages - probably the most interesting, as you don’t use a firewall on NethServer.

Maybe also /var/log/mail, to see why mail fails…

My 2 cents
Andy

Hi,

I rebooted my WiFi / Firewall and the Nethserver and the problem appeared to go away until yesterday. Rebooting the network again has not solved the issue so I’ve pulled the logs before I reboot again.

journalctl -u cockpit gives the following

Nov 27 20:55:13 nethserver.itwerx.co.uk systemd[1]: Starting Cockpit Web Service...
Nov 27 20:55:18 nethserver.itwerx.co.uk remotectl[1780]: /usr/bin/chcon: can't apply partial context to unlabeled file ‘/etc/cockpit/ws-certs.d/99-nethserver.cert’
Nov 27 20:55:18 nethserver.itwerx.co.uk remotectl[1780]: remotectl: couldn't change SELinux type context 'etc_t' for certificate: /etc/cockpit/ws-certs.d/99-nethserver.cert: Child process exited with code 1
Nov 27 20:55:18 nethserver.itwerx.co.uk systemd[1]: Started Cockpit Web Service.
Nov 27 20:55:19 nethserver.itwerx.co.uk cockpit-ws[1831]: Using certificate: /etc/cockpit/ws-certs.d/99-nethserver.cert
Nov 28 13:30:35 nethserver.itwerx.co.uk systemd[1]: Starting Cockpit Web Service...
Nov 28 13:30:35 nethserver.itwerx.co.uk remotectl[15132]: /usr/bin/chcon: can't apply partial context to unlabeled file ‘/etc/cockpit/ws-certs.d/99-nethserver.cert’
Nov 28 13:30:35 nethserver.itwerx.co.uk remotectl[15132]: remotectl: couldn't change SELinux type context 'etc_t' for certificate: /etc/cockpit/ws-certs.d/99-nethserver.cert: Child process exited with code 1
Nov 28 13:30:35 nethserver.itwerx.co.uk systemd[1]: Started Cockpit Web Service.
Nov 28 13:30:35 nethserver.itwerx.co.uk cockpit-ws[15136]: Using certificate: /etc/cockpit/ws-certs.d/99-nethserver.cert
Nov 28 13:31:36 nethserver.itwerx.co.uk cockpit-session[15588]: pam_ssh_add: Failed adding some keys
Nov 28 13:31:41 nethserver.itwerx.co.uk cockpit-ws[15136]: logged in user session
Nov 28 13:31:42 nethserver.itwerx.co.uk cockpit-ws[15136]: New connection to session from 192.168.1.131
Nov 28 13:34:55 nethserver.itwerx.co.uk cockpit-ws[15136]: New connection to session from 192.168.1.131

I have attached mail and messages logs in linked zip file Logs

Hope someone can help me understand whats going on.

Thanks
John