Nethserver stops working over night


(Istvan) #1

NethServer Version: 7.4.1708

Hello,

I have a fresh Nethserver install and during the night the server stops responding, it hangs.
I don’t have physical access to the server, someone has to reboot it for me in the morning.
The server is a new Dell Poweredge T130 with four 1TB drives configured in software RAID1(boot) and RAID10(the rest) and 8GB RAM.
The server has Samba server and router roles(DHCP, DNS, firewall, MultiWAN).
I tried to configure WAN failover and shorewall crashed. Even in this moment it says “Check firewall rules The firewall is NOT running”. Last time I tried to reboot the server remotely it hanged.
I would like to know where should I start the troubleshooting.

Istvan


(Istvan) #2

I disabled the 2nd WAN connection and Shorewall started.


(Stéphane de Labrusse) #3

Check in log if you can catch something before the server hangs
Check a hardware cause, like RAM.


(Istvan) #4

Which log should I check?


(Stéphane de Labrusse) #5

particularly /var/log/messages but you could have a clue with other…check also that your log is not contained only events from the boot


Nethserver stopped working
(Istvan) #6

There are a lot of messages like this before hanging:
Feb 3 00:18:19 server kernel: IPv4: martian source 46.97.27.250 from 46.97.27.249, on dev p3p1
Feb 3 00:18:19 server kernel: ll header: 00000000: ff ff ff ff ff ff 00 6c bc ef 5e 2e 08 06 …l…^…

These are related to the 2nd WAN which I disabled to be able to start Shorewall.


(Istvan) #7

After I disabled the 2nd WAN this problem stopped.


In proxmox after reboot, autostart, can't access dashboard
(Istvan) #8

This morning the server was not accessible again. This is my 2nd Nethserver and there are way too much stability problems with it. I need to find out if the stability problems are hardware related or Nethserver related. Is not normal to have weekly server hanging issues.

This morning I found this on the screen.


Any suggestions?

Istvan!


(Jeroen Visser) #9

Could you give some more info about the setup? Is this your own WAN IP for instance? What is p1p3’s role in the network? What is the servers task?


(Marc) #10

There is an upstream bug open, not sure it is the one affecting your server:
https://bugs.centos.org/view.php?id=13843#c31121


(Filippo Carletti) #11

I think that @dnutan has found the right answer.
@adv, could you please reboot your server using kernel 3.10.0-514.26.2.el7.x86_64 just to confirm that the problem goes away?
This seems to be a regression coming from Redhat, we use to trust them.


(Istvan) #12

For this issue I supposed to open a new topic, but the forum administrators are closing them. They say the issues are related to this topic.
The thing is that I’m facing a lot of stability issues with this Nethserver installation. Actually since I use Linux in the past 15 years I never had stability issues.
I assume this must be a hardware issue, but I can’t prove it yet.

About this issue I disabled the auditd service, like is written on this link:

Now I wait…


(Filippo Carletti) #13

You are using CentoOS 7, I don’t think that instructions for CentOS 6 are relevant to your problem.
Does your system match what is reported in the CentOS bugtracker?


(Istvan) #14

Today I the server was blocked again. The kernel version is: 3.10.0-693.17.1.el7.x86_64
Apparently in this version the bug reported here persists: https://bugs.centos.org/view.php?id=13843#c31121
In this thread is mentioned that this bug is fixed in version: 3.10.0-820.el7, but I can’t find this version.
Instead I found that the kernel 3 is EOL(https://www.kernel.org/) but it can be upgraded to 4.4 or 4.15.(https://www.howtoforge.com/tutorial/how-to-upgrade-kernel-in-centos-7-server/)
So what do you think? Should I upgrade to 4.4 or 4.15? Or is better to revert to 3.10.0-514.26.2.el7.x86_64 ?


(Istvan) #15

I just discovered I can’t revert to 3.10.0-514.26.2.el7.x86_64. The available versions are:
0 : CentOS Linux (3.10.0-693.17.1.el7.x86_64) 7 (Core)
1 : CentOS Linux (3.10.0-693.11.6.el7.x86_64) 7 (Core)
2 : CentOS Linux (3.10.0-693.el7.x86_64) 7 (Core)
3 : CentOS Linux (0-rescue-b6a299e6f2a74c57aa53e745d1fd3c69) 7 (Core)

So should I go to update to kernel v4? Which one? 4.4 or 4.15?


(Filippo Carletti) #16

3.10.0-820 will come with 7.5 in a few months.
3.10 is not EOL, it’s maintained by Redhat.
Follow the howto you have found to pick a new kernel from elrepo.
I’d prefer the -lt kernel (which, as of today, is 4.4).
You will lose the nDPI function, but you may rebuild it from sources (I know it works).


(Istvan) #17

After I upgrade to 4.4, can I revert to 3.10.0-820 ?


(Istvan) #18

I upgraded to 4.4.116-1.el7.elrepo.x86_64. I’m anxious to see if it solves the problem.
Indeed DPI function is not available. Do you have a tutorial how can I rebuild it from the sources? I’m planning to use this feature.


(Stefano Zamboni) #19

upgrading the kernel will put yourself on your own, since it is a big change from upstream’s path.

CentOS’ kernel is maintained so you can’t compare a 3.10 vanilla kernel with a CentOS/RH’s one…


(Istvan) #20

What would you do in my place? My server is unstable with the latest maintained CentOS kernel.