Server Unresponsive Issue

Hi,

Two mornings in a row now my Nethserver install has locked up for want of a better explanation forcing me to reset the VM it runs on.

When I try to access the console its unresponsive too. After a reboot it functions perfectly well.

Nothing to indicate cause in the messages log? Where else can I look to find the issue?

Thanks

John

Depending on your virtualization there may be logs too (ie. vmware.log). Do you use some virtualization backup tool?

Are there entries in /var/log/messages after the system freeze? Does it happen at a specific time? A cron job could be the trigger…

Nothing significant to my eyes.

VMware logs extracts:

Aug 21 08:42:56.389] [ message] [VGAuthService] SAML_Init: Using xmlsec1 for XML signature support [Aug 21 08:42:56.389] [ message] [VGAuthService] ServiceNetworkListen: Created socket directory ‘/var/run/vmware’ [Aug 21 08:42:56.389] [ message] [VGAuthService] BEGIN SERVICE [Aug 22 08:18:39.444] [ message] [VGAuthService] VGAuthService ‘build-6082533’ logging at level ‘normal’ [Aug 22 08:18:39.448] [ message] [VGAuthService] Pref_LogAllEntries: 1 preference groups in file ‘/etc/vmware-tools/vgauth.conf’ [Aug 22 08:18:39.449] [ message] [VGAuthService] Group ‘service’

[Aug 21 08:42:56.421] [ message] [vmtoolsd] Plugin ‘vmbackup’ initialized. [Aug 21 08:42:56.430] [ message] [vix] VixTools_ProcessVixCommand: command 62 [Aug 21 08:42:56.456] [ message] [vix] ToolsDaemonTcloReceiveVixCommand: command 62, additionalError = 17 [Aug 21 08:43:26.481] [ warning] [guestinfo] GuestInfoSendNicInfoXdr: update failed: request "SetGuestInfo 10 ", reply “Invalid guest information type.”. [Aug 22 08:18:39.906] [ message] [vmsvc] Log caching is enabled with maxCacheEntries=4096. [Aug 22 08:18:39.908] [ message] [vmsvc] Core dump limit set to -1 [Aug 22 08:18:39.908] [ message] [vmtoolsd] Tools Version: 10.1.10.63510 (build-6082533) [Aug 22 08:18:40.280] [ message] [vmtoolsd] Plugin ‘hgfsServer’ initialized. [Aug 22 08:18:40.280] [ message] [vix] QueryVGAuthConfig: vgauth usage is: 1

Messages

Aug 22 07:14:21 remote kernel: ll header: 00000000: ff ff ff ff ff ff 5c 49 7d 61 fe db 08 00 …\I}a… Aug 22 07:14:21 remote kernel: IPv4: martian source 192.168.1.254 from 192.168.1.231, on dev br0 Aug 22 07:14:21 remote kernel: ll header: 00000000: ff ff ff ff ff ff 00 26 51 75 29 67 08 06 …&Qu)g… Aug 22 07:14:21 remote kernel: IPv4: martian source 192.168.1.254 from 192.168.1.231, on dev br0 Aug 22 07:14:21 remote kernel: ll header: 00000000: ff ff ff ff ff ff 00 26 51 75 29 67 08 06 …&Qu)g… Aug 22 07:14:21 remote kernel: IPv4: martian source 255.255.255.255 from 192.168.1.104, on dev br0 Aug 22 07:14:21 remote kernel: ll header: 00000000: ff ff ff ff ff ff 68 c6 3a b6 73 6d 08 00 …h.:.sm… Aug 22 07:14:22 remote kernel: IPv4: martian source 192.168.1.255 from 192.168.1.5, on dev br0 Aug 22 07:14:22 remote kernel: ll header: 00000000: ff ff ff ff ff ff 00 0c 29 16 65 17 08 00 …).e… Aug 22 07:14:22 remote kernel: IPv4: martian source 192.168.1.255 from 192.168.1.5, on dev br0 Aug 22 07:14:22 remote kernel: ll header: 00000000: ff ff ff ff ff ff 00 0c 29 16 65 17 08 00 …).e… Aug 22 07:14:22 remote kernel: IPv4: martian source 192.168.1.254 from 192.168.1.231, on dev br0 Aug 22 07:14:22 remote kernel: ll header: 00000000: ff ff ff ff ff ff 00 26 51 75 29 67 08 06 …&Qu)g… Aug 22 07:14:22 remote kernel: IPv4: martian source 192.168.1.254 from 192.168.1.231, on dev br0 Aug 22 07:14:22 remote kernel: ll header: 00000000: ff ff ff ff ff ff 00 26 51 75 29 67 08 06 …&Qu)g… Aug 22 08:18:15 remote journal: Runtime journal is using 8.0M (max allowed 189.5M, trying to leave 284.2M free of 1.8G available → current limit 189.5M). Aug 22 08:18:15 remote kernel: Initializing cgroup subsys cpuset Aug 22 08:18:15 remote kernel: Initializing cgroup subsys cpu Aug 22 08:18:15 remote kernel: Initializing cgroup subsys cpuacct Aug 22 08:18:15 remote kernel: Linux version 3.10.0-862.11.6.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Tue Aug 14 21:49:04 UTC 2018 Aug 22 08:18:15 remote kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.11.6.el7.x86_64 root=/dev/mapper/VolGroup-lv_root ro crashkernel=auto rd.lvm.lv=VolGroup/lv_root rd.lvm.lv=VolGroup/lv_swap nodmraid rhgb quiet LANG=en_US.UTF-8 Aug 22 08:18:15 remote kernel: Disabled fast string operations Aug 22 08:18:15 remote kernel: e820: BIOS-provided physical RAM map: Aug 22 08:18:15 remote kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009f3ff] usable Aug 22 08:18:15 remote kernel: BIOS-e820: [mem 0x000000000009f400-0x000000000009ffff] reserved Aug 22 08:18:15 remote kernel: BIOS-e820: [mem 0x00000000000dc000-0x00000000000fffff] reserved

The issues happened at around just before 8am according to my monitoring system here at the office.

Cron log

Aug 22 07:14:11 remote run-parts(/etc/cron.daily)[24492]: starting collectd_cleanup Aug 22 07:14:11 remote run-parts(/etc/cron.daily)[12872]: finished collectd_cleanup Aug 22 07:14:11 remote run-parts(/etc/cron.daily)[24492]: starting duc-index Aug 22 08:18:52 remote crond[2413]: (CRON) INFO (RANDOM_DELAY will be scaled with factor 1% if used.) Aug 22 08:18:52 remote crond[2413]: (CRON) INFO (running with inotify support) Aug 22 08:19:01 remote CROND[3339]: (sogo) CMD (/usr/sbin/sogo-ealarms-notify > /dev/null 2>&1)

There are no corresponding events or tasks in ESXi for the nethserver VM other than the open vm tools going offiline at about 7:15am.

Also, VMware backup not yet implemented. Risky!

The console is inaccessible or just slow?

It could be a kernel panic, but you should find something inside the logs.
From the pasted log, I can’t see anything strange.

I could see the login prompt but could not type anything.