Zabbix keeps failing after update

NethServer Version: 7.9.2009 (final)
Module: zabbix

I had a major problem this week and had to restore my full VM from my PBS with the backup from July 23rd. And although several problems were solved and Zabbix seemed to work but does not collect data.

So, I did the upgrade following these steps

# from:
# https://community.nethserver.org/t/zabbix-5-0-lts-has-been-released/15520/136?u=mre

# check versions
rpm -qa |grep zabbix

# Please update with the needed repos enabled. With the new version the repos should be enabled by default:
yum update --enablerepo=zabbix,mrmarkuz

# Don’t forget the usual signal-event nethserver-zabbix-update after package install:
signal-event nethserver-zabbix-update

# check versions
rpm -qa | grep zabbix

zabbix-release-5.0-1.el7.noarch
zabbix-server-pgsql-5.0.36-1.el7.x86_64
nethserver-zabbix-agent-0.0.1-2.ns7.noarch
zabbix-web-5.0.10-1.el7.noarch
nethserver-zabbix-0.0.1-10.ns7.noarch
zabbix-agent2-5.0.36-1.el7.x86_64
zabbix-agent-5.0.36-1.el7.x86_64

# uname -a
 
 Linux ads.avion.lan 3.10.0-1160.95.1.el7.x86_64 #1 SMP Mon Jul 24 13:59:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

# uptime
 22:27:17 up 12 min,  1 user,  load average: 0.00, 0.03, 0.05

# rpm -qa |grep zabbix

But now with Zabbix updated the service continuously terminates and does not stay active:

* zabbix-server.service - Zabbix Server
   Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2023-08-06 05:15:25 MST; 6h left
  Process: 2637 ExecStop=/bin/kill -SIGTERM $MAINPID (code=exited, status=1/FAILURE)
  Process: 4792 ExecStart=/usr/sbin/zabbix_server -c $CONFFILE (code=exited, status=0/SUCCESS)
 Main PID: 4794 (zabbix_server)
   CGroup: /system.slice/zabbix-server.service
           |-4794 /usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf
           |-4801 /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.137112 sec, idle 60 sec
           |-4804 /usr/sbin/zabbix_server: alert manager #1 started        
           |-4805 /usr/sbin/zabbix_server: alerter #1 started              
           |-4806 /usr/sbin/zabbix_server: alerter #2 started              
           |-4807 /usr/sbin/zabbix_server: alerter #3 started              
           |-4808 /usr/sbin/zabbix_server: preprocessing manager #1 started
           |-4809 /usr/sbin/zabbix_server: preprocessing worker #1 started 
           |-4811 /usr/sbin/zabbix_server: preprocessing worker #2 started 
           |-4812 /usr/sbin/zabbix_server: preprocessing worker #3 started 
           |-4813 /usr/sbin/zabbix_server: lld manager #1 started          
           |-4814 /usr/sbin/zabbix_server: lld worker #1 started           
           |-4815 /usr/sbin/zabbix_server: lld worker #2 started           
           |-4816 /usr/sbin/zabbix_server: housekeeper [startup idle for 30 minutes
           |-4817 /usr/sbin/zabbix_server: timer #1 [updated 0 hosts, suppressed 0 events in 0.002501 sec, idle 34 sec
           |-4819 /usr/sbin/zabbix_server: http poller #1 [got 0 values in 0.000447 sec, idle 5 sec
           |-4820 /usr/sbin/zabbix_server: discoverer #1 [processed 0 rules in 0.000000 sec, performing discovery
           |-4821 /usr/sbin/zabbix_server: history syncer #1 [processed 0 values, 0 triggers in 0.000012 sec, idle 1 sec
           |-4823 /usr/sbin/zabbix_server: history syncer #2 [processed 0 values, 0 triggers in 0.000009 sec, idle 1 sec
           |-4825 /usr/sbin/zabbix_server: history syncer #3 [processed 0 values, 0 triggers in 0.000013 sec, idle 1 sec
           |-4826 /usr/sbin/zabbix_server: history syncer #4 [processed 0 values, 0 triggers in 0.000011 sec, idle 1 sec
           |-4827 /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.000547 sec, idle 3 sec
           |-4828 /usr/sbin/zabbix_server: proxy poller #1 [exchanged data with 0 proxies in 0.000004 sec, idle 5 sec
           |-4829 /usr/sbin/zabbix_server: self-monitoring [processed data in 0.000007 sec, idle 1 sec
           |-4830 /usr/sbin/zabbix_server: task manager [processed 0 task(s) in 0.000180 sec, idle 5 sec
           |-4831 /usr/sbin/zabbix_server: poller #1 [got 0 values in 0.000005 sec, idle 5 sec
           |-4832 /usr/sbin/zabbix_server: poller #2 [got 0 values in 0.000007 sec, idle 5 sec
           |-4834 /usr/sbin/zabbix_server: poller #3 [got 0 values in 0.000007 sec, idle 5 sec
           |-4836 /usr/sbin/zabbix_server: poller #4 [got 0 values in 0.000003 sec, idle 5 sec
           |-4838 /usr/sbin/zabbix_server: poller #5 [got 0 values in 0.000006 sec, idle 5 sec
           |-4839 /usr/sbin/zabbix_server: unreachable poller #1 [got 0 values in 0.000007 sec, idle 5 sec
           |-4840 /usr/sbin/zabbix_server: trapper #1 [processed data in 0.000305 sec, waiting for connection
           |-4841 /usr/sbin/zabbix_server: trapper #2 [processed data in 0.000275 sec, waiting for connection
           |-4842 /usr/sbin/zabbix_server: trapper #3 [processed data in 0.000738 sec, waiting for connection
           |-4843 /usr/sbin/zabbix_server: trapper #4 [processed data in 0.003556 sec, waiting for connection
           |-4845 /usr/sbin/zabbix_server: trapper #5 [processed data in 0.001070 sec, waiting for connection
           |-4848 /usr/sbin/zabbix_server: icmp pinger #1 [got 0 values in 0.000005 sec, idle 5 sec
           `-4849 /usr/sbin/zabbix_server: alert syncer [queued 0 alerts(s), flushed 0 result(s) in 0.000513 sec, idle 1 sec

Aug 06 05:15:25 ads.avion.lan systemd[1]: Starting Zabbix Server...
Aug 06 05:15:25 ads.avion.lan systemd[1]: Started Zabbix Server.

By the way, before this update Zabbix was showing data the last 5 minutes of data (graph); but now it shows absolutely nothing, as if the DB was empty… in another thread, someone commented that he had to wait for (postgresql?) to finish indexing data… is this the case?

Does anyone have the solution to keep Zabbix up and running?

Regards

Hi

AFAIK this would help:

signal-event nethserver-zabbix-update

If not, maybe we need the amazing @mrmarkuz here, it’s still early on a Sunday… :slight_smile:

My 2 cents
Andy

2 Likes

Thank you @Andy_Wismer

An update:

Almost 24 hours after restoring and updating my NS/AD+Zabbix, And Zabbix started to show events:

Could it be that there are internal processes in Zabbix that are so slow that you have to wait hours for it to recover (Postgresql maybe) ?

BTW it stills shows:
image

Before the restore I can confirm that there were no more updates for days and some message similar to “Zabbix server is not running” (clarify).

I will wait a few more hours, and tomorrow I will try to restart some of my Proxmox to see if the Zabbix event is logged or maybe disconnect the UPS so that my piNUT server will detect it and report it to Zabbix.

Regards

If you take into account, that Zabbix normally writes faster into Zabbix DB than the cleanup process can remove unneeded entries from the db, that would not surprise me!

My 2 cents
Andy

1 Like

I wonder if I need a scheduled job to restart the Zabbix server every N minutes.

The service keeps failing, I haven’t measured how often.