Troubleshooting High I/O Wait affected the CPU

pasing · February 6, 2018, 5:59pm

NethServer Version: NethServer release 7.4.1708 (Final)

Hi All,
I have found high IO Wait in my Collectd Graph Panel as you can see in the next image.

top - 18:45:58 up 22 days,  6:35,  1 user,  load average: 3.00, 3.04, 3.05
Tasks: 229 total,   1 running, 228 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.3 sy,  0.0 ni, 49.8 id, 49.5 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  5937944 total,   569632 free,  2990360 used,  2377952 buff/cache
KiB Swap:  6160380 total,  5862224 free,   298156 used.  2416072 avail Mem

How can I identify the offending process that generates Wait-IO?

Thanks in advance,
Pasquale

Stefano_Zamboni · February 6, 2018, 6:20pm

please, give us more info about your setup, starting from the hds, controller, raid setup

you’d try, IIRC, iostat

pasing · February 6, 2018, 6:39pm

hi Stefano,
this is the output of iostat

[root@gateway ~]# iostat -xm
Linux 3.10.0-693.11.6.el7.x86_64 (gateway.domanin.xxx)    02/06/2018      _x86_64_        (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.94    0.01    0.51    6.51    0.00   92.02

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.01     0.26    0.34    7.25     0.01     0.05    17.06     0.34   45.14    5.31   47.03   3.04   2.31
dm-0              0.00     0.00    0.34    6.89     0.01     0.05    17.74     0.37   51.69    5.42   53.98   3.19   2.31
dm-1              0.00     0.00    0.01    0.08     0.00     0.00     8.04     0.02  245.60    7.81  283.88   0.38   0.00
sdf               0.00     0.00    0.00    0.00     0.00     0.00   354.93     0.00   18.81    2.93   25.03  18.38   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00   393.69     0.00   22.22    3.09   26.57  21.69   0.00

[root@gateway ~]# iostat -N
Linux 3.10.0-693.11.6.el7.x86_64 (gateway.domain.xxx)    02/06/2018      _x86_64_        (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.94    0.01    0.51    6.53    0.00   92.01

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               7.59         8.78        55.98   16918856  107912002
centos_gateway-root     7.23         8.56        55.58   16499735  107139729
centos_gateway-swap     0.09         0.05         0.30      96652     578224
sdf               0.00         0.01         0.27      22217     514976
centos_gateway_iscsi-data     0.00         0.00         0.13       5216     258558

Stefano_Zamboni · February 6, 2018, 6:43pm

you missed my request… disks, raid layout, ram… which kind of services… and so on

pasing · February 6, 2018, 7:15pm

I’m sorry I missed it.
I have two disks, one of which is iscsi. no raid configuration. Active services are dns, dhcp, mail, spam, ftp, vpn, firewal, isp, webserver, database, antivirus, ldap and backup.

[root@gateway ~]# lsblk
NAME                          MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                             8:0    0 465.8G  0 disk
├─sda1                          8:1    0   500M  0 part /boot
└─sda2                          8:2    0 465.3G  0 part
  ├─centos_gateway-root       253:0    0 459.4G  0 lvm  /
  └─centos_gateway-swap       253:1    0   5.9G  0 lvm  [SWAP]
sdf                             8:80   0   500G  0 disk
└─sdf1                          8:81   0   500G  0 part
  └─centos_gateway_iscsi-data 253:2    0   500G  0 lvm  /data
sr0                            11:0    1  1024M  0 rom
[root@gateway ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           5798        2967         474         158        2357        2304
Swap:          6015         291        5724

My only suspect is related to the backup module.
In the last days, I received some notification emails from the system indicating that another backup was in progress.

Can the process linked to the backup module generate a high Wait-IO on the second CPU?

giacomo · February 12, 2018, 10:11am

That’s good, the lock is working

I think you’ve find the smoking gun