Iperf3 made server crash

AveryFreeman · January 20, 2022, 6:25pm

Hi,

Just wanted to make you aware that I made nethserver crash running iperf3 tests

It was a very cursory test, so I don’t have a lot of details, but I tried to run the iperf3 server end on nethserver and could not connect I’m guessing because port 5201 was blocked by firewall

so I was running nethserver as client, an ubuntu 21.10 VM as server on LAN, connected nethserver to Ubuntu for test and got through about 8-9 iterations before nethserver crashed and was unrecoverable.

I got a bunch of errors on the console:

I grepped a bit from /var/log related to ixgbe adapter:

messages:Jan 20 10:11:51 nethserver kernel: ixgbe 0000:02:00.0: master disable timed out
messages:Jan 20 10:11:51 nethserver kernel: ixgbe 0000:02:00.0 enp2s0f0: detected SFP+: 3
messages:Jan 20 10:11:52 nethserver kernel: ixgbe 0000:02:00.0 enp2s0f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
messages:Jan 20 10:11:52 nethserver kernel: ixgbe 0000:02:00.0 enp2s0f0: Detected Tx Unit Hang #012  Tx Queue             <2>#012  TDH, TDT             <0>, <1>#012  next_to_use          <1>#012  next_to_clean        <0>#012tx_buffer_info[next_to_clean]#012  time_stamp           <100480884>#012  jiffies              <1004808f2>
messages:Jan 20 10:11:52 nethserver kernel: ixgbe 0000:02:00.0 enp2s0f0: Detected Tx Unit Hang #012  Tx Queue             <1>#012  TDH, TDT             <0>, <1>#012  next_to_use          <1>#012  next_to_clean        <0>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10048087e>#012  jiffies              <1004808f2>
messages:Jan 20 10:11:52 nethserver kernel: ixgbe 0000:02:00.0 enp2s0f0: tx hang 183 detected on queue 1, resetting adapter
messages:Jan 20 10:11:52 nethserver kernel: ixgbe 0000:02:00.0 enp2s0f0: initiating reset due to tx timeout
messages:Jan 20 10:11:52 nethserver kernel: ixgbe 0000:02:00.0 enp2s0f0: Reset adapter
messages:Jan 20 10:11:52 nethserver kernel: ixgbe 0000:02:00.0 enp2s0f0: tx hang 184 detected on queue 2, resetting adapter
messages:Jan 20 10:11:52 nethserver kernel: ixgbe 0000:02:00.0 enp2s0f0: RXDCTL.ENABLE for one or more queues not cleared within the polling period
messages:Jan 20 10:11:52 nethserver kernel: ixgbe 0000:02:00.0 enp2s0f0: TXDCTL.ENABLE for one or more queues not cleared within the polling period
messages:Jan 20 10:11:52 nethserver kernel: ixgbe 0000:02:00.0: master disable timed out

I have two LAN adapters on the same subnet, so that may have contributed to the issue. One is 1Gbps and the other is 10Gbps. I wasn’t paying attention and connected to the 1Gbps IP address, but was getting 9.4Gbps (so traffic was going through 10GbE adapter).

Anyway, if anyone has any questions, let me know.

filippo_carletti · January 21, 2022, 10:02am

The most probable cause of the problem is the amount of network traffic.
Not knowing the ethernet adapters installed, it’s hard to diagnose further, but these kinds of problems are usually well known by the vendor of the system (bios and/or driver updates).

Sometimes, as a workaround, I try to disable advanced features of the adapters, like scatter-gather or TSO/GSO.