Nethserver Crash, possible XFS corruption?


(Dave J) #1

NethServer Version: 7.4 plus all updates (as of a couple of weeks ago).
Module: N/A

Found the Nethserver server unresponsive earlier today. Not looked at for a couple of weeks. After an enforced reboot, the server goes into emergency mode.
The error file points to a message “XFS (dm-0): Corruption of in-memory data detected” as the key issue.
Seems the root partition (located in a lvm2 volume group) is corrupt (as are all the partitions in the volume group).

Searching the web found various suggests ions to fix, but none have proved successful.
Seems to be more of a CentOS/XFS problem than a Nethserver issue, but any suggestions???

A few things I’ve tried and the results are below. The key checks have been run from both the emergency mode and from a separate Live CD (running LinuxMint).

vgdisplay -v
… the volume group and logical volumes look correct.

vgscan -v
Wiping cache of LVM-capable devices
Wiping internal VG cache
Reading all physical volumes. This may take a while…
Finding all volume groups
Finding volume group "vg01"
Found volume group “vg01” using metadata type lvm2

(Run from emergency mode)
mount /dev/mapper/vg01-root /mnt

Corruption of in-memory data detected. Shutting down filesystem
Please umount the filesystem and rectify the problem(s).

mount: /dev/mapper/vg01-root: can’t read superblock

(Run from Live CD)
mount /dev/mapper/vg01-root /mnt
mount: Structure needs cleaning

xfs_repair /dev/mapper/vg01-root
Phase 1 - find and verify superblock…
bad primary superblock - bad or unsupported version !!!

attempting to find secondary superblock…
…Sorry, could not find valid secondary superblock
Exiting now.

PS: Server running on hardware. No VM involved!


(Giacomo Sanchietti) #2

I’ve never encountered such situation, but I’d try with a CentOS 7 CD/DVD.

Probably XFS version is different between CentOS and Mint.

Finally, you could try to force the repair. Take a look here:


(Dave J) #3

Thanks for the thoughts (and good articles on XFS too, particularly the fibrevillage one!)

  1. I tried to force it (using -L). No difference.
  2. I wondered about using Mint. But because the emergency mode of CentOS didn’t do any better (running out of /boot presumably) which would be the same version, I sort of discounted the need to try with a CentOS image. But probably worth a go at this stage!!.

(Michael Träumner) #4

You could also have a look here:

https://bugzilla.redhat.com/show_bug.cgi?id=1490946

Have you tried

xfs_repair

Edit: Yes you have, my fault.

Did you test your hardware? You should test your memory and hard-disks. I use ubcd for it.


(Dave J) #5

Thanks Mike. Yes saw that Redhat report, and the behaviour post reboot is the same, I guess the real question is why it corrupted in the first place.
Checks of HW showed up nothing.
Nothing has worked as far as getting the root FS to a bootable state. Towards the end I even tried some of the ‘out there’ suggestions (more as an experiment by that stage).
The only thing that is/was of note to me is that originally all the partitions on the LVM VG (vg01) reported as having errors. As I rebuilt the machine I had to reformat the vg01-root but I left the other xfs partition alone and just mounted it. Once the newly built machine was up and running, the other partition was readable with no errors. Food for thought.
Anyway thanks for your help. The server is again working!. Just a bit more config to do. I’ve only had a couple of serious disk corruptions in my life and both have involved LVM and XFS.


(Michael Träumner) #6

Thanks for your feedback and sorry I can’t help more.
Could you at least mark your answer as solution please:


(Dave J) #7

OK. Thanks. Didn’t really think I’d solved it, though, just accepted that it wasn’t solvable in this case. Should I still tick as solved?


(Michael Träumner) #8

No, in this case of course not.
Did it happens again?


(Dave J) #9

No.
Maybe it will happen in 6months time! :slight_smile: