SERIOUS PROBLEM - VM starts but NS7 doesn't!

NLS · June 15, 2023, 8:42pm

NethServer Version: latest

I use NS7 for years, never had a serious issue, nothing stopped NS from actually booting, even when server crashed (host) etc.

Now I gracefully rebooted the host (UNRAID - has KVM and NS is a VM) and now the VM starts but fails as it seems to mount virtual disks within the (existing) qcow2 image.

What can I do? I have no clue.

EDIT: Forgot to point that I rebooted the host because of a version change.
It possibly also changed KVM version.
But other VMs run fine.

EddieA · June 16, 2023, 12:26am

This is from the Proxmox board, but it should get you started.

I had exactly the same issue when I moved my NS7 from bare metal to Proxmox.

Cheers.

NLS · June 16, 2023, 8:30am

Not sure how to actually get started from the prompt I have the screenshot above.
In any case things a few more data:

I have a Win VM on same host. Works fine (so kvm itself doesn’t seem to be messed up).
I checked qcow2 image, it didn’t report any issues.
VM configuration seems untouched, although I am not sure about this entry if something is wrong:

  <os>
    <type arch='x86_64' machine='pc-q35-7.1'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/6fbb079d-0778-64b7-5c5b-1e3789ec2e6a_VARS-pure-efi.fd</nvram>
  </os>

I am not sure if there lies an issue with the VM, but this outside NS7 support scope (still if you have any idea…).

I have tried to “see” inside qcow2 image from within the host and it does see three partitions inside it, efi partition and two lvm partitions (root and cache). So I tried to mount root partition and I DID manage to mount them… those are the partitions that NS7 itself fails to mount!
I did try both NS7 and generic CentOS isos to start a recovery process, but they say they don’t detect an existing OS!

The (virtual) disk seems to be fine. I cannot imagine what outside the scope of the VM (and in the host) could have affected the boot process INSIDE the VM.

I will try to remake a new VM and re-use the existing image, but I don’t expect much.
(esp. since I am not sure how to follow what the link you provided suggests)

michelandre · June 16, 2023, 8:40am

Hi Nick,

If your are using LVM, Is it possibe that the Disk Manager config file kept the old UUID or PVID from the original disk instead of those of the new one?

# cat /etc/lvm/devices/system.devices
conpared to:
# vgs

Also:
# dmsetup info /dev/dm-0
compared to
# dmsetup info /dev/dm-1.

Michel-André

NLS · June 16, 2023, 9:02am

Question is, can I check those from within the prompt it allows me to use? (see screenshot)

I used NS7 own setup iso, so it uses LVM because this is how the setup wizard has set it up.
Why would the UUID or PVID change? Does it depend on the host?
(remember NS7 is inside a VM)

michelandre · June 16, 2023, 9:19am

Something like that happen to me using VirtualBOX on an Ubuntu-22.04 when I clone a VM and didn’t kept the original disk name and hardware UUID.

Same Warning lines about
/dev/mapper/VolGroup-lv_root does not exist
and
/dev/mapper/VolGroup-lv_swap does not exist

Michel-André

filippo_carletti · June 16, 2023, 10:41am

Boot from a CentOS or NethServer usb/cdrom and select rescue mode:
https://docs.centos.org/en-US/centos/install-guide/Rescue_Mode/

NLS · June 16, 2023, 12:38pm

I already did that, see last bullet above. It cannot find my OS install it says.

I will try again.

EDIT: BTW, CentOS forum actually discarded my post there on the basis that “NethServer is not CentOS”. WTF…

NLS · June 16, 2023, 3:28pm

I don’t have any of those in emergency shell.
I cannot go to rescue as it doesn’t find an OS (!)…

michelandre · June 16, 2023, 3:34pm

Hi Nick,

Just for a try, can you change all the UUID in /etc/fstab by their /dev/sda1 and /dev/sda2 as you didn’t add nothing to the LVM.

That way it will eliminate LVM altogether ???

Michel-André

EddieA · June 16, 2023, 4:14pm

At the prompt, there is a CTRL key combination (can’t remember exactly which). This will restart the boot at the failed point, which will now complete because the relevant disks are mounted.

Cheers.

NLS · June 16, 2023, 4:38pm

The only way to see INSIDE my root partition is to mount it in my host system.
(I connect to the qcoq2 image, I ask it to scan for lvm partitions, it DOES find them and then I CAN mount the root partition and see the contents in my mount point!.. This gives me hope that I can actually save my mailboxes etc.)

Any other thing, booting CentOS iso, selecting rescue or NS iso, selecting rescue, cannot find my OS at all. Even using super_grub2_disk doesn’t take me far - goes to the same as the screenshot above.

But mounting this on host at least shows me LV UUID of the partitions. Is this useful to me?

(I will make an extra thread - as I may need to go to a fresh NS7 install and then since I CAN see the root partition and get the contents, maybe I should just “drop” everything over that - but I would hope to fix the EXISTING install if possible)

LayLow · June 16, 2023, 4:49pm

IF you make a bit by bit copy right now, you may allow yourself some fiddling

NLS · June 16, 2023, 7:36pm

What do you mean?
I already have a copy of the whole disk image just to be safe.

I don’t want to fiddle, I want to have a NethServer7 system that works and has my full configuration intact.

LayLow · June 16, 2023, 7:50pm

Good, that’s what I meant to say. Good luck.

NLS · June 16, 2023, 7:58pm

FIXED!!!

Well I found the issue.
Anything virtio is NOT working!
Definitely an issue with the host?

I changed both the vdisk to be SATA and network to be e1000 and works!
Now to see who’s fault is this…

NLS · June 16, 2023, 8:46pm

This shows that CentOS7 based systems like NS7 and newer KVM hosts MAY have issues.
So… waiting for NS8…