Problem with Bootloader and Raid on Nethserver 7

Hello everybody,

my name is Mario and I’m following this forum since I’ve switched from SME Server to Nethserver 6.

Now wanted to install latest Nethserver 7.6 on new Hardware but having problems with Bootloader when using Raid 1 on two Harddisks.

Nethserver 6 (and formerly SME Server), which both are based on Centos 6, have installed automatically Raid 1 on two Disks and after Installation there was Bootloader on both Disks,
so that in case of damage you could boot from each of the both disks.

This worked either for small Disks with MBR and even for larger Disks with GPT.

With new Nethserver 7 based on Centos 7 it seems that Bootloader is stored only in the first of the two disks, so that in case of damage of the second disk the system won’t boot.

Anaconda Installer don’t let you make Bootloader on both disks and after Installation tried with “grub2-install /dev/sdb” don’t work for me, still just boot on /dev/sda.

With two GPT Disks Anaconda not even installs, but fails with unknown error every time.

What am I missing ? I’m thankful for every help from you.

3 Likes

If you proceed with Nethserver installation without modify anything on disk section, you should configure your server with two different RAID 1 arrays.

Please explain it in more detail…

After hours of trying found some information but no solution.

On MBR Disks Nethserver creates two different Raid arrays, one Raid 1 for /boot and another LVM over Raid 1 for / and /swap.

With “fdisk -l” in console you can find sda2/sdb1 for boot array and sda3/sdb2 for LVM array.

But there is only one sda1 marked as *BOOT but none on sdb.

Tried “grub2-install /dev/sdb” but nothing changes, still no *BOOT on sdb.

The Raid arrays are shown in Dashboard as OK, and seems to work when writing data.

But what’s the aim of this Raid 1 when booting is possible only of one of the Disks ?

Simulating damage by taking away a cable of one of the two disks results in booting to emergency mode from which is not possible to go further.

In my opinion when failing one of the disks it must be possible booting just from the other one.

Otherwise I can’t see no advantage instead of using just a single disk.

As mentioned in my first thread, the situation was different with Nethserver 6 (based on Centos 6).

When taking away a disk in Nethserver 6 then the Raid was shown as critical in Dashboard, but it was still booting.

Same scenario with GPT Disks even with BIOS BOOT Partitions created on both disks.

Failing (or taking away) one of the disks results in booting to emergency mode.

Interesting fact is that Nethserver 6 (Centos 6) didn’t used a separate BIOS BOOT Partition.

Maybe used old Grub on LILO instead but I’m not a Linux expert to explain further.

But fact is that Nethserver 6 worked different from new Nethserver 7 without any problems.

Or maybe I just didn’t understand ho to go on when Raid fails and booted in emergency mode.

Until this is not clear to me, I’m afraid of putting my data on Nethserver 7, without knowing what do do in case of Raid fails.

Maybe someone could give better explanations to me, what I am doing wrong, or what I am thinking wrong regarding the purpose of Raid 1 but Bootloader only on one disk.

Can’t believe that this is a generally bug of Nethserver 7 or Centos 7 because in this case many people before me must have been facing this problem too.

Greetings an hopefully waiting for help.

1 Like

You would have to install the MBR on the second drive in emergency mode with grub-install /dev/sdb. I was used to hardware RAID and wondered about that too.

Here is a tutorial.

I read that you could manually install an MBR on the other harddisk just after OS installation to achieve what you want, a second drive booting immediately without interaction:

Let’s try to put things straight. :slight_smile:

Installing NethServer on a system with two hard disks, using the standard or unattended install option will create two RAID1 sets, one for /boot and one for an lvm LV for / and swap.
The GRUB boot loader is installed on both disks.

To sum up: NethServer does all that’s needed, no additional commands or steps are required for the user.

But, today, if you remove one disk, the system will not boot, due to a bug introduced in RHEL 7.6.
The bug is being worked on, it’s an high priority bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1451660

The temporary workaround is to add rd.retry=15 on boot.

If you have replaced a failed disk, you need to install GRUB on it, as @mrmarkuz said.

3 Likes

Thanks for your replies.

@mrmarkuz I just tried “grub2-install /dev/sdb” several times before as described in many forums.

Every time not giving any error, so I was convinced that this worked right, and bootloader should have been present on both disks.

But still not booting in case of one drive removed, so the problem had to be elsewhere.

@filippo_carletti so it’s not me being wrong but a real bug in Centos 7 thats driving me crazy.

Please could you explain a little more detailed how to add “rd.retry=15” at boot ?

I will experience later in the evening when home from work.

If workaround succeeds it would be sufficient for me now, an hopefully resolved in next version 7.7.

Thanks a lot again.

Have tested the workaround this evening.

When Grub Menu is displayed press “e” and then add “rd.retry=15” at the end of the “linux16…” line.

Continue with “Ctrl+X” and finally Nethserver is booting and working.

For me this workaround is OK in the rare case of failing a disk.

Grazie mille Filippo per l’aiuto !

1 Like

Had the same problem yesterday. The failing drive was apparently the only that had the bootloader installed.

Tried to boot the nethserver install disk and selected rescue mode, mounted the good drive then grub2-install, but to no avail : the /usr/lib/grub folder on the live image was empty !?

I finally stumbled upon supergrub :heart_eyes: which allowed to boot on the good drive. From there I was able to install grub and then grub2-mkconfig -o /boot/grub2/grub.cfg

phew !