RAID-1 for full system - GPT - UEFI

NethServer Version: 7.9.2009-x86_64
Module: I don’t know. Installation?

So I have two new hard drives for my new server.

What I want is to have a system so that if one drive fails, the system still boots (RAID-1).

However, the drives are over 2TB so they need GPT, and GPT needs UEFI, and UEFI needs a 1MB bios boot partition, and if I make that partition RAID in the Centos manual installation GUI, it fails the disk check and won’t let me proceed.

If I just put it on as recommended that’s fine, I’m sure it will work… until the drive fails.

I think that by cleverly using mdadm on the command line, either before or after installation, I can probably get this to work, but before I charge ahead like this, can anyone tell me:

Is there any “proper” way to install nethserver using the GUI onto two identical drives which will give me bootable redundancy against drive failure?

Thanks all!

Are you wanting to use a software RAID-1?

Or do you have a hardware RAID Controller which can do the RAID for you?

I want linux to handle it, if i had hardware RAID I could make a VG from the physical disks in the setup software and it wouldn’t be an issue…

OK, so this is Dell PC with RAID one and zero available in the BIOS. Setting this up in the BIOS changed the problem though; configured this way, the CentOS installer throws an exception at the GUI start and nothing can be done apart from a reboot.

The bottom of the exception chain includes:

File “/usr/lib/python2.7/site-packages/blivet/devices/md.py”, line 108, in init
sysfsPath=sysfsPath)

File “/usr/lib/python2.7/site-packages/blivet/devices/container.py”, line 61, in init
super(ContainerDevice, self).init(*args, **kwargs)

File “/usr/lib/python2.7/site-packages/blivet/devices/storage.py”, line 131, in init
Device.init(self, name, parents=parents)

File “/usr/lib/python2.7/site-packages/blivet/devices/device.py”, line 83, in init
if not self.isNameValid(name):

File “/usr/lib/python2.7/site-packages/blivet/devices/storage.py”, line 839, in isNameValid
if name.startswith(“cciss/”):

AttributeError: ‘NoneType’ object has no attribute ‘startswith’

So armed with this we find old bugs,
https://bugzilla.redhat.com/show_bug.cgi?id=1145783 and
https://bugzilla.redhat.com/show_bug.cgi?id=1441891
and off we go:

(ctrl-alt-F2 for a terminal LOL)

lsblk to confirm devices, then

dd if=/dev/zero of=/dev/sda, count to three, CTRL-C
dd if=/dev/zero of=/dev/sdb, count to three, CTRL-C
reboot, run the installer…

Same problem.

mdadm --zero-superblock /dev/sdb
mdadm --zero-superblock /dev/sdb

reboot, run the installer…

No error, but predictably there’s no RAID either and we are back where we started! So back into the BIOS, make the RAID-1 mirror again and… it’s broken again.

So I think that the special 1MB BIOS boor partition that can’t be mirrored is only needed if the BIOS isn’t setup for UEFI properly; but if i enable UEFI in the BIOS, I can’t get the nethserver USB drive to boot.

Please, does anyone have any advice? I feel as though I’ve missed something simple here…

I’ll go back and re-read those old bug reports and see if there are any clues there.

From:

https://docs.centos.org/en-US/centos/install-guide/StorageSpoke-x86/

" Neither the biosboot nor efi partition can reside on an LVM volume. Use standard physical partitions for them."

Booting in pure UEFI mode changes what I need, but doesn’t fix the problem. I’ll instead look for a way to get anaconda to be happy with the Intel firmware RAID.

Couldn’t find one. I will give up on reliable RAID for this system, but I’m pretty surprised at how this has worked out. Please add a reply on here if you think that this problem has a sane solution!

@freakwent

Hello

There IS a simple solution: Either use a real HW-RAID, or let Linux do Software RAID.

Intel’s on board RAID is almost always fake-raid, it’s your CPU doing the weight lifting!
The OS is not correctly presented with a “RAID”, but two “normal” disks. It seems only Windows (Intel’s Windows Driver does it!) is presented with a “RAID”, other OS not!

HP Proliants, especially the “entry” models, have a lot of Raid-Boards using fake-raid. Due to the fact that it is a seperate board, a lot of people think of it as a hardware RAID, which it is not! Set it up in BIOS, but the OS does NOT see any RAID, same as you had…

My 2 cents
Andy

Thanks @Andy_Wismer for taking the time to reply – I agree on all fronts, but I had hoped that Linux might handle the fakeraid (ataraid).

In any case, I’m still left with a “suboptimal” configuration; everything works, but if sda goes away, the system can’t boot (I assume!) because sda1 is a 1M ‘biosboot’ partition, and this 1M partition doesn’t exist on sdb.

So it’s not ideal, and I reckon I could probably fix it using mdadm, but this is an “upgrade” from an older distro & hardware with smaller drives, and I hate making “improvements” that result in feature loss…

Anyway, thanks again, I really appreciate the reply.

PS: I think I used to many “quotation marks”. Cheers!

@freakwent

Interesting fact:

On HP Proliant Microserver Gen8, Proxmox (Debian) will NOT see any RAID, just individual Disks.
This system is using an Intel CPU.

On HP Proliant Microserver Gen10, Proxmox (Debian) will see a RAID.
This system is using an AMD CPU.

I haven’t yet needed to deal with the newer modell, the HP Proliant Microserver Gen10 Plus, this one is again Intel based…

Based on this, I’d say AMD does a better job of hiding the individual disks, and presenting a RAID Volume to the OS. Intel’s RAID does NOT do this, only when their Windows Driver is used!

However, I’d also like to note I haven’t any disk outages so far, and can not say which system actully reboots of the right disk and if it actually boots!

My 2 cents
Andy

1 Like

If it is not a dedicated RAID controller(with dedicated memory) i tend to use Linux RAID, i had really some bad experiences with onboard raid controllers, and really good ones with linux! :slight_smile:

Nice to know about the Proliant Microserver Gen10. I will be working with one next month i think!

RedHat’s irrational aversion to ZFS is a real shame–Proxmox can do exactly what you want, with ZFS goodness on top, but of course it’s Debian-based, not EL-based. It is possible to hack CentOS/Neth to boot from a ZFS mirror, but not really something I’d recommend for production use:

Apologies for the late response, been a little busy.

Interesting to read this thread about the Proliant Servers and how Linux sees the hardware raid.

I feel that kind of justifies my view of having a separate external unit/device to manage the disks and the raid which is then presented to the server via a SCSI of Fibre cable.

@bwdjames

Hi James

A HP Proliant >= ML350 comes with a “proper” RAID, that is with it’s own “intelligence”, and a Battery-Array to power the RAM on Board. These work as expected, and very well too!

What a LOT of people didn’t know, even specialists:

Compaq, who started the Proliant Series, (And HP bought them for that, it’s the only Compaq “Brand Name” still in use!) actually went the step further, that the others in that time didn’t: Dell, IBM and others like Supermicro NEVER had this feature…

They actually specified that, whoever made their RAID Controllers, had to guarantee that the RAID Controller could accept disks (And the RAID config!) from an older controller / server.
This meant, you could remove 4 SCSI Disks from a HP Server, move them into a “newer” server - and it would boot, recognize the RAID (Ask if that’s OK…) and boot the OS as before… :slight_smile:

I used that with Novell Netware / Windows several times.

Once on the new server, each Disk could be replaced one by one, and rebuilt, then the server was running on newer disks…

I have no information if this is still valid with SATA/SAS connected RAID drives at HP…

My 2 cents
Andy

Oooohhh… I never knew that there was that feature! They kept real quiet about it.

Not sure why it never became a widely accepted feature, sysadmins would have snapped it up like wild fire.

@bwdjames

It was a real cool feature, available until the last SCSI RAID Controllers…

I’ve used it myself on Servers with 6 Disks in RAID (single RAID5 or RAID1 & RAID5 for Sys / Data Disks…)

And the controller - without actually touching the config - would recognize what RAID setup was used, and replicate the config (Using most likely the Disk Signatures by the RAID). Really Cool!

As you could not put in SCSI in a SAS Bay, I’ve never used it since, and as said, am not aware if they enforced it later on. SCSI was something special… HP also bought Compaq soon before the move to SAS, so it could have died a silent death then…

As the replacement server was usually a few years newer, it was really cool to just move the disks over and be done with the hardware migration as far as “Server” is concerned. Netware, unlike Windows had really long lived versions… The Disks still had to be replaced with newer, one by one until remirroring was done, but the RAID Controllers handled that really well then!

My 2 cents
Andy

1 Like