RAID1 broken following Power Outage

Jimbo · October 19, 2019, 7:34pm

Guys,
I’m running System version
NethServer release 7.7.1908 (final)
Kernel release
3.10.0-1062.1.2.el7.x86_64
Everything was fine till this morning, when we had a power outage. Now, I’ve got the server back, but RAID is not well:

From the Dashboard:

md1 CRITICAL
Level RAID1
Devices 1/2 (sda1)

md2 CRITICAL
Level RAID1
Devices 1/2 (sda2)

This is a vanilla Nethserver build, I changed nothing from the way the disks were built by the install process.

Now I can see there is something wrong, but you will be able to see from what I have tried, that I know NOTHING about rebuilding RAID:

cat /proc/mdstat outputs the following:

Personalities : [raid1]
md1 : active raid1 sda1[0]
1046528 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk

md2 : active raid1 sda2[0]
974531584 blocks super 1.2 [2/1] [U_]
bitmap: 7/8 pages [28KB], 65536KB chunk

unused devices:

so its U_ instead of UU so I think I’m running on half a RAID array, but I don’t know which disk has been removed and I don’t know how to re-add them. I tried this:

mdadm /dev/md1 -a /dev/sda2

but got this response:
mdadm: Cannot open /dev/sda2: Device or resource busy

which makes me think I’m playing with the wrong disk, so I backed off from that!

I also got this output:
[root@bastion dev]# mdadm -D /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Thu Feb 28 21:53:46 2019
Raid Level : raid1
Array Size : 1046528 (1022.00 MiB 1071.64 MB)
Used Dev Size : 1046528 (1022.00 MiB 1071.64 MB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent

 Intent Bitmap : Internal

   Update Time : Sat Oct 19 14:37:38 2019
         State : clean, degraded
Active Devices : 1

Working Devices : 1
Failed Devices : 0
Spare Devices : 0

Consistency Policy : bitmap

          Name : localhost:1
          UUID : 0438973a:a588a669:aa17a4a9:1347d2c1
        Events : 393

Number   Major   Minor   RaidDevice State
   0       8        1        0      active sync   /dev/sda1
   -       0        0        1      removed

[root@bastion dev]# mdadm -D /dev/md2

/dev/md2:
Version : 1.2
Creation Time : Thu Feb 28 21:53:23 2019
Raid Level : raid1
Array Size : 974531584 (929.39 GiB 997.92 GB)
Used Dev Size : 974531584 (929.39 GiB 997.92 GB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent

 Intent Bitmap : Internal

   Update Time : Sat Oct 19 20:19:49 2019
         State : clean, degraded
Active Devices : 1

Working Devices : 1
Failed Devices : 0
Spare Devices : 0

Consistency Policy : bitmap

          Name : localhost:2
          UUID : ebd5c302:96e2a3b9:21f695e0:a1f08008
        Events : 362063

Number   Major   Minor   RaidDevice State
   0       8        2        0      active sync   /dev/sda2
   -       0        0        1      removed

It’s only a “domestic” server, so its not the end of the world, and it seems to be running OK on one disk for the time being, but I’d like to learn a bit about this, I’ve run RAID arrays before on “home grown” servers , but never tinkered with the internals of Nethserver. Some other info I dug out:

[root@bastion ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 931.5G 0 disk
├─sda1 8:1 0 1G 0 part
│ └─md1 9:1 0 1022M 0 raid1 /boot
└─sda2 8:2 0 929.5G 0 part
└─md2 9:2 0 929.4G 0 raid1
├─VolGroup-lv_root 253:0 0 921.5G 0 lvm /
└─VolGroup-lv_swap 253:1 0 7.9G 0 lvm [SWAP]
sdb 8:16 0 931.5G 0 disk
├─sdb1 8:17 0 1G 0 part
└─sdb2 8:18 0 930.5G 0 part
sdc 8:32 0 1.8T 0 disk
└─sdc1 8:33 0 1.8T 0 part
sdd 8:48 1 7.5G 0 disk
sr0 11:0 1 1024M 0 rom

[root@bastion dev]# ls sd*
sda sda1 sda2 sdb sdb1 sdb2 sdc sdc1 sdd
[root@bastion dev]# ls md*
md1 md2

md:
1 2

I’d be grateful for some guidance…one last thing: the server is fully backed up daily to an external 1T USB drive, so if the worst comes to the worst…

Thanks

Jim

mrmarkuz · October 19, 2019, 8:07pm

I think you need to readd the RAID members correctly:

mdadm /dev/md1 -a /dev/sda1
mdadm /dev/md2 -a /dev/sda2

and check progress with

cat /proc/mdstat

Jimbo · October 19, 2019, 9:29pm

Hi Mrmarkuz
I tried what you suggested, but it didn’t do as expected:

[root@bastion ~]# mdadm /dev/md2 -a /dev/sda2
mdadm: Cannot open /dev/sda2: Device or resource busy
[root@bastion ~]# mdadm /dev/md1 -a /dev/sda1
mdadm: Cannot open /dev/sda1: Device or resource busy
[root@bastion ~]#

I had a look at the link you provided…I’d found that earlier, and the reason I hadn’t followed it when it told me to do this:

sgdisk -R /dev/sdb /dev/sda
sgdisk -G /dev/sdb

and since this messes with MBRs and the like, I wanted to be sure I understood which disk I should be issuing the command on…I am unsure which disk is the sick one…sorry, I need some guidance here

Thanks for your response, nonetheless

Jim

mrmarkuz · October 19, 2019, 9:40pm

That’s the most important thing.

I found this about the busy device:

More threads:

Jimbo · October 20, 2019, 6:16am

Hi Mrmarkuz,
Well, I slept on it, and woke up with the solution: its all because I’m stupid!..I was trying to rebuild with the wrong drives. When I went with the right drives, it all seemed to be fine:

[root@bastion ~]# mdadm /dev/md1 -a /dev/sdb1
mdadm: re-added /dev/sdb1
[root@bastion ~]# mdadm /dev/md2 -a /dev/sdb2
mdadm: re-added /dev/sdb2

one volume is back already, the other is rebuilding:

[root@bastion ~]# mdadm --detail /dev/md2
/dev/md2:
Version : 1.2
Creation Time : Thu Feb 28 21:53:23 2019
Raid Level : raid1
Array Size : 974531584 (929.39 GiB 997.92 GB)
Used Dev Size : 974531584 (929.39 GiB 997.92 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

 Intent Bitmap : Internal

   Update Time : Sun Oct 20 07:07:49 2019
         State : clean, degraded, recovering
Active Devices : 1

Working Devices : 2
Failed Devices : 0
Spare Devices : 1

Consistency Policy : bitmap

Rebuild Status : 49% complete

          Name : localhost:2
          UUID : ebd5c302:96e2a3b9:21f695e0:a1f08008
        Events : 377989

Number   Major   Minor   RaidDevice State
   0       8        2        0      active sync   /dev/sda2
   1       8       18        1      spare rebuilding   /dev/sdb2

So thanks for your help…having to describe the problem properly allowed me fix it

Thanks

Jim