21 Aug 2014, 20:14

Synology NAS (DS1813+) degraded array for md0 and md1 after rebuild

By Scott D. Barker

I recently purchased a Synology DS1813+ to replace my troubled Drobo-FS. The migration process was long and arduous, consisting of a handful of rebuilds on both the DS side as well as the Drobo side as I shuffled data and moved disks.

During my final rebuild on the DS side (which is an SHR-2 array), I experienced a drive failure in Bay 3 (a Seagate 3TB Barracuda) which resulted in a hard-lock of the device requiring a reboot. When the DS came back up, the drive was available to to DiskStation Manager (DSM), however it wasn’t part of the array, and no amount of mdadm fiddling would re-add it, so through DSM I requested a rebuild of the array to that disk.

Unfortunately, part way through the rebuild, that disk failed again and dropped out of the array and out of the OS as well; it wasn’t found anywhere. Moments after that, the disk in Bay 2 (another Seagate Barracuda, 2TB) dropped from the OS.

At this point I initiated a rebuild with a 4TB Western Digital Red drive that I had configured as a hot stand-spare.

26 hours later, the rebuild finished, though my array was still degraded due to the lack of two drives at this point. I rebooted the DS, it picked up Bay 2 again, and everything was happy. Almost.

DSM reported that the DS was in good condition, however cat /proc/mdstat had something else to say:

~ # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md4 : active raid6 sdh7[4] sdd7[0] sdg7[3] sdf7[2]
      1953485568 blocks super 1.2 level 6, 64k chunk, algorithm 2 [4/4] [UUUU]

md3 : active raid6 sdb6[7] sdh6[6] sdc6[1] sdg6[5] sdf6[4] sdd6[2]
      3906971136 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md2 : active raid6 sdb5[8] sdh5[7] sda5[0] sdg5[6] sdf5[5] sdd5[3] sdc5[2]
      4860138240 blocks super 1.2 level 6, 64k chunk, algorithm 2 [7/7] [UUUUUUU]

md1 : active raid1 sdh2[4] sdg2[6] sdf2[5] sdd2[3] sdc2[2] sdb2[1] sda2[0]
      2097088 blocks [8/7] [UUUUUUU_]

md0 : active raid1 sdb1[2] sdh1[1] sda1[0] sdc1[4] sdd1[3] sdf1[5] sdg1[6]
      2490176 blocks [8/7] [UUUUUUU_]

unused devices: <none>
~ #

Yes, it would seem as though my rebuild missed md0 and md1. I found that very curious, because they were part of the rebuild process when I was nervously querying cat /proc/mdstat.

After a day and a half of nervously inspecting partitions, configurations, and mdadm’s output, I discoverd that md0 and md1 aren’t my devices in that they don’t hold any of my data. When I queried pvdisplay, they weren’t listed in any of LVM’s volumes, and when mounting them, they appeared to contain replicas of the OS (which I do suppose makes sense).

I was able to address the issue by issuing mdadm --grow -n 7 /dev/md[01] which caused those two arrays to “grow” (in this case, shrink) by one device. That happened immediately, and a subsequent cat /proc/mdstat showed fully happiness across the board:

~ # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md4 : active raid6 sdd7[0] sdg7[3] sdf7[2] sdh7[4]
      1953485568 blocks super 1.2 level 6, 64k chunk, algorithm 2 [4/4] [UUUU]

md3 : active raid6 sdh6[6] sdg6[5] sdf6[4] sdb6[7] sdd6[2] sdc6[1]
      3906971136 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md2 : active raid6 sda5[0] sdg5[6] sdf5[5] sdb5[8] sdd5[3] sdc5[2] sdh5[7]
      4860138240 blocks super 1.2 level 6, 64k chunk, algorithm 2 [7/7] [UUUUUUU]

md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3] sdf2[5] sdg2[6] sdh2[4]
      2097088 blocks [7/7] [UUUUUUU]

md0 : active raid1 sda1[0] sdb1[2] sdc1[4] sdd1[3] sdf1[5] sdg1[6] sdh1[1]
      2490176 blocks [7/7] [UUUUUUU]

unused devices: <none>

Now, with one bay empty, I just have to wait on my last 4TB WD Red to arrive to be configured as a replacement hot-spare, and I’ll be in business!

1: http://www.synology.com/en-us/products/overview/DS1813+

Ramblings of a Software Engineer