06 Apr 2015, 16:13

Synology SHR array wrong size after expanding

By Scott D. Barker

I recently replaced a 1TB drive (Seagate Barracuda) in my Synology DS-1813+’s SHR-2 array with a 4TB drive (Western Digital Red). During that process, I had another drive which was on it’s last leg (another Seagate Barracuda, 4TB this time) die. I replaced the 4TB Seagate with a 6TB Western Digital Red drive. After everything was finished rebuilding and expanding, I was left a very small change in the capacity of the volume. For having added 5TB to the array, I was seeing about a 1TB change in capacity. That didn’t seem right to me.

So I asked Reddit. The answer there was “Well, SHR hides the complexity of RAID, bla bla bla.” So I asked Synology support. The answer from them was “Well, the calculator on the site is only for new arrays, what you’ll actually see when expanding is bla bla bla.” Neither of those answers were reasonable to me, so I started digging.

As it turned out, when looking at cat /proc/mdstat, I saw (something similar to, recreated from memory) this:

~ # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md5 : active raid6 sdf8[0] sda8[5] sdd8[4] sde8[3] sdh8[2] sdg8[1]
      3906585344 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md4 : active raid6 sde7[5] sdd7[6](S) sda7[7](S) sdg7[3] sdf7[2] sdh7[4]
      1953485568 blocks super 1.2 level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md3 : active raid6 sdh6[6] sdd6[9](S) sda6[10](S) sdg6[5] sdf6[4] sdb6[7] sde6[8] sdc6[1]
      5860456704 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md2 : active raid6 sde5[9] sda5[11](S) sdg5[6] sdf5[5] sdb5[8] sdd5[10] sdc5[2] sdh5[7]
      5832165888 blocks super 1.2 level 6, 64k chunk, algorithm 2 [7/7] [UUUUUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md1 : active raid1 sda2[7] sdb2[0] sdc2[1] sdd2[2] sde2[6] sdf2[3] sdg2[4] sdh2[5]
      2097088 blocks [8/8] [UUUUUUUU]

md0 : active raid1 sda1[7] sdb1[1] sdc1[3] sdd1[2] sde1[6] sdf1[4] sdg1[5] sdh1[0]
      2490176 blocks [8/8] [UUUUUUUU]

unused devices: <none>

At first glance, everything looked fine. I ran an lsblk, and everything seemed fine there too. I checked mdadm --examine /dev/md[0,1,2,3,4,5] and all of that seemed reasonable. Except, not quite.

The results from mdadm --examine /dev/md[2,3,4] showed that several of the partitions had been added to the array as spares, and if you look closely at the cat /proc/mdstat above, that’s confirmed by looking at the devices that in arrays – some of them have an (S) after them, also indicating spare. You’ll also notice from that output that bitmaps were enabled which I had done from a previous rebuild operation.

I believe what happened was that, because I had left bitmaps on, the Synology (actually, mdadm), wasn’t able to successfully execute the mdadm --grow /dev/md[2,3,4] --raid-devices=N (where N is the new number of devices) after it had successfuly performed the (for example) mdadm --add /dev/md2 /dev/sda5. Because of that, the devices were only added as spares and not integrated in to the array, and the subsequent resize2fs command had no additional capacity to resize to.

What I ended up doing was mdadm --grow /dev/md[2,3,4] --bitmap=none, and then for each of the md devices, mdadm --grow /dev/mdX --raid-devices=N, X being the md device, and N being the number of devices currently in the array plus the number marked as spare.

After each of those commands completed, DSM happily reported that I could expand the space. I wanted to get all of the devices in the array before I expand the first, so I finished all of those first. And then, through DSM, I expand the space. Doing this, I was able to recover nearly 5TB of “lost” capacity to the volume.

Ramblings of a Software Engineer