Dangers of Large RAIDs

Posted on by Brantley Coile

Choosing the best size for a RAID depends on what you’re going to do with the LUN. If it’s for a VM, smaller is better. The best option for a single system LUN is a three disk RAID5. I’ve seen plenty of folks use this to great effect.

When using VMs, larger RAIDs have terrible performance thanks to head contention. And when a few dozen VMs are on a single volume, they end up spread over many drives. This means that two busy VMs will need the drive heads to move quite a distance, which takes time. Performance tanks.

The best setup is sometimes a single drive per small number of VMs—maybe even one drive per VM. After all, that’s what would be in the real machine. If you lose the drive, all you have to do is reload from backup. RAID1 can be used to great effect.

Even if a LUN is for a single system, there is good reason for avoiding a large RAID: it takes days to rebuild. To replace a missing disk in a RAID, all the other disks have to be read. With today’s larger disks, this can take quite a long time.

Spinning disks have IO time limitations thanks to rotational speed and read/write head movement during rebuild. The more drives in the RAID, the more drives have to be read to replace a failed drive. If using the RAID5 or RAID6 to avoid service interruption, add even more time to rebuild. Seek times are on the order of milliseconds. 125 seeks a second add up even without the time needed to read or write.

Also, very large RAID5s run the risk of finding blocks that have gone bad on other drives when recovering a bad stripe. As the areal density of disks goes up, the odds of sectors quietly going bad increases. Magnetic fluxes just flip back to their previous state after a while. During rebuilding suddenly another disk fails because a silently failed sector can’t be read. Usually this is fixable, but it can be touch and go for a while.

As it turns out, there are no real performance advantages in large RAIDs. Up to a point, a RAID can give better performance because drives work in parallel for a theoretical transfer rate of 250MB/s per disk. If the SRX has two 10GbE ports, its limit is 2TB/s. So there is no performance advantage when using more than eight drives in a RAID. (All the LUNs on the SRX share ports, so the advantage is actually less.)

I’ve also heard the argument that wider stripes save money because, as the number of stripes go up, the percentage of redundant disks goes down. Three stripes have an overhead of 30%, but a one hundred disk stripe has 1% overhead. This saves on cost, right?

Turns out, this is a false economy. The cost of drawn out outage time thanks to a lengthy rebuild is expensive, especially when compared to the low cost of disk space. Even leading edge 12TB disks are only about $500. 8TB disks are under $300. Squeezing a bit more storage to save such a small amount of cash isn’t worth the risk of large RAIDs.

If you’ve missed all the exciting things happening at Coraid, check out this article from The Register.

←Previous | Blog Archive