What Do The Various RAID Levels Mean?
You may have heard the term RAID used in connection with computer systems but been unsure of what it means or whether you should use it. Simply put, RAID stands for Redundant Array of Independent Disks. This means that you have several hard drives acting as one, with inherent benefits over the same drives operating separately. The key to these benefits is found in the term Redundant.
Why is redundancy important? As complex mechanical devices, all hard drives are prone to failure. The oft-used term "crash" specifically refers to the drive heads, which usually ride on a cushion of air micrometers above the spinning platter, coming into physical contact with the platter itself. But there are any number of other physical and electronic breakdowns possible. Eventually, any storage medium, even solid-state devices, will fail. Redundant drives provide a stopgap that lets you deal with the failure before you lose data.
Counting Drives
To provide redundancy, your array must consist of at least two drives. With three or more drives -- and some arrays can support dozens of drives -- you have more options. If one drive fails the system can continue to operate from the others. Depending on the capabilities of the accompanying hardware and software you may be alerted to the failure of the drive with a flashing light, an audible alarm, or e-mail. Then, again depending on the capabilities of the array, you may be able to replace the failed drive without shutting down the system.
Perhaps the simplest configuration is RAID 1, or disk mirroring. All the data stored on one disk drive is also stored on another. Both drives are written to and read from at the same time. Of course, you can have more than two drives, so long as they're in even numbers. Store data on three drives, with three additional drives as the mirror. Obviously this has a drawback: it takes twice the number of drives -- and therefore twice the cost -- to store a given amount of data. A set of six 250 GB drives will give you only 750 GB of usable capacity in a RAID 1 configuration. An advantage is that the speed of read operations can be faster than with a single drive because the array controller may be able to read from two locations at once. Write operations are generally as fast as with a single drive.
A RAID 5 system calculates a parity bit for each byte of data and writes that and the data across at least three drives. (See the sidebar "Calculating Parity.") If any one drive fails, data can be reconstructed by calculating the missing bits. The advantage of RAID 5 over RAID 1 is that there's less overhead for redundancy; six 250 GB drives will yield 1250 GB of usable capacity. The disadvantage is that there's additional overhead in calculating the parity. Thus, RAID 5 systems tend to be slower handling write operations than RAID 1 systems, while they're often faster at read operations.
Both RAID 1 and RAID 5 provide the same degree of redundancy; if one drive fails the system can continue to operate while you replace the failed drive. However, what if two drives fail simultaneously? Or what if one drive fails and, while replacing it, you accidentally pull the wrong drive? The entire system fails. A number of schemes get around this. One is having a hot spare. If the RAID controller detects that a drive has failed, it automatically moves the data that would have been on that drive to a standby drive. In such an arrangement you'd have to have three drives fail before the system collapsed. Again, that takes overhead; in a RAID 5 configuration, six 250 GB drives would provide only 1000 GB of usable capacity. On some systems it's even possible to configure multiple hot spares. More complex configurations involve a combination of RAID 1 and RAID 5. You could mirror a RAID 5 array with another RAID 5 array. And there are still more RAID configurations: 3, 4, 6, 7. And there are other proprietary systems.
There are also configurations that provide improved performance at the expense of redundancy. RAID 0 stripes data across drives without storing parity information. If one of the drives fails, data is lost.
While many mid and high-end server class systems now come with RAID controllers, you can also purchase external RAID systems, from massive cabinets hosting dozens of drives and providing hundreds of terabytes of storage, to small, portable units no larger than a toaster. It's also possible to configure your own RAID at the operating system level, using single-channel disk controllers, at the cost of CPU power. Some systems will tune their configuration for you, analyzing usage patterns and adjusting storage to give you the best performance or capacity while ensuring redundancy. Others will let you alter configurations on the fly, creating multiple partitions to be shared across platforms, adding and removing drives as your needs change.
An alternative to backup?
Does a RAID array, either 1, 5, or some other combination, eliminate the need for backups? Certainly not. If data is accidentally deleted from a RAID array, the built-in redundancy won't be able to recover it. Nor is a RAID array immune to theft or natural disasters.
With the decline in hardware prices, it may be time to consider the added security of RAID storage for your valuable data.