RAID can be categorized based on its functions, including acceleration, increased capacity, backup, or fault tolerance. Different RAID levels can be configured to meet various needs. The formation of these different RAID levels can be achieved using hardware, known as Hardware RAID. It can also be accomplished using software, known as Software RAID. There is also a hybrid option called Semi-Hardware RAID, which combines aspects of both hardware and software. The advantages and disadvantages of each approach are as follows:
In practice, its speed is not faster than Software RAID, and it lacks the flexibility of Software RAID, making it a substantial deception. Therefore, it's also referred to as "FakeRAID," and it's generally advisable to avoid it. Its only advantage is its lower dependence on the operating system compared to Software RAID, as certain chipsets supporting Semi-Hardware RAID are compatible with both Windows and Linux.
In general, when a regular hard drive detects instability in a certain magnetic area during operation, it will "freeze" momentarily and secretly move the data from the unstable area to a reserved area. After the transfer is complete, the unstable area is blocked, and the reserved area is opened up. As a result, the total capacity remains unchanged (referred to as "deep recovery cycle" in Western Digital's terminology). Through this process, the hard drive effectively performs a "Grand Relocation" of data, which might appear as a momentary pause to the average user, causing minimal disruption.
However, both Hardware RAID and Software RAID treat any hard drive that doesn't respond within 8 to 30 seconds during data writing as a faulty drive and eject it from the RAID array. While the "deep recovery cycle" doesn't happen frequently, in applications that operate 24/7 for a year, it is likely to occur a few times. This behavior is not limited to mechanical hard drives; even SSDs (Solid-State Drives) with their limited write-cycle lifespan due to NAND Flash rely on this type of self-repair mechanism.
Taking Western Digital's consumer-grade desktop hard drives (Blue, Green, Black series) as an example, the "freeze" during the execution of the "deep recovery cycle" can last up to 2 minutes. If this happens in a RAID setup, the hard drive would have been already ejected from the RAID array due to the extended response time. However, Western Digital's enterprise-grade hard drive series (such as Velociraptor and RAID Edition Re) are equipped with "TLER" (Time-Limited Error Recovery) functionality, which ensures that the "freeze" during error correction won't exceed 7 seconds, preventing the drive from being kicked out of the RAID array.[Note 1.0]
Not only Western Digital, but other brands of enterprise-grade hard drives like Seagate, Toshiba, and Hitachi also have similar mechanisms, though the terms used might differ (such as ERC - Error Recovery Control or CCTL - Command Completion Time Limit). Therefore, when setting up a RAID, it's crucial to spend some time checking the specifications on the hard drive manufacturer's website. If you're setting up a RAID 5, you should also consider "URE" (Unrecoverable Read Errors) and aim to minimize potential RAID 5 unrecoverable errors. It's not just about price and capacity; otherwise, you might end up with unexpected issues.
In general, hard drives suitable for setting up RAID are of "enterprise-grade" or "NAS-grade", which offer support for "TLER" (Time-Limited Error Recovery) functionality and lower levels of "unrecoverable read error."
The primary objectives of RAID 0 are increased capacity and speed, with no backup or fault tolerance functionality.
RAID 1 operates on the same principle as LVM's Mirror Volume. Regardless of the number of hard drives, RAID 1 capacity is determined by the "capacity of the smallest hard drive."
The primary objective of RAID 1 is data backup and fault tolerance, making it the most secure option. It offers some acceleration benefits (read speed is determined by the fastest hard drive, while write speed waits for the slowest drive).
The process of rebuilding the entire RAID could take several hours (depending on RAID size and disk speed). During this time, the RAID remains in degraded mode. However, once the rebuilding process is complete, the RAID array is restored to full functionality.
The principle of RAID 5 involves splitting a data set among various hard drives, as shown in the diagram. Each hard drive is accompanied by a parity bit, contributing to the fault-tolerant mechanism. In the event of a hard drive failure, the remaining drives can use the parity information to reconstruct the data and continue operation. If a spare disk is available or the faulty hard drive is replaced, the presence of parity information on the remaining drives facilitates the reconstruction of the entire RAID 5 array.
The capacity of RAID 5 is calculated as "(number of hard drives - 1) x capacity of the smallest hard drive."
RAID 5 offers fault tolerance, acceleration, and backup functionalities while requiring a relatively modest number of hard drives. It used to be a dream combination, but with the increasing capacity of modern hard drives, a new concern has arisen—Unrecoverable Errors (URE).
URE, short for Unrecoverable Read Error, is an essential specification of a hard drive. For consumer-grade hard drives, the acceptable rate of read errors is 1 in 1014 bits (URE = 1/1014). Enterprise-grade hard drives typically have URE rates of 1 in 1015 (URE = 1/1015 ).
While URE rates of 1/10-14 or 1/10-15 might seem low and negligible on the surface, the issue becomes amplified when used in a RAID 5 configuration. Nowadays (as of 2014), hard drives often start at 1TB (1T = 1012), making the occurrence of a single bit error after reading a full 1TB drive more significant. Moreover, during the reconstruction of RAID 5, which involves reading/writing at least three hard drives, any bit read/write error can render RAID 5 unable to recover the data.
Based on calculations, the worst-case rebuild failure rate for a 2TB RAID 5 array composed of 6 hard drives is as follows:URE | RAID 5 of rebuild failure rate for 6x 2TB drives (total 12 T) |
1/1014 | 55% |
1/1015 | 10% |
1/1016 | 0% |
Due to the simultaneous writing of two independent parity blocks, the write efficiency of RAID 6 is lower than RAID 5, and it is relatively more complex. This aspect was not given much attention in the past, but with the inevitable trend of increasing hard drive capacities, it has gained renewed importance. As the capacities of hard drives continue to grow, the phenomenon of RAID 5's unrecoverable read errors becomes amplified, prompting a resurgence of interest in RAID 6.
The capacity of RAID 6 is calculated as "(number of hard drives - 2) x capacity of the smallest hard drive."
As long as there are enough hard drives, theoretically, various RAID levels can be combined in hybrid configurations. However, in the case of RAID 1 and RAID 0, you can create RAID 01 or RAID 10. RAID 01, though possible, has lower reliability (consider the combination yourself), so some hybrid RAID setups are not very common. More practical hybrid options include RAID 10, RAID 50, and RAID 60.
However, with the increased functionality of hybrid RAID comes the challenge of choosing the right configuration. A general recommendation is to opt for RAID 5 or RAID 6 if there is a higher emphasis on reading operations. On the other hand, if frequent random write operations are involved, RAID 10 would be a better choice.
mdadm is a versatile command used for creating, managing, and monitoring Software RAID. For detailed usage instructions, you can refer to the man mdadm command. Below, I'll introduce the commonly used modes: create、manage、assemble 、monitor、grow, and misc.
mdadm | --create | /dev/md# | --level=(#|NAME) | --raid-devices=# | DEVICE | [--spare-devices=# DEVICE] |
↑ | ↑ | ↑ | ↑ | ↑ | ↑ | |
(1) | (2) | (3) | (4) | (5) | (6) |
In the above example, the RAID 1 array is composed of the disk /"dev/sdb" and the partition "/dev/sdc2". It's important to note that when creating a RAID array using partitions (like "/dev/sdc2") or entire disks ("like /dev/sdb"), the existing data on those partitions or disks will be lost and overwritten. Therefore, it's recommended to back up any important data before creating a RAID array. Additionally, after creating the RAID array, the partition type of the disks might change to "loop," which can be observed using the parted -l command.
There are two crucial pieces of information recorded in "/etc/mdadm.conf". The first is the list of devices ("DEVICE") that comprise the RAID, and the second is the "UUID" of the RAID. The second item can be generated using the Misc mode with the command mdadm --detail --scan. The following example illustrates creating a RAID 1 and editing the "/etc/mdadm.conf" file:
Example:# mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdb /dev/sdc --spare-devices=1 /dev/sda6 ←# Create a RAID 1 array with two devices, "/dev/sdb" and "/dev/sdc", and designate "/dev/sda6" as a spare disk. mdadm: largest drive (/dev/sda6) exceed size (3767808K) by more than 1% Continue creating array? y ←Such as hard disk or partition There will be a warning if the size is different or there is data in it, press <y> to continue mdadm: array /dev/md0 started. ←md (RAID)started # echo 'DEVICE /dev/sdb /dev/sdc /dev/sda6' > /etc/mdadm.conf ← Write the Device of the RAID group into "/etc/mdadm.conf" (to include spare-device) # mdadm --detail --scan >> /etc/mdadm.conf ←Write the UUID of the RAID Add "/etc/mdadm.conf" # cat /etc/mdadm.conf ←Confirm the content of the file "/etc/mdadm.conf" DEVICE /dev/sdb /dev/sdc ARRAY /dev/md0 level=raid1 num-devices=2 spares=1 UUID=0fcf1a50:057f9442:49ee75e2:9acafdb6 |
The advanced settings in "/etc/mdadm.conf" include "MAILADDR", which can be used to send an email notification when there is a problem with the RAID array, and "PROGRAM", which can be used to run a specific command when mdadm detects a specific condition. For more information, please refer to the man page for "mdadm.conf" and the monitor mode.
After creating the Soft-RAID array, you can format and mount the array and edit the "/etc/fstab fil"e. The following example shows how to do this.
Example:# mkfs -j /dev/md0 ←format # mount /dev/md0 /medial ←mount |
# cat -n /proc/mdstat ←Monitor RAID status 1 Personalities : [raid1] ←Current raid levels 2 md0 : active raid1 sda6[2](S) sdc[1] sdb[0]←RAID disk combination (S indicates spare device) 3 3767808 blocks [2/2] [UU]←RAID operational status 4 [=>...................] resync = 5.1% (194752/3767808) finish=11.5min speed=5154K/sec ←Display sync percentage and estimated completion time |
In the example above:
# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Wed Aug 21 11:07:44 2013 Raid Level : raid1 Array Size : 3767808 (3.59 GiB 3.86 GB) Used Dev Size : 3767808 (3.59 GiB 3.86 GB) Raid Devices : 2 Total Devices : 3 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Wed Aug 21 11:07:44 2013 State : active, resyncing Active Devices : 2 Working Devices : 3 Failed Devices : 0 Spare Devices : 1 Rebuild Status : 10% complete UUID : 0fcf1a50:057f9442:49ee75e2:9acafdb6 Events : 0.3 Number Major Minor RaidDevice State 0 8 16 0 active sync /dev/sdb 1 8 32 1 active sync /dev/sdc 2 8 6 - spare /dev/sda6 |
# mdadm --manage /dev/md0 --add /dev/sda6 ←Adding spare disk mdadm: added /dev/sda6 |
For RAID1/5/6, if no spare disk was reserved beforehand and a spare disk is immediately added when a hard drive fails and degrades, the RAID will automatically start rebuilding.
The "--manage" option can usually be omitted, so the above example can be simplified to: mdadm /dev/md0 --add /dev/sda6. After adding the spare disk, remember to edit the "DEVICE" section in "/etc/mdadm.conf". Otherwise, the spare disk might not function correctly after a reboot.
The following example shows how to simulate the failure of a hard disk in a RAID 1 array that was created in create mode.
Example: # mdadm /dev/md0 --fail /dev/sdc ←Simulate "/dev/sdc" failure mdadm: set /dev/sdc faulty in /dev/md0 # mdadm --detail /dev/md0 ←Observe whether the RIAD is rebuilt . . . Rebuild Status : 8% complete ←Rebuild progress UUID : 0fcf1a50:057f9442:49ee75e2:9acafdb6 Events : 0.56 Number Major Minor RaidDevice State 0 8 16 0 active sync /dev/sdb 1 8 6 1 spare rebuilding /dev/sda6 ↑spare-dsik is automatically added to rebuild RAID 2 8 8 32 faulty spare /dev/sdac ↑"/dev/sdc" is marked as faulty # cat /proc/mdstat ←Monitor RAID status Personalities : [raid1] md0 : active raid1 sda6[2] sdb[0]sdc[1](F)←(F) means faulty broken 3767808 blocks [2/1] [U_] ←one of the two hard drives in RAID1 is broken [===>................] resync = 18.1% (6874224/3767808) finish=4.5 min speed=12572K/sec ←Display the syn. completion progress and time # mdadm /dev/md0 --remove /dev/sdc ←Remove the faulty hard disk mdadm: hot removed /dev/sdc |
This test shows the benefits of RAID fault tolerance and the use of a spare disk. If a hard disk fails in a RAID 1/5/6 array, the spare disk will be automatically added. If the hard disk controller supports hot-swap, you can remove and replace the failed hard disk without shutting down the system. If you are not sure, you should always shut down the system before removing or replacing a hard disk, otherwise the hardware may be damaged.[Note1.1]
The mdadm /dev/md0 --fail /dev/sdc --remove /dev/sdc command can be used to combine the "--fai"l and "--remove options".
Since the original spare disk will replace the failed hard disk in a RAID array, it is best to add another spare disk as a backup. If the device name of the additional spare disk changes, remember to update the "/etc/mdadm.conf" file.
# umount /dev/md1 ←Unmount the RAID on the md-device # mdadm /dev/md1 --stop /dev/md1 ←←Stop the md-device mdadm: stopped /dev/md1 # mdadm --assemble /dev/md1 ←Restart the md-device mdadm: /dev/md0 has been started with 3 drivers and 1 spare. |
Monitor mode is generally used with the --daemonize or --daemonise option to let it monitor in the background.
The following are the commonly used monitor mode options:
# mdadm --monitor --scan --daemonize --mail=abc@123.com ←Send email notifications when any RAID has an abnormality. # mdadm --monitor /dev/md2 --daemonize --mail=123@abc.com ←Send email notifications only when "/etc/md2" has an abnormality |
# mdadm --grow /dev/md0 --size=max ←Modify size of md0 # resize2fs /dev/md0 ←Adjust filesystem size |
# mdadm --detail /dev/md0 ←Display detailed information about md0 # mdadm --detail --scan ←Scan all md-devices |
# mdadm --examine /dev/sdb ←Check if /dev/sdb has RAID metadata |
Sometimes, when creating a RAID using mdadm --create md-DEVICE, if previous RAID metadata exists on a hard disk or partition, you can try using mdadm --zero-superblock DEVICE to completely clear the metadata from the disk before retrying.
# umount /dev/md3 ←Unmount first # mdadm /dev/md0 --stop /dev/md3 ←Stop themd # mdadm --zero-superblock /dev/sda /dev/sdb /dev/sdc ←Clear information from all disks in the md assembly # rm /etc/mdadm.conf ←Delete the configuration file |
This option scans "/etc/mdadm.conf" for settings to use as a reference.
For example: # mdadm --assemble --scan ←Scan `/etc/mdadm.conf` and restart RAID # mdadm --detail --scan ←Scan all md-devices |
# mdadm --create /dev/md0 -v --raid-devices=2 --level=0 /dev/md1 /dev/md2 |
# mdadm -v --create /dev/md0 --level=raid10 --raid-devices=4 /dev/sdb /dev/sdc /dev/sdd /dev/sde ←Create RAID 10 |
# mdadm --create /dev/md0 --raid-devices=3 --level=5 /dev/sdb1 /dev/sdc1 /dev/sdd1 ↑ First, create a RAID 5 array # mdadm --create /dev/md1 --raid-devices=3 --level=5 /dev/sde1 /dev/sdf1 /dev/sdg1 ↑ Create another RAID 5 array # mdadm --create /dev/md3 --raid-devices=2 --level=0 /dev/md0 /dev/md1 ↑ Combine the two RAID 5 arrays into a RAID 0 array, creating RAID 50 |