Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Softpanorama Search

Linux Software RAID

News

Recommended Books

Recommended Links

Reference

Tutorials

LVM

RAID Levels

Linux Multipath IO udev       Humor Etc

Software RAID gives ability to create RAID arrays without having RAID-cable hardware disk controller. It provides the capability of using Raid 0, 1, and 5 (see RAID Levels)

Linux has a rather clumsy way of creating software RAID. It uses a separate driver called md,  which is not integrated into LVM and has a separate administration utility called  mdadm, The latter has different syntax then LVM commands. This is a typical design blunder (the first workable solution was adopted and became entrenched), but we need to live with it.

mdadm can operated on partitions what were created (or converted to) special partition id:  fd.

The example that follows creates a single partition (/dev/sdb1) on the second SCSI drive (/dev/sdb) and marks it as an automatically detectable RAID partition:

# fdisk /dev/sdb
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1116, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-1116, default 1116): 1116

Change the drive type of /dev/sdb1 to Linux Raid Auto (0xFD) so it can be detected automatically at boot time:

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd

TIPS:

 mdadm provides for a half-dozen operations. Here is the list from the man page:

Two modes, Create and Assemble, are used to configure and activate arrays. Manage mode is used to manipulate devices in an active array. Monitor mode allows administrators to configure event notification and actions for arrays. Build mode is used when working with legacy arrays that use an old version of the md driver.

The most often used mdadm command is create:

mdadm -v --create /dev/md-device  --level=<num> --raid-devices=<num> <device list>

Options are as following:

For example:

mdadm -v --create /dev/raid1  --level=1 --raid-devices=2 /dev/sdb /dev/sdc

Here is a more detailed discussion from InformIT Managing Storage in Red Hat Enterprise Linux 5 Understanding RAID:

RAID-0 and RAID-1 are often combined to gain the advantages of both. They can be combined as RAID-0+1 which means that two volumes are mirrored while each volume is striped internally. The other combination is RAID-1+0 where each disk is mirrored and striping is done across all mirrors. RAID-0+1 may be considered less reliable because if each half looses any one disk, the entire volume fails.

When creating partitions to use for the RAID device, make sure they are of type Linux raid auto. In fdisk, this is partition id fd. After creating the partitions for the RAID device, use the following syntax as the root user to create the RAID device:
mdadm --create /dev/mdX --level=<num> --raid-devices=<num> <device list>  
The progress of the device creation can be monitored with the following command as root:
tail -f /proc/mdstat
mdadm --create /dev/md0 --level=1 --raid-devices=3 /dev/sda5 /dev/sda6 /dev/sda7

The command cat /proc/mdstat should show output similar to:  

Personalities : [raid0] [raid1]
md0 : active raid1 sda7[2] sda6[1] sda5[0]
      10241280 blocks [3/3] [UUU]
      [>....................]  resync =  0.0% (8192/10241280) finish=62.3min
speed=2730K/sec

unused devices: <none>

The RAID device /dev/md0 is created. To add a partition to a RAID device, execute the following as root after creating the partition of type Linux raid auto (fd in fdisk):

mdadm /dev/mdX -a <device list>

To add /dev/sda8 to the /dev/md0 RAID device created in the previous section:

mdadm /dev/md0 -a /dev/sda8

The /dev/sda8 partition is now a spare partition in the RAID array.

Personalities : [raid0] [raid1]
md0 : active raid1 sda8[3](S) sda7[2] sda6[1] sda5[0]
      10241280 blocks [3/3] [UUU]
      [>....................]  resync =  0.6% (66560/10241280) finish=84.0min
speed=2016K/sec

unused devices: <none>

If a partition in the array fails, use the following to remove it from the array and rebuild the array using the spare partition already added:

mdadm /dev/mdX -f <failed device>

For example, to fail /dev/sda5 from /dev/md0 and replace it with the spare (assuming the spare has already been added):

mdadm /dev/md0 -f /dev/sda5

To verify that the device has been failed and that the rebuild has been complete and was successful, monitor the /proc/mdstat file (output shown in Listing 7.9):

tail -f /proc/mdstat

Notice that /dev/sda5 is now failed and that /dev/sda8 has changed from a spare to an active partition in the RAID array.

Failing a Partition and Replacing with a Spare

Personalities : [raid0] [raid1]
md0 : active raid1 sda8[3] sda7[2] sda6[1] sda5[4](F)
      10241280 blocks [3/2] [_UU]
      [>....................]  recovery =  0.2% (30528/10241280) finish=11.1min
speed=15264K/sec

unused devices: <none>

Monitoring RAID Devices

The following commands are useful for monitoring RAID devices:

 

Old News ;-)

Deploying Linux on IBM eServer pSeries Clusters (SG247014)

3.7.2 Using the command line to create RAID

In Example 3-12 we have two disks; a small part of the first is used for / partition and a swap device, and the second disk is empty.

We can create a logical partition on our first disk and mirror it to the partition on the second disk. For better compatibility and performance, we choose to span identical cylinders.

Example 3-12 Starting point software RAID 

# fdisk -l
Disk /dev/sda: 255 heads, 63 sectors, 17849 cylinders
Units = cylinders of 16065 * 512 bytes

 

   Device Boot    Start       End    Blocks   Id  System
/dev/sda1   *         1         1      8001   41  PPC PReP Boot
/dev/sda3            15       537   4200997+  83  Linux
/dev/sda4           538     17848 139050607+   5  Extended
/dev/sda5           538       799   2104483+  82  Linux swap

 

Disk /dev/sdb: 255 heads, 63 sectors, 17849 cylinders
Units = cylinders of 16065 * 512 bytes

 

   Device Boot    Start       End    Blocks   Id  System

First we create a RAID partition on the first disk (we type: fdisk /dev/sda, n, l, Enter, Enter, t, 6, fd).

n - new
l - logical
enter - use default starting cylinder
enter - use default ending cylinder
t - change type
6 - number of partition which type we want change
fd - Type Linux Software RAID autodetect

Example 3-13 Creating a RAID partition on the first disk

fdisk /dev/sda
The number of cylinders for this disk is set to 17849.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

 

Command (m for help): n
Command action
   l   logical (5 or over)
   p   primary partition (1-4)
l
First cylinder (800-17848, default 800): 
Using default value 800
Last cylinder or +size or +sizeM or +sizeK (800-17848, default 17848): 
Using default value 17848

 

Command (m for help): t
Partition number (1-6): 6
Hex code (type L to list codes): fd
Changed system type of partition 6 to fd (Linux raid autodetect)

 

Command (m for help): w
The partition table has been altered!

 

Calling ioctl() to re-read partition table.

 

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.

In Example 3-14, we create a RAID partition on the second disk and use 800 as a start cylinder in order to be analog to the first one. We will lose additional space anyway if we are going to mirror the partitions.

Example 3-14 Creating a RAID partition on the second disk

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 4
First cylinder (1-17849, default 1): 800
Last cylinder or +size or +sizeM or +sizeK (800-17849, default 17849): 
Using default value 17849

 

Command (m for help): t
Partition number (1-4): 4
Hex code (type L to list codes): fd
Changed system type of partition 4 to fd (Linux raid autodetect)

 

Command (m for help): w
The partition table has been altered!

 

Calling ioctl() to re-read partition table.

 

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.

Now we have partition sda6 and sdb4 ready for RAID. In order to define what kind of RAID we want to create, we edit /etc/raidtab as shown in Example 3-15.

Example 3-15 /etc/raidtab file

 

raiddev /dev/md0
        raid-level   raid1
        nr-raid-disks   2
        chunk-size      32
        persistent-superblock 1
        device          /dev/sda6
        raid-disk       0
        device          /dev/sdb4
        raid-disk       1

Now we run the mkraid command in order to create a RAID device defined in /etc/raidtab, as shown in Example 3-16.

Example 3-16 mkraid

#  mkraid /dev/md0
handling MD device /dev/md0
analyzing super-block
disk 0: /dev/sda6, 136946061kB, raid superblock at 136945984kB
disk 1: /dev/sdb3, 136954125kB, raid superblock at 136954048kB

We can watch the status of our RAID device by issuing the cat /proc/mdstat command, as shown in Example 3-17.

Example 3-17 Checking RAID status

# cat /proc/mdstat 
Personalities : [raid1] 
read_ahead 1024 sectors
md0 : active raid1 sdb3[1] sda6[0]
      136945984 blocks [2/2] [UU]
      [>....................]  resync =  1.8% (2538560/136945984) finish=140.7min speed=15917K/sec
unused devices: <none>

 

 

Note: We do not need to wait for reconstruction to finish in order to use the RAID device; the synchronization is done using idle I/O bandwidth. The process is transparent, so we can use the device (place LVM on it, partition and mount) although the disks are not synchronized yet. If one disk fails during the synchronization, we will need our backup tape.

 

Now we can create a volume group and add /dev/md0 to it, as shown in Example 3-18.

Example 3-18 Creating VG on RAID device

# vgcreate raidvg /dev/md0
vgcreate -- INFO: using default physical extent size 4 MB
vgcreate -- INFO: maximum logical volume size is 255.99 Gigabyte
vgcreate -- doing automatic backup of volume group "raidvg"
vgcreate -- volume group "raidvg" successfully created and activated

And a logical volume in this volume group, as shown in Example 3-19.

Example 3-19 Creating LV in raidvg

lvcreate  -L 20G -n mirrordata1 raidvg
lvcreate -- doing automatic backup of "raidvg"
lvcreate -- logical volume "/dev/raidvg/mirrordata1" successfully created

This newly created volume can be formatted and mounted.

 

mdadm A New Tool For Linux Software RAID Management - O'Reilly Media

 

Linux Software RAID - A Belt and a Pair of Suspenders Linux Magazine

This article will present a simple example with two drives. For this article, a CentOS 5.3 distribution was used on the following system:

Using this configuration a simple RAID-1 configuration is created between /dev/sdb and /dev/sdc.

Step 1 - Set the ID of the drives
The first step in the creation of a RAID-1 group is to set the ID of the drives that are to be part of the RAID group. The type is “fd” (Linux raid autodetect) and needs to be set for all partitions and/or drives used in the RAID group. You can check the partition types fairly easy:

fdisk -l /dev/sdb

[Feb 9, 2009] USB Hard Drive in RAID1

January 31, 2008 | www.bgevolution.com

This concept works just as for an internal hard drive. Although, USB drives seem to not remain part of the array after a reboot, therefore to use a USB device in a RAID1 setup, you will have to leave the drive connected, and the computer running. Another tactic is to occasionally sync your USB drive to the array, and shut down the USB drive after synchronization. Either tactic is effective.

You can create a quick script to add the USB partitions to the RAID1.

The first thing to do when synchronizing is to add the partition:

sudo mdadm --add /dev/md0 /dev/sdb1

I have 4 partitions therefore my script contains 4 add commands.

Then grow the arrays to fit the number of devices:

sudo mdadm --grow /dev/md0 --raid-devices=3

After growing the array your USB drive will magically sync USB is substantially slower than SATA or PATA. Anything over 100 Gigabytes will take some time. My 149 Gigabyte /home partition takes about an hour and a half to synchronize. Once its synced I do not experience any apparent difference in system performance.

[April 28, 2008 ] Managing RAID and LVM with Linux

Initial set of a RAID-5 array

I recommend you experiment with setting up and managing RAID and LVM systems before using it on an important filesystem. One way I was able to do it was to take old hard drive and create a bunch of partitions on it (8 or so should be enough) and try combining them into RAID arrays. In my testing I created two RAID-5 arrays each with 3 partitions. You can then manually fail and hot remove the partitions from the array and then add them back to see how the recovery process works. You'll get a warning about the partitions sharing a physical disc but you can ignore that since it's only for experimentation.

In my case I have two systems with RAID arrays, one with two 73G SCSI drives running RAID-1 (mirroring) and my other test system is configured with three 120G IDE drives running RAID-5. In most cases I will refer to my RAID-5 configuration as that will be more typical.

I have an extra IDE controller in my system to allow me to support the use of more than 4 IDE devices which caused a very odd drive assignment. The order doesn't seem to bother the Linux kernel so it doesn't bother me. My basic configuration is as follows:

hda 120G drive
hdb 120G drive
hde 60G boot drive not on RAID array
hdf 120G drive
hdg CD-ROM drive
 
The first step is to create the physical partitions on each drive that will be part of the RAID array. In my case I want to use each 120G drive in the array in it's entirety. All the drives are partitioned identically so for example, this is how hda is partitioned:
Disk /dev/hda: 120.0 GB, 120034123776 bytes
16 heads, 63 sectors/track, 232581 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1      232581   117220792+  fd  Linux raid autodetect
So now with all three drives with a partitioned with id fd Linux raid autodetect you can go ahead and combine the partitions into a RAID array:
# /sbin/mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 \
	/dev/hdb1 /dev/hda1 /dev/hdf1
Wow, that was easy. That created a special device /dev/md0 which can be used instead of a physical partition. You can check on the status of that RAID array with the mdadm command:
# /sbin/mdadm --detail /dev/md0
        Version : 00.90.01
  Creation Time : Wed May 11 20:00:18 2005
     Raid Level : raid5
     Array Size : 234436352 (223.58 GiB 240.06 GB)
    Device Size : 117218176 (111.79 GiB 120.03 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Jun 10 04:13:11 2005
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 36161bdd:a9018a79:60e0757a:e27bb7ca
         Events : 0.10670

    Number   Major   Minor   RaidDevice State
       0       3        1        0      active sync   /dev/hda1
       1       3       65        1      active sync   /dev/hdb1
       2      33       65        2      active sync   /dev/hdf1
The important lines to see are the State line which should say clean otherwise there might be a problem. At the bottom you should make sure that the State column always says active sync which says each device is actively in the array. You could potentially have a spare device that's on-hand should any drive should fail. If you have a spare you'll see it listed as such here.

One thing you'll see above if you're paying attention is the fact that the size of the array is 240G but I have three 120G drives as part of the array. That's because the extra space is used as extra parity data that is needed to survive the failure of one of the drives.

Initial set of LVM on top of RAID

Now that we have /dev/md0 device you can create a Logical Volume on top of it. Why would you want to do that? If I were to build an ext3 filesystem on top of the RAID device and someday wanted to increase it's capacity I wouldn't be able to do that without backing up the data, building a new RAID array and restoring my data. Using LVM allows me to expand (or contract) the size of the filesystem without disturbing the existing data.

Anyway, here are the steps to then add this RAID array to the LVM system. The first command pvcreate will "initialize a disk or partition for use by LVM". The second command vgcreate will then create the Volume Group, in my case I called it lvm-raid:

# pvcreate /dev/md0
# vgcreate lvm-raid /dev/md0
The default value for the physical extent size can be too low for a large RAID array. In those cases you'll need to specify the -s option with a larger than default physical extent size. The default is only 4MB as of the version in Fedora Core 5. The maximum number of physical extents is approximately 65k so take your maximum volume size and divide it by 65k then round it to the next nice round number. For example, to successfully create a 550G RAID let's figure that's approximately 550,000 megabytes and divide by 65,000 which gives you roughly 8.46. Round it up to the next nice round number and use 16M (for 16 megabytes) as the physical extent size and you'll be fine:
# vgcreate -s 16M <volume group name>
Ok, you've created a blank receptacle but now you have to tell how many Physical Extents from the physical device (/dev/md0 in this case) will be allocated to this Volume Group. In my case I wanted all the data from /dev/md0 to be allocated to this Volume Group. If later I wanted to add additional space I would create a new RAID array and add that physical device to this Volume Group.

To find out how many PEs are available to me use the vgdisplay command to find out how many are available and now I can create a Logical Volume using all (or some) of the space in the Volume Group. In my case I call the Logical Volume lvm0.

# vgdisplay lvm-raid
	.
	.
   Free  PE / Size       57235 / 223.57 GB
# lvcreate -l 57235 lvm-raid -n lvm0
In the end you will have a device you can use very much like a plain 'ol partition called /dev/lvm-raid/lvm0. You can now check on the status of the Logical Volume with the lvdisplay command. The device can then be used to to create a filesystem on.
# lvdisplay /dev/lvm-raid/lvm0 
  --- Logical volume ---
  LV Name                /dev/lvm-raid/lvm0
  VG Name                lvm-raid
  LV UUID                FFX673-dGlX-tsEL-6UXl-1hLs-6b3Y-rkO9O2
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                223.57 GB
  Current LE             57235
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:2
# mkfs.ext3 /dev/lvm-raid/lvm0
	.
	.
# mount /dev/lvm-raid/lvm0 /mnt
# df -h /mnt
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/lvm--raid-lvm0
                       224G   93M  224G   1% /mnt

Handling a Drive Failure

As everything eventually does break (some sooner than others) a drive in the array will fail. It is a very good idea to run smartd on all drives in your array (and probably ALL drives period) to be notified of a failure or a pending failure as soon as possible. You can also manually fail a partition, meaning to take it out of the RAID array, with the following command:
# /sbin/mdadm /dev/md0 -f /dev/hdb1
mdadm: set /dev/hdb1 faulty in /dev/md0

Once the system has determined a drive has failed or is otherwise missing (you can shut down and pull out a drive and reboot to similate a drive failure or use the command to manually fail a drive above it will show something like this in mdadm:

# /sbin/mdadm --detail /dev/md0
     Update Time : Wed Jun 15 11:30:59 2005
           State : clean, degraded
  Active Devices : 2
 Working Devices : 2
  Failed Devices : 1
   Spare Devices : 0
	.
	.
     Number   Major   Minor   RaidDevice State
        0       3        1        0      active sync   /dev/hda1
        1       0        0        -      removed
        2      33       65        2      active sync   /dev/hdf1
You'll notice in this case I had /dev/hdb fail. I replaced it with a new drive with the same capacity and was able to add it back to the array. The first step is to partition the new drive just like when first creating the array. Then you can simply add the partition back to the array and watch the status as the data is rebuilt onto the newly replace drive.
# /sbin/mdadm /dev/md0 -a /dev/hdb1
# /sbin/mdadm --detail /dev/md0
     Update Time : Wed Jun 15 12:11:23 2005
           State : clean, degraded, recovering
  Active Devices : 2
 Working Devices : 3
  Failed Devices : 0
   Spare Devices : 1

          Layout : left-symmetric
      Chunk Size : 64K

  Rebuild Status : 2% complete
During the rebuild process the system performance may be somewhat impacted but the data should remain in-tact.

Expanding an Array/Filesytem

I'm told it's now possible to expand the size of a RAID array much as you could on a commercial array such as the NetApp. The link below describes the procedure. I have yet to try it but it looks promising:
Growing a RAID5 array - http://scotgate.org/?p=107

Recommended Links

RAID - Wikipedia, the free encyclopedia

mdadm(8) manage MD devices aka Software Raid - Linux man page

The Software-RAID HOWTO Jakob Østergaard & Emilio Bueso (v1.1, 2004-06-03 )

Ubuntu Fake Raid Howto

Runtime Software RAID Reconstructor Tutorial

RAID0 Implementation Under Linux

Linux Software RAID - A Belt and a Pair of Suspenders Linux Magazine

InformIT Managing Storage in Red Hat Enterprise Linux 5 Understanding RAID:

How To Set Up Software RAID1 On A Running LVM System (Incl. GRUB Configuration) (Fedora 8) HowtoForge - Linux Howtos and Tutorials



Copyright © 1996-2009 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Disclaimer:

Last modified: September 11, 2009