|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
|
| Old News ;-) | Certification | Recommended Links | Mirroring root filesystem | Reference | Selected man pages | |
| Forcedirdctio optimization |
Sync buffer size optimization |
Tips | Humor | Etc |
RAID-1 volume in Solaris can be treated using two methods:
The syntax of metainit command is:
metainit mirror_volume -m submirror [read_options] [write_options] [pass_num]
where:
As an example let's create a mirrored volume d10, and attach volume d11 as a one-way mirror using. After that let's attach volume d11 is a submirror of the mirrored volume d10.
# /usr/sbin/metainit d10 -m d11
d10: Mirror is setup
After creation of a mirror you need to update the /etc/vfstab file to change the mount point from a slice, such as /dev/dsk/c#t#d#s#, to a volume, such as /dev/md/dsk/d10.
metaroot device
where device specifies the mirrored volume used for the root file system.
The following example shows that the /etc/vfstab file has been updated by the metaroot command to point to the RAID-1 mirrored metadevice.
# metaroot d10 && grep md /etc/vfstab
/dev/md/dsk/d10/dev/md/rdsk/d10/ufs1no-
In addition to modifying the /etc/vfstab file to update the root (/) file system pointer, the metaroot command updates the /etc/system file to contain the forceload statement that loads the kernel modules that support the logical volumes. For example:
# tail /etc/system
forceload: misc/md_hotspares
forceload: misc/md_sp
forceload: misc/md_stripe
forceload: misc/md_mirror
forceload: drv/pcipsy
forceload: drv/simba
forceload: drv/glm
forceload: drv/sd
rootdev:/pseudo/md@0:0,10,blk
You must reboot the system before attaching the secondary submirror.
Enter the init command to reboot the system using
reboot
After the reboot, attach the secondary submirror by using the
metattach command:
# metattach d10 d12
d10: submirror d12 is attached
Note: If the metattach command is not used, no resynchronization operations occur. As a result, data could become corrupted as the SVM assumes that both sides of the mirror are identical and can be used interchangeably.
If you mirror your root (/) file system, you also need to record the alternate boot path contained in the boot-device PROM variable. In the following example, you determine the path to the alternate boot device by using the ls -l command on the slice that is being attached as the secondary submirror to the root (/) mirror.
# ls -l /dev/dsk/c1t2d0s1
lrwxrwxrwx 1 root root 46 Feb 28 08:58 /dev/dsk/c1t2d0s1 -> ../../devices/pci@1f,0/pci@1/scsi@4,1/sd@2,0:b
Record the path that follows the /devices directory:
/pci@1f,0/pci@1/scsi@4,1/sd@2,0:b
Note: When using old disk controllers in Ultra 5 and Ultra 10, the path to the device might differ between the entries in the /devices directory and the entries in the OpenBoot PROM. In these instances, follow the entries in the OpenBoot PROM. If you do not take into account existing difference and use entry from /devices directory adapt to the you get an error:
can’t open boot device
To get the system to boot automatically from the alternate boot device in the event of a primary root submirror failure, complete the following steps:
ok nvalias backup_root /pci@1f,0/pci@1/scsi@4,1/disk@2,0:b
ok printenv boot-device
boot-device= disk net
ok setenv boot-device disk backup_root net
boot-device= disk backup_root net
ok boot backup_root
If system can book OK then in the event of primary root disk failure, the system will automatically try to boot from the secondary submirror.
Run the metastat command on the mirror to verify that submirror 0 is in the Okay state, for example:
# metastat d10
2. Run the metadetach command on the mirror to make a one-way mirror.
# metadetach d10 d12
d10: submirror d12 is detached
3. Because this is a root (/) file system mirror, run the metaroot command again in order to update the /etc/vfstab and /etc/system files.
# metaroot /dev/dsk/c0t0d0s0 && grep c0t0d0s0 /etc/vfstab
4. Reboot the system using reboot
5. Run the metaclear command to clear the mirror and submirrors. The -r option recursively deletes specified metadevices and hot spare pools, associated with the targeted metadevices specified in the metaclear command. For example
# metaclear -r d10
d10: Mirror is cleared
d11: Concat/Stripe is cleared
# metaclear d12
d12: Concat/Stripe is cleared
|
Andre Molyneux's Weblog Weblog
+--------+ +--------+ | Chunk 1| |Chunk 2 | +--------+ +--------+ | Chunk 3| |Chunk 4 | +--------+ +--------+ | Chunk 5| |Chunk 6 | +--------+ +--------+ Disk A Disk B | | +--------Stripe---------+
In this case the term RAID is a misnomer as no redundancy is gained. The address space is 'interlaced' across the disks, improving performance for I/Os bigger than the interlace/chunk size as a single I/O will spread across multiple disks. In the example above, for an interlace size of 16k, the first 16k of data would reside on Disk A, the second 16k on Disk B, the third 16k on Drive A, and so on. The interlace size needs to be chosen when the logical device is created and can't be changed later without recreating the logical device from scratch.
+--------+ +--------+ | Copy A | <-> | Copy B | +--------+ +--------+ Disk A Disk B | | +--------Mirror---------+
The advantages are that you can lose either disk and the
data will still be accessible, and reads can be alternated between the two
disks to improve performance. The drawbacks are that you've doubled your
storage costs and incurred additional overhead by having to generate two
writes to physical devices for every one that's done to the logical mirror
device.
+--------+ | Disk A | +--------+ | Disk B | +--------+ | Disk C | +--------+ Concat
A concatenation aggregates several smaller physical devices into one large logical device. Unlike a stripe, the address space isn't interlaced across the underlying devices. This means there's no performance gain from using a concatenated device.
Since RAID0 improves performance, and RAID1 provides redundancy, someone came up with the idea to combine them. Fast and reliable. Two great tastes that taste great together!
When combining these two types of 'logical' devices there's a choice to be made -- do you mirror two stripes, or do you stripe across multiple mirrors? There are pros and cons to each approach:
+------------------------------------+ +------------------------------------+
| +--------+ +--------+ +--------+ | | +--------+ +--------+ +--------+ |
| | Chunk 1| |Chunk 2 | |Chunk 3 | | | | Chunk 1| |Chunk 2 | |Chunk 3 | |
| +--------+ +--------+ +--------+ | | +--------+ +--------+ +--------+ |
| | Chunk 4| |Chunk 5 | |Chunk 6 | | | | Chunk 4| |Chunk 5 | |Chunk 6 | |
| +--------+ +--------+ +--------+ |<--->| +--------+ +--------+ +--------+ |
| | Chunk 7| |Chunk 8 | |Chunk 9 | | | | Chunk 7| |Chunk 8 | |Chunk 9 | |
| +--------+ +--------+ +--------+ | | +--------+ +--------+ +--------+ |
| Disk A Disk B Disk C | | Disk D Disk E Disk F |
+------------------------------------+ +------------------------------------+
Stripe 1 Stripe 2
| |
+-----------------------------------Mirror--------------------------------------+
Advantage: Simple
administrative model. Issue one command to create the first stripe, a
second command to create the second stripe, and a third command to mirror
them. Three commands and you're done, regardless of the number of disks in
the configuration.
Disadvantage: An error on any one of the disks kills redundancy for
all disks. For instance, a failure on Disk B above 'breaks' the Stripe 1
side of the mirror. As a result, should disk D, E, or F fail as well, the
entire mirror becomes unusable.
+-----------------+ +-----------------+ +-----------------+
| Chunk 1 | | Chunk 2 | | Chunk 3 |
+-----------------+ +-----------------+ +-----------------+
| Chunk 4 | | Chunk 5 | | Chunk 6 |
+-----------------+ +-----------------+ +-----------------+
| Chunk 7 | | Chunk 8 | | Chunk 9 |
+-----------------+ +-----------------+ +-----------------+
Disk A <---> Disk B Disk C <---> Disk D Disk E <---> Disk F
Mirror 1 Mirror 2 Mirror 3
| |
+-----------------------------------------------------------+
Stripe
Advantage: A failure on one disk only impacts redundancy for the
chunks of the stripe that are located on that disk. For instance, a
failure on Disk B above only loses redundancy for every third chunk (1, 4,
7, etc.) Redundancy for the other stripe chunks is unaffected, so a second
disk failure could be tolerated as long as the second failure wasn't on
Disk A.
Disadvantage: More complicated from an administrative standpoint.
The administrator needs to issue one creation command per mirror, then a
command to stripe across the mirrors. The six-disk example above would
require four commands to create, while a twelve disk configuration would
require seven commands.
SVM specifics
So, does SVM do RAID 0+1 or RAID 1+0? The answer is, "Yes." So it gives you a choice between the two? The answer is "No."
Obviously further explanation is necessary...
In SVM, mirror devices cannot be created from "bare" disks. You are required to create the mirror on top of another type of SVM metadevice, known as a concat/stripe*. SVM combines concatenations and stripes into a single metadevice type, in which one or more stripes are concatenated together. When used to build a mirror these concat/stripe logical devices are known as submirrors. If you want to expand the size of a mirror device you can do so by concatenating additional stripe(s) onto the concat/stripe devices that are serving as submirrors.
So, in SVM, you are always required to set up a stripe (concat/stripe) in order to create a mirror. On the surface this makes it appear that SVM does RAID 0+1. However, once you understand a bit about the SVM mirror code, you'll find RAID 1+0 lurking under the covers.
SVM mirrors are logically divided up into regions. The state of each mirror region is recorded in state database replicas* stored on disk. By individually recording the state of each region in the mirror, SVM can be smart about how it performs a resync. Following a disk failure or an unusual event (e.g. a power failure occurs after the first side of a mirror has been written to but before the matching write to the second side can be accomplished), SVM can determine which regions are out-of-sync and only synchronize them, not the entire mirror. This is known as an optimized resync.
The optimized resync mechanisms allow SVM to gain the redundancy benefits of RAID 1+0 while keeping the administrative benefits of RAID 0+1. If one of the drives in a concat/stripe device fails, only those mirror regions that correspond to data stored on the failed drive will lose redundancy. The SVM mirror code understands the layout of the concat/stripe submirrors and can therefore determine which resync regions reside on which underlying devices. For all regions of the mirror not affected by the failure, SVM will continue to provide redundancy, so a second disk failure won't necessarily prove fatal.
So, in a nutshell, SVM provides a RAID 0+1 style administrative interface but effectively implements RAID 1+0 functionality. Administrators get the best of each type, the relatively simple administration of RAID 0+1 plus the greater resilience of RAID 1+0 in the case of multiple device failures.
* concat/stripe logical devices (metadevices)
The following example shows a concat/stripe metadevice that's serving as a submirror to a mirror metadevice. Note that the metadevice is a concatenation of three separate stripes:
- Stripe 0 is a 1-way stripe (so not really striped at all) on disk slice c1t11d0s0.
- Stripe 1 is a 1-way stripe on disk slice c1t12d0s0.
- Stripe 2 is a 2-way stripe with an interlace size of 32 blocks on disk slices c1t13d0s1 and c1t14d0s2.
d1: Submirror of d0 State: Okay Size: 78003 blocks (38 MB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t11d0s0 0 No Okay Yes Stripe 1: Device Start Block Dbase State Reloc Hot Spare c1t12d0s0 0 No Okay Yes Stripe 2: (interlace: 32 blocks) Device Start Block Dbase State Reloc Hot Spare c1t13d0s1 0 No Okay Yes c1t14d0s2 0 No Okay Yes** State database replicas
SVM stores configuration and state information in a 'state database' in memory. Copies of this state database are stored on disk, where they are referred to as state database replicas. The primary purpose of the state database replicas is to provide non-volatile copies of the state database so that the SVM configuration is persistant across reboots. A secondary purpose of the replicas is to provide a 'scratch pad' to keep track of mirror region states.
Mirroring Disks with Solstice DiskSuite
BigAdmin - Submitted Tech Tip: Boot Disk Mirroring Using Solaris ...
Solaris Volume Manager Administration Guide
[PDF] Comprehensive Data Management Using Solaris™ Volume Manager Software
A Tool for Cold Mirroring of Solaris System Disks
Configuring
Boot Disks With Solaris Volume Manager Software (October 2002)
-by Erik Vanden Meersch and Kristien Hens
This article is an update to the April 2002 Sun BluePrints OnLine article,
Configuring Boot Disks With Solstice DiskSuite Software. This article
focuses on the Solaris 9 Operating Environment, Solaris Volume Manager software,
and VERITAS Volume Manager 3.2 software. It describe how to partition and mirror
the system disk, and how to create and maintain a backup system disk. In
addition, this article presents technical arguments for the choices made, and
includes detailed runbooks.
Solstice DiskSuite (SDS) disk mirroring
Sync buffer size optimization. RAID 1 volumes can benefit from increased sync buffer size to 1M (2048 512 blocks). To experiment use metasync -r 2048 command. For permanent changes:
set md_mirror:md_resync_bufsz = 2048
Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Last modified: June 05, 2008