Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

RAID Levels

Old News ;-)

Disk Repartitioning 

Recommended Links Reference Software RAID vs Hardware RAID Linux Software RAID Linux Disk Partitioning Linux  LVM
FAQs Mirroring Root Filesystem RAID 0 volumes
(striping)
RAID 1 volumes
(mirroring)
RAID 5 volumes RAID 0+1 RAID 1+0 Etc

RAID (Redundant Array of Independent Disks) is a set of methods (or levels) for storing data on a set of disks to improve performance, reliability, or both.

RAID levels 0, 1 and 5 are most common. RAID is most commonly implemented in hardware controllers. A RAID controller appears to a host computer just as any other storage device would. Behind the hardware RAID controller is a group of disk drives. Depending on which RAID level the controller is configured for, it will store data on the disks in different ways.

The common RAID levels have the following characteristics:

Software RAID vs Hardware RAID

The RAID levels can also be implemented in the host's software on any collection of individual disks. RAID-0 (striping) and RAID-1 (mirroring) are the simplest to implement in a host driver and there are many "software RAID" implementations.

Recovery procedures are the most difficult aspect of software RAID. Performance of software RAID may be slower than hardware RAID for a couple reasons. Software RAID levels one and higher often require more data to be transfered between hosts and storage than would be required for hardware RAID. For example, the host must make two writes to separate disks when maintaining mirrors whereas the data would be written only once to a hardware RAID device. In addition to the extra I/O to maintain parity, RAID-5 XOR calculations require extra CPU cycles.

Hardware RAID

The hardware based system manages the RAID subsystem independently from the host and presents to the host only a single disk per RAID array. This way the host doesn't have to be aware of the RAID subsystems(s).

Software RAID

Special and pretty complex driver is needed to implement software RAID solution. This is more error prone and less compatible then hardware based solutions, especially Fiber Channel based, but it is cheaper.

Just like any other application, software-based arrays occupy host system memory, consume CPU cycles and are operating system dependent. By contending with other applications that are running concurrently for host CPU cycles and memory, software-based arrays degrade overall server performance. Also, unlike hardware-based arrays, the performance of a software-based array is directly dependent on server CPU performance and load.

Except for the array functionality, hardware-based RAID schemes have very little in common with software-based implementations. Since the host CPU can execute user applications while the array adapter's processor simultaneously executes the array functions, the result is true hardware multi-tasking. Hardware arrays also do not occupy any host system memory, nor are they operating system dependent.

Hardware arrays are also highly fault tolerant. Since the array logic is based in hardware, software is NOT required to boot. Some software arrays, however, will fail to boot if the boot drive in the array fails. For example, an array implemented in software can only be functional when the array software has been read from the disks and is memory-resident. What happens if the server can't load the array software because the disk that contains the fault tolerant software has failed? Software-based implementations commonly require a separate boot drive, which may be included or not in the array.

NEWS CONTENTS

Old News ;-)

Andre Molyneux's Weblog

Since RAID0 improves performance, and RAID1 provides redundancy, someone came up with the idea to combine them. Fast and reliable. Two great tastes that taste great together!

When combining these two types of 'logical' devices there's a choice to be made -- do you mirror two stripes, or do you stripe across multiple mirrors? There are pros and cons to each approach:

[Nov 02, 2019] Raid-5 is obsolete if you use large drives , such as 2TB or 3TB disks. You should instead use raid-6 ( two disks can fail)

Notable quotes:
"... RAID5 can survive a single drive failure. However, once you replace that drive, it has to be initialized. Depending on the controller and other things, this can take anywhere from 5-18 hours. During this time, all drives will be in constant use to re-create the failed drive. It is during this time that people worry that the rebuild would cause another drive near death to die, causing a complete array failure. ..."
"... If during a rebuild one of the remaining disks experiences BER, your rebuild stops and you may have headaches recovering from such a situation, depending on controller design and user interaction. ..."
"... RAID5 + a GOOD backup is something to consider, though. ..."
"... Raid-5 is obsolete if you use large drives , such as 2TB or 3TB disks. You should instead use raid-6 ..."
"... RAID 6 offers more redundancy than RAID 5 (which is absolutely essential, RAID 5 is a walking disaster) at the cost of multiple parity writes per data write. This means the performance will be typically worse (although it's not theoretically much worse, since the parity operations are in parallel). ..."
Oct 03, 2019 | hardforum.com

RAID5 can survive a single drive failure. However, once you replace that drive, it has to be initialized. Depending on the controller and other things, this can take anywhere from 5-18 hours. During this time, all drives will be in constant use to re-create the failed drive. It is during this time that people worry that the rebuild would cause another drive near death to die, causing a complete array failure.

This isn't the only danger. The problem with 2TB disks, especially if they are not 4K sector disks, is that they have relative high BER rate for their capacity, so the likelihood of BER actually occurring and translating into an unreadable sector is something to worry about.

If during a rebuild one of the remaining disks experiences BER, your rebuild stops and you may have headaches recovering from such a situation, depending on controller design and user interaction.

So i would say with modern high-BER drives you should say:

So essentially you'll lose one parity disk alone for the BER issue. Not everyone will agree with my analysis, but considering RAID5 with today's high-capacity drives 'safe' is open for debate.

RAID5 + a GOOD backup is something to consider, though.

  1. So you're saying BER is the error count that 'escapes' the ECC correction? I do not believe that is correct, but i'm open to good arguments or links.

    As i understand, the BER is what prompt bad sectors, where the number of errors exceed that of the ECC error correcting ability; and you will have an unrecoverable sector (Current Pending Sector in SMART output).

    Also these links are interesting in this context:

    http://blog.econtech.selfip.org/200...s-not-fully-readable-a-lawsuit-in-the-making/

    The short story first: Your consumer level 1TB SATA drive has a 44% chance that it can be completely read without any error. If you run a RAID setup, this is really bad news because it may prevent rebuilding an array in the case of disk failure, making your RAID not so Redundant. Click to expand...
    Not sure on the numbers the article comes up with, though.

    Also this one is interesting:
    http://lefthandnetworks.typepad.com/virtual_view/2008/02/what-does-data.html

    BER simply means that while reading your data from the disk drive you will get an average of one non-recoverable error in so many bits read, as specified by the manufacturer. Click to expand...
    Rebuilding the data on a replacement drive with most RAID algorithms requires that all the other data on the other drives be pristine and error free. If there is a single error in a single sector, then the data for the corresponding sector on the replacement drive cannot be reconstructed, and therefore the RAID rebuild fails and data is lost. The frequency of this disastrous occurrence is derived from the BER. Simple calculations will show that the chance of data loss due to BER is much greater than all other reasons combined. Click to expand...
    These links do suggest that BER works to produce un-recoverable sectors, and not 'escape' them as 'undetected' bad sectors, if i understood you correctly.
  1. parityOCP said:
    That's guy's a bit of a scaremonger to be honest. He may have a point with consumer drives, but the article is sensationalised to a certain degree. However, there are still a few outfits that won't go past 500GB/drive in an array (even with enterprise drives), simply to reduce the failure window during a rebuild. Click to expand...
    Why is he a scaremonger? He is correct. Have you read his article? In fact, he has copied his argument from Adam Leventhal(?) that was one of the ZFS developers I believe.

    Adam's argument goes likes this:
    Disks are getting larger all the time, in fact, the storage increases exponentially. At the same time, the bandwidth is increasing not that fast - we are still at 100MB/sek even after decades. So, bandwidth has increased maybe 20x after decades. While storage has increased from 10MB to 3TB = 300.000 times.

    The trend is clear. In the future when we have 10TB drives, they will not be much faster than today. This means, to repair an raid with 3TB disks today, will take several days, maybe even one week. With 10TB drives, it will take several weeks, maybe a month.

    Repairing a raid stresses the other disks much, which means they can break too. Experienced sysadmins reports that this happens quite often during a repair. Maybe because those disks come from the same batch, they have the same weakness. Some sysadmins therefore mix disks from different vendors and batches.

    Hence, I would not want to run a raid with 3TB disks and only use raid-5. During those days, if only another disk crashes you have lost all your data.

    Hence, that article is correct, and he is not a scaremonger. Raid-5 is obsolete if you use large drives, such as 2TB or 3TB disks. You should instead use raid-6 (two disks can fail). That is the conclusion of the article: use raid-6 with large disks, forget raid-5. This is true, and not scaremongery.

    In fact, ZFS has therefore something called raidz3 - which means that three disks can fail without problems. To the OT: no raid-5 is not safe. Neither is raid-6, because neither of them can not always repair nor detect corrupted data. There are cases when they dont even notice that you got corrupted bits. See my other thread for more information about this. That is the reason people are switching to ZFS - which always CAN detect and repair those corrupted bits. I suggest, sell your hardware raid card, and use ZFS which requires no hardware. ZFS just uses JBOD.

    Here are research papers on raid-5, raid-6 and ZFS and corruption:
    http://hardforum.com/showpost.php?p=1036404173&postcount=73

  1. brutalizer said:
    The trend is clear. In the future when we have 10TB drives, they will not be much faster than today. This means, to repair an raid with 3TB disks today, will take several days, maybe even one week. With 10TB drives, it will take several weeks, maybe a month. Click to expand...
    While I agree with the general claim that the larger HDDs (1.5, 2, 3TBs) are best used in RAID 6, your claim about rebuild times is way off.

    I think it is not unreasonable to assume that the 10TB drives will be able to read and write at 200 MB/s or more. We already have 2TB drives with 150MB/s sequential speeds, so 200 MB/s is actually a conservative estimate.

    10e12/200e6 = 50000 secs = 13.9 hours. Even if there is 100% overhead (half the throughput), that is less than 28 hours to do the rebuild. It is a long time, but it is no where near a month! Try to back your claims in reality.

    And you have again made the false claim that "ZFS - which always CAN detect and repair those corrupted bits". ZFS can usually detect corrupted bits, and can usually correct them if you have duplication or parity, but nothing can always detect and repair. ZFS is safer than many alternatives, but nothing is perfectly safe. Corruption can and has happened with ZFS, and it will happen again.

Is RAID5 safe with Five 2TB Hard Drives ? | [H]ard|Forum Your browser indicates if you've visited this link

https://hardforum.com /threads/is-raid5-safe-with-five-2tb-hard-drives.1560198/

Hence, that article is correct, and he is not a scaremonger. Raid-5 is obsolete if you use large drives , such as 2TB or 3TB disks. You should instead use raid-6 ( two disks can fail). That is the conclusion of the article: use raid-6 with large disks, forget raid-5 . This is true, and not scaremongery.

RAID 5 Data Recovery How to Rebuild a Failed RAID 5 - YouTube

RAID 5 vs RAID 10: Recommended RAID For Safety and ... Your browser indicates if you've visited this link

https://www.cyberciti.biz /tips/raid5-vs-raid-10-safety-performance.html

RAID 6 offers more redundancy than RAID 5 (which is absolutely essential, RAID 5 is a walking disaster) at the cost of multiple parity writes per data write. This means the performance will be typically worse (although it's not theoretically much worse, since the parity operations are in parallel).

SVM specifics

So, does SVM do RAID 0+1 or RAID 1+0? The answer is, "Yes." So it gives you a choice between the two? The answer is "No."

Obviously further explanation is necessary...

In SVM, mirror devices cannot be created from "bare" disks. You are required to create the mirror on top of another type of SVM metadevice, known as a concat/stripe*. SVM combines concatenations and stripes into a single metadevice type, in which one or more stripes are concatenated together. When used to build a mirror these concat/stripe logical devices are known as submirrors. If you want to expand the size of a mirror device you can do so by concatenating additional stripe(s) onto the concat/stripe devices that are serving as submirrors.

So, in SVM, you are always required to set up a stripe (concat/stripe) in order to create a mirror. On the surface this makes it appear that SVM does RAID 0+1. However, once you understand a bit about the SVM mirror code, you'll find RAID 1+0 lurking under the covers.

SVM mirrors are logically divided up into regions. The state of each mirror region is recorded in state database replicas* stored on disk. By individually recording the state of each region in the mirror, SVM can be smart about how it performs a resync. Following a disk failure or an unusual event (e.g. a power failure occurs after the first side of a mirror has been written to but before the matching write to the second side can be accomplished), SVM can determine which regions are out-of-sync and only synchronize them, not the entire mirror. This is known as an optimized resync.

The optimized resync mechanisms allow SVM to gain the redundancy benefits of RAID 1+0 while keeping the administrative benefits of RAID 0+1. If one of the drives in a concat/stripe device fails, only those mirror regions that correspond to data stored on the failed drive will lose redundancy. The SVM mirror code understands the layout of the concat/stripe submirrors and can therefore determine which resync regions reside on which underlying devices. For all regions of the mirror not affected by the failure, SVM will continue to provide redundancy, so a second disk failure won't necessarily prove fatal.

So, in a nutshell, SVM provides a RAID 0+1 style administrative interface but effectively implements RAID 1+0 functionality. Administrators get the best of each type, the relatively simple administration of RAID 0+1 plus the greater resilience of RAID 1+0 in the case of multiple device failures.


* concat/stripe logical devices (metadevices)

The following example shows a concat/stripe metadevice that's serving as a submirror to a mirror metadevice. Note that the metadevice is a concatenation of three separate stripes:

** State database replicas

SVM stores configuration and state information in a 'state database' in memory. Copies of this state database are stored on disk, where they are referred to as state database replicas. The primary purpose of the state database replicas is to provide non-volatile copies of the state database so that the SVM configuration is persistant across reboots. A secondary purpose of the replicas is to provide a 'scratch pad' to keep track of mirror region states.

Recommended Links

RAID - Wikipedia, the free encyclopedia

How To Set Up Software RAID1 On A Running LVM System (Incl. GRUB Configuration) (Fedora 8) HowtoForge - Linux Howtos and Tutorials

Redundant Arrays of Independent Disks - Computerworld

Sys Admin v12, i06 Introduction to RAID

Reference

Raid Recovery Comparison Chart and Raid Types

RAID Level

Min. Num of Drives

Description

Strengths

Weaknesses

Raid 0

2 Data striping without redundancy Highest performance No data protection; One drive fails, all data is lost

Raid 1

2 Disk mirroring Very high performance; Very high data protection; Very minimal penalty on write performance High redundancy cost overhead; Because all data is duplicated, twice the storage capacity is required

Raid 2

Not Used In LAN No practical use Previously used for RAM error environments correction (known as Hamming Code ) and in disk drives before the use of embedded error correction No practical use; Same performance can be achieved by RAID 3 at lower cost

Raid 3

3 Byte-level data striping with dedicated parity drive Excellent performance for large, sequential data requests Not well-suited for transaction-oriented network applications; Single parity drive does not support multiple, simultaneous read and write requests

Raid 4

3 (not widely used Block-level data striping with dedicated parity drive Data striping supports multiple simultaneous read requests Write requests suffer from same single parity-drive bottleneck as RAID 3; RAID 5 offers equal data protection and better performance at same cost

Raid 5

3 Block-level data striping with distributed parity Best cost/performance for transaction-oriented networks; Very high performance, very high data protection; Supports multiple simultaneous reads and writes; Can also be optimized for large, sequential requests Write performance is slower than RAID 0 or RAID 1

Raid 0/1

4 Combination of RAID 0 (data striping) and RAID 1 (mirroring) Highest performance, highest data protection (can tolerate multiple drive failures) High redundancy cost overhead; Because all data is duplicated, twice the storage capacity is required; Requires minimum of four drives



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: February 19, 2020