Вы находитесь на странице: 1из 32

RAID Arrays

© 2007 EMC Corporation. All rights reserved.


RAID?
RAID (redundant array of independent disks) is a
data storage virtualization technology that combines
multiple physical disk drive components into a single
logical unit for the purposes of data redundancy,
performance improvement, or both

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 2


RAID - Redundant Array of Independent Disks

RAID
Controller

Host

RAID Array

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 3


 RAID (Redundant Arrays of Independent Disks)
combines two or more disk drives in an array into a RAID
set or a RAID group. The RAID set appears to the host
as a single disk drive. Properly implemented RAID sets
provide:
– Higher data availability
– Improved I/O performance
– Streamlined management of storage devices

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 4


RAID Components

Physical
Array

Logical
Array
RAID
Controller
Logical
Array

Host

RAID Array

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 5


 Physical disks inside a RAID array are usually contained
in smaller sub-enclosures. These sub-enclosures, or
physical arrays, hold a fixed number of physical disks,
and may also include other supporting hardware, such as
power supplies.
 A subset of disks within a RAID array can be grouped to
form logical associations called logical arrays, also
known as a RAID set or a RAID group. The operating
system may see these disk groups as if they were regular
disk volumes. Logical arrays facilitate the management of
a potentially huge number of disks. Several physical disks
can be combined to make large logical volumes.

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 6


RAID Levels
 0 Striped array with no fault tolerance
 1 Disk mirroring
 3 Parallel access array with dedicated parity disk
 4 Striped array with independent disks and a dedicated
parity disk
 5 Striped array with independent disks and distributed
parity
 6 Striped array with independent disks and dual
distributed parity
 Combinations of levels (I.e., 1 + 0, 0 + 1, etc.)
© 2007 EMC Corporation. All rights reserved. RAID Arrays - 7
Data Organization: Strips and Stripes

Stripe 1
Stripe 2
Stripe 3

Strips

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 8


 RAID sets are made up of disks. Within each disk, there
are groups of contiguously addressed blocks, called
strips. The set of aligned strips that spans across all the
disks within the RAID set is called a stripe.
– Strip size (also called stripe depth) describes the number of blocks
in a strip, and is the maximum amount of data that is written to or
read from a single disk in the set before the next disk is accessed
(assuming that the accessed data starts at the beginning of the
strip).
 All strip in a stripe have the same number of blocks.
 Decreasing strip size means that data is broken into smaller pieces
when spread across the disks.
– Stripe size describes the number of data blocks in a stripe.
 To calculate the stripe size, multiply the strip size by the number of data
disks.
– Stripe width refers to the number of data strips in a stripe (or, put
differently, the number of data disks in a stripe).
© 2007 EMC Corporation. All rights reserved. RAID Arrays - 9
RAID 0 – Striped Array with no Fault Tolerance

RAID
Block 0
4
3
2
1 Block 0
4
3
2
1
Controller

Host

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 10


 RAID 0 stripes the data across the drives in the array without generating redundant
data.
– Performance - better than JBOD because it uses striping. The I/O rate, called
throughput, can be very high when I/O sizes are small. Large I/Os produce high
bandwidth (data moved per second) with this RAID type. Performance is further
improved when data is striped across multiple controllers with only one drive per
controller.
– Data Protection – no parity or mirroring means that there is no fault tolerance.
Therefore, it is extremely difficult to recover data.
 Striping improves performance by distributing data across the disks in the array. This
use of multiple independent disks allows multiple reads and writes to take place
concurrently.
– When a large amount of data is written, the first piece is sent to the first drive, the
second piece to the second drive, and so on.
– The pieces are put back together again when the data is read.
– Striping can occur at the block (or block multiple) level or the byte level. Stripe
size can be specified at the Logical Volume Manager level from the host –
software RAID. Or depending on the vendor, can be set at the array level – in
case of hardware RAID.

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 11


RAID 1 – Disk Mirroring

RAID
Block 0
1 Block 0
1
Controller

Host

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 12


– Benefits - high data availability and high I/O rate (small block size)
– Drawbacks - total number of disks in the array equaling 2 times the
data (useable) disks. This means that the overhead cost equals
100%, while usable storage capacity is 50%
– Performance – improves read performance, but degrades write
performance
– Data Protection - improved fault tolerance over RAID 0
– Disks – at least two disks
– Cost – expensive due to the extra capacity required to duplicate data
– Maintenance - low complexity
– Applications - applications requiring high availability and non-
degraded performance in the event of a drive failure

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 13


RAID 0+1 – Striping and Mirroring

RAID
Block 0
3
2
1 Block 0
3
2
1
Controller

Host

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 14


Benefits - medium data availability, high I/O rate (small
block size), and the ability to withstand multiple drive
failures as long as they occur on the same stripe
Drawbacks - total number of disks equal two times the
data disks, with overhead cost equaling 100%
Performance - high I/O rates; writes are slower than
reads because of mirroring
Data Protection - medium reliability
Disks - even number of disks (4 disk minimum to allow
striping)
Cost - very expensive because of the high overhead
Applications – imaging and general file server

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 15


RAID 1+0 – Mirroring and Striping

RAID
Block 0
3
2
1 Block 0
3
2
1
Controller

Host

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 16


Benefits - high data availability, high I/O rate (small block
size), and the ability to withstand multiple drive failures
as long as they occur on different mirrors
Drawbacks - total number of disks equal two times the
data disks, with overhead cost equaling 100%
Data Protection - high reliability
Disks - even number of disks (4 disk minimum, to allow
striping)
Cost - very expensive, because of the high overhead
Performance: High I/O rates achieved using multiple
stripe segments. Writes are slower than reads, because
they are mirrored
Applications – databases requiring high I/O rates with
random data, and applications requiring maximum data
availability
© 2007 EMC Corporation. All rights reserved. RAID Arrays - 17
RAID 0+1 vs. RAID 1+0
 Benefits are identical under normal operations
 Rebuild operations are very different
– RAID 1+0 uses a mirrored pair – only 1 disk is rebuilt if a disk fails
– RAID 0+1 if a single drive fails, the entire stripe is faulted
 RAID is 0+1 is a poorer solution and is less common

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 18


RAID Redundancy: Parity

0
4
8

1
5
9

RAID 2
Controller 6
10

3
Host 7
11

0123
4567
8 9 10 11

Parity Disk
© 2007 EMC Corporation. All rights reserved. RAID Arrays - 19
Parity is a redundancy check that ensures that the data is
protected without using a full set of duplicate drives.
If a single disk in the array fails, the other disks have
enough redundant data so that the data from the failed
disk can be recovered.
Like striping, parity is generally a function of the RAID
controller and is transparent to the host.
Parity information can either be:
Stored on a separate, dedicated drive (RAID-3)
Distributed with the data across all the drives in the
array (RAID-5)

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 20


Parity Calculation

5 Data
5 + 3 + 4 + 2 = 14

3 Data

The middle drive fails: 4 Data

5 + 3 + ? + 2 = 14
2 Data
? = 14 – 5 – 3 – 2
?=4 Parity
14

RAID Array
© 2007 EMC Corporation. All rights reserved. RAID Arrays - 21
RAID 3 – Parallel Transfer with Dedicated Parity
Disk

Block 0
3
2
1 RAID0
Block
Controller
Block
Parity1
Generated
Block 2
Host Block 3
P0123

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 22


RAID 4 – Striping with Dedicated Parity Disk

Block 0
Block 4

Block 1
Block 5

Parity
RAID0 Block 2
Block 0 Block
Generated
Controller Block 6
P0123
Block 3
Host Block 7

P0123
P4567

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 23


 RAID Level 4 stripes data for high performance and uses
parity for improved fault tolerance. Data is striped across
all the disks, but one in the array. Parity information is
stored on a dedicated disk so that data can be
reconstructed if a drive fails.
– Benefits - the total number of disks is less than in a mirrored solution
(e.g., 1.25 times the data disks for group of 5), good read
throughput, and reasonable write throughput.
– Drawbacks – the dedicated parity drive can be a bottleneck when
handling small data writes. This RAID level is not well suited to
transaction processing applications. Data loss if multiple drives fail
within the same RAID 4 Group.
– Performance - high data read transfer rate. Poor to medium write
transfer rate. Disk failure has a significant impact on throughput
– Data Protection - uses parity for improved fault tolerance.
– Striping – usually at the block (or block multiple) level
© 2007 EMC Corporation. All rights reserved. RAID Arrays - 24
RAID 5 – Independent Disks with Distributed Parity

Block 0
Block 4

Block 1
Block 5

Parity
RAID4 Block 2
Block 0
4 Block 0
Generated
Controller Block 6
P4
0516
273
Block 3
Host P4567

P0123
Block 7

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 25


 RAID 5 does not read and write data to all disks in
parallel like RAID 3. Instead, it performs independent
read and write operations. There is no dedicated parity
drive; data and parity information is distributed across all
drives in the group.
– Benefits - the most versatile RAID level. A transfer rate greater than
that of a single drive but with a high overall I/O rate. Good for parallel
processing (multi-tasking) applications/environments. Cost savings
due to the use of parity over mirroring.
– Drawbacks - slower transfer rate than RAID 3. Small writes are slow,
because they require a read-modify-write (RMW) operation. There is
degradation in performance in recovery and reconstruction modes
and data loss if multiple drives within the same group are lost.
– Performance - high read data transaction rate, medium write data
transaction rate. Low ratio of parity disks to data disks. Good
aggregate transfer rate

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 26


RAID 6 – Dual Parity RAID
 Two disk failures in a RAID set leads to data unavailability
and data loss in single-parity schemes, such as RAID-3,
4, and 5
 Increasing number of drives in an array and increasing
drive capacity leads to a higher probability of two disks
failing in a RAID set
 RAID-6 protects against two disk failures by maintaining
two parities
– Horizontal parity which is the same as RAID-5 parity
– Diagonal parity is calculated by taking diagonal sets of data blocks
from the RAID set members

 Even-Odd, and Reed-Solomon are two commonly used


algorithms for calculating parity in RAID-6
© 2007 EMC Corporation. All rights reserved. RAID Arrays - 27
RAID Implementations
 Hardware (usually a specialized disk controller card)
– Controls all drives attached to it
– Performs all RAID-related functions, including volume management
– Array(s) appear to the host operating system as a regular disk drive
– Dedicated cache to improve performance
– Generally provides some type of administrative software

 Software
– Generally runs as part of the operating system
– Volume management performed by the server
– Provides more flexibility for hardware, which can reduce the cost
– Performance is dependent on CPU load
– Has limited functionality

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 28


Hot Spares

RAID
Controller

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 29


Hot Swap

RAID
Controller

RAID
Controller

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 30


Summary
Key points covered in this module:
 What RAID is and the needs it addresses
 The concepts upon which RAID is built
 Some commonly implemented RAID levels

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 31


 Check Your Knowledge
 What is a RAID array?
 What benefits do RAID arrays provide?
 What methods can be used to provide higher data
availability in a RAID array?
 What is the primary difference between RAID 3 and
RAID 5?
 What is a hot spare?

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 32

Вам также может понравиться