Sistema de almacenamiento RAID
In 1987, UC Berkeley researchers David Patterson, Garth Gibson, and
Randy Katz warned the world of an impending predicament. The speed of
computer CPUs and performance of RAM were growing exponentially, but
mechanical disk drives were improving only incrementally. As a result,
they stated, “We need innovation to avoid an I/O crisis
The authors famously proposed a solution in their paper, “A Case for
Redundant Arrays of Inexpensive Disks (RAID).” They noted that PC
disk drives were starting to match the speed of those supplied for
mainframes and minicomputers-yet PC drives tended to be better in
terms of cost per megabyte. Why? Because of standards such as SCSI,
which enabled suppliers to embed functionality that had once required
custom controllers
Looking at what was available on the market, Patterson, Gibson, and
Katz concluded that 75 PC disk drives could be lashed together to
provide the capacity of a single mainframe drive-with lower power
consumption, lower total cost, and 12 times the I/O bandwidth. The
snag was Mean Time to Failure (MTTF), clearly much worse for an array of
commodity drives than for a bulletproof, silver-plated mainframe unit.
Their paper therefore suggested the use of extra check disks,
containing redundant information that could be used to recover data
in the event of a disk failure. Once a failed disk was replaced,
either by a human operator or by electronic switching, data would be
reconstructed onto it automatically
The rest is history, as just about anybody connected with networking
is already aware. The I/O problem was widely recognized and, within a
couple of years, Intel-based products like the Compaq Systempro (released
in 1990) made RAID an expected ingredient in every midrange and high-end
server
So why discuss RAID now? Apart from the fact that we've never
published a tutorial about it, the plummeting costs of both disk
drives and the circuitry necessary to support disk arrays make RAID
more relevant than ever before. It's now affordable for low-end
servers and even standalone workstations
When the first RAID-based products came out, the cost of controllers
and capacious SCSI disk drives meant that servers could easily cost
$35,000. Vendors were embarrassed by the “Inexpensive” in the RAID
acronym and temporarily decided that the “I” stood for “Independent”
instead
Now, with 20Gbyte drives selling for under $200, it's time to be
proud of RAID's low cost and leave the misnomer behind. The fact is,
there's little that's independent about the drives in a RAID array
By design, RAID technology hides the characteristics of individual
drives from whatever operating system is run on top of it, presenting
multiple disks as if they were a single, larger drive. It maps logical
disk block addresses to their actual physical counterparts using a variety
of algorithms. The differing organizations of data on disk are known as
RAID levels, each with its own particular advantages and disadvantages
RAID 0
RAID level 0 could more correctly be called “AID,” because there's no
redundancy about it. Data is merely divided into blocks, each one written
sequentially to the next drive in the array. If there are four drives in
the array, as shown in the figure on page 36, each logical I/O is broken
into four physical operations
The point of RAID 0 is performance. Theoretically, it can deliver n
times the performance of a single drive, where n is the number of
drives in the array. However, tuning the stripe size is important. If it
is too large, many I/O operations will fit in a single stripe and take
place on a single drive. If it is too small, each logical operation will
be broken into too many physical operations, saturating the bus or
controller to which the drives are attached
Reviewers of RAID 0 products on workstations have commented that they
offer little advantage with typical applications, such as word processors
and spreadsheets. However, in cases where very large files must be opened
or saved-on video servers, for example-they can be very beneficial
RAID 1
RAID 1 is the simplest actual redundant array design, employing
mirrored pairs of disk drives. As seen in the figure, it merely
creates a duplicate of the contents of one disk drive onto another.
While that fact makes RAID 1 easy to implement, it also makes it the
most costly (100 percent redundancy) in terms of required disk
overhead
RAID 1's write performance is slower than that of a solo drive, since all
data must be written twice. However, buffering on a controller usually
hides this fact from the host computer. Reads can be faster, since it's
always possible to retrieve data from whichever drive is available sooner
RAID 2
RAID 2 is a bit-oriented scheme for striping data. Each bit of a data word
is written to a separate disk drive, in sequence. Checksum information is
then computed for each word and written to physically separate
error-correction drives
Unfortunately, I/O is slow, especially for small files, because each
drive must be accessed for every operation. Controller design is
relatively simple, high data-transfer rates are possible for large
files, and disk overhead is typically 40 percent. However, while
reliable, RAID 2 is seldom considered worth bothering with today
RAID 3
RAID 3 introduces a more efficient way of storing data while still
providing error correction. It still stripes data across drives bit
by bit (or byte by byte). However, error-checking now takes place by
storing parity information (computed via a mathematical function
known as the Exclusive OR, or XOR) on a separate parity drive (see
figure)
Given that parity values are simple to compute and write, RAID 3
arrays can perform swiftly. However, any I/O operation must address
all drives simultaneously. This means that, while RAID 3 delivers
high data-transfer rates, it is best suited to large files such as
video streams
RAID 4
RAID 4 modifies the RAID 3 concept by working with data in terms of
blocks (as does RAID 0), rather than bits or bytes. This reduces
processing overhead and can make for high aggregate data-transfer
rates on reads. For writes, however, there is inevitable contention
for the sole parity drive, making this RAID level relatively
sluggish
RAID 5
One of the most popular RAID levels, RAID 5 is again block-oriented
and based on the storing of parity information. However, instead of
placing parity data on a single drive, it distributes it across the
entire array (see figure)
Because RAID 5 eliminates the parity-drive bottleneck, it enhances
write performance. And due to the independence of all the drives in
the array, read performance is tops among true RAID levels. Recovery
following a disk failure is relatively slow, but reliable enough. All in
all, RAID 5 achieves an excellent balance between performance, data
protection, and low cost
RAID 10 and RAID 53
RAID 10 is also known as RAID 0+1 or 1+0 because it combines the
elements of RAID 0 and RAID 1. It uses two sets of drives that mirror one
another, as in RAID 1. Then, within these sets, data is striped across the
drives (as in RAID 0) in order to speed access
RAID 53, which should really be called RAID 30 using the above logic,
combines RAID 0 and RAID 3. Again, it uses a striped array, as with RAID
0, but the segments of this are RAID 3 arrays. High data-transfer rates
and high I/O rates for small requests are both offered-but at a price
ENHANCING PERFORMANCE
RAID controllers can become a bottleneck, especially with high-speed
interconnects like Fibre Channel, because of the calculations they
must perform. For example, to perform a disk-write operation to a
RAID 5 array, a Read-Modify-Writeback operation must be performed.
First, old data must be read from both a data drive and a parity
drive. Second, that data must be XORed. Third, new data must be
written to the data drive. Fourth, new data must also be XORed with
the parity data, and only then can the result finally be written to
the parity drive
One solution to this bottleneck has been to move the responsibility
of calculating XOR data to the disk drives themselves. Seagate, IBM,
and other vendors have released drives that can perform XOR
calculations in parallel with other disks, without the aid of the
RAID controller
The industry is entering a period of rapid transition in I/O
architectures. InfiniBand's 2001 products will couple I/O directly to host
memory, offering transfer rates of up to 6Gbytes/sec, and RAID products
will evolve to support such throughput. At the same time, the falling cost
of controllers and drives will make RAID arrays ever more commonplace on
the low end. Your next notebook computer may even offer you a choice of
RAID levels
Jonathan Angel, senior editor, can be reached at jangel@cmp.com
Resources
“A Case for Redundant Arrays of Inexpensive Disks (RAID) by
Patterson, Gibson, and Katz is archived at
http://sunsite.berkeley.edu/Dienst/UI/2.0/Describe/ncstrl.ucb/CSD-87-391/sunsite.berkeley
.edu/Dienst/UI/2.0/Describe/ncstrl.ucb/CSD-87-391/.
Details on Storage Computer's proprietary RAID 7 can be found at
www.raid7.com/wp_raid7afa.html
The RAID Advisory Board Web site is located at www.raid-advisory.com
A search engine designed specifically to find information about RAID
and other storage-related topics can be found at www.searchstorage.com