Descending into geekdom with a senior IT guy. My teams manage storage, and I architect storage solutions for various projects, so be warned. . .
Mr. Z's First rule: RAID IS NO SUBSTITUTE FOR FULL AND TESTED BACKUPS.
Now, for those who don't want to read further: Chances are excellent that RAID 5 will fill the bill for most educational projects. Short form: It's highly redundant, and the only performance penalty is during write activities. BE CERTAIN you can detect failures and address them promptly, have spares, and you should be fine until the equipment ages.
RAID 5: Uses a number of disks (typically 3 to 8) to create a "RAID Group."
Visualize an 8-layer cake. Instead of a sector on one disk, like a slice of one layer, there is a "stripe," a slice cut thru all the layers. In that stripe, 7 of the pieces contain data, the 8th contains parity information. If you lose one "layer," the information for all the stripes can be recreated as long as all pieces of all 7 other layers remain intact. The parity sector is calculated during a write operation, and is written in round-robin fashion to layer 1 for the first stripe, layer 2 of the next stripe, etc.
Disadvantage ? These parity calculations take time and processing power, which is why I do not specify this RAID level for highly "write-intensive" applications like a transactional database with 3,000 concurrent users. But I doubt this is a serious issue in most educational uses.
Advantage ? Saves money -- you only need one disk worth of overhead to carry the parity information. But don't get carried away. . .
One thing to watch for -- as I mentioned already -- is to be certain you can detect failures promptly. When (NOT "if") a disk fails, you need to discover that and to replace it promptly. The system will typically begin reading every sector of every other disk (called running in 'degraded mode') to recreate the missing data both for user operations and to rebuild the disk when you replace it.
IF A SECOND DISK FAULTS BEFORE THE REBUILD COMPLETES THE RG IS TOAST. That's why you replace failed disks promptly, and never put "too many" disks into one RG. I have been consulted after people build RAID 5 RGs with 15 disks, and 4 years later two fail. They ask "what can we do ?" and I tell them to reach for backups. (See my first rule)
So, as disks get larger, the rebuild times takes longer. Which leads to . . .
RAID 6: Like RAID 5, but the parity information is written to 2 disks in the RG.
Advantage: Lose 1 disks and there is no degraded mode operation per say. Lose a second disk during rebuild and then the system shifts to degraded mode. BUT, it's not a disaster.
Disadvantage: Higher cost ("wastes" another disk over RAID 5), not all controllers support RAID 6 yet. Also resistant to LSE's
Latent Sector Error (LSE) - When a disk has a weak sector but either that data has not been read from, or the system corrected for it. You find in on a RAID 5 system when a DIFFERENT disk fails, and you try to rebuild. The system tries to read the weak sector, and OOPS!" Which is why RAID 5 on aging systems can be problematic. Sophisticated enterprise-class storage from EMC or Hitachi performs "disk-scrubbing" in the back-ground, constantly looking for weak sectors and moving the data. But lower-end systems typically do not have that option. If your controller supports it, make sure it's turned on.
Lastly, for the performance geeks out there:
RAID 0 - Not redundant at all. Writes data across multiple disks, which gives a very high I/O capability, but does not calculate nor does it write any parity information. Good for very little in the business setting, since downtime will occur eventually.
RAID 1 - Mirrors 2 disks. No parity information to calculate, so in general as fast as a single disk, but with redundancy. But doubles the disk costs. Used as higher write performer than RAID 5 / 6.
RAID 1 + 0 or RAID 10 - Mirror multiple pairs of disks to create RGs, then stripe across those RAID 1 RGs to create a RAID 0 super-RG. Redundant due to mirroring of individual pairs, very fast, since there are no parity calculations, and with great potential due to the available I/O bandwidth. Quite safe: A complete failure requires BOTH halves of any one mirrored pair to fail.
RAID 0 + 1 - Stripe then mirror. Don't do it. Unlike RAID 10, if you lose just ONE disk in each RAID 0 RG, you're toast.
RAID 5 + 0 / 50 - composite multiple RAID 5 RGs then stripe across them. Improved performance over RAID 5.
Lastly, for people using RAID,
CREATE A HOTSPARE. This is a mechanism for the system to keep one disk aside for emergencies. If a running disk fails, the system will "swap in" the hot spare and begin the rebuild. It reduces the vulnerable time
MONITOR THE SYSTEM. It sucks to have the system swap in the hot spare and for you to not know it,
Hope that answered your questions.
Thanks that's very comprehensive. I currently have RAID 6 on my HP SAN and it does indeed run the scrubber utility and there is a hot spare available. The SAN also sends plenty of emails too many sometimes :P
So.... what you really need is data integrity (i.e. knowing that when you write a block of data to your filesystem you are safe in the knowledge that you can read it back at any point in the future). There is only one filesystem that can guarantee this.... ZFS (and to save me writing shed loads look here: ZFS - Wikipedia, the free encyclopedia).
Any form of RAID only gives you protection against disk failure and allows you to carry on (albeit at a reduced performance rate) whilst the parity is rebuilt on the hot spare. With ZFS the parity is only rebuilt on the blocks that are used. So for example, if you have a RAID set made up of 2TB drives (not uncommon nowadays) with 100GB of used capacity and you have a situation where one fails and the hot spare kicks in you only rebuilds the parity on the 200GB and NOT the whole drive like other filesystems will force you to do. This means your "slow down" is considerable less.
Regarding the read and write performance issues noted above, ZFS is the only "SSD aware" filesystem. What that means is ZFS understands about SSDs and uses them intelligently for both reads and writes to drive greater performance thus alleviating the need for "tiers of storage" and the associated management overhead maintaining data in the correct tier.
I am more than happy to explain this in detail off forum if anyone wants to know more.
What systems now support zfs as it kind of got muddy after Oracle brought Sun?
Hi there, So the systems that support ZFS are the S7x20 range (Sun Unified Storage | Flash Optimized Storage | Oracle).
Originally Posted by SYNACK
These use the latest Intel Nehelam and Westmere CPUs and support read and write SSDs (model dependant).
If you need more information then drop me a line and I can put you in touch with your local Oracle storage sales person
:) the last time I checked into Sun gear it took two weeks to find out that they apparently did not consider NZ a worthwhile market and everything would have to be imported from Australia at horrific prices (more than double UK pricing). I would be interested if this has changed any though so that I could give Sun/Oracle products another shot in the future.
Originally Posted by Hebdenlad
Hi, not sure about this... Let me see if I canget hold of my Australian opposite number...
Originally Posted by SYNACK
Does ZFS need special (i.e. Sun-only) hardware support to work correctly / with realistic performance? I tried using ZFS on a Linux system running on commodity hardware but got corrupted files - is this likly to have been the Linux implementation of ZFS, the hardware not being up to the job in some way, or something else?
Originally Posted by Hebdenlad
While we have a RAID expert here... We've just bought a QNAP storage device - a 2U server with 8 hot-swap SATA drive caddies in the front. Taking the case off, it's plain to see that these 8 disks are connected to SATA ports on the motherboard, there's no dedicated hardware RAID card in there. The CPU uses a passive cooler, so can't be anything too fancy (a recent Atom processor, I think), and has 2 GB of RAM installed.
Originally Posted by mister_z
The server runs some kind of Linux-based OS. Several people on this forum have obviously found the performance of these devices just fine as they were recommended in a couple of recent threads where people were asking about storage servers. The server offers iSCISI, so must be capable of good enough performance to act as a VM disk image host.
If the above is correct, is it likly that the QNAP server is using Linux's standard mdadm software RAID to run its RAID array, or are they likly to have had to write their own RAID system of some kind? If performance of mdadm RAID is up to running an iSCISI server, why do people bother buying hardware RAID cards in the first place? Is there likly to be some practical limit to the number of disks mdadm RAID will be able to handle - is 8 maybe the most you should expect to be able to use?
You can run ZFS on anything that supports Solaris/FreeBSD. The Linux implementation isn't great, but the FreeBSD implementation is meant to be quite good if you don't want to run Solaris. If you want a nice easy to use distro with ZFS support built in have a look a Nexenta. Couple of options there if you have less than 12tb storage then Nexenta Community Edition comes with the same GUI as the Nexenta commercial product I believe and is free to run. If you have more than 12tb you can use Nexenta Core which is free for any amount of storage but doesn't have a GUI. If you need a GUI you could always use napp-it.org :)
Originally Posted by dhicks
Would you recommend a hotspare even when using RAID6?
Originally Posted by mister_z
I am upgrading an 8 disk RAID6 array with 4 more disks, do I just add them as extra storage or use one disk as a HS?
Well, if the sole purpose of this appliance is to act as a RAID server, it IS "hardware RAID" after a fashion -- there's a piece of hardware whose sole job it is to maintain the RAID subsystem. Generally, I refer to "software RAID" as that which occurs on a system that is performing other operations as well. For example, a Sun database server using Veritas Volume Manager (VxVM) to perform mirroring between pairs of LUNs. The issue here is the potential for slower performance as the CPU and I/O channels choose between database updates and mirroring I/O operations. No, it's by no means a foregone conclusion that it will be an issue, just that it can be.
Originally Posted by dhicks
In addition to the possible overhead of software RAID, there's the setup expertise -- the "care and feeding" part. I agree that VxVM, mdadm, and the like are not rocket science, but plugging in a card and running a vendor utility are even less so. I'd love a Hitachi VSP for every project I do, but I don't have that much budget. I've also done plenty of pro bono mdadm setups with great success (I recommend webmin to remove much of the grunt work). Depends on the money and the skill level.
And while I have no experience with QNAP, I would not be surprised to see them using FOSS tools like linux and mdadm. But if I have the right impression, understand they're devoting all the CPU power on that appliance to running either mdadm, NFS, Samba, or some combination of all 3. The CPU has no other work. Hence, you can consider it hardware RAID.
In my day job, I have worked with storage engineers from several huge, 3-letter vendors, and been given logins to what were obviously customized Linux-based appliances under the cover. If you're feeling really curious, pull a disk and mount it on a Linux system. See if mdadm can read the header.
Ultimately, RAID cards are easier than mdadm, and you don't need much for care and feeding. How many people running mdadm run checkarray or some other tool like that religiously ? If you can afford them, those cards have their advantages.
IIRC, mdadm has a limit of 28 devices per array. There are ways around that, but I think then the stock kernel gets confused. I have never used more than 8 in a RAID 5, disks do fail. Then I've used LVM to stripe or "glue" the individual arrays together. But that's more of a personal choice than anything else.
plexer i haave a few 1tb disks sitting outside my system ready to swap in, just in case
Well, RAID 6 is generally excellent protection on its own. Deciding on a dedicated hot-spare depends on the answers to several questions, among them
Originally Posted by plexer
How good is your monitoring ? Will you know promptly when you need to replace a failed disk ? The hot-spare makes up for some less-effective monitoring systems, giving you more time to actually notice it failed before it becomes a crisis. If you have nagios or similar watching everything and emailing alerts, this is less important.
How physically accessible is the system ? Do you have on-site support regularly available at least during normal business hours ? If it's a "one person show," does that techie take extended vacations ? Are there weather-accessibility issues ? For remote sites, how long is travel-time ? The hot-spare will buy more time to actually get to the task of changing the disk. But if those are not a factor, you may have plenty of time to swap disks manually.
How long to acquire another disk ? If the vendor takes 2 weeks to get another disk, and you don't have a spare on the shelf, you might get nervous in the interim. If you have a shelved spare, or can "share spares" among several teams / locations / schools / organizations. this may not be a concern.
How big is the disk / how busy will the system be / how long does it take to rebuild ? The hot-spare starts the rebuild as soon as the failure is detected. If rebuild times are not long (you can test this any time as long as you have a good backup first), then starting the rebuild later rather than sooner is not a problem.
Can you stand some small risk of downtime if the unlikely happens ? Probably 'yes' but consider it. You could get a run of bad disks. It's rare, but we lost a pile of them in quick succession some years ago. Our SAN systems were slamming in hot-spares (we allocate one for each 30 disks in service) at a (comparatively) stunning rate. This was a real corner-condition, but the hot-spares earned their keep in that we had zero downtime.
How are your backups ? Do you test recovery regularly ? I recently ran into an issue where a specialized backup technology from a really big vendor worked perfectly during commissioning tests. And subsequently during annual D/R tests. Until we grew the LUN size over a certain number of TB. Then it went kerblooey. All the backups were running perfectly, but we were still vulnerable due to a bug we didn't know about.
In the end, only you can decide. RAID 6 already goes a long way to being able to sleep at night, you may need nothing else. Best of luck.
Or for real peace of mind if you're REALLY concerned about data loss / guarantee then simply use ZFS as your filesystem (ZFS - Wikipedia, the free encyclopedia) and get lots of good things for free, like unlimited snapshots, software RAID (RAID5, RAID6, Mirror, Triple Parity RAID), deduplication, NFS, SMB, iSCSI, etc., etc.
Originally Posted by mister_z
ZFS is free to download and use and as it is a Copy on Write filesystem you can avoid lots of nasty things that other filesystems may being you (silent data corruption, bad blocks, phantom writes, etc.).
Just my tuppence worth.