[Beowulf] Re: building a RAID system: A long-delayed follow-on
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Gerry Creager n5jxs gerry.creager at tamu.eduThu Aug 19 16:21:32 PDT 2004
- Previous message: [Beowulf] PostgreSQL
- Next message: [Beowulf] Re: building a RAID system: A long-delayed follow-on
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I just found this note from last year's discussion. I've some follow-up. If you're not interested, I'll understand. Just hit <delete> and go on... We implemented a 1.6 TB RAID-5 system using HighPoint Technology controllers and Maxtor 200 GB parallel IDE drives. The performance wasn't what we expected, but some careful examination discovered that, just as chronicled below, the additional overhead, especially a complete 2nd round of buffering, was really slowing performance. OK, the next manufacturer up the proverbial foodchain was Promise. Got the hardware, better, but far less than stellar performance. Oh, and drivers were several kernel releases behind, and in some cases I considerd the kernel updates mandatory for security. We started looking at 3Ware, but work got in the way of the fun stuff. Also, a collaborator (co-conspirator is more accurate) at another institution had been doing similar work and suggested we look at software RAID. OK. It's quick to configure, we need the box back up, and it can't run any worse that the HighPoint stuff. Well, I'm still thinking I'd like to go with the 3Ware hardware, but that'll have to wait 'til we build the next 2 TB system... soon, real soon. And if it's slower than s/w RAID, I'll go back to that. Since we went to the s/w RAID-5 config we've seen 1 failure caused by stupid sysadmin tricks and an inadequate UPS when the campus went down. To confess completely, when RAID didn't come back up cleanly I attributed it to a missing entry in /etc/rc.d/rc.local... and technically, I was right. I did a raidstart and mounted the drive, without a cursory fsck. My bad. We got a "clean" mount, and went merrily ahead. To add to the confusion, I was doing all this from my laptop, at 70+mph (my wife was driving for most of this) using a Sprint 1xRTT connection, once we got into Minnesota. Iowa doesn't have Sprint coverage we could find, save for a 2-block stretch of Ames. About 3 days after "recovering" we started seeing a bunch of disk errors. By now, I was in _rural_ Wisconsin. We didn't have cellphone coverage of any sort at the inlaws, and on a good day, we got 26k dialup... throttling down to 9600 sometimes. I opted to drive into town and suck down coffee where I could get a 1xRTT connection... marginally acceptable. I took the array offline and started an 'fsck -a' which would run for hours with little to look at to indicate the system was even still responding... and then roll over for "too many errors" and a message to run without the '-a' option. 'fsck -y' was little better. We fought this for the rest of the vacation, whenever I had connectivity, and I never got the disk happy. Came home, immediately flew to DC and wrote a perl script on the plane to tell fsck in manual mode "yes, dammit" to all the 'do ya wanna fix this?' questions. Got into DC at 8pm, started the script, went to dinner. Came back script was still running and the screen was full of the Q&A. Went to bed. Got up, same thing. Went to the first day of meetings, and returned at 9pm. Still running. Another day of meetings, and back to the room. Still running but it completed while I was changing clothes before going to dinner. Overall, FSCK on a 1.7 TB machine appears to take about 96+/- hours to run when you've really abused it. I restarted the box, restarted RAID, remounted, manually started the LDM data collection system, and got on an airplane. By the time I was back in Texas, all the missing data from the 2-day odessy was replaced and the system was back up to speed. We're using this system to cache 30 days of all the Level II radar on it. I'll be doing some radar processing on a little 16-node dual opteron cluster (ob:cluster) to see about running some of the newer processing codes to better render the data. We'll also be extracting some of the data to initialize the MM5 and WRF models, once I figure out how to handle that. We'll still try 3Ware. I've got indications it's pretty good, from another guy. However, kudos to the kernel and RAID developers in Linux-land. They done good. Gerry pesch at attglobal.net wrote: > You write: > > "The problem with offloading is, that while it made great sense in the > days of 1 MHz CPUs, it really doesn't make a noticable difference in the > load on your typical N GHz processor." > > Did you have a maximum data storage size in mind? - or to put it differently: at what data size do you see the > practical limit of SW RAID? > > Paul > > Jakob Oestergaard wrote: > > >>On Thu, Oct 09, 2003 at 08:50:17PM +0200, Daniel Fernandez wrote: >> >>>Hi again, >> >>... >> >>Others have already answered your other questions, I'll try to take one >>that went unanswered (as far as I can see). >> >>... >> >>>But must be noted that HW RAID offers better response time. >> >>In a HW RAID setup you *add* an extra layer: the dedicated CPU on the >>RAID card. Remember, this CPU also runs software - calling it >>'hardware RAID' in itself is misleading, it could just as well be called >>'offloaded SW RAID'. >> >>The problem with offloading is, that while it made great sense in the >>days of 1 MHz CPUs, it really doesn't make a noticable difference in the >>load on your typical N GHz processor. >> >>However, you added a layer with your offloaded-RAID. You added one extra >>CPU in the 'chain of command' - and an inferior CPU at that. That layer >>means latency even in the most expensive cards you can imagine (and >>bottleneck in cheap cards). No matter how you look at it, as long as >>the RAID code in the kernel is fairly simple and efficient (which it >>was, last I looked), then the extra layers needed to run the PCI >>commands thru the CPU and then to the actual IDE/SCSI controller *will* >>incur latency. And unless you pick a good controller, it may even be >>your bottleneck. >> >>Honestly I don't know how much latency is added - it's been years since >>I toyed with offload-RAID last ;) >> >>I don't mean to be handwaving and spreading FUD - I'm just trying to say >>that the people who advocate SW RAID here are not necessarily smoking >>crack - there are very good reasons why SW RAID will outperform HW RAID >>in many scenarios. >> >> >>>HW raid offers hotswap capability and offload our work instead of >>>maintaining a SW raid solution ...we'll see ;) >> >>That, is probably the best reason I know of for choosing hardware RAID. >>And depending on who you will have administering your system, it can be >>a very important difference. >> >>There are certainly scenarios where you will be willing to trade a lot >>of performance for a blinking LED marking the failed disk - I am not >>kidding. >> >>Cheers, >> >>-- >>................................................................ >>: jakob at unthought.net : And I see the elder races, : >>:.........................: putrid forms of man : >>: Jakob Østergaard : See him rise and claim the earth, : >>: OZ9ABN : his downfall is at hand. : >>:.........................:............{Konkhra}...............: >>_______________________________________________ >>Beowulf mailing list, Beowulf at beowulf.org >>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578 Pager: 979.228.0173 Office: 903A Eller Bldg, TAMU, College Station, TX 77843
- Previous message: [Beowulf] PostgreSQL
- Next message: [Beowulf] Re: building a RAID system: A long-delayed follow-on
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
