[Beowulf] Storage
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduWed Oct 6 07:42:46 PDT 2004
- Previous message: [Beowulf] Linux memory leak?
- Next message: [Beowulf] Storage - housing 100TB
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dear List, I'm turning to you for some top quality advice as I have so often in the past. I'm helping assemble a grant proposal that involves a grid-style cluster with very large scale storage requirements. Specifically, it needs to be able to scale into the 100's of TB in "central disk store" (whatever that means:-) in addition to commensurate amounts of tape backup. The tape backup is relatively straightforward -- there is a 100 TB library available to the project already that will hold 200 TB after an LTO1->LTO2 upgrade, and while tapes aren't exactly cheap, they are vastly cheaper than disk in these quantities. The disk is a real problem. Raw disk these days is less than $1/GB for SATA in 200-300 GB sizes, a bit more for 400 GB sizes, so a TB of disk per se costs in the ballpark of $1000. However, HOUSING the disk in reliable (dual power, hot swap) enclosures is not cheap, adding RAID is not cheap, and building a scalable arrangement of servers to provide access with some controllable degree of latency and bandwidth for access is also not cheap. Management requirements include 3 year onsite service for the primary server array -- same day for critical components, next day at the latest for e.g. disks or power supplies that we can shelve and deal with ourselves in the short run. The solution we adopt will also need to be scalable as far as administration is concerned -- we are not interested in "DIY" solutions where we just buy an enclosure and hang it on an over the counter server and run MD raid, not because this isn't reliable and workable for a departmental or even a cluster RAID in the 1-8 TB range (a couple of servers) it isn't at all clear how it will scale to the 10-80 TB range, when 10's of servers would be required. Management of the actual spaces thus provided is not trivial -- there are certain TB-scale limits in linux to cope with (likely to soon be resolved if they aren't already in the latest kernels, but there in many of the working versions of linux still in use) and with an array of partitions and servers to deal with, just being able to index, store and retrieve files generated by the compute component of the grid will be a major issue. SO, what I want to know is: a) What are listvolken who have 10+ TB requirements doing to satisfy them? b) What did their solution(s) cost, both to set up as a base system (in the case of e.g. a network appliance) and c) incremental costs (e.g. filled racks)? d) How does their solution scale, both costwise (partly answered in b and c) and in terms of management and performance? e) What software tools are required to make their solution work, and are they open source or proprietary? f) Along the same lines, to what extent is the hardware base of their solution commodity (defined here as having a choice of multiple vendors for a component at a point of standardized attachment such as a fiber channel port or SCSI port) or proprietary (defined as if you buy this solution THIS part will always need to be purchased from the original vendor at a price "above market" as the solution is scaled up). Rules: Vendors reply directly to me only, not the list. I'm in the market for this, most of the list is not. Note also that I've already gotten a decent picture of at least two or three solutions offered by tier 1 cluster vendors or dedicated network storage vendors although I'm happy to get more. However, I think that beowulf administrators, engineers, and users should likely answer on list as the real-world experiences are likely to be of interest to lots of people and therefore would be of value in the archives. I'm hoping that some of you bioinformatics people have experience here, as well as maybe even people like movie makers. FWIW, the actual application is likely to be Monte Carlo used to generate huge data sets (per node) and cook them down to smaller (but still multiGB) data sets, and hand them back to the central disk store for aggregation and indexed/retrievable intermediate term storage, with migration to the tape store on some as yet undetermined criterion for frequency of access and so forth. Other uses will likely emerge, but this is what we know for now. I'd guess that bioinformatics and movie generation (especially the latter) are VERY similar in the actual data flow component and also require multiTB central stores and am hoping that you have useful information to share. Thanks in advance, rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] Linux memory leak?
- Next message: [Beowulf] Storage - housing 100TB
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
