[Beowulf] Hadoop
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Gerry Creager gerry.creager at tamu.eduFri Jan 2 04:10:10 PST 2009
- Next message: [Beowulf] An Ask about Compilation
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Greg Lindahl wrote: > On Fri, Dec 26, 2008 at 05:16:04PM -0600, Gerry Creager wrote: > >> We've a user who has requested its installation on one of our clusters, >> a high-throughput system. > > You didn't say anything about what they wanted to do. Hadoop is > designed to store a lot of data, and then enable what we HPC people > would call nearly-embarrassingly-parallel computation with good > locality -- it takes shards of mapreduce computation to run on the > same system as the disk shards being processed. Ah, but there's the problem. We've divined what they intend... we think... but they didn't originally tell us. The PI involved is a relatively new, but reasonably experienced CS prof associated with our bioinformatics crowd. She and her students intend to sift through plant genomic data for patterns (we think, based on her known affiliations). *I* suspect she's interested, as well, because she read about Hadoop and wants to play. > This means you'll have to dedicate systems over the long term to store > the data (much like PVFS), and all of these systems will have to be a > part of their mapreduce jobs. So if your queue system can run > whole-cluster jobs easily, no problem. Can it? Yes. Is that the intent of the cluster? No. The cluster is configured as a high-throughput system with a gigabit non-blocking backplane. 8 cores/node, all jobs are scheduled on a per-node basis. Each node DOES have local disk (this isn't an opportunity to reopen THAT religious war) so we theoretically could use the Hadoop file system, save it'd likely break our cluster design. Instead, we're looking at Hadoop On Demand (http://hadoop.apache.org/core/docs/r0.17.2/hod.html). > If, instead, they're just looking for a simple way to do > embarrassingly parallel computations, without lots of persistent data, > then you can probably point them at something easier and more friendly > to your queue system. Yeah, and I've been trying, but someone else promised them it'd be made available without talking to the guys who have to install and support it, because it "looks" like valuable computer science. gerry -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
- Next message: [Beowulf] An Ask about Compilation
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
