[Beowulf] Input Sought: "Basic" Luster FS deployment on GigEther-Fabric Cluster

Wed Mar 28 12:30:30 PDT 2007

Hi all,

I've trawled the archives and read via google for a few days now, but have not got a lot of clarity yet - hence  a query to the list.  If merited/of use I can summarize back replies once done.

I'm looking to soon begin deployment of a ~50node (dual socket, dual core opteron) cluster with gig-ether interconnect, intended to run a fairly specific CPU intensive MPI model which scales "absurdly well".  I/O performance is far and away not the bottleneck. (based on benchmarks done on other linux clusters already)

Originally I had assumed for simplicity to use ROCKS with NFS as the filesystem for shared storage (There is one dedicated "storage node" with 2 x Raid5 bricks attached, to be exported to all nodes in the cluster)

I've been reading in the past ~month and realize from what I've seen, that the Lustre FileSystem over GigEther (even with this kind of trivial topology - all MetaData and Storage Data hosted on the same node) - should give significantly better performance than NFS  running the same hardware / topology.

Some digging in this list archive suggested a bit of debate (ie, Lustre performance would only exceed NFS if lots of large streaming intensive I/O access, otherwise it would be worse).  

Additionally, if I can get any "real world" comments (ideally from folks with similar "straightforward" lustre deployments) - feedback will be tremendously appreciated. (I'm not really looking for failover / redundancy nor distributed storage across many nodes being exported across the whole cluster..)

I'm still poring over the lustre install guide to get a better handle on "subtle details" such as (how much storage is needed for MetaData / ratio of MetaData footprint vs OSS Storage capacity, etc - but I'm sure I'll get there soon enough after playing with a test setup for a week or so).

Many thanks for taking the time to read this far..

-Tim Chipman