[Beowulf] Need recommendation for a new 512 core Linux cluster

Joe Landman landman at scalableinformatics.com
Wed Nov 7 16:22:40 PST 2007


Steven Truong wrote:
> Hi, all.  I would like to know for that many cores, what kind of file
> system should we go with?  Currently we have a couple of clusters with
> around 100 cores and NFS seems to be ok but not great.  We definitely
> need to put in place a parallel file system for this new cluster and I
> do not know which one I should go with?  Lustre, GFS, PVFS2 or what
> else?  Could you share your experiences regarding this aspect?

Hi Steven:

   What is the nature of your IO?  That is, are your jobs dominated by 
large sequential reads and writes, or are the nodes effectively 
reading/writing when they want (small, random-ish IO).  Are your 
programs already set for parallel IO (MPI-IO), or is there a single node 
that handles most of your IO requests for your jobs

   We have looked at GFS recently for some of our storage cluster 
offerings, and while inexpensive, it appears to have some bottlenecks 
which render it less than ideal for HPC cluster storage.  There are some 
papers on technologies to improve it:

	http://www.cse.ohio-state.edu/~liangs/paper/liang-icpp2006.pdf

   The general contenders could be Lustre, PVFS2, and a few others.  As 
Lustre was just acquired by Sun, my concern would be continued Linux 
support going forward.

   Again, all of this depends upon your read/write patterns, and what 
you want to do with it (is this scratch/temp space, or "permanent" 
storage space)?

> 
> I also would like to know how many head nodes should I need to manage
> jobs and queues.  And what else should I have to worry about?

   This depends upon usage patterns.  How critical is it that your job 
scheduler stay up?  How many users will be submitting jobs?  Will they 
do so interactively, or via web tools?  Which scheduler do you plan to 
deploy?  Which OS?

512 cores would fit nicely in 32 nodes with quad socket quad core. 
Floor space shouldn't be an issue.  Heat/power could be.

> 
> Thank you very much for sharing any experiences.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the Beowulf mailing list