[Beowulf] large scratch space on cluster

Jörg Saßmannshausen jorg.sassmannshausen at strath.ac.uk
Tue Sep 29 12:15:19 PDT 2009


Hi Joe,

thanks for the prompt reply.

Actually, it is not GAMESS which is causing the problem but Molpro. The reason 
why it needs that much space is simply the size of the molecule and the 
functional ( CCSD(T) ). Both are not in favour of a fast job with little 
scratch space. I don't think there is much I can do in terms of the program 
side. If I want to run the job, I need more scratch space.

I always thought that a RAID0 stripe is the best solution for fast and large 
scratch space? That is the reason why I thought of that.
Besides, these are a large number of small files, so you don't read 700 GB at 
once. Else it would be an impossible task. ;-)
I did that once before, and that was over a NFS share, and it acutally was 
working not too bad... until somebody triggered the power-switch and did not 
put it back quick enough so the UPS was running out of battery power :-(

Besides, I have already contacted the $VENDORs ;-)

All the best

Jörg


On Dienstag 29 September 2009 Joe Landman wrote:
> Hearns, John wrote:
> > I was wondering if somebody could help me here a bit.
> > For some of the calculations we are running on our cluster we need a
> > significant amount of disc space. The last calculation crashed as the
> > ~700 GB which I made available were not enough. So, I want to set up a
> > RAID0 on one 8 core node with 2 1.5 TB discs. So far, so good.
> >
> >
> > Sounds like a cluster I might have had something to do with in a past
> > life...
> >
> >
> > 700 gbytes! My advice - look closely at your software and see why it
> > needs this scratch space, and what you can do to cut down on this.
>
> Heh... some of the coupled cluster GAMESS tests we have seen/run have
> used this much or more in scratch space.
>
> Single threaded readers/writers ... you either need a very fast IO
> device, or like John suggested, you need to examine what is getting
> read/written.
>
> 700GB @ 1GB/s takes 700 seconds, roughly 11m40s +/- some.
> 700GB @ 0.1GB/s takes 7000 seconds, roughly 116m40s +/- some (~2 hours).
>
> A RAID0 stripe of two drives off the motherboard will be closer to the
> second than the first ...
>
> > Also, let us know what code this is please.
> > You're right about network transfer of scratch files like that - if at
> > all possible, you should aim to use local scratch space on the nodes.
> > $VENDOR (I think in Warwick!) should be very happy to help you there!
>
> I know those guys! (and they are good).
>
> --
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics Inc.
> email: landman at scalableinformatics.com
> web  : http://scalableinformatics.com
>         http://scalableinformatics.com/jackrabbit
> phone: +1 734 786 8423 x121
> fax  : +1 866 888 3112
> cell : +1 734 612 4615



-- 
*************************************************************
Jörg Saßmannshausen
Research Fellow
University of Strathclyde
Department of Pure and Applied Chemistry
295 Cathedral St.
Glasgow
G1 1XL

email: jorg.sassmannshausen at strath.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html



More information about the Beowulf mailing list