[Beowulf] [tjrc@sanger.ac.uk: Re: [Bioclusters] topbiocluster.org]
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David S. dgs at gs.washington.eduFri Jun 24 12:10:41 PDT 2005
- Previous message: [Beowulf] [tjrc@sanger.ac.uk: Re: [Bioclusters] topbiocluster.org]
- Next message: [Beowulf] [tjrc@sanger.ac.uk: Re: [Bioclusters] topbiocluster.org]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Jun 24, 2005 at 06:18:08PM +0200, Eugen Leitl wrote: > ----- Forwarded message from Tim Cutts <tjrc at sanger.ac.uk> ----- > > On 24 Jun 2005, at 4:06 pm, Brodie, Kent wrote: > > > > >I'd be VERY interested to see if anyone has results from using cluster > >filesystems, for example..... > > Cluster filesystems have *drastically* cut our data distribution > time. We can distribute a new multi-GB genome data set to all the > machines that use cluster filesystems in a few minutes. The old RLX > blades, which have to rely on the hierarchy of rsync processes to > which James referred, trail in a dismal few hours later. > > They've also increased performance when running jobs; the machines > can suck data over the filesystem's GB ethernet faster than the > individual spindles could supply the data locally. > > We've been using cluster filesystems (specifically, GPFS) in > production since October 2003, for the static datasets; blastables > and so on. This is going to continue, and we've been so pleased with > it as a method, that it's going to be extended. The number of nodes > per cluster filesystem (currently 14) will be expanded, hopefully to > the entire cluster. Scratch filesystems for the cluster will be > moved to GPFS or Lustre, rather than NFS, which is where they are > currently. We're not wedded to GPFS - Lustre looks good too. I guess that we've had the opposite experience with GPFS. We have that file system on Linux x86 in an NSD configuration, with two servers attached to a SAN distributing the the file system to about fifty execution nodes over gigabit ethernet. This cluster run bioinformatics applications - lots of BLAST. Concurrent BLAST jobs can run quite slowly reading the databases from GPFS. Just yesterday someone ran BLAST accross twenty-five nodes in that fashion, and the individual processes shambled along, barely using more than 15% of the CPU. Meanwhile the NSD servers were showing loads of around twenty, and the GPFS file was annoyingly unresponsive in interactive use. MEGABLAST is even worse. The folks around here have given up on running concurrent MEGABLASTs in GPFS, and instead first stage the databases they need to local disk on the execution hosts. A large part of the problem could be the SATA disks in the SAN, but that's what we have to work with. We're vaguely casting about for alternatives to GPFS. One study I've found comparing cluster or parallel file systems http://www.linuxclustersinstitute.org/Linux-HPC-Revolution/Archive/PDF05/17-Oberg_M.pdf indicates that alternative aren't very much better. David S. >
- Previous message: [Beowulf] [tjrc@sanger.ac.uk: Re: [Bioclusters] topbiocluster.org]
- Next message: [Beowulf] [tjrc@sanger.ac.uk: Re: [Bioclusters] topbiocluster.org]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
