[scyld-users] Database searches on Scyld

Donald Becker becker at scyld.com
Fri Feb 28 06:47:01 PST 2003


On 28 Feb 2003, Timo Lassmann wrote:

> 	I am working in a bioinformatics group in Stockholm. We had a beowulf
> cluster for a long time and are now thinking of switching to the new
> scyld beowulf version. We mainly use the cluster for database searches
> and in our old setup we had to have local database copies installed on
> all nodes for extra speed. Since in beowulf 2 everything is only
> installed on the master node, would our parallel searches run as fast?

An advantage of the Scyld Beowulf system is that there is a broad range
of file system configurations.

The demo and turn-key CDs are set up at one end of the range, with
the master operating only from the CD-ROM and compute nodes not mounting
any network or local disk file systems.

The initial installation of the full system has the master using a
file system on the local disk, with the compute nodes booting without a
filesystem.

After initial installation almost all sites configure the compute nodes
to use local disks for swap space and NFS mount /home.  Most sites
configure local disk partitions for data file storage.  Some sites
configure the root partition on local disk, and NFS mount or copy over
all of the standard binaries so that the system looks like a traditional
Beowulf FS.  Other sites configure local storage as a parallel file
system with PVFS or GFS.

With all of these configurations there is the advantage of the "diskless
administration" model.  The computes nodes always boot into the safe
no-file-system mode.  They are configured by a master machine, using
only configuration files on that master.

When local file systems are enabled on the compute node, the master
configuration has the option to
       Always make a fresh file system with 'mkfs'
       Make a fresh file system if the existing one fails 'fsck'
       Leave the node operational but in 'Error' state if any 'fsck' fails.

> And if it will be as fast how difficult is it to install pvm based
> applications over to the new setup? 

Easier than a traditional PVM installation:
   the cluster system knows the list of available nodes
   the job initiation semantics are cleaner and faster
   there is not the usual 'rsh'/'ssh' permission problem

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993




More information about the Scyld-users mailing list