Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] integrating node disks into a cluster filesystem?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Dmitry Zaletnev dzaletnev at yandex.ru
Fri Sep 25 17:32:28 PDT 2009


Mark,
I use to make experiments with my toy cluster of PS3. and I'm interested in your ideas.
PS3 has two network interfaces - GLAN NIC and Wi-fi. Available for running with my firmware 2.70 distros are:
Yellow Dog Linux 6.1 NEW, PSUBUNTU (Ubuntu 9.04), Fedora 11.
There're Allied Telesyn AT-GS900/8E switch and D-Link DIR-320 Wi-fi router with USB.
Except this systems there're Core2Duo E8400/ 8GB RAM/ 1.5 TB HDD and Celeron 1.8/ 1 GB RAM/ 80 GB HDD.
While I'm waiting when my partner-programmer realize ILP64-scheme in his CFD-package, PS3 stay without 
any work and they are ready for any experiments with their HDD's of 80 GB.
I would prefer not to load their GLAN NICs with something except MPI, but may be it's possible to use wi-fi?
There're two PS3's, but it's suffice for an experiment.

Dmitry Zaletnev

> > users to cache data-in-progress to scratch space on the nodes. But there's a 
> > definite draw to a single global scratch space that scales automatically with 
> > the cluster itself.
> using node-local storage is fine, but really an orthogonal issue.
> if people are willing to do it, it's great and scales nicely.
> it doesn't really address the question of how to make use of 
> 3-8 TB per node. we suggest that people use node-local /tmp, 
> and like that name because it emphasizes the nature of the space.
> currently we don't sweat the cleanup of /tmp (in fact we merely 
> have the distro-default 10-day tmpwatch).
> > > - obviously want to minimize the interference of remote IO to a node's 
> > > jobs.
> > > for serial jobs, this is almost moot. for loosely-coupled parallel jobs
> > > (whether threaded or cross-node), this is probably non-critical. even for
> > > tight-coupled jobs, perhaps it would be enough to reserve a core for
> > > admin/filesystem overhead.
> > I'd also strongly consider a separate network for filesystem I/O.
> why? I'd like to see some solid numbers on how often jobs are really
> bottlenecked on the interconnect (assuming something reasonable like DDR IB).
> I can certainly imagine it could be so, but how often does it happen?
> is it only for specific kinds of designs (all-to-all users?)
> > > - distributed filesystem (ceph? gluster? please post any experience!) I
> > > know it's possible to run oss+ost services on a lustre client, but not
> > > recommended because of the deadlock issue.
> > I played with PVFS1 a bit back in the day. My impression at the time was
> yeah, I played with it too, but forgot to mention it because it is afaik
> still dependent on all nodes being up. admittedly, most of the alternatives
> also assume all servers are up...
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 



More information about the Beowulf mailing list