[Beowulf] best archetecture / tradeoffs
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comSun Aug 28 09:00:04 PDT 2005
- Previous message: [Beowulf] best archetecture / tradeoffs
- Next message: [Beowulf] best archetecture / tradeoffs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Robert: Robert G. Brown wrote: > Joe Landman writes: > >>> >>> you need to seriously rethink such jobs, since actually *using* swap >>> is pretty much a non-fatal error condition these days. >> >> >> I disagree with this classification (specifically the label you >> applied). Using swap means IMO that you need to buy more ram for your >> machines. There is no excuse to skimp on ram, as it is generally >> inexpensive (up to a point, just try to buy reasonably priced 4GB >> sticks of DDR single/dual ranked memory). > > > And there is also the "dancing bear" problem. In some very large > problems, the amazing thing is not that it runs particularly well or > fast, but that you can run it at all. Some jobs are parallelized IN > ORDER TO run something too big to fit into physical memory, and some > task partitionings put the job itself on a single node and use the rest > to provide some sort of extended memory to that node. Back when I was at SGI working on benchmarks for customers, one of the things we would do is to try to get the job going as wide as possible across the CPUs. If you have 8MB caches, and you can spread that thing wide enough, and it gets run entirely out of cache .... I seem to remember running a streams benchmark like this once for a customer. [...] > So it isn't ALWAYS a non-fatal error condition, but it should always be > done deliberately, because if an ordinary task swaps you start getting > that nasty old several order of magnitude slowdown...;-) Heh... a big red warning modal window could pop up and ask "ARE YOU REALLY SURE?" [...] > I don't know of warewulf is quite there, but they are damned good try. > You install any distro you like (if it is far from any beaten path > expect to do a bit of work). I am starting to play with the SuSE bits and Warewulf now. I have been playing with it on and off for the last year or so. I like it. They do many things right. Specifically > It divorces the support of the > minimal "cluster" core from the choice of OS, This is goodness, and IMO the right way to do things. With some distributions, getting them to support/ship stuff that they should is worse than pulling teeth. This has a cascading (negative) impact on cluster distributions which depend critically upon the upstream linux distribution. > from its natural > update/upgrade process, and so on an maximally leverages the particular > tools (e.g. yum) that make managing/selecting packages easy. I like the modular linux approach, where you have groups of packages where all the bits are properly dependency resolved relative to each other, and they are not so locked into a particular set of underlying bits (e.g. if you look at it from a graph perspective, each module has very few connections to the core). The problem with this is that each distribution does things ever so slightly differently, to the net effect that .src.rpms (or .debs) don't necessarily always work, so the binary builds don't either, ... Even with yum/apt masking this (which at a coarse level is what they are attempting to do), this is still annoying. More importantly for us is that most of the distros build various important packages either very poorly, incorrectly, or simply use default configurations which have little value to people who need to use the tools. A great example of this is how most RPM distros build perl module RPMs. Lots of the really good options on these things are not set as the default, and it is *really* hard to go back and fix the RPMs as they are generated automagically by the CPAN2RPM utilities. We have found that in many cases the only way to solve this problem is to ignore the distro supplied packages and build our own. In the end we could package the module as a huge RPM. That would be fine. But the individual packages that make it up need to be built right to begin with, and yum/apt doesn't solve that issue. > The > clusters you end up with are close to what you'd get if you rolled your > own on top of your own distro (diskless, yet:-) but a whole lot easier > than roll-your-own-from-scratch. Roll your own got old for me in 1999. > Agnostic is good. I think distribution agnosticism is *required* for good cluster support going forward. Sure, some folks may be able to do some good things with a distro-fixed cluster system, and make installation* easy (* easy on supported hardware that is, just try to include something not in the supported list for that distro, say like SATA, or Firewire). > Automagic agnostic > is better (though harder -- requires a broad developer/participant > base). Tools written/maintained by folks that eat their own dog food is > best. warewulf looks like it on the generally correct track. My only complaint in the past with warewulf has been no-local boot VNFS install (e.g. copy the VNFS and minimal bits to the local disk, so the system can boot without loading things into RAMDISK). Some customers have issues with using ram for anything other than local fast memory. I think this has been/is being addressed. I agree though in that warewulf looks very much like the right way. I am looking a little at onesis, but I am not sure how well supported that is/will be. > > rgb > >> >> -- >> Joseph Landman, Ph.D >> Founder and CEO >> Scalable Informatics LLC, >> email: landman at scalableinformatics.com >> web : http://www.scalableinformatics.com >> phone: +1 734 786 8423 >> fax : +1 734 786 8452 >> cell : +1 734 612 4615 >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615
- Previous message: [Beowulf] best archetecture / tradeoffs
- Next message: [Beowulf] best archetecture / tradeoffs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
