[Beowulf] best archetecture / tradeoffs
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduSun Aug 28 07:55:20 PDT 2005
- Previous message: [Beowulf] best archetecture / tradeoffs
- Next message: [Beowulf] best archetecture / tradeoffs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Joe Landman writes: >> >> you need to seriously rethink such jobs, since actually *using* swap is >> pretty much a non-fatal error condition these days. > > I disagree with this classification (specifically the label you > applied). Using swap means IMO that you need to buy more ram for your > machines. There is no excuse to skimp on ram, as it is generally > inexpensive (up to a point, just try to buy reasonably priced 4GB sticks > of DDR single/dual ranked memory). And there is also the "dancing bear" problem. In some very large problems, the amazing thing is not that it runs particularly well or fast, but that you can run it at all. Some jobs are parallelized IN ORDER TO run something too big to fit into physical memory, and some task partitionings put the job itself on a single node and use the rest to provide some sort of extended memory to that node. This is one of the points (IIRC) of the trapeze project at Duke, and is one reason that one might well consider e.g. swapping on a remote ramdisk as a poor-man's way to access a much larger memory space than is currently possible. The richer-man's versions involving more efficient ways of moving the memory back and forth over the network. So it isn't ALWAYS a non-fatal error condition, but it should always be done deliberately, because if an ordinary task swaps you start getting that nasty old several order of magnitude slowdown...;-) >> actually, swap over the network *could* make excellent sense, since, >> for instance, gigabit transfers a page about 200x faster than a disk >> can seek. (I'm assuming that the "swap server" has a lot of ram ;) Sure. Ideally, swapping to remote ramdisk. In fact, in some cases configuring remote nodes so ALL they are is one big ramdisk to swap on (or otherwise serve as an extension of memory for a single-threaded task). > I usually classify this in the "local disk is almost always fastest" > rule (which some folks disagree with, but never indicate data to the > contrary). > > The take home messages are a) avoid swap if possible b) and if you > cannot swap at the fastest possible speed (e.g. locally). Agreed, where locally may or may not be fastest but where it will likely be faster than swapping to remote DISK. Depending on the access pattern required, speed of the network, etc. And where to get the best possible speed, you may want to not use the VM subsystem to extend memory in this way -- I really don't know its relative efficiency compared to e.g. message passing used to load memory blocks over the net on demand, but would guess that it is slower, if only because different assumptions are made in the design of the subsystem(s). Complicated still further by the advent of RDMA NICs, which can bypass a lot of the OS/CPU overhead and parallelize the data transfer with execution on a good day. rgb > From the commercial view of this, most end users just want a simple to > maintain machine (they view a cluster as a single machine for the most > part) that runs, with no surprises, and just works. I am not aware of > any of the canned systems that do this while also meeting the critera > that they require in terms of flexibility of distribution choice (some > people have distribution constraints based upon their purchased software > support requirements), breadth of hardware support, support for a wide > array of infrastructure elements... I don't know of warewulf is quite there, but they are damned good try. You install any distro you like (if it is far from any beaten path expect to do a bit of work). You layer on their 3 required packages (rebuilding from source as needed). You customize (part of the work:-) and run a script to build exportable vnfs chroot roots, using whatever methodology makes sense to your KIND of (hopefully package supporting) distro. Or roll your own script from scratch. The rest of the setup -- dhcp, tftp -- is managed semi-automagically for you. That part is actually not THAT hard to learn, but it is really useful to have something to generate a working configuration for that first time. Now one thing I'm still working on figuring out is just what warewulf will do when confronted by heterogenous node hardware/infrastructure etc. One doesn't really want e.g. kudzu to redetect hardware on each reboot, for example. Not really a warewulf issue per se, just one of the many things that has to be resolved setting up a default node configuration in an actual cluster with particular components. I suspect that at that point automagic fails and one has to start to customize... although doubtless Tim will let us know if this is incorrect (I'm still learning warewulf by playing with it). I also have yet to see if a single arch server (e.g. i386) can comfortably serve a different arch (e.g. x86_64) since I have both in my home/test/play cluster. I also have some "grumbles" about its marginally inadequate and incomplete documentation and the lack of a yum repo tree or the placement of its core packages in an existing extras tree such as livna, but if I ever DO figure it all out and end UP fully embracing it this may be something I end up contributing back to the project.:-) Anyway, that's why I >>like<< warewulf as a philosophical approach (at least) over some of the other choices. It divorces the support of the minimal "cluster" core from the choice of OS, from its natural update/upgrade process, and so on an maximally leverages the particular tools (e.g. yum) that make managing/selecting packages easy. The clusters you end up with are close to what you'd get if you rolled your own on top of your own distro (diskless, yet:-) but a whole lot easier than roll-your-own-from-scratch. Agnostic is good. Automagic agnostic is better (though harder -- requires a broad developer/participant base). Tools written/maintained by folks that eat their own dog food is best. warewulf looks like it on the generally correct track. rgb > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics LLC, > email: landman at scalableinformatics.com > web : http://www.scalableinformatics.com > phone: +1 734 786 8423 > fax : +1 734 786 8452 > cell : +1 734 612 4615 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20050828/45aca981/attachment.bin
- Previous message: [Beowulf] best archetecture / tradeoffs
- Next message: [Beowulf] best archetecture / tradeoffs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
