Network RAM for Beowulf
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduThu Aug 16 05:38:05 PDT 2001
- Previous message: Network RAM for Beowulf
- Next message: Network RAM for Beowulf
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, 16 Aug 2001, Amber Palekar wrote: > hi, > we as a group of four students are *also* thinking > of implementing Network RAM for a beowulf cluster > (assuming 100Mbps Ethernet ) whereby each node in the > cluster will donate some part of their RAM to be used > by all other nodes.so we will basically be mapping > this shared RAM to the address space of the current > node.One of the uses that we're thinking of is for > Journaling(as in file systems ).We'll be maintaining > the journals on the Network RAM instead of writing > them to the local disks.As we are completely new to > this , it is very difficult for us to determine the > statistics like :- the overhead in writing to Network > RAM . Any info or pointers to these stats would be > highly appreciated . Check out the Trapeze project at Duke: http://www.cs.duke.edu/ari/trapeze/ This is a high end version of the project you are suggesting. I suspect that you can implement a simpler (but slower) version of this out of component parts with existing kernels (or almost so). If current kernels support swap over NFS, for example, you can build a large ramdisk on each node (and otherwise leave them just enough memory to function comfortably without swapping) and export them all to a central node for use as swap. This would effectively extend the size of VM on the central node, but swapping would occur at network-limited speeds instead of disk-hardware limited speeds. And of course you can definitely export a set of ramdisks and NFS mount them for regular "network bound" file I/O, and might even be able to glue them together into a big striped filesystem (I don't think md will to this, but you MIGHT be able to hack it so that it would). Disk bandwidth, of course, has gotten much better over the years that this might not make sense from a raw BW point of view. For something doing a lot of random accesses to disk, though, where the performance is dominated by latency, might well benefit as the combination of memory latency on the nodes plust network latency plus NFS latency will still likely be an order of magnitude less than the seek time on a disk (which requires the physical movement of big chunks of mass). Order of milliseconds for a disk seek, order of 100 microseconds for the network+memory hit, a rough factor of ten improvement. Note that even if you hack the kernel and eliminate all the kludginess from this approach (NFS swap?), you're still going to be limited by raw socket latency and will therefore probably not improve on this estimate by as much as a factor of 2. Of course this works another order of magnitude better with Myrinet (used in the Duke project) with latency less than 10 microseconds and even the BW compares decently with local disk. This is the primary motivation for this project -- creating very large network-distributed ramdisk-like storage with a very low level (and hence efficient) implementation. The PRIMARY place where this is useful is in projects that require far more memory and/or faster (lower latency) disk than one can physically add to a system, as it will almost always be cheaper and better to add memory to a local system than to build a network of virtual memory IF you can get to the desired memory regime by adding sticks to your own box. This is especially true now with memory prices in free fall (512MB PC2100 DDR only $168 as of this morning on pricewatch.com -- it was over $600 at the beginning of the summer -- and 512MB PC133 only >>$30<< ditto). Ditto with disk BW -- high end disk storage units now can provide quite a lot of BW (with terrible latency of course). Building a system with (say) 10 GB of network/virtual memory is a lot more challenging. To start with, this is more than the unhacked kernel can address, I believe. So there you'd need to build BOTH the socket-based memory subsystem AND hack the kernel so that it could somehow address it. You should definitely look over the Trapeze site to see how they attempt to finesse this problem via a higher level API (if I understand their papers). That is, they don't tamper with the existing VM so much as to graft a set of hooks into a special library so that the remote "memory" can be accessed via a file interface. Or something like that. You'll need to read about it yourself, and might even want to talk to the primary researchers as they'd likely have some very sensible suggestions and direction to give you. rgb > TIA > Amber > > __________________________________________________ > Do You Yahoo!? > Make international calls for as low as $.04/minute with Yahoo! Messenger > http://phonecard.yahoo.com/ > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: Network RAM for Beowulf
- Next message: Network RAM for Beowulf
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
