The FNN (Flat Neighborhood Network) paradox
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Tim Mattox tmattox at ieee.orgWed Feb 21 21:50:02 PST 2001
- Previous message: Gigabit ethernet or myrinet ?
- Next message: The FNN (Flat Neighborhood Network) paradox
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello, The FNN (Flat Neighborhood Network) cost paradox is just a matter of looking in the right design space, and realizing that it is not just a network for lowering costs. As several people have already discussed this, I will try to add a few things I think were missed in the discussion of cost/benefits of a FNN architecture. Using a FNN does more than let me build a larger cluster than the maximum size of an affordable switch. It also gives a higher bisection bandwidth than a fat tree with the same size & number of switches. And, remember, it is a "Flat" network, so all nodes are just 1-switch hop distant from each other, giving you lower latency too. Some of this can be found in our paper from ALS2000 in the Extreme Linux section: http://www.linuxshowcase.org/2000/2000papers/papers/index.html "KLAT2's Flat Neighborhood Network" Basically, the extra costs are having more than one NIC-port pair per node in your cluster, since you need more than one NIC per node to have the full connectivity. However, the extra NIC-port pair(s) per node are not just an added cost. You do get extra usable bandwidth. How much, depends on your application's communication patterns, as well as on if you are doing just "basic routing" or what we call "advanced routing". So far, we have not yet had time to do the coding to implement advanced routing on a FNN. These are discussed in the paper, but essentially, with a FNN, between many pairs of nodes, there will be more than one usable 1-hop communications path. Advanced routing would take advantage of those extra paths, to get what might best be described as "destination specific channel bonding", but without the slaved MAC addresses. Even without advanced routing on KLAT2, we have observed quite good performance on real applications (CFD code), as well as benchmarks. You can see one of our papers on this in the SC2000 proceedings here: http://www.sc2000.org/proceedings/techpapr/indexn.htm#19 "High-Cost CFD on a Low-Cost Cluster" For most clusters smaller than 64 nodes, a FNN is probably not a great choice, due to it's infancy, and currently very limited software support. Given time, both will improve. However, if you are building a large cluster, a FNN may be able to do great things for you even now. We hope, in the future, to have all our FNN design tools in the public domain, and then even smaller clusters may benefit... You can get a $54 eight port Compex switch from Buy.com. Now, if only there were reliable NICs for less than $8 each :) As one of the developers of the FNN architecture, I must apologize to the beowulf community for not yet having our design and implementation tools out and available for you to use at this time. It is one thing to hack together a new machine and get it to work. It is another to make the tools usable by others. Please be patient. Now that I have seen a real discussion of FNNs on the list, I will redouble my efforts to get the software tools releasable. And here are the obligatory URLs: http://aggregate.org/FNN/ The home page for FNNs http://aggregate.org/KLAT2/ The first machine with a FNN You may also wish to look at the Bunyip machine which has a similar network, developed independently and almost at the exact same time as KLAT2. http://tux.anu.edu.au/Projects/Beowulf/ P.S. - Anyone care to try and make a version of PVM or MPICH that does not assume all nodes have just one universally accessible IP address each? We looked into it, and it seemed to be a bear in there... I ran away :) I was able to hack LAM MPI over a weekend to get it to work, so PVM and MPICH are not even on a back-burner for us right now... -- Tim Mattox - tmattox at ieee.org - http://home.earthlink.net/~timattox http://aggregate.org/KAOS/ - http://advogato.org/person/tmattox/
- Previous message: Gigabit ethernet or myrinet ?
- Next message: The FNN (Flat Neighborhood Network) paradox
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
