[Beowulf] One network, or two?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comTue Sep 23 16:01:22 PDT 2008
- Previous message: [Beowulf] One network, or two?
- Next message: [Beowulf] One network, or two?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Prentice Bisbal wrote: > Alan Ward wrote: >> Good day. >> >> I have been reading the ongoing discussion on network usage with some >> interest, mainly because in all (admittedly very small, 4 to 8 node) >> clusters we have set up so far, we have always gone with doubling the >> network. Nowadays we mostly run a 100 MBit/s "el cheapo" FastEthernet >> for control, NFS and monitoring, while the faster Gigabit is exclusively >> for MPI. Applications are CFD, with various levels of granularity. >> >> Anybody care to comment? >> >> -Alan > > My new cluster, which is still in labor, will have InfiniBand for MPI, > and we have 10 Gb ethernet switches for management/NFS, etc. The nodes > only have 1 Gb ethernet, so it will be effectively a 1 Gb network. > > I'm also curious as to whether the dual networks are overkill, and if > using a slower network for I/O will cause the system to be slower than > doing all traffic over IB, since I/O will be slower and cause the nodes > to wait longer for these ops to finish. Obviously YMMV, but in tuning systems, and seeing where bottlenecks are, we look for obvious things in the network design 1) poorly designed (office quality, usually uplinked/oversubscribed) networks ... have an interesting effect when you see two 128 port gigabit switches connected together with a 1 or 2 gigabit, or even 10 GbE link. You see this plateau in MPI scalability. Yes, from a real customer case. :( 2) I/O: after burning incense to the daemons of low latency, the next big area we see is (curiously enough) I/O bandwidth. I can't begin to elaborate on how many times I have seen a big expensive shiny new cluster with an absolutely terrible I/O design. Usually starting with 1 GbE connected NAS for 32 or more nodes. Well, ok, YMMV, but when you run lots of Gaussian jobs which hammer on the NFS over this, you are going to experience pain. And no, this is not where you stick the NetApps. Yes, from several real customer cases. :( :( No, you don't need to perform ritual incantations to make the I/O go faster. You just need good hardware and good design to get the data to the hardware. 1GbE may be great for EP jobs that occasionally write to disk. But if your units are going to hammer on disk, you are going to need to check your IO design, and make sure it scales. Yes, we are biased, we strongly believe in good I/O systems. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615
- Previous message: [Beowulf] One network, or two?
- Next message: [Beowulf] One network, or two?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
