FW: MPI cases and PVM
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Velocet math at velocet.caThu Mar 28 08:32:25 PST 2002
- Previous message: FW: MPI cases and PVM
- Next message: Inexpensive beowulf but not so miniature Beowulf
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Mar 22, 2002 at 03:01:13PM -0500, Patrick Geoffray's all... > Daniel Kidger wrote: > > > However the biggest problem with layering on top of TCPIP is the issue of > > latency. > > Right, and CPU load. TCP/IP is very hungry, having a machine on its knee > just moving data is not very usefull for something else. When I see scaling drop with large numbers of nodes due to TCP/IP latency over LAM/MPI, obviously there's CPU laying around doing nothing. Its not like this cpu being around means there's extra CPU for IP packets to be handled - the reason the CPU is not being used because work cant come in fast enough to keep the CPU busy and thats all because of the lag of TCP/IP. Thats the given. Now what happens if you run another LAM/MPI mesh over those machines that are using less than 100% of the cpu? Would this allow recovery of that lost cpu? Obviously 2 jobs will run slower than 1 job in this situation, but I am wondering if it will be 2.00 times slower, or just say 1.6 times slower - obviously getting twice the work done in 1.6 times the time is a win. There may also be some interaction between the jobs because of the packets being sent out on the wire causing additional latency, but is it enough to bring us up to 2.00 times slower, or is it less? If the bandwidth required is above some magic %age of total capacity obviously the extra latency from trying to put twice as many packets on the wire will be significant. But on 100Mbps ether, with gromacs and the benchmark d.dppc job, the way gromacs interacts with LAM/MPI I saw about the same bandwidth rate for any number of nodes - 11-13Mbps. Does that leave enough room on 100Mbps to run 2 jobs over the same mesh nodes without serious latency due to interaction? (I am not sure what the bandwidth usage over GBE with gromacs running d.dppc is, but I know its not 10x what it is on 100Mbps). I spose you also lose some performance due to cache thrashing as well as memory bandwidth bottlenecks, but thats dependent on the types of jobs running I would assume - for some jobs this type of thing could be a win. Anyone have experience with this? /kc > Patrick > > ---------------------------------------------------------- > | Patrick Geoffray, Ph.D. patrick at myri.com > | Myricom, Inc. http://www.myri.com > | Cell: 865-389-8852 685 Emory Valley Rd (B) > | Phone: 865-425-0978 Oak Ridge, TN 37830 > ---------------------------------------------------------- > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math at velocet.ca * Velocet Communications Inc. * Toronto, CANADA
- Previous message: FW: MPI cases and PVM
- Next message: Inexpensive beowulf but not so miniature Beowulf
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
