very high bandwidth, low latency manner?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Markus Fischer mfischer at mufasa.informatik.uni-mannheim.deWed Apr 17 10:33:43 PDT 2002
- Previous message: very high bandwidth, low latency manner?
- Next message: very high bandwidth, low latency manner?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 16 Apr 2002, [iso-8859-1] Håkon Bugge wrote: >1) Performance. > >Performance transparency is always goal. Nevertheless, sometimes an >implementation will have a performance bug. The two organizations owning >the mentioned systems, have both support agreements with Scali. I have >checked the support requests, but cannot find any request where your >incidents were reported. We find this fact strange if you truly were aiming >at achieving good performance. We are happy to look into your application >and report findings back to this news group. I don't think we have a performance bug. We have developed a real world application using frequent communication and have tested/run it on multiple systems. We do not intend to modify our algorithms to try to get better performance on a particular system. If people need help for gaining performance on a particular system, then this platform is not a target again if I can not do the tuning by myself, which we did. Not all codes are PD which makes the point before also important. >2) Startup time. > >You contribute the bad scalability to high startup time and mapping of >memory. This is an interesting hypothesis; and can easily be verified by No, I said that with larger numbers of nodes (I would like to talk about >100 , but here I mean more than 16) the scalability is limited (amount spent in communication increases significantly and speedup values decrease after a certain number of nodes) and yes the startup time also increases, which I thought to be caused by the SCI mechanisms of exporting/mapping mem). >using a switch when you start the program, and measure the difference >between the elapsed time of the application and the time it uses after >MPI_Init() has been called. However, the startup time measured on 64-nodes, >two processors per node, where all processes have set up mapping to all >other processes, is nn second. If this contributes to bad scalability, your >application has a very short runtime. I certainly think that scalability has nothing to do with startup time. And I just checked my earlier posting on this. > >3) SCI ring structure > >You state that on a multi user, multi-process environment, it is hard to >get deterministic performance numbers. Indeed, that is true. True sharing >of resources implies that. Whether the resource is a file-server, a memory >controller, or a network component, you will probably always be subject to >performance differences. Also, lack of page coloring will contribute to I think that when running on a dedicated partition of a cluster, I would not like to receive a significant impact from other applications because their communication increases nor would I like to influence my advisor's application. >different execution times, even for a sequential program. You further >indicate that performance numbers reported f. ex. by Pallas PMB benchmark >only can be used for applying for more VC. I disagree for two reasons; >first, you imply that venture capitalists are naive (and to some extent >stupid). That is not my impression, merely the opposite. Secondly, such >numbers are a good example to verify/deny your hypothesis that the SCI ring >structure is volatile to traffic generated by other applications. PMB's >*multi* option is architected to investigate exactly the problem you >mention; Run f. ex. MPI_Alltoall() on N/2 of the machine. Then measure how >performance is affected when the other N/2 of the machine is also running >Alltoall(). This is the reason we are interested in comparative performance >numbers to SCI based systems. It is to me strange, that no Pallas PMB >benchmark results ever has been published for a reasonable sized system >based on alternative interconnect technologies. To quote Lord Kelvin: "If >you haven't measured it, you don't know what you're talking about". > >As a bottom line, I would appreciate that initiatives to compare cluster >interconnect performance should be appreciated, rather than be scrutinized >and be phrased as "only usable to apply for more VC". > what's the goal then of having marketing statements which can not be applied in general in a .signature ? there is also PD SCI-MPICH which from reading papers applies for the same statement. Markus > >H >At 11:40 AM 4/15/02 +0200, Markus Fischer wrote: >>Steffen Persvold wrote: >> > >> > Now we have price comparisons for the interconnects (SCI,Myrinet and >> > Quadrics). What about performance ? Does anyone have NAS/PMB numbers for >> > ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132 >> > node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII >> ServerWorks >> > HE-SL based cluster). >> >>yes, please. >> >>I would like to get/see some numbers. >>I have run tests with SCI for a non linear diffusion algorithm on a 96 node >>cluster with 32/33 interface. I thought that the poor >>scalability was due to the older interface, so I switched to >>a SCI system with 32 nodes and 64/66 interface. >> >>Still, the speedup values were behaving like a dog with more than 8 nodes. >> >>Especially, the startup time will reach minutes which is probably due to >>the exporting and mapping of memory. >> >>Yes, the MPI library used was Scampi. Thus, I think the >>(marketing) numbers you provide >>below are not relevant except for applying for more VC. >> >>Even worse, we noticed, that the SCI ring structure has an impact on the >>communication pattern/performance of other applications. >>This means we only got the same execution time if other nodes were >>I idle or did not have communication intensive applications. >>How will you determine the performance of the algorithm you just invented >>in such a case ? >> >>We then used a 512 node cluster with Myrinet2000. The algorithm scaled >>very fine up to 512 nodes. >> >>Markus >> >> > >> > Regards, >> > -- >> > Steffen Persvold | Scalable Linux Systems | Try out the world's best >> > mailto:sp at scali.com | http://www.scali.com | performing MPI >> implementation: >> > Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - >> > Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS >> latency >> > >> > _______________________________________________ >> > Beowulf mailing list, Beowulf at beowulf.org >> > To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >>_______________________________________________ >>Beowulf mailing list, Beowulf at beowulf.org >>To change your subscription (digest mode or unsubscribe) visit >>http://www.beowulf.org/mailman/listinfo/beowulf > >-- >Håkon Bugge; VP Product Development; Scali AS; >mailto:hob at scali.no; http://www.scali.com; fax: +47 22 62 89 51; >Voice: +47 22 62 89 50; Cellular (Europe+US): +47 924 84 514; >Visiting Addr: Olaf Helsets vei 6, Bogerud, N-0621 Oslo, Norway; >Mail Addr: Scali AS, Postboks 150, Oppsal, N-0619 Oslo, Norway; > >
- Previous message: very high bandwidth, low latency manner?
- Next message: very high bandwidth, low latency manner?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
