[Beowulf] Infiniband and multi-cpu configuration
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Craig Tierney Craig.Tierney at noaa.govMon Feb 11 07:10:36 PST 2008
- Previous message: [Beowulf] Infiniband and multi-cpu configuration
- Next message: [Beowulf] getting kubuntu to perform as a cluster os
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Guillaume Michal wrote: > Hi all, > We set up our first cluster in our faculty this week. As we are new to cluster computing, there is a lot to learn. We performed some linpack test using the OpenMPI benchmark available in the Rocks 4.3 distribution. The system is as follow: > - GigB ethernet with switch HP Procurve 2800 series > - 1 Master node: 500GB sata HDD, two intel quad core E5410 at 2.33GHz, 2GB mem > - 4 nodes each having: 80GB sata HDD, two intel quad core E5410 at 2.33GHz, 8GB mem > > First I'm a bit confused by the parameters P and Q in HPL.dat and how to use them properly. I noticed a 4P 2Q test is not equivalent to a 2P 4Q, generally speaking it does not commute. Why? What is clearly P and Q then: P for number of processors per nodes and Q for the number of nodes? > Visualize the problem as a big 2d matrix. P and Q represent how the problem is divided. In general, the best is when the matrix is divided into even squares. If your core count isn't n^2, then P and Q have to be different. From experience P should always be less than Q. There may be a computational reason for that (ie, longer strides in memory), but I am not sure. > Secondly, what is the definition of processor for a quad core architecture? I suppose a quad core should be counted as 4 processors. Yes, unless you are using a multithreaded BLAS library. If you are, you should have each node be 1 process. > > I launched Linpack using Ns=10000 and various configuration for P and Q. At the moment I got a maximum of 78 Gflops using P=8 Q=4 -> 32 processors. You want to use as much available memory as possible. I use N=10000 on a single processor, single core run with 1GB. You can figure out a good value of N by the following formula: Ns=sqrt(<Memory in Bytes per core>*<Number of cores>/8) The 8 represents the size of a double. For <Memory in Bytes per core>, I try to use the largest number possible, typically about 90% of max. You never want to go into swap during these calculations (or, have it crash because you have diskless nodes). Ex: If you have 2GB per core for 32p, should use Ns as: Ns=sqrt(1900*1024*1024*32/8) Ns=89270 Honestly, this may be overkill. At some point, the working memory set will be large enough so that FP performance will be the bottleneck. I would start with smaller numbers (say half) and work your way up to understand what is going on. In any case, using Ns=10000 is way to small. > > If I'm right the peak performance should be Rpeak= 4 cores x 4 floting point op per cycle x 2.33 Ghz x 8 quad cores = 298 Gflops. > Which would lead to a test running at ~25% Rpeak. > > This is very low and I see 3 causes for the problem: > - I miscalculated Rpeak > - P and Q are not set properly > - there is a serious bottelneck > I think your Rpeak calculation is correct (not sure how many FPs the latest Intel chips can do). If increasing Ns doesn't help, run smaller cases on a per node bases (using all available memory for each node). If you don't get the exact same answer on every node (or at least with 2%), you have a problem. Figure out what is wrong with the slow nodes. Also, run the test multiple times on the same node and verify consistent performance. Craig > Thanks for your advices > > Guillaume > > > --Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Craig Tierney (craig.tierney at noaa.gov)
- Previous message: [Beowulf] Infiniband and multi-cpu configuration
- Next message: [Beowulf] getting kubuntu to perform as a cluster os
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
