[Beowulf] cluster profiling

Prentice Bisbal prentice at ias.edu
Wed Nov 3 07:49:31 PDT 2010



tomislav_maric at gmx.com wrote:
> Hi everyone,
> 
> I'm running a COTS beowlulf cluster and I'm using it for CFD simulations with the OpenFOAM code. I'm currently writing a profiling application (a bunch of scripts) in Python that will use the Ganglia-python interface and try to give me an insight into the way machine is burdened during runs. What I'm actually trying to do is to profile the parallel runs of the OpenFOAM solvers. 
> 
> The app will increment the mesh density (the coarsness) of the simulation, and run the simulations increasing the number of cores. Right now the machine is miniscule: two nodes with Quad cores. The app will store the data (timing of the execution, the number of cores) and I will plot the diagrams to see when the case size and the core number is starting to drive the speedup away from the "linear one". 
> 
> Is this a good approach? I know that this will show just tendencies on such an impossible small number of nodes, but I will expand the machine soon, and then their increased number should make these tendencies more accurate. When I cross-reference the temporal data with the system status data given by the ganglia, I can derive conclusions like "O.K., the speedup went down because for the larger cases, the decomposition on max core number was more local, so the system bus must have been burdened, if ganglia confirms that the network is not being strangled for this case configuration".
> 
> Can anyone here tell me if I am at least stepping in the right direction? :) Please, don't say "it depends". 
> 

Have you looked at something like Vampir for MPI profiling? Support for
VampirTrace is built into OpenMPI, if you compile Open MPI wih the
correct options.

The rub is that I think you need to pay for a Vampir GUI to analyze the
data. I've never used it myself, but I saw a demo once, and it looked
pretty powerful.

http://www.vampir.eu/

You might also want to look at Tau, PAPI, and Perfmon2

http://www.cs.uoregon.edu/research/tau/home.php
http://icl.cs.utk.edu/papi/
http://perfmon2.sourceforge.net/

I set this up for one of my users a couple of years ago. I could be
wrong, but I think Tau requires PAPI, and PAPI in turn requires the
perfmon2 kernel patches. I could be wrong, since it's been a couple of
years. Reading the docs above should point you in the correct direction.

That's probably more than you wanted to know.



-- 
Prentice



More information about the Beowulf mailing list