[Beowulf] WRF model on linux cluster: Mpi problem

Michael Will mwill at penguincomputing.com
Thu Jun 30 12:10:38 PDT 2005


Vincent is on target here:

If your application already uses MPI as a middleware assuming
distributed memory, then you should definitly use a beowulf style
setup rather than openmosix with it's pseudo-shared memory model.

Look at rocks 4.0.0 http://www.rocksclusters.org/Rocks/ which
is free and based on CentOS 4 which again is a free version of RHEL4.

Michael

Vincent Diepeveen wrote:

>At 02:34 PM 6/30/2005 +0200, Federico Ceccarelli wrote:
>  
>
>>Thanks for you answer Vincent,
>>
>>my network cards are Intel Pro 1000, Gigabit.
>>
>>Yes I did a 72h (real time) simulations that lasted 20h on 4 cpus...same
>>behaviour...
>>
>>I'm thinking about a bandwith problem...
>>
>>....maybe due to hardware failure of some network card, or switch (3com
>>-Baseline switch 2824).
>>
>>Or the pci-raisers for the network card (I have a 2 unit rack so that I
>>cannot mount network cards directly on the pci slot)...
>>    
>>
>
>because the gigabit cards have such horrible one way ping pong latencies as
>compared to the highend cards (myri,dolphin,quadrics and relative seen also
>infiniband), the pci bus is not your biggest problem which is the case here.
>
>The specifications of the card are so so so restricted that the pci is not
>the problem at all.
>
>There are many tests out there to test things. You should try some one-way
>pingpong test. 
>
>By the way, the reason for me to not run openmosix nor similar single image
>software systems is because it has such ugly effect at the latencies and
>the way it pages shared memory communication between nodes is real ugly
>slow and bad for this type of software. There is also something called
>OpenSSI which is pretty active getting developed. It has the same problem.
>
>Vincent
>
>  
>
>>Did you experience problem with pci-raisers?
>>
>>Can you suggest me a bandwidth benchmark?
>>
>>thanks again...
>>
>>federico
>>
>>Il giorno gio, 30-06-2005 alle 12:44 +0200, Vincent Diepeveen ha
>>scritto:
>>    
>>
>>>Hello Federico,
>>>
>>>Hope you can find contacts to colleges.
>>>
>>>A few questions.
>>>  a) what kind of interconnects does the cluster have (networkcards and
>>>which type?)
>>>  b) if you run a simulation that eats a few hours instead of a few
>>>      
>>>
>seconds,
>  
>
>>>     do you get the same speed outcome difference?
>>>
>>>I see the program is pretty big for open source calculating software, about
>>>1.9MB fortran code, so bit time consuming to figure out for someone who
>>>isn't a non-meteorological expert.
>>>
>>>E:\wrf>dir *.f* /s /p
>>>..
>>>     Total Files Listed:
>>>             141 File(s)      1,972,938 bytes
>>>
>>>Best regards,
>>>Vincent
>>>
>>>At 06:56 PM 6/29/2005 +0200, federico.ceccarelli wrote:
>>>      
>>>
>>>>Hi!
>>>>
>>>>I would like to get in touch with people running numerical meteorological
>>>>models  on a linux cluster (16cpu) , distributed memory (1Gb every node),
>>>>diskless nodes, Gigabit lan, mpich and openmosix.
>>>>
>>>>I'm tring to run WRF model but the mpi version parallelized on 4, 8, or 16
>>>>nodes runs slower than the single node one! It runs correctly but so
>>>>        
>>>>
>slow...
>  
>
>>>>When I run wrf.exe on a single processor the cpu time for every
>>>>        
>>>>
>timestep is
>  
>
>>>>about 10s for my configuration.
>>>>
>>>>When I switch to np=4, 8 or 16 the cpu time for a single step sometimes
>>>>        
>>>>
>its
>  
>
>>>>faster (as It should always be, for example 3sec for 4 cpu ) but often
>>>>        
>>>>
>it is
>  
>
>>>>slower and slower (60sec and more!). The overall time of the simulation is
>>>>bigger than for the single node run...
>>>>
>>>>anyone have experienced the same problem?
>>>>
>>>>thanks in advance to everybody...
>>>>
>>>>federico
>>>>
>>>>
>>>>
>>>>Dr. Federico Ceccarelli (PhD)
>>>>-----------------------------
>>>>    TechCom snc
>>>>Via di Sottoripa 1-18
>>>>16124 Genova - Italia
>>>>Tel: +39 010 860 5664
>>>>Fax: +39 010 860 5691
>>>>http://www.techcom.it
>>>>
>>>>_______________________________________________
>>>>Beowulf mailing list, Beowulf at beowulf.org
>>>>To change your subscription (digest mode or unsubscribe) visit
>>>>        
>>>>
>>>http://www.beowulf.org/mailman/listinfo/beowulf
>>>      
>>>
>>>>        
>>>>
>>
>>    
>>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>  
>


-- 
Michael Will
Penguin Computing Corp.
Sales Engineer
415-954-2887
415-954-2899 fx
mwill at penguincomputing.com 




More information about the Beowulf mailing list