Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Clarification: [Beowulf] hpl - large problems fail

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Paul Johnson redboots at ufl.edu
Thu Mar 10 14:56:16 PST 2005


Guy Coates wrote:

>On Thu, 10 Mar 2005, Paul Johnson wrote:
>
>  
>
>>All:
>>
>>I have a 4 node cluster(dont snicker :) )
>>    
>>
>
>Everyone starts off small.
>
>and Im trying to do some
>  
>
>>benchmarking with HPL.  I want to test 2 of the nodes with 1Gb of
>>ram each.  I calculated the maximum problem size that can fit in 2Gb
>>and still allow for memory for the operating system.  That came out to
>>be around 14500x14500.  When I run that size of a test it always fails.
>>The largest problem that I can test and not have it fail on me is
>>12500x12500.
>>What is the reason behind this?  Im confused on what is going on here.
>>Thanks for any help.
>>    
>>
>
>
>Do you know what actually caused the failure?
>
>If your problem size was too big, and you are really out of memory, you
>should see some messages in the system log saying the out-of-memory-killer
>was activated and HPL was zapped.
>
>If you know your machines was not actually out of memory, then you have
>broken hardware on one of your nodes. Run memtest+ or memtest on your
>nodes (Possibly the world's most useful pieces of diagnostic software).
>
>http://www.memtest86.com
>http://www.memtest.org
>
>
>If you haven't seen it, IBM have a redpaper on tuning HPL, which gives
>some good starting parameters, problem-sizing tips and an overview of
>different BLAS libraries you can compile against to get that extra few
>Gflops of performance.
>
>Cheers,
>
>Guy
>
>  
>
I should have been more clearer in my description.  It doesn't fail at 
the command prompt when I run it.  It fails when it checks the solution 
to linear equations.  The residual is too high and fails.  This is part 
of the data from my HPL.out file:

============================================================================
T/V                N    NB     P     Q               Time             Gflops
----------------------------------------------------------------------------
WC12R2L4       14500    64     1     2             388.43          5.233e+00
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1  * N        ) =   284363.4669186 ...... FAILED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =   210262.3627204 ...... FAILED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =    41377.6398965 ...... FAILED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . =           0.001692
||A||_oo . . . . . . . . . . . . . . . . . . . =        3708.772315
||A||_1  . . . . . . . . . . . . . . . . . . . =        3695.221759
||x||_oo . . . . . . . . . . . . . . . . . . . =           6.847285
||x||_1  . . . . . . . . . . . . . . . . . . . =       19610.120504
============================================================================

Sorry for the confusion,
Paul

-- 
Paul Johnson
Graduate Student - Mechanical Engineering
University of Florida - Gainesville, Fl
http://plaza.ufl.edu/redboots

Reclaim Your Inbox!
http://www.mozilla.org/products/thunderbird

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20050310/9349bbcf/attachment.html


More information about the Beowulf mailing list