<div><br></div><div><br></div><div class="gmail_quote">On Sun, May 17, 2009 at 12:04 PM, Jan Heichler <span dir="ltr"><<a href="mailto:jan.heichler@gmx.net">jan.heichler@gmx.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<div>


<p>Hallo Tiago,</p>

<p><br></p>

<p>Sonntag, 17. Mai 2009, meintest Du:</p><div class="im">

<p><br></p>

<div><table border="0" cellpadding="1" cellspacing="2" style="background-color:#ffffff"><tbody><tr valign="top"><td width="2" style="background-color:#0000ff"><br>

</td><td width="1683">

<p><span>On Sat, May 16, 2009 at 11:56 PM, Rahul Nabar <</span><a href="mailto:rpnabar@gmail.com" target="_blank">rpnabar@gmail.com</a><span>> wrote:</span></p>

<p><br></p>

<p><span>On Sat, May 16, 2009 at 2:34 PM, Tiago Marques <</span><a href="mailto:a28427@ua.pt" target="_blank">a28427@ua.pt</a><span>> wrote:</span></p>

<p><span>> One of the codes, VASP, is very bandwidth limited and loves to run in a</span></p>

<p><span>> number of cores multiple of 3. The 5400s are also very bandwith - memory and</span></p>

<p><span>> FSB - limited which causes that they sometimes don't scale well above 6</span></p>

<p><span>> cores. They are very fast per core, as someone mentioned, when compared to</span></p>

<p><span>> AMD cores.</span></p>

<p><br></p>

<p><span>Thanks Tiago. This is super useful info. VASP is one of our major</span></p>

<p><span>"users" too. Possibly 40% of the cpu-time. Rest is a similar</span></p>

<p><span>computational chemistry code, DACAPO.</span></p>

<p><br></p>

<p><span>It would be interesting to compare my test-run times on our</span></p>

<p><span>AMD-Opterons (Barcelona). Is is possible to share what your benchmark</span></p>

<p><span>job was?</span></p>

<p><br></p>

<p><span>I'll try to talk to the user who crafted it for me before, but it should be no problem to pass it to you after.</span></p>

<p><br></p>

<p><span> </span></p>

<p><br></p>

<p><span>Since you mention VASP is bandwidth limited do you mean memory</span></p>

<p><span>bandwidth or the interconnect? Maybe this question itself is naiive.</span></p>

<p><span>Not sure. What interconnect do you use? We have gigabit ethernet dual</span></p>

<p><span>bonded.</span></p>

<p><br></p>

<p><span>Memory bandwith, as you can see by the performance gain from going to 1600MHz from 1066, even with looser timings IIRC.</span></p>

<p><span>Of course interconnects also play a role, even internal ones, which in the case of Xeons was a very slow FSB.</span></p>

<p><br></p>

<p><span>I use single GbE because for as much as I could benchmark, I hardly found anything that could use more than one node efficiently and no one - not even here - could help me with that. Seems I need infiband. </span></p>


<p><span>I only managed to increase 33% with two nodes when using a really huge job(+100k atoms) on Gromacs.</span></p>

</td></tr></tbody></table>

</div>

<p><br></p>

</div><p>For VASP you should look for ConnectX or InfiniPath. InfiniHost III scales badly for the scenarios i saw. It is probably because of the use of collectives. </p><div class="im">

<p></p></div></div></blockquote><div>Ok, tks. I'll look into that for a future upgrade.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div><div class="im">

<p> </p>

<div><table border="0" cellpadding="1" cellspacing="2" style="background-color:#ffffff"><tbody><tr valign="top"><td width="2" style="background-color:#0000ff"><br>

</td><td width="1683">

<p><span> Which brings to a point that I forgot to mention to you. When considering Intel machines, you can always get a compiler license for $2000, give or take,</span></p>

</td></tr></tbody></table>

</div>

<p><br></p>

</div><p>2000 USD sounds rather expensive. Node locked licenses are usually cheaper... Look for the package with Compilers, MKL and MPI - the Cluster Toolkit. Is definitely worth it (when buying more than just a single machine).</p>


<p></p></div></blockquote><div></div><div>It wasn't, Intel only required me that we purchase one license that can be used to compile software for all nodes. Given the price of the 8 nodes, it was a small percentage of the total. Still, it seems to be cheaper, I've checked the Cluster Toolkit Compiler Edition for Linux costs $1699.</div>

<div>I mostly only use ICC, Ifort and sometimes MKL. Even MKL isn't faster than GotoBLAS, although not by much, and I only use it when I need LAPACK routines.</div><div></div><div>Best regards,</div><div></div><div>Tiago Marques</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div><p> </p><font color="#888888">

<p><br></p>

<p>Jan                            </p>


</font></div></blockquote></div><br><div><br></div>