[Beowulf] CCL:Question regarding Mac G5 performance (fwd from mmccallum at pacific.edu)

Thu May 20 20:41:02 PDT 2004

At 06:01 PM 5/20/2004, Joe Landman wrote:
>On Thu, 2004-05-20 at 20:26, Michael Huntingdon wrote:
> > At 01:23 PM 5/20/2004, Joe Landman wrote:
> > >On Thu, 2004-05-20 at 02:05, Michael Huntingdon wrote:
> > > > I've spent some time sifting through the attached numbers. Though 
> not each
> > >
> > >Any paper that starts out praising spec as a "good" predictive benchmark
> > >is suspect.  Benchmarking is difficult to do right, in large part
> > >because of deceptively simple scoring functions (time, frequency (not of
> > >the CPU, but number of iterations per unit time), ...).
> > Although SPEC was included among others, I didn't see where Martyn Guest
> > "praised" SPEC.
>
>Page 5: "One of the most useful indicators of CPU performance is
>provided by the SPEC ... benchmarks"
>
>Many folks would take issue with the exact utility of these benchmarks.

Again...SPEC was included, among a number of others. Would you happen to 
have a cross section of code relevant to computational chemistry that might 
offer a fair example of platform performance? I'd enjoy a chance to get it 
into the labs and see how Itanuim 2 might stack up.

It's hard to overlook the bias in all the assumptions suggested. Those 
funding the project that led to the papers referenced, as well as those who 
took part in the testing/reporting, by way of their careers must have 
followed procedures to ensure the system/os/compiled code represented a 
fair and balanced environment.

We support Pentium, Xeon, Opteron, PA, Alpha and Itanuim 2 solutions for 
higher education, so I have no particular ax to grind. Puzzled though, in 
as much as Martyn is well published over many years, how his results find 
this type skepticism. I hope you have an opportunity to contact him directly.

regards
michael

>At least he points out later in the document (page 6) some of the
>serious flaws in the benchmark.  The problem is that he continues to use
>it as a valid scoring function.  We can argue and debate over this, but
>the numbers are of highly dubious value at best.
>
> >
> > >Further, looking these over, I did not see much of a discussion (though
> > >it is implied by the use of certain compilers) of the effects of things
> > >like SSE2 in the P4, memory alignment, 32/64 bit
> > >compilation/optimization, use of tuned libraries where available...
> > >Given the sheer number of machines tested, it is unlikely that they used
> > >up to date compilers (latest gcc's are better than earlier gcc's for
> > >performance), or recompiled the binary for all the different platforms
> > >to run native.  The ifc results seem to indicate that they used SSE2 on
> > >P4's but probably used plain old 32 bit code on Opterons.
> >
> > I wouldn't begin to speculate; however, would hope Daresbury Laboratory 
> and
> > Martyn Guest were working to advance research, using the best technology
> > available for each platform. I didn't see anything in their mission
> > statement which leads me to think otherwise.
>
>No one is implying that they would do anything less than advance the
>state of knowledge.  It is important to note that little information (I
>may have missed it, so please do point it out if you find it) exists on
>the use of the -m64 gcc compilation for Opterons (gets you a nice
>performance boost in many cases, and in a number of chemistry
>applications I have worked with), or the ACML libraries for high
>performance *GEMM operations on AMD, or the Altivec compilation/math
>libraries, or the SGI performance libraries, ... etc.  That is, as I
>implied, it would be quite difficult for the lab to a) test all the
>machines, b) test all the machines optimally.  In fact, they
>specifically indicate that they could not do so (see page 4) due to time
>constraints.
>
>While the information in here does appear to be useful (and I did not
>state otherwise), it does not constitute an exhaustive investigation of
>machine performance characteristics.  It does appear to compare how well
>some programs ran on limited time loaner machines, donated hardware,
>etc.  Which means the operative issue is to get results quickly and hope
>you can do some fast optimization.
>
>It would be dangerous to draw conclusions beyond the text which the
>authors specifically caution against.
>
> > > > lends itself to hp Itanium 2, there appears to be a very balanced 
> trend.
> > >
> > >... in a specific set of operations relevant for specific classes of
> > >calculation.
> >
> > The tables cover a wide range of benchmarks specific to the interests of
> > those working in computational chemistry. With respect to this, the rx2600
> > (Itanium 2 based) ranked among the top ten (with the exception of table 4
> > where it was ranked #11). Averaged out, the tables reflect an overall
> > rating of 5.86 among the 400 platforms tested. My initial conclusion may
> > have been less than scientific, but I'll stay with it for now.
>
>Thats fine.  You of course are entitled to your opinion.
>
>You asked a simple question as to why there is not more discussion of
>this in these and other circles.  Well, other people are entitled to
>their opinions, and it appears the market is indeed deciding between
>competitors.
>
>Aside from this, "benchmarks" are problematic to do right, in a
>completely non-biased manner.  These benchmarks are interesting, but
>there was not enough detail given of the systems for others to try to
>replicate the work.  For example, which OS, specific compiler versions,
>patches were used?  For the non-spec codes, which compilation options
>were used?  For the chips with SIMD capability, was it used (P4,
>Opteron, G5)?  How was memory laid out?  Was any attention paid to
>processor affinity and related scheduling?
>
>Remember that using the ifc/efc compilers with the Itanium chips as well
>as the Pentium chips gives you a significant leg up in performance as
>compared to using the gcc system on similar architectures.  Moreover,
>there is a performance penalty to be paid for not picking the compiler
>options carefully under gcc or ifc.
>
> > >Not everyone in HPC does matrix work, eigenvalue extraction, etc.  Some
> > >of us do things like string/db searching (informatics).  There, the
> > >numbers look quite different.
> >
> > My comments referenced numerically intensive research rather than I/O
> > intensive environments. I'm surprised 8GB of memory is enough to sustain
> > superior performance when searching very large data sets normally
> > associated with bio-informatics.
>
>8GB is enough for some, not enough for others.  Some projects I have
>worked on
>(http://www.sgi.com/newsroom/press_releases/2001/january/msi.html) have
>used a few processors and a little memory.
>
>Again, as indicated by many others (and myself), the only things that
>matter are your (the end users) tests, with real data.  I am intrigued
>by Martyn's chemistry tests, and when I get a free moment, I will send a
>note about possibly including a more protein oriented set of tests into
>BBS (http://bioinformatics.org/bbs) v2 due out soon (yeah I know I keep
>saying that, but it is soon...)
>
> >
> > Ciao~
> > Michael
> >
> >
> > >[... snip ...]
> > >
> > >--
> > >Joseph Landman, Ph.D
> > >Scalable Informatics LLC,
> > >email: landman at scalableinformatics.com
> > >web  : http://scalableinformatics.com
> > >phone: +1 734 612 4615
>--
>Joseph Landman, Ph.D
>Scalable Informatics LLC,
>email: landman at scalableinformatics.com
>web  : http://scalableinformatics.com
>phone: +1 734 612 4615