[Beowulf] CCL:Opteron or Nocona ? (fwd fromm.somers at chem.leidenuniv.nl)

Vincent Diepeveen diep at xs4all.nl
Tue May 10 05:29:59 PDT 2005

At 08:05 PM 5/9/2005 -0400, Mark Hahn wrote:
>> Maybe the license terms changed again, but can please anyone clarify.
>you can't use the intel compiler under the noncom license 
>if you're using it for anything for which you are compensated.
>if you're an opensource developer who is paid to develop, 
>you can't use it.  students can use it on classwork, assuming
>they're not being compensated for going to school ;)
>personally, this seems silly: the compiler exists for the purpose of 
>getting people to use Intel chips, and to do so effectively.
>it should be a loss leader - heck, Intel should pay people to use it!
>support is completely orthogonal to this.

Yes this is a problem i have with intel c++, it's promoting its own chips
in a mercilous manner, violating sometimes even standards that describe how
accurate floating point must be.

Now the standards are very very polite, so they describe certain datatypes
to have an accuracy far underneath from what scientists expect from it.

If you have a 64 bits register which allows n bits accuracy very easily, 
standards describe it to have at least m bits where m is a lot smaller than n,
then i find it very bad that by default intel c++ compiler is using m bits.

Some of us understand far more than i do that the itanium2 doesn't have a
divide instruction in hardware, but the average scientist doesn't. I was
for example unaware that itanium2 doesn't have a rotate instruction. Pretty
crucial instruction for the random generator i use (opteron 3ns per
generated number, itanium2 9ns, k7 13 ns). Yet if that means that the
itanium2 is just approximating a division just a little bit with the
minimum needed standards require, then that is not very nice to do this

Of course everyone will shout murder and fire now: "you do know the
floating point standards". No i do NOT know them.

If i have 64 bits, i assume that 64 bits is as accurate as possible. Not
that just 53 bits or 43 bits get used.

Now the intel c++ has a special flag at itanium2 to compile such that it
uses the full precision. Default that flag is turned off.

IMHO this is misleading scientists. 

The compare a n bits accurate result at other cpu's with m bits where m < n
from itanium2. Of course that let's the cpu look better. 

We all realize that because of roundoff errors like this entire new
theories have been invented by scientists to explain a certain behaviour (i
remember a quantum mechanica example) which in the long run then is refuted
when 1 clever scientist notices years later it is just a round off error,
caused for whatever reason.

There is no harm in making an option that allows faster floating point in a
compiler. But please, by default turn it off. 

I lost because of the many bugs in intel c++ in the past a crucial 0.5
point in a chess tournament. Though i'm under the impression that todays
intel c++ is far more bugfree than the old versions, the harm has been done
that i verify very very well before using a compiler.

>my guess is that some Intel PHB got bitched at by a competing SW vendor.
>either that or some MBA got of his/her leash.
>I would very much like to see a good comparison of Intel compilers
>versus the alternatives.  I know good things about Pathscale, for instance,
>and gcc 4.0 and 4.1 seem pretty impressive.

gcc 4.0.0 is a big step forward. 3.3 and 3.4 series were SO buggy. 4.0.0
compiles the software bugfree here with pgo.

>so: how would you go about comparing a set of compilers?  assuming you 
>want something more general than just running it on your pet code.

you hit a real crucial point here. the only way is to let a few objective
testers test a lot of different commercial softwares.

Spec is doing its best, but we all know about the huge cooking that happens
from which intel c++ is the best example, but not the biggest speedup we
saw. I remember a Sun compiler trick speeding up a certain floating point
program what was it 7 times or so?

Spec wants the source code and sells that to those who buy a license from

That simply removes 99% of all interesting software to get tested by them.

I feel the right solution would be a way faster rotation cycle. Spec needs
like 7 years to decide what programs to use for testing. If they would do
it every year, that gives less cooking time.

Right now the different compiler teams already have had like what is it 2
years time or so to start cooking for specint2004 ?

That's way too long and removes the fun of looking at benchmark results there.

In case of highend, what we require is a few programs like one way ping
pong test that measures for all processors simultaneously. And even more
importantly a bandwidth test that measures the average bandwidth each
processor can get simultaneously.

Right now routers are having if you ask me a special cache strategy in
order to be fast for bandwidth streaming and predicting from point A to B,
whereas they are completely ughing out when you run a program on all
processors at the same time and all processors are busy streaming at the
same time.

This can be a pretty simplistic program.

It is just an example, for programmers having raw data performance is very
important, as the network is the weak link of a cluster/supercomputer
always of course. 

Right now you have to guess yourself how much of the performance of a
certain type program is because of compiler cooking and how much of it is
because a machine is faster.

We can forgive manufacturers giving out themselves good measurements.

Is NUMALINK4 of SGI faster than NUMALINK3?

I do not know. Have you ever seen one way ping pong latencies at numalink4
from processor 0 to processor 511, or even within 1 partition of 64

I never have seen it. Whereas for programmers it is the utmost crucial
thing to know.

>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list