[Beowulf] CCL:Opteron or Nocona ? (fwd fromm.somers@chem.leidenuniv.nl)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Vincent Diepeveen diep at xs4all.nlTue May 10 05:29:59 PDT 2005
- Previous message: [Beowulf] Re: CCL:Opteron or Nocona ?
- Next message: [Beowulf] standards for GFLOPS / power consumption measurement?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
At 08:05 PM 5/9/2005 -0400, Mark Hahn wrote: >> Maybe the license terms changed again, but can please anyone clarify. > >you can't use the intel compiler under the noncom license >if you're using it for anything for which you are compensated. > >if you're an opensource developer who is paid to develop, >you can't use it. students can use it on classwork, assuming >they're not being compensated for going to school ;) > >personally, this seems silly: the compiler exists for the purpose of >getting people to use Intel chips, and to do so effectively. >it should be a loss leader - heck, Intel should pay people to use it! >support is completely orthogonal to this. Yes this is a problem i have with intel c++, it's promoting its own chips in a mercilous manner, violating sometimes even standards that describe how accurate floating point must be. Now the standards are very very polite, so they describe certain datatypes to have an accuracy far underneath from what scientists expect from it. If you have a 64 bits register which allows n bits accuracy very easily, standards describe it to have at least m bits where m is a lot smaller than n, then i find it very bad that by default intel c++ compiler is using m bits. Some of us understand far more than i do that the itanium2 doesn't have a divide instruction in hardware, but the average scientist doesn't. I was for example unaware that itanium2 doesn't have a rotate instruction. Pretty crucial instruction for the random generator i use (opteron 3ns per generated number, itanium2 9ns, k7 13 ns). Yet if that means that the itanium2 is just approximating a division just a little bit with the minimum needed standards require, then that is not very nice to do this *default*. Of course everyone will shout murder and fire now: "you do know the floating point standards". No i do NOT know them. If i have 64 bits, i assume that 64 bits is as accurate as possible. Not that just 53 bits or 43 bits get used. Now the intel c++ has a special flag at itanium2 to compile such that it uses the full precision. Default that flag is turned off. IMHO this is misleading scientists. The compare a n bits accurate result at other cpu's with m bits where m < n from itanium2. Of course that let's the cpu look better. We all realize that because of roundoff errors like this entire new theories have been invented by scientists to explain a certain behaviour (i remember a quantum mechanica example) which in the long run then is refuted when 1 clever scientist notices years later it is just a round off error, caused for whatever reason. There is no harm in making an option that allows faster floating point in a compiler. But please, by default turn it off. I lost because of the many bugs in intel c++ in the past a crucial 0.5 point in a chess tournament. Though i'm under the impression that todays intel c++ is far more bugfree than the old versions, the harm has been done that i verify very very well before using a compiler. >my guess is that some Intel PHB got bitched at by a competing SW vendor. >either that or some MBA got of his/her leash. > >I would very much like to see a good comparison of Intel compilers >versus the alternatives. I know good things about Pathscale, for instance, >and gcc 4.0 and 4.1 seem pretty impressive. gcc 4.0.0 is a big step forward. 3.3 and 3.4 series were SO buggy. 4.0.0 compiles the software bugfree here with pgo. >so: how would you go about comparing a set of compilers? assuming you >want something more general than just running it on your pet code. you hit a real crucial point here. the only way is to let a few objective testers test a lot of different commercial softwares. Spec is doing its best, but we all know about the huge cooking that happens there, from which intel c++ is the best example, but not the biggest speedup we saw. I remember a Sun compiler trick speeding up a certain floating point program what was it 7 times or so? Spec wants the source code and sells that to those who buy a license from spec. That simply removes 99% of all interesting software to get tested by them. I feel the right solution would be a way faster rotation cycle. Spec needs like 7 years to decide what programs to use for testing. If they would do it every year, that gives less cooking time. Right now the different compiler teams already have had like what is it 2 years time or so to start cooking for specint2004 ? That's way too long and removes the fun of looking at benchmark results there. In case of highend, what we require is a few programs like one way ping pong test that measures for all processors simultaneously. And even more importantly a bandwidth test that measures the average bandwidth each processor can get simultaneously. Right now routers are having if you ask me a special cache strategy in order to be fast for bandwidth streaming and predicting from point A to B, whereas they are completely ughing out when you run a program on all processors at the same time and all processors are busy streaming at the same time. This can be a pretty simplistic program. It is just an example, for programmers having raw data performance is very important, as the network is the weak link of a cluster/supercomputer always of course. Right now you have to guess yourself how much of the performance of a certain type program is because of compiler cooking and how much of it is because a machine is faster. We can forgive manufacturers giving out themselves good measurements. Is NUMALINK4 of SGI faster than NUMALINK3? I do not know. Have you ever seen one way ping pong latencies at numalink4 from processor 0 to processor 511, or even within 1 partition of 64 processors? I never have seen it. Whereas for programmers it is the utmost crucial thing to know. >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > >
- Previous message: [Beowulf] Re: CCL:Opteron or Nocona ?
- Next message: [Beowulf] standards for GFLOPS / power consumption measurement?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
