[Beowulf] Re: vectors vs. loops

Vincent Diepeveen diep at xs4all.nl
Tue May 3 06:49:23 PDT 2005


At 07:58 AM 5/3/2005 -0400, Joe Landman wrote:
>With all due respect, I think Robert is correct.  The majority of 
>existing scientific code base is serial code with very limited (if any) 
>vectorizable content, or parallelizable content.  This is in large part 
>due to the way people write them.

You can try to talk about "the majority of software", which IMHO is
irrelevant. Relevant is which amount of system time software uses.

There is hard statistical evidence, as you can see in the "supercomputer
reports" where the majority of system time goes to.

It's undeniable that a few type of applications eat the majority of the
available system time and those applications are embarrassingly parallel.

With respect to bio-informatics, that just eats 0.5% system time of total
supercomputer/supercluster system time.

We really talk about a tiny fraction. Even then the few bio-informatics
programs i know that eat lots of system time are busy with matrix
calculations and invariant calculations, both of which are embarrassingly
parallel.

In case of searching algorithms (my own specialty), they are usually very
hard to parallellize (years of fulltime work usually to get it
parallellized). 

Where those programs eat more system time at home-pc's of users, than all
supercomputers have in combined power, they eat probably the smallest
amount of system time from supercomputers/superclusters directly. There is
actually just 2 serious chessprograms that work at a cluster that's my own
Diep and Hydra from the Sheik Tahnoon Bin Zayed. Both cluster versions are
private owned programs. Not mass software.

Best regards,
Vincent

>Aside from that, simply looking at the field of bioinformatics, I would 
>be hard pressed to find a code capable of usting long vectors out of the 
>box (HMMer can use short vectors, and even then a little rewriting is 
>needed).  These codes tend to be trivially/embarrassingly parallel though.
>
>The issue may be one of labeling.  What you interpret as "the majority 
>of scientific codes" may be very different than what I interpret as "the 
>majority ..." and what Robert interprets as "the majority ...". 
>Supercomputing and high performance computing in a more general sense, 
>is not just about numerically intensive codes anymore.  IDC reports that 
>the largest fractions of machine purchased for HPC in recent years have 
>been going for "scientific research" and "life science computing".  The 
>latter is effectively unvectorizable, and the former has a small 
>fraction of overall content that is vectorizable.
>
>The question is whether or not you consider BLASTing 10000 ESTs vs the 
>nt database to be a supercomputing problem.  I do (as do a fair number 
>of others).  Many old-timer linear algebra folks do not.
>
>Joe
>
>Joachim Worringen wrote:
>> Robert G. Brown wrote:
>> 
>>> However, most code doesn't vectorize too well (even, as you say, with
>>> directives), so people would end up getting 25 MFLOPs out of 300 MFLOPs
>>> possible -- faster than a desktop, sure, but using a multimillion dollar
>>> machine to get a factor of MAYBE 10 in speedup compared to (at the time)
>>> $5-10K machines.  In the meantime, I'm sure that there were people who
>>> had code that DID vectorize well pulling their hair because of all those
>>> 100 hour accounts that basically wasted 90% of the resource.
>> 
>> 
>> This general statement is just wrong. Many scientific codes *do* 
>> vectorize well, and in this case, you do not get <10% of peak as you 
>> indicated, but typically 30 to 50% or even more (see i.e. Leonid 
>> Oliker's recent papers on this topic) for the *overall application* (the 
>> vectorized loop alone is close to 100%) . This is a significant difference.
>> 
>> Another 'legend' comes up in your statement: a single node of a vector 
>> machine does not cost "multimillion" dollars. The price factor is quite 
>> close to the performance factor for suitable applications.
>> 
>>> My own code doesn't vectorize too well because it isn't heavy on linear
>>> algebra and the loops are over 3 dimensional lattices where
>>> nearest-neighbor sums CANNOT be local in a single memory stream and
>> 
>> 
>> Real vector architectures have very efficient scatter/gather memory 
>> operations, and support indirect addressing efficiently as well.
>> 
>>  Joachim
>> 
>
>-- 
>Joseph Landman, Ph.D
>Founder and CEO
>Scalable Informatics LLC,
>email: landman at scalableinformatics.com
>web  : http://www.scalableinformatics.com
>phone: +1 734 786 8423
>fax  : +1 734 786 8452
>cell : +1 734 612 4615
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
>



More information about the Beowulf mailing list