[Beowulf] Re: vectors vs. loops

Wed Apr 27 08:42:50 PDT 2005

> However, most code doesn't vectorize too well (even, as you say, with
> directives), so people would end up getting 25 MFLOPs out of 300 MFLOPs
> possible -- faster than a desktop, sure, but using a multimillion dollar
> machine to get a factor of MAYBE 10 in speedup compared to (at the time)
> $5-10K machines.

What the people who run these centers have told me that a
supercomputer is worth the cost if you can get a speed up of 30x over
serial. What do others think of this?

> The moral of this particular story is to NOT try to force code onto a
> vector environment unless it is, really, a vector task.  Indeed, don't
> force code into a PARALLEL environment (e.g. into PVM or MPI) unless it
> is a NON-TRIVIAL parallel task (I spent a lot of time rewriting my code
> as master-slave stuff in PVM, only to finally realize that EP tasks are
> more easily managed by just running the damn jobs independently via e.g.
> a script and accumulating results with other scripts, because writing
> ROBUST PVM (or MPI) code -- code that can survive a casual reboot or
> interruption of any particular node -- is Not Easy.

:) I needed to do some CHARMM runs this summer. The X1 did not like it
much (neither did I, but when the code is making references to punch
cards and you are trying to run it on a vector super, I think most
would feel that way), I ended up running it in parallel by a similar
method as yours. Worked great!

> If it IS a vector (or nontrivial parallel, or both) task, then the
> problem almost by definition will EITHER require extensive "computer
> science" level study -- work done with Ian Foster's book, Amalsi and
> Gottlieb for parallel and I don't know what for vector as it isn't my
> area of need or expertise and Amazon isn't terribly helpful (most books
> on vector processing deal with obsolete systems or are out of print, it
> seems).

So what we should really be trying to do is matching code to the
machine. One of the problems that I have run into is that unless one
is at a large center there are only one or two machines that provide
computing power. Where I am from we have a X1 and T3E. Not a very good
choice between the two. There should be a cluster coming up soon,
which will give us the options that we need. ie Vector or Cluster.

The manual for the X1 provides some information and examples. Are the
Apple G{3,4,5} the only processors who have real vector units? I have
not really looked at SSE(2), but remember that they were not full
precision.

> For me, I just revel in the Computer Age.  A decade ago, people were
> predicting all sorts of problems breaking the GHz barrier.  Today CPUs
> are routinely clocked at 3+ GHz, reaching for 4 and beyond.  A decade

I just picked up a Semptron 3000+, 1.5GB RAM, 120GB HD, CD-ROM, video,
10/100 + intel 1000 Pro for $540 shipped. I was amazed.