[Beowulf] Why one might want a bunch o' processors under your desk.

Vincent Diepeveen diep at xs4all.nl
Mon May 9 13:40:39 PDT 2005

At 05:49 PM 5/6/2005 -0700, Jim Lux wrote:
>Today I was running a lot of antenna models, using a method of moments code 
>called NEC4 (in FORTRAN).
>Just to describe the computational task for context:
>The antenna I am modeling is 9 patches, in a square grid, the middle one of 
>which is excited.
>Basically, it breaks the antenna into a whole bunch of segments (1981 of 
>them in my model), calculates the interactions between them (i.e. if you 
>have a voltage in segment i, what current does that induce in segment j) 
>making a square matrix some 2000x2000.  It then solves for the currents 
>given an excitation, giving you a vector of some 1981 currents.  The sum of 
>the radiated fields is calculated at points covering a hemisphere (8326 of 
>them in my case).
>The process is, as one might imagine, highly compute intensive (i.e. 
>there's not much disk access going on, after the 10 line file describing 
>the geometry has been read in).  The matrix math routines have been (I've 
>been told) highly optimized for the Pentium architecture.
>On a P4 1.7GHz, it uses 64MB of RAM, and takes about 21 seconds to fill the 
>matrix (that's the calculating the interactions part), and about 49 seconds 
>to calculate the currents.  Calculating the actual far field pattern takes 
>about 126 seconds. (Interestingly, I ran it on my new HP Tablet too, which 
>has a Pentium M 1GHz(nom) processor.. the first two steps were about the 
>same speed (slightly faster), but the last was much slower, and the fan was 
>merrily spinning... I suspect it slowed down because it got too hot (speed 
>reported as 590 MHz, not 1000 MHz)
>OK.. both of these are Windows (2000 or XP), but run times on comparable 
>Linux systems are about the same (not much OS activity going on in either 
>In any case, this is a long enough run time (3.5 minutes) that it's not 
>interactive.  It's definitely a "start it and go get coffee down the hall" 
>time span.
>This is for one frequency.  Now, say I wanted to run the model for, say, 
>100 different frequencies (which I do).  We're looking at 350 minutes, or 
>the better part of a day.
>Or, more importantly, I want to assess the effects of small changes in the 
>orientation and position of the elements (what are my construction 
>tolerances?). Maybe a Monte Carlo analysis, changing parameters using some 
>random numbers.  Can I arrange them differently and get more tolerance?
>In any case, this is a problem ripe for parallelizing. One could 
>parallelize the pattern computation (i.e. calculate the matrix once and 
>split the 8000 points among multiple processors).  Or, for the parametric 
>studies, calculate different frequencies (which requires redoing the 
>matrix, since everything is wavelength dependent) on different processors.
>It's a design problem ripe for interaction too.  There's a lot of 
>parameters I can change (size and shape of the patches, segmentation, 
>spacing, etc.), so running a "try all possible values of all variables 
>overnight" strategy won't work.  Equally poor would be a "submit massive 
>batch job to the JPL DELL 1024 processor cluster", mostly because the 
>design space probably spans several thousand parameter combinations.  I 
>want to try a few things, then try some more, and use my experience to 
>guide the process, not depend on a optimizing program, for which I'd have 
>to come up with a goal function that is sort of ill-defined.
>Oh yeah, what I REALLY want to do is simulate an antenna with several 
>hundred patches, not just 9.
>What I DON'T want to do is rewrite (or even recompile) the antenna modeling 
>code. It works, it's been validated, it's been optimized (to a certain 
>extent), and besides, my job is to use the code, not to rewrite it for 
>parallel computing.

You know, i can get very sad reading that. 

I worked for 1.5 years real hard (i have worked several months, 7 days a
week, from 9 AM to 11 PM or later even) to get a hard to parallellize
algorithm to work on a 512 processor SGI origin3800, without being able to
test on the machine.

If you can get system time on a 1024 processor machine for how many cpu
hours is it? That means that the organisation in question is spending on
you tens of thousands of dollars of system time and probably even more to
salaries of the organisations guarding the machine.

You aren't even prepared to do hard work to let the program run more
efficient within the system time given?

>And yes, there are approximations, better modeling codes, etc. 
>available.  But again, I'd like to avoid having to track them down, 
>validate them, and so forth. I want to run my tried and true (but slow) 
>code, faster.
>I suspect that I am not alone.  There are probably hundreds of people who 
>have similar kinds of problems, and would be well served by a desktop or 
>personal supercomputer.
>Flame On!!

If you are not prepared to modify the software, 
then basically i'm missing the point of the problem presented.

Any way to run it more efficient involves re-programming the software.

Matrix type stuff is very well possible to parallellize.

>James Lux, P.E.
>Spacecraft Radio Frequency Subsystems Group
>Flight Communications Systems Section
>Jet Propulsion Laboratory, Mail Stop 161-213
>4800 Oak Grove Drive
>Pasadena CA 91109
>tel: (818)354-2075
>fax: (818)393-6875
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list