[Beowulf] Why one might want a bunch o' processors under your desk.

Tue May 10 05:42:39 PDT 2005

At 05:34 PM 5/9/2005 -0700, Jim Lux wrote:
>At 01:40 PM 5/9/2005, Vincent Diepeveen wrote:
>>At 05:49 PM 5/6/2005 -0700, Jim Lux wrote:
>> >Today I was running a lot of antenna models, using a method of moments
code
>> >called NEC4 (in FORTRAN).
>> >Just to describe the computational task for context:
>> >
>> >The antenna I am modeling is 9 patches, in a square grid, the middle
one of
>> >which is excited.
>> >
>> >
>> >
>> >What I DON'T want to do is rewrite (or even recompile) the antenna
modeling
>> >code. It works, it's been validated, it's been optimized (to a certain
>> >extent), and besides, my job is to use the code, not to rewrite it for
>> >parallel computing.
>>
>>You know, i can get very sad reading that.
>>
>>I worked for 1.5 years real hard (i have worked several months, 7 days a
>>week, from 9 AM to 11 PM or later even) to get a hard to parallellize
>>algorithm to work on a 512 processor SGI origin3800, without being able to
>>test on the machine.
>>
>>If you can get system time on a 1024 processor machine for how many cpu
>>hours is it? That means that the organisation in question is spending on
>>you tens of thousands of dollars of system time and probably even more to
>>salaries of the organisations guarding the machine.
>>
>>You aren't even prepared to do hard work to let the program run more
>>efficient within the system time given?
>>
>> >And yes, there are approximations, better modeling codes, etc.
>> >available.  But again, I'd like to avoid having to track them down,
>> >validate them, and so forth. I want to run my tried and true (but slow)
>> >code, faster.
>> >
>> >I suspect that I am not alone.  There are probably hundreds of people who
>> >have similar kinds of problems, and would be well served by a desktop or
>> >personal supercomputer.
>> >
>> >Flame On!!
>>
>>If you are not prepared to modify the software,
>>then basically i'm missing the point of the problem presented.
>>
>>Any way to run it more efficient involves re-programming the software.
>>
>>Matrix type stuff is very well possible to parallellize.
>
>Actually, this describes the basic problem in the high performance 
>computing area very well.. The people who have jobs that "need" HPC don't 
>have the skills or time or resources to modify their code to use some 
>particular computational resource.
>
>So you have a resource (a very high performance computational system) that 
>goes begging looking for work, because there's some other "non-free" 
>resource needed to effectively use it (that is, skilled software people). I 
>should point out that JPLs 1024 processor Dell Xeon cluster is actually 
>heavily used, as are the Cray and the SGI machines, so my comments are of a 
>general nature.
>
>And, yes, the organization IS paying hundreds of thousands of dollars to 
>provide a shared resource, just as it pays for the buildings, the library, 
>and so forth.  And, none of these resources are "free", even if they come 
>as part of the institutional overhead.
>
>But, at some point, you have to decide whether to allocate your resources 
>to developing software, or working on your particular problem, for which 
>the software is merely a tool.  You do a cost benefit analysis: do I spend 
>a work month of time on parallelizing some code, so that the remaining 4 
>months worth of work takes only 2 months? Or, do I just soldier on with the 
>old slow code, and adapt my working style to making overnight runs.
>
>Then, there's also the situation that even if you DID have the money, you 
>might not have the people resources. It's very difficult to "buy" a few 
>weeks' time of a skilled developer. If they're skilled, they're probably 
>busy and fully subscribed. If I have to wait a month for them to fit me 
>into their schedule, I might as well have been running the old slow code, 
>and be partway to my end point.
>
>And then there's the granuarity of purchase problem.  If the 10 skilled 
>developers are already fully occupied, my little one work month increment 
>of work would require hiring a whole additional person, which my little 
>research task could not afford.
>
>Add to this the fact that for most codes, it would probably take many many 
>work months to significantly improve and modify them. It's a full time job 
>in itself. And that's assuming that you have sufficient visibility into the 
>code to do it.  What if you're stuck with a tool that is ONLY available as 
>a compiled program (and such things are not particularly 
>uncommon).  Imagine trying to modify OpenOffice to use Base 9, instead of 
>Base 10.  Sure, the source is available, and the actual change might be 
>quite simple, once you knew where to change it.  The problem is that it 
>would probably take you a year to find the 4 or 5 essential routines, and 
>to make sure that everything still worked after you were done.
>
>
>So... the trick is to find a way to make cluster (or super) computing 
>usable in a transparent fashion?  This is one reason why people buy 
>mainframes, after all.  You can run the same old code, faster. It's the 
>original concept that Cray had.  Run your unchanged FORTRAN program, a LOT 
>faster.  It's the original concept behind a system I worked on back in the 
>80s, where the idea was to build a 80286 emulator out of fast ECL, so that 
>IBM PC software could be run lots faster.  Not particularly clever, but 
>still, elegant in a kind of perverse way.
>
>If the reconfiguration extends to maybe an hour or two of setting up 
>(because that's essentially what it takes to install a new software 
>package), you'll find that people are willing to do it.  But if it takes 
>weeks and weeks, you'll not get many takers.
>
>It's not laziness, nor a lack of desire, just a lack of appropriate
resources.

Honesty, if you ask me, the only reason it happens is because the
government pays the bill and not you.

>
>
>
>
>
>
>