<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <br>

    <div class="moz-cite-prefix">On 4/5/2013 9:43 AM, Lux, Jim (337C)

      wrote:<br>

    </div>

    <blockquote cite="mid:CD842119.2F4FD%25james.p.lux@jpl.nasa.gov"

      type="cite">

      <pre wrap="">I would think that the problem is more that you can easily stamp out

another 1000 processors than another 10 software developers.  HPC

developers are the scarce commodity, and just throwing money at it doesn't

solve the problem.</pre>

    </blockquote>

    <br>

      I'm coming from an academic perspective, but to me it seems that

    HPC developers are the scarce commodity not because they're hard to

    find (though there's some truth in that, especially for ones with

    broad experience), but because they're hard to categorize and the

    benefits of a developer are hard to quantify.   A lot of people

    outside this relatively small Beowulf community have an overly

    simplistic 'model' of how computing works -- hardware works or it

    doesn't and if it does, a scientist writes a code and gets results. 

    It's like pressing a button, basically.  Nobody thinks about

    pressing a button <i>well</i>, or pressing it <i>efficiently</i>,

    or even ensuring that whatever happens when it's pressed happens <i>fast</i>. 

    It's just assumed that it happens as fast as it can, provided the

    button is 'working'.<br>

    <br>

      In this view, buying X more nodes makes a lot more sense than

    hiring a developer.  There are never enough cycles to begin with, so

    all the developer does is help make sure the system or code

    'works'.  And if the system isn't working, well, that's what your

    systems people are for.  If the code isn't working?  Well, some

    lucky grad student will be up all night and day trying to fix it. 

    Sometimes for weeks.  Months.  Even years.  And, all this time, the

    'scientific throughput' on the whole is probably up on the system,

    due to the additional nodes, even without those few applications

    that aren't working.<br>

    <br>

      However, once you look at the <i>details</i>, often times the

    scenario changes - why, yes, those extra nodes are being used all

    the time, running an atmospheric model 24/7, on a thousand

    processors.  Except, whoops, it's using serialized I/O to your

    underlying parallel file system - so while it's 'working', you could

    be running 4-5x faster with some changes to a library and a few

    changed settings.   Is this something the scientist should know?  We

    can debate that; I'm on the side of 'no, they should focus on their

    science', but there are points to be made on each side.  The fact

    is, though, that they often <i>don't</i> notice it,<i></i> whereas

    someone whose focus is not on the scientific output but on the

    computational methods and techniques would do so quickly.   And,

    like that, the equation changes - in the parallel universe where you

    <i>hired</i> this person, you've now saved Y node-hours of

    computation, with a fairly small time-slice of your expert.  How

    does this savings compare to the new nodes you'd buy?  That depends

    on the scale of the operation, but we can run some numbers in a

    moment.<br>

    <br>

    <blockquote cite="mid:CD842119.2F4FD%25james.p.lux@jpl.nasa.gov"

      type="cite">

      <pre wrap="">That is why the real challenge for HPC is in developing smart compilers

and tools that make it easier to do HPC, even at the cost of needing more

computational resources.</pre>

    </blockquote>

    <br>

      I'm absolutely in favor of smart(er) compilers and tools, but like

    any tool, I think an expert will wield it with much greater skill,

    experience and insight than a novice.  Give a chef a knife and some

    vegetables and watch as they're turned into perfectly cut pieces. 

    Give me the same knife and set of vegetables and half would end up

    on the floor, I'd probably need a few band-aids, and we'd likely be

    ordering take-out.   Even 'simple' tools like compiler-enabled

    profiling typically presents a lot of information to a user that

    they're uncertain how to sift through at first, whereas an expert

    will know exactly what to look for, how to use it, and get results

    quickly.<br>

    <br>

      (It's occurred to me we might be using 'HPC developer' in very

    different ways -- as a dedicated person, <i>developing</i> the

    majority of a model some of this might not apply.  As a 'specialist'

    who works with other scientists to allow them to focus on their

    science while they focus on the code, calculations and systems, I

    think it applies pretty well.)<br>

    <br>

    <blockquote cite="mid:CD842119.2F4FD%25james.p.lux@jpl.nasa.gov"

      type="cite">

      <pre wrap="">Just back of the enveloping here..

Let's say a "node" costs about $3k plus about $1500 in operating costs

over 3 years. Make it a round $5000 all told. A developer costs about

$250-300k, fully burdened for a year. So I can make the choice.. Buy

another 150 nodes or pay for the developer to make the processing more

efficient.  Of course, if I buy the nodes, I get the faster computation

today. If I buy the developer, I have to wait some time for my faster

solution.</pre>

    </blockquote>

    <br>

      I haven't looked at hardware prices in a while, but I'd argue

    slightly higher hardware costs -IB, more storage, etc.- not just on

    the node, but ports for the switches, extra disks or Lustre servers,

    let's say $6K all around, including operating costs.  I admittedly

    don't know much about employee costs, but let's imagine a mid-level

    HPC specialist at a university having a salary of $75K.   The fringe

    rate at a few places I just checked hovers around 33-36% or so for

    this type of employee, so ignoring things like office space costs,

    power, incidentals, etc., and just going with the cost to the

    university as 'salary * (1.0 + fringe rate)', we come up with just

    over $100K.  Let's bump it up to $120K just because.<br>

    <br>

      For simplicity lets keep the salary static over four years (chosen

    as the life of the nodes), and now we're comparing $480K for an

    employee or 80 nodes, and what delivers better usage.   The extreme

    case of having lots of nodes is, in my opinion, pretty easy.  If I

    have 2000 nodes now, adding 80 gives me a 4% improvement in my

    capabilities, and I'll quite happily challenge anyone who thinks I

    can't improve either their codes, workflow or system by more than

    4%.  What if I have only 200 nodes, though?  Then the extra 80 gives

    me a 40% increase in job throughput.  That <i>sounds</i> pretty

    good, and in reality it might be if your codes are well-behaved,

    production-quality codes that are tuned, your scientists know how to

    modify them for new experiments, and there isn't much need of

    expertise, but 40% more cycles would help.  I've yet to meet a

    scientific department that meets that description, though.    From

    people running N^2 algorithms when N-log-N methods exist, to people

    using NFS as local scratch for large temporary files, to people

    managing literally thousands of files <i>by hand</i> for a

    parameter-sweep MC code because scripting isn't something they're

    familiar with, a decent level of expertise can <i>often</i> render

    2-3x factor improvement in usage, and <i>sometimes</i> much more,

    but the really high cases (1000x+) are pretty rare.  Still, we're

    talking a 2-3x factor being common, so even if the 'average' gain

    when normalized across all your resources is a mere 1.5x, that still

    beats 1.4.  And we're just talking about savings in CPU time, not

    savings in scientist time.<br>

    <br>

      Of course, then you have the extreme case of very <i>few</i>

    nodes - if you've only got twenty, and are looking at either 80 more

    or a person, well, it's going to depend heavily on what's being

    done, and I'd typically lean towards the nodes if I had only that

    binary choice.  If I could be creative, though, I'd aim to

    coordinate with multiple departments, buy maybe 40-60 nodes, and

    hire a person.  Besides, that expertise might help you take

    advantage of off-site resources like XSEDE even if you have zero

    nodes locally.<br>

    <br>

    <blockquote cite="mid:CD842119.2F4FD%25james.p.lux@jpl.nasa.gov"

      type="cite">

      <pre wrap="">If I buy the developer, I have to wait some time for my faster solution.

</pre>

    </blockquote>

    <br>

      Yes, if we're talking about developing entirely new methods.  But

    there's a <i>ton</i> of low-hanging fruit that exists in the mix of

    systems tuning, compiler options, <i>basic</i> code changes (not

    anything deep or time-consuming), etc., that takes hours or, at

    most, a few days, and can have massive impacts.  The serial I/O

    -> parallel I/O example way (way, way) above being one example,

    and as another, I can't tell you the number of times I've seen

    people running production runs with '-O0 -g' as their compilation

    flags.  Or, not using tuned BLAS / LAPACK libraries.  Or running an

    OpenMP-enabled code with 1 process per node but having

    OMP_NUM_THREADS set to 1 instead of 8.  Or countless other things.<br>

    <br>

      So that's my lengthy two cents in defense of why it's <i>very

      often</i> favorable to hire HPC specialists over more hardware -

    the gains are certainly much harder to quantify, and clearly more

    variable depending upon the various projects and applications in

    use, but in my experience -and in our environment- the gains in

    terms of making scientists' jobs easier and less time consuming, as

    well as the saving in CPU hours, more than makes it worth it.  

    Plus, in those rare moments we're not working, the HPC developers

    can contribute to discussions on the Beowulf list.  Surely that's

    something we can all agree we need more of!  :)<br>

    <br>

      Cheers,<br>

      - Brian<br>

    <br>

    <br>

    (PS.  One thing I cleverly avoided touching upon way back at the top

    is that while a skilled computational person can indeed ensure that

    calculations are more efficient, fast and working well, they can <i>also</i>

    make sure they're working <i>correctly</i>.  That's a can of worms

    that's a different discussion!  Getting results /= getting <i>correct</i>

    results. )<br>

    <br>

    (PPS.  Sorry for the length!)<br>

  </body>

</html>