<p dir="ltr">Thanks, infoative: p<br>

I'll consider your advice.</p>

<p dir="ltr">If i read correctly, it seems the answer to the question about programming was: yes, a program must be written to accommodate a cluster. Did i get you right?</p>

<div class="gmail_quote">在 2012-11-4 上午6:11，"Mark Hahn" <<a href="mailto:hahn@mcmaster.ca">hahn@mcmaster.ca</a>>写道：<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I am currently researching the feasibility and process of establishing a<br>

relatively small HPC cluster to speed up the processing of large amounts of<br>

digital images.<br>

</blockquote>

<br>

do you mean that smallness is a goal?  or that you don't have a large budget?<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

After looking at a few HPC computing software solutions listed on the<br>

Wikipedia comparison of cluster software page (<br>

<a href="http://en.wikipedia.org/wiki/Comparison_of_cluster_software" target="_blank">http://en.wikipedia.org/wiki/<u></u>Comparison_of_cluster_software</a> ) I still have<br>

only a rough understanding of how the whole system works.<br>

</blockquote>

<br>

there are several discrete functionalities:<br>

- shared filesystem (if any)<br>

- scheduling<br>

- intra-job communication (if any; eg MPI)<br>

- management/provisioning/<u></u>monitoring of nodes<br>

<br>

IMO, anyone who claims to have "best practices" in this field is lying.<br>

there are particular components that have certain strengths, but none of them are great, and none universally appropriate.  (it's also common<br>

to conflate or "integrate" the second and fourth items - for that matter,<br>

monitoring is often separated from provisioning.)<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

1. Do programs you wish to use via HPC platforms need to be written to<br>

support HPC, and further, to support specific middleware using parallel<br>

programming or something like that?<br>

</blockquote>

<br>

"middleware" is generally a term from the enterprise computing environment.<br>

it basically means "get someone else to take responsibility for hard bits",<br>

and is a form of the classic commercial best practice of CYA.  from an HPC<br>

perspective, there's the application and everything else.  if you really<br>

want, you can call the latter "middleware", but doing so is uninformative.<br>

<br>

HPC covers a lot of ground.  usually, people mean jobs will execute in a batch environment (started from a commandline/script).  OTOH HPC sometimes<br>

means what you might call "personal supercomputing", where an interactive application runs in a usually-dedicated cluster (shared clusters tend to have scheduling response times that make interactive use problematic.)<br>


(shared clusters also give rise to the single most important value of<br>

clusters: that they can interleave bursty demand.  if everyone in your department shares a cluster, it can be larger than any one group can afford, and therefore all groups will be able to burst to higher capacity.<br>

this is why large, shared clusters are so successful.  and, for that matter,<br>

why cloud services are successful.)<br>

<br>

you can do HPC with very little overhead.  you will generally want a shared<br>

filesystem - potentially just a NAS box or existing server.  you may not<br>

bother with scheduling at all - let users pick which machine to run on,<br>

for instance.  that sounds crazy, but if you're the only one using it, why<br>

bother with a scheduler?  HPC can also be done without inter-job<br>

communication - if your jobs are single-node serial or threaded, for<br>

instance.  and you may not need any sort of management/provisioning,<br>

depending on the stability of your nodes, environment, expected lifetime,<br>

etc.<br>

<br>

in short, slapping linux onto a few boxes, set up ssh keys or hostbased<br>

trust, have one or more of them NFS out some space, and you're cooking.<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

OR<br>

Can you run any program on top of the HPC cluster and have it's workload<br>

effectively distributed? --> How can this be done?<br>

</blockquote>

<br>

this is a common newbie question.  a naive program (probably serial or perhaps<br>

multithreaded) will see no benefit from a cluster.  clusters are just plain<br>

old machines.  the benefit comes if you want throughput (jobs per time) or specifically program for distributed computation (classically with MPI).<br>

it's common to use infiniband to accelerate this kind of job (as well as provide the fastest possible IO.)<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

2. For something like digital image processing, where a huge amount of<br>

relatively large images (14MB each) are being processed, will network<br>

</blockquote>

<br>

the main question is how much work a node will be doing per image.<br>

<br>

suppose you had an infinitely fast fileserver and gigabit connected nodes:<br>

transferring the image would take 10-15ms, so you would ideally spend about the same amount of time processing an image.  but in this case, you should probably ask whether you can simply store images on the nodes in the first place.  if you haven't thought about where the inputs are and how fast they<br>


can be gotten, then that will probably be your bottleneck.<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

speed, or processing power be more of a limiting factor? Or would a gigabit<br>

network suffice?<br>

</blockquote>

<br>

how long does a prospective node take to complete one work unit,<br>

and how long does it take to transfer the files for one?<br>

your speedup will be limited by whatever resource saturates first<br>

(possibly your fileserver.)<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

3. For a relatively easy HPC platform what would you recommend?<br>

</blockquote>

<br>

they are all crap.  you should try not to spend on crap you don't need,<br>

but ultimately it depends on how much expertise you have and/or how much<br>

you value your time.  any idiot can build a cluster from scratch using fundamental open-source components, eventually.  but if said idiot has to learn filesystems, scheduling, provisioning, etc from scratch, it could<br>


take quite a while.  when you buy, you are buying crap, but it's crap<br>

that may save you some time.<br>

<br>

don't count on commercial support being more than crappy.<br>

<br>

you should probably consider using a cloud service - this is just commercial<br>

outsourcing - more crap, but perhaps of value if, for instance, you don't want to get your hands dirty hosting machines (amazon), etc.<br>

<br>

anything commercial in this space tends to be expensive.  the license to cover a crappy scheduler for a few hundred nodes, for instance will be pretty<br>

close to an FTE-year.  renting a node from a cloud provider for a year costs<br>

about as much as buying a new node each year, etc.<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Again, I hope this is an ok place to ask such a question, if not please<br>

</blockquote>

<br>

this is the place.  though there are some fringe sects of HPC who tend to subsist on more and/or different crap (such as clusters running windows.)<br>

beowulf tends towards the low-crap end of things (linux, open packages.)<br>

<br>

regards, mark hahn.<br>

</blockquote></div>