[Beowulf] Digital Image Processing via HPC/Cluster/Beowulf - Basics

Sat Nov 3 15:47:53 PDT 2012

Thanks, infoative: p
I'll consider your advice.

If i read correctly, it seems the answer to the question about programming
was: yes, a program must be written to accommodate a cluster. Did i get you
right?
在 2012-11-4 上午6:11，"Mark Hahn" <hahn at mcmaster.ca>写道：

> I am currently researching the feasibility and process of establishing a
>> relatively small HPC cluster to speed up the processing of large amounts
>> of
>> digital images.
>>
>
> do you mean that smallness is a goal?  or that you don't have a large
> budget?
>
>  After looking at a few HPC computing software solutions listed on the
>> Wikipedia comparison of cluster software page (
>> http://en.wikipedia.org/wiki/**Comparison_of_cluster_software<http://en.wikipedia.org/wiki/Comparison_of_cluster_software>) I still have
>> only a rough understanding of how the whole system works.
>>
>
> there are several discrete functionalities:
> - shared filesystem (if any)
> - scheduling
> - intra-job communication (if any; eg MPI)
> - management/provisioning/**monitoring of nodes
>
> IMO, anyone who claims to have "best practices" in this field is lying.
> there are particular components that have certain strengths, but none of
> them are great, and none universally appropriate.  (it's also common
> to conflate or "integrate" the second and fourth items - for that matter,
> monitoring is often separated from provisioning.)
>
>  1. Do programs you wish to use via HPC platforms need to be written to
>> support HPC, and further, to support specific middleware using parallel
>> programming or something like that?
>>
>
> "middleware" is generally a term from the enterprise computing environment.
> it basically means "get someone else to take responsibility for hard bits",
> and is a form of the classic commercial best practice of CYA.  from an HPC
> perspective, there's the application and everything else.  if you really
> want, you can call the latter "middleware", but doing so is uninformative.
>
> HPC covers a lot of ground.  usually, people mean jobs will execute in a
> batch environment (started from a commandline/script).  OTOH HPC sometimes
> means what you might call "personal supercomputing", where an interactive
> application runs in a usually-dedicated cluster (shared clusters tend to
> have scheduling response times that make interactive use problematic.)
> (shared clusters also give rise to the single most important value of
> clusters: that they can interleave bursty demand.  if everyone in your
> department shares a cluster, it can be larger than any one group can
> afford, and therefore all groups will be able to burst to higher capacity.
> this is why large, shared clusters are so successful.  and, for that
> matter,
> why cloud services are successful.)
>
> you can do HPC with very little overhead.  you will generally want a shared
> filesystem - potentially just a NAS box or existing server.  you may not
> bother with scheduling at all - let users pick which machine to run on,
> for instance.  that sounds crazy, but if you're the only one using it, why
> bother with a scheduler?  HPC can also be done without inter-job
> communication - if your jobs are single-node serial or threaded, for
> instance.  and you may not need any sort of management/provisioning,
> depending on the stability of your nodes, environment, expected lifetime,
> etc.
>
> in short, slapping linux onto a few boxes, set up ssh keys or hostbased
> trust, have one or more of them NFS out some space, and you're cooking.
>
>  OR
>> Can you run any program on top of the HPC cluster and have it's workload
>> effectively distributed? --> How can this be done?
>>
>
> this is a common newbie question.  a naive program (probably serial or
> perhaps
> multithreaded) will see no benefit from a cluster.  clusters are just plain
> old machines.  the benefit comes if you want throughput (jobs per time) or
> specifically program for distributed computation (classically with MPI).
> it's common to use infiniband to accelerate this kind of job (as well as
> provide the fastest possible IO.)
>
>  2. For something like digital image processing, where a huge amount of
>> relatively large images (14MB each) are being processed, will network
>>
>
> the main question is how much work a node will be doing per image.
>
> suppose you had an infinitely fast fileserver and gigabit connected nodes:
> transferring the image would take 10-15ms, so you would ideally spend
> about the same amount of time processing an image.  but in this case, you
> should probably ask whether you can simply store images on the nodes in the
> first place.  if you haven't thought about where the inputs are and how
> fast they
> can be gotten, then that will probably be your bottleneck.
>
>  speed, or processing power be more of a limiting factor? Or would a
>> gigabit
>> network suffice?
>>
>
> how long does a prospective node take to complete one work unit,
> and how long does it take to transfer the files for one?
> your speedup will be limited by whatever resource saturates first
> (possibly your fileserver.)
>
>  3. For a relatively easy HPC platform what would you recommend?
>>
>
> they are all crap.  you should try not to spend on crap you don't need,
> but ultimately it depends on how much expertise you have and/or how much
> you value your time.  any idiot can build a cluster from scratch using
> fundamental open-source components, eventually.  but if said idiot has to
> learn filesystems, scheduling, provisioning, etc from scratch, it could
> take quite a while.  when you buy, you are buying crap, but it's crap
> that may save you some time.
>
> don't count on commercial support being more than crappy.
>
> you should probably consider using a cloud service - this is just
> commercial
> outsourcing - more crap, but perhaps of value if, for instance, you don't
> want to get your hands dirty hosting machines (amazon), etc.
>
> anything commercial in this space tends to be expensive.  the license to
> cover a crappy scheduler for a few hundred nodes, for instance will be
> pretty
> close to an FTE-year.  renting a node from a cloud provider for a year
> costs
> about as much as buying a new node each year, etc.
>
>  Again, I hope this is an ok place to ask such a question, if not please
>>
>
> this is the place.  though there are some fringe sects of HPC who tend to
> subsist on more and/or different crap (such as clusters running windows.)
> beowulf tends towards the low-crap end of things (linux, open packages.)
>
> regards, mark hahn.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20121104/55ab2bea/attachment.html>