[Beowulf] Digital Image Processing via HPC/Cluster/Beowulf - Basics
Lux, Jim (337C)
james.p.lux at jpl.nasa.gov
Mon Nov 5 14:49:26 PST 2012
What I find interesting (and which is characteristic of this application space) is the sort of bimodal requirement:
1) High performance workstation with bespoke software that the digital artist uses
2) A render farm to grind out the final product. (quite the EP task, in general)
The workflow is similar to the traditional film workflow, where each person gets a piece to work on, and then it's handed off to someone else to composite with the other pieces and build up the whole film. The artist would work in wireframe or with rendered key frames, do the changes, then send it off to be rendered. The next work day, the fully rendered product is complete and viewable.
The other interesting thing is that this problem space has HUGE disk space requirements (although instantaneous bandwidth requirements aren't all that high to stream video). It wasn't unusual in the late 80s, early 90s to see a workstation with dozens of firewire drives in a big column attached to hold the raw video. Providing a suitable multi-user server architecture is quite challenging.
Several movies in the 90s made use of what were essentially boxes of disk drives that were flown back and forth every day from one location to another. (the "nothing beats FedEx for raw bandwidth" model... getting tens of Mbps network connectivity to rural Czech Republic or Romania where the location shoot is happening isn't easy.)
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Prentice Bisbal
Sent: Monday, November 05, 2012 12:56 PM
To: beowulf at beowulf.org
Subject: Re: [Beowulf] Digital Image Processing via HPC/Cluster/Beowulf - Basics
This article is from 14 years ago, but it might be relevant to your situation. It describes how Digital Domain used a Linux 'render farm' to do the GCI for Titanic. I haven't read this article in 14 years, so I'm a little fuzzy on the details, but I think you might learn something useful from it.
Manager of Information Technology
Rutgers Discovery Informatics Institute (RDI2)
On 11/03/2012 10:12 AM, CJ O'Reilly wrote:
Thank you very much!
I'll be sure to talk to the software developer about this.
For now this project is moving slowly; still doing research (it's possible simply a single powerful computer could get this work done feasibly...)
Perhaps I'll be back around in the future though!
Thanks a bundle:)
On Sat, Nov 3, 2012 at 9:50 PM, Lux, Jim (337C) <james.p.lux at jpl.nasa.gov<mailto:james.p.lux at jpl.nasa.gov>> wrote:
1. Yes and no.. The application process needs to be "parallel aware", but for some applications that could just mean running multiple instances, one on each node, and farming the work out to them. This is called "embarassingly parallel" (EP).. A good example would be rendering animation frames. Typically each frame doesn't depend on the frames around it so you can just parcel the work at a frame granularity to the nodes. There are other applications which are more tightly coupled and where the computation process running on node N needs to know something about what's running on Node N+1 and Node N-1 very frequently. For this, applications use some sort of standardized process communication library (e.g. MPI), or, perhaps a library that performs a high level function (e.g. Matrix inversion) that underneath uses the interprocess comm.
2. Another "it depends". If the process is EP, and each node is processing a different image, then your problem is one of sending and retrieving images, which isn't much different from a conventional file server kind of model. If multiple processors/nodes are working on the same image, then the interconnect might be more important. It all depends on the communication requirements. Note that even EP applications can get themselves fouled up in network traffic (imagine booting 1000 nodes simultaneously, with them all wanting to fetch the boot image from one server simultaneously)
This is the place to ask..
From: CJ O'Reilly <supaiku at gmail.com<mailto:supaiku at gmail.com>>
Date: Wednesday, October 31, 2012 11:31 PM
To: "beowulf at beowulf.org<mailto:beowulf at beowulf.org>" <beowulf at beowulf.org<mailto:beowulf at beowulf.org>>
Subject: [Beowulf] Digital Image Processing via HPC/Cluster/Beowulf - Basics
Hello, I hope that this is a suitable place to ask this, if not, I would equally appreciate some advice on where to look in lue of answers to my questions:
You may guess that I'm very new to this subject.
I am currently researching the feasibility and process of establishing a relatively small HPC cluster to speed up the processing of large amounts of digital images.
After looking at a few HPC computing software solutions listed on the Wikipedia comparison of cluster software page ( http://en.wikipedia.org/wiki/Comparison_of_cluster_software ) I still have only a rough understanding of how the whole system works.
I have a few questions:
1. Do programs you wish to use via HPC platforms need to be written to support HPC, and further, to support specific middleware using parallel programming or something like that?
Can you run any program on top of the HPC cluster and have it's workload effectively distributed? --> How can this be done?
2. For something like digital image processing, where a huge amount of relatively large images (14MB each) are being processed, will network speed, or processing power be more of a limiting factor? Or would a gigabit network suffice?
3. For a relatively easy HPC platform what would you recommend?
Again, I hope this is an ok place to ask such a question, if not please help refer me to a more suitable source.
Beowulf mailing list, Beowulf at beowulf.org<mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf