quick question

W Bauske wsb at paralleldata.com
Tue Jun 20 13:03:02 PDT 2000


David Lombard wrote:
> 
> Kurt Brust wrote:
> >
> > Hello, I am sure you are busy, so i will not take up much of your time.
> >
> > In regards to clustering, Is it possible to setup a beowulf cluster, to
> > help process a log file (txt based) over multiple processer's to help
> > distrube the load? Right now its at 1.5 gigs a day, takes 12 hours to
> > process, I am looking to cut that down as much as possible.
> >
> 
> It depends.  That's always standard answer to a question this vague.
> 
> It depends upon what you mean by "help process a log file".
> 
> What is being logged?
> 
> How is the log file processed today?
> 
> Be specific.

Also, do you control the source code that does the processing?
If not, then the only way to split the work is split the log into
chunks and run the log processing on each chunk. Then you have 
the question of is the data partitionable such that you get the
same analysis when it's split. Should be since you already split
it on daily boundaries. In general, using 100Mbit Enet, you can 
distribute a 1.5GB log to multiple machines in a few minutes
so if you're taking 12 hours now, the transfer cost is a nit.
Might consider tuning your log analysis code before worrying
about parallelizing it. 12 hours seems like a long time to
process a log, even at 1.5GB. Maybe faster HW is a better
solution, cpu/disk/network.

Just some thoughts.


Wes




More information about the Beowulf mailing list