Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

quick question

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Robert G. Brown rgb at phy.duke.edu
Tue Jun 20 14:03:39 PDT 2000


On Tue, 20 Jun 2000, Kragen Sitaker wrote:

> This is not correct.  There are several ways to partition problems in
> general, and log-processing problems in particular, and splitting up
> the input data is only one of them.
> 
> Some examples:  
> 
> - if you're running a pipelinable problem --- separable, sequential
>   stages, each with a relatively high computation-to-data ratio (say, a
>   billion or more instructions for every twelve megabytes, thus a
>   thousand instructions for every twelve bytes or so) --- you can build
>   a pipeline with different stages on different machines.  In an ideal
>   world, you'd be able to migrate pipeline stages between machines to
>   load-balance.
> - if you want to generate ten reports for ten different web sites whose
>   logs are interleaved in the same log file, you can run the log into
>   one guy whose job it is to divvy it up, line by line, among ten
>   machines doing analysis, one for each web site.
> - if you're looking for several different kinds of information in the
>   log file --- again, with a high computation-to-data ratio --- you can
>   send a copy of the log file to several processes, each extracting one
>   of the kinds of information.
> 

All good points.  Another good point is that if the reports are the
result of syslogd output, a sensible /etc/syslog.conf can often achieve
a lot of partitioning for you.  If the reports are the result of a
centralized syslog loghost that receives all the syslog output of (say)
100+ hosts, you might look into "syslog-ng", which basically filters
input as it comes into the loghost and squirrels it away in a nice set
of host/loglevel-specific files according to your specification.

Either of these will result in significantly smaller files to process
and a lot of the processing will already be done.

> Of course, all of this depends on the problem.  My guess is that the
> original querent can, as you suggested, rewrite his log-processing
> script in C instead of Perl and get the performance boost he needs, and
> it will be easier than parallelizing by anything but the simplistic
> split-the-log-into-chunks approach.
> 
> [I'm just guessing that the log-processing code is currently in Perl. :) ]

Agreed and agreed.

    rgb

> -- 
> <kragen at pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
> The Internet stock bubble didn't burst on 1999-11-08.  Hurrah!
> <URL:http://www.pobox.com/~kragen/bubble.html>
> The power didn't go out on 2000-01-01 either.  :)
> 
> 
> _______________________________________________
> Beowulf mailing list
> Beowulf at beowulf.org
> http://www.beowulf.org/mailman/listinfo/beowulf
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu







More information about the Beowulf mailing list