quick question
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduTue Jun 20 14:03:39 PDT 2000
- Previous message: quick question
- Next message: quick question
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 20 Jun 2000, Kragen Sitaker wrote:
> This is not correct. There are several ways to partition problems in
> general, and log-processing problems in particular, and splitting up
> the input data is only one of them.
>
> Some examples:
>
> - if you're running a pipelinable problem --- separable, sequential
> stages, each with a relatively high computation-to-data ratio (say, a
> billion or more instructions for every twelve megabytes, thus a
> thousand instructions for every twelve bytes or so) --- you can build
> a pipeline with different stages on different machines. In an ideal
> world, you'd be able to migrate pipeline stages between machines to
> load-balance.
> - if you want to generate ten reports for ten different web sites whose
> logs are interleaved in the same log file, you can run the log into
> one guy whose job it is to divvy it up, line by line, among ten
> machines doing analysis, one for each web site.
> - if you're looking for several different kinds of information in the
> log file --- again, with a high computation-to-data ratio --- you can
> send a copy of the log file to several processes, each extracting one
> of the kinds of information.
>
All good points. Another good point is that if the reports are the
result of syslogd output, a sensible /etc/syslog.conf can often achieve
a lot of partitioning for you. If the reports are the result of a
centralized syslog loghost that receives all the syslog output of (say)
100+ hosts, you might look into "syslog-ng", which basically filters
input as it comes into the loghost and squirrels it away in a nice set
of host/loglevel-specific files according to your specification.
Either of these will result in significantly smaller files to process
and a lot of the processing will already be done.
> Of course, all of this depends on the problem. My guess is that the
> original querent can, as you suggested, rewrite his log-processing
> script in C instead of Perl and get the performance boost he needs, and
> it will be easier than parallelizing by anything but the simplistic
> split-the-log-into-chunks approach.
>
> [I'm just guessing that the log-processing code is currently in Perl. :) ]
Agreed and agreed.
rgb
> --
> <kragen at pobox.com> Kragen Sitaker <http://www.pobox.com/~kragen/>
> The Internet stock bubble didn't burst on 1999-11-08. Hurrah!
> <URL:http://www.pobox.com/~kragen/bubble.html>
> The power didn't go out on 2000-01-01 either. :)
>
>
> _______________________________________________
> Beowulf mailing list
> Beowulf at beowulf.org
> http://www.beowulf.org/mailman/listinfo/beowulf
>
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: quick question
- Next message: quick question
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
