quick question
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Kragen Sitaker kragen at pobox.comTue Jun 20 13:37:14 PDT 2000
- Previous message: quick question
- Next message: quick question
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
W Bauske writes: > Also, do you control the source code that does the processing? > If not, then the only way to split the work is split the log into > chunks and run the log processing on each chunk. Then you have > the question of is the data partitionable such that you get the > same analysis when it's split. This is not correct. There are several ways to partition problems in general, and log-processing problems in particular, and splitting up the input data is only one of them. Some examples: - if you're running a pipelinable problem --- separable, sequential stages, each with a relatively high computation-to-data ratio (say, a billion or more instructions for every twelve megabytes, thus a thousand instructions for every twelve bytes or so) --- you can build a pipeline with different stages on different machines. In an ideal world, you'd be able to migrate pipeline stages between machines to load-balance. - if you want to generate ten reports for ten different web sites whose logs are interleaved in the same log file, you can run the log into one guy whose job it is to divvy it up, line by line, among ten machines doing analysis, one for each web site. - if you're looking for several different kinds of information in the log file --- again, with a high computation-to-data ratio --- you can send a copy of the log file to several processes, each extracting one of the kinds of information. Of course, all of this depends on the problem. My guess is that the original querent can, as you suggested, rewrite his log-processing script in C instead of Perl and get the performance boost he needs, and it will be easier than parallelizing by anything but the simplistic split-the-log-into-chunks approach. [I'm just guessing that the log-processing code is currently in Perl. :) ] -- <kragen at pobox.com> Kragen Sitaker <http://www.pobox.com/~kragen/> The Internet stock bubble didn't burst on 1999-11-08. Hurrah! <URL:http://www.pobox.com/~kragen/bubble.html> The power didn't go out on 2000-01-01 either. :)
- Previous message: quick question
- Next message: quick question
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
