[Beowulf] Time measurement
diep at xs4all.nl
Sat Aug 6 08:04:36 PDT 2005
At 08:02 AM 8/5/2005 -0400, Robert G. Brown wrote:
>Vincent Diepeveen writes:
>> Always measure wall clock time of execution. That's IMHO the only thing
>> that really counts and it includes all overhead.
>> Take into account that at most clusters the wall clock time of node A is
>> not the same like wall clock of node B.
>> If you start a job at the head node and spawn it from there further to
>> the rest of the cluster, what counts obviously is the time from when you
>> started it, until it finished execution.
>> Because in reality what matters is how quickly you did get the job done.
>Unless you are trying to time how fast the computer does actual
>instructions in a particular context.
>Remember, there are lots of reasons to time things. One is certainly to
>time how fast you get a "job" done, where a job is a complex entity with
>all sorts of overhead and inefficiencies. However, there are others,
>such as wanting to know how fast a computer can generate random numbers
>with a particular algorithm inside a generic loop WITHOUT having the
>results affected by the fact that your computer at the time of the test
>was running a cron job or dealing with a broadcast storm generated by
>some ill-managed system down the wire. Or how fast it can do a simple
>divide operation in a given context, again doing one's very best NOT to
>include random delays introduced by an interrupt and/or context switch
>that happens to occur right in the middle of the timing interval. In
>these microbenchmark contexts it is actually a PITA to "prepare" the
>system in such a way as to make interruptions like this unlikely in the
>timing interval(s) and why writing a "good" microbenchmark program is
As soon as you allow scientists to do measurements of their results
without wall time clock, then problems really will grow above Mount Everest.
The stopwatch is what counts!
>Even when timing jobs one has to remember that BECAUSE of the
>uncertainty in the "state" of the computer during the timing interval,
>your recorded times may or may not be terribly accurate predictors of
>actual performance in a different global context or state. It is
>important to do a NUMBER of measurements if possible, ideally with some
>degree of knowledge and control of system state, and at least eyeball
>the statistics of the results.
>Doing this has revealed lots of interesting things (on this list, even)
>over the years such as huge delays "randomly" inserted in TCP streams,
>anomalies in the rate at which processors perform particular
>instructions (for example, multiplying by a power of two in C code is
>often a TERRIBLE predictor for how fast a processor multiples even in an
>instruction form such as
> a[i] = 2.0*b[i];
>because modern processors optimize such operations and perform them much
> a[i] = 3.14159*b[i];
>) anomalies in the performance of all sorts of network adapters (some of
>which work(ed) fine for short traffic bursts but collapsed on the floor
>screaming if fed a system-saturating stream of small packets).
>Generating an actual histogram of timings is good. Look for outliers
>(if any) -- these are an indication that something highly nonlinear and
>state dependent is going on in your code (or on your system) and is
>often a place to focus optimization energy.
>With all that said, yes, using wall time is a very good thing to do,
>ideally measured with the system CPU timer itself. gettimeofday() in
>the past has had a call resolution of about 2 usec (2000 nanoseconds)
>depending on how it is implemented (I think it it moving or has moved
>towards being implemented on top of the CPU timer where possible). The
>CPU timer can yield a time resolution of 40-70 nanoseconds per timing
>call pair, which of course can be improved with good statistics and/or
>inlining the timing assembler and avoiding subroutine calls.
>A final good thing to do is to remember profiling. Even "jobs" --
>perhaps especially "jobs" -- benefit from profiling. The times won't be
>terribly accurate because of job instrumentation and so on, but getting
>a good idea of where your job is spending most of its time can be a
>surprising and rewarding thing to do. Surprising because it might not
>be where you think it is; rewarding because once you know where it is
>doing a lot of work you may be able to rearrange it for improved
>> At 08:13 AM 8/1/2005 -0700, ThanhVu H. Nguyen - Gmail wrote:
>>>Hi, just wondering what the standard way of measure the execution time
>>>2 methods I thought about are:
>>>1) /usr/bin/time prog : this includes all the communcation, i/o ,
>>>loading overhead etc
>>>2) include start_time , end_time code in the program : this won't
>>>include the communication , i/o , loading etc overhead.
>>>what method is usually used ? thanks
>>>ThanhVu H. Nguyen
>>>Beowulf mailing list, Beowulf at beowulf.org
>>>To change your subscription (digest mode or unsubscribe) visit
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit
>Attachment Converted: "g:\internet\eudora\attach\Re [Beowulf] Time
More information about the Beowulf