[Beowulf] Execution time measurements - clarification

David Mathog mathog at caltech.edu
Mon May 23 09:40:13 PDT 2011


> On Fri, May 20, 2011 at 02:26:31PM -0400, Mark Hahn forwarded a message:
> 
> > When I run 2 identical examples of the same batch job
simultaneously, execution time of *each* job is
> > LOWER than for single job run !

Disk caching could cause that.  Normally if the data read in isn't too
big you see an effect where:

run 1:  30 sec  <-- 50% disk IO/ 50% CPU
run 2:  15 sec  <-- ~100% CPU

where the first run loaded the data into disk cache, and the second run 
read it there, saving a lot of real disk IO.  Under some very peculiar
conditions, on a multicore system, if run 1 and 2 are "simultaneous"
they could seesaw back and forth for the "lead", so they end up taking
turns doing the actual disk IO, with the total run time for each ending
up between the times for the two runs above.  Note that they wouldn't
have to be started at exactly the same time for this to happen, because
the job that starts second is going to be reading from cache, so it will
tend to catch up to the job that started first.  Once they are close
then noise in the scheduling algorithms could cause the second to pass
the first.  (If it didn't pass, then this couldn't happen, because the
second would always be waiting for the first to pull data in from disk.) 

Of course, you also need to be sure that run 1 isn't interfering with
run 2.  They might, for instance, save/retrieve intermediate values
to the same filename, so that they really cannot be run safely at the
same time.  That is, they run faster together, but they run incorrectly.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list