[Beowulf] failure rates
Bruno Rocha Coutinho
coutinho at dcc.ufmg.br
Thu Feb 1 12:09:26 PST 2007
Most fault-tolerance literature assume that system components have
exponential failure rates.
But software sometimes don't have exponential failure rates if the cause
of the failure is related to a timer, a overflow or resource leaks. In
this case failure rate could be fixed and you end with all process
failing at the same time.
I think that is safe to assume exponential failure rates for hardware
and in spite of most machine crashes today are OS (not hardware)
related, most people assume that OSs are well behaved and don't suffer
of fixed rate failures.
2007/1/30, enver ever <enverever at hotmail.com>:
I am a PhD student working on mathematical looking to the
I was looking whether or not it is possible to take exponential failure
rates fot the nodes.
Thats the case in these publications:
1- "A Realistic Evaluation of Consistency Algorithms for Replicated
Files"Annual Simulation Symposium archive Proceedings of the 21st annual
symposium on Simulation table of contents Tampa, Florida, United States
Pages: 121 - 130 Year of Publication: 1988 ISBN:0-8186-0845-5
2-"Availability Modeling and Analysis on High Performance
Systems"Availability, Reliability and Security, 2006. ARES 2006. The
International Conference on Publication Date: 20-22 April 2006
3-"A Failure Predictive and Policy-Based High Availability Strategy for
Linux High Performance Computing Cluster" Chokchai Leangsuksun1,
Tirumala Rao1, Stephen L. Scott2, and Richard Libby Linux.com | LCI 5th
International Linux Cluster Conference.
I think it can be taken as exponentially distributed since in many
multi-server systems this was the approach followed.
I would appreciate if you could add any comments
MSN Hotmail is evolving – check out the new Windows Live Mail
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf