Cluster-wide overclocking...
Robert G. Brown
rgb@phy.duke.edu
Thu, 24 Sep 1998 18:48:41 -0400
On Thu, 24 Sep 1998, Shachar Tal wrote:
> That was an example. A brute force attack is considered a computation as
> well.
Hmmm, bad example.
A brute force attack would have to do each tested passwd twice FOR EACH
VALUE to be able to conclude that a given string IS or IS NOT the passwd
at probability roughly 10^-18. The first time might give you the right
result, or might give you nonsense (by hypothesis) one in 10^9 times.
If you got a match from whatever it gave you to a line in your passwd
file, you can as you note easily test it by forward encrypting a second
time, and if the two disagree you could do it a third, etc. So you can
identify false positives with little additional effort (or rather
validate your positives, real or false, at a much higher probability).
However, by hypothesis one out of a billion times you will give it a
correct password and it will (incorrectly) fail to match a line in your
passwd file (a false negative). Oops.
It may be that your problem is one where this is an acceptable tolerance
-- you may not care if you miss somebody with the password "aardvark"
because you are trying to crack into a site and if you don't get one guy
you'll get another, or if you are trying to check your own password file
you don't mind missing a really dumb password like aardvark maybe one in
a million times (considering the number of opportunities you would have
for a false negative in a reasonable passwd file). These are all cases
where the cost of a mistake is low to negligible. If this is correct
for your problem, fine.
There are other cases where it would be a disaster. If you are trying
to certify the passwd file as being "unbreakable" for a client in a
million dollar banking system and some kid running crack turns around
and finds it on a NON overclocked system in ten seconds, it might cost
you millions in a suit, your professional reputation, your job, the
failure of your company...kind of like I listed before.
I'm not really trying to give you a hard time, by the way. I believe
you when you say that you can rapidly validate your errors. I interpret
this to mean that you are looking for needles in a haystack and can
easily tell when you've found a needle, and don't really care if you
miss one needle in a billion or maybe miss finding the needle on any
particular haystack. False negatives (or the equivalent) obviously
don't cost you much, or you have reason to believe they don't occur.
There may well be lots of folks in a similar position (like: All Windows
Users Everywhere;-). You've worked out the bet and it's heads you win a
dollar, tails you lose -- nothing.
I'm just trying to point out that my original observation is correct --
it is ALWAYS a bet to overclock. Sometimes you can hedge the bet.
Sometimes you can afford certain kinds of losses and can go double or
nothing until you win on any kind of loss that you cannot afford.
However, you cannot check any calculation (with presumed optimally
efficient code) against all possible results (and errors) with less
computational result than that which produced the result in the first
place (gotta love Shannon's Theorem:-) -- you can at best up the odds
against failure (to some intermediate point, as even running the full
calculation twice leaves you with a finite probability of error).
If you like, to address this issue properly, you must use Bayesian
analysis and include the cost function. All possible analyses will show
that the probability of errors occurring that cannot be detected without
prior knowledge or without redoing the calculation (whatever it is) are
"high". Prior knowledge (knowing that the result is between 0 and 1 and
getting 30) certainly counts -- it can reduce the probability of an
uncaught failure of any sort. Low cost for a certain class of failures
is also great -- it means that "hundreds of undetected errors will occur
every day but they don't cost me anything so I don't care".
Really, one can actually make this completely mathematical given any
expected rate of failure and a reasonable knowledge of the classes of
failure that can occur, and its all written up in completely
unintelligible form in any number of books on information theory,
failure analysis, engineering statistics, and the like. I'd recommend
that anyone planning to overclock a beowulf (or ANY system) meditate a
bit on Baysian statistics and fault tolerance before concluding that it
is safe. Statistics is one where that is VERY DANGEROUS to make
assumptions based on naive frequency interpretations, especially when
very large numbers are involved (little probabibilities have a way of
growing:-).
rgb
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu