[Beowulf] AMD64 results...
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduSun Dec 19 07:43:18 PST 2004
- Previous message: [Beowulf] AMD64 results...
- Next message: [Beowulf] looking for a switch to purchase
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, 16 Dec 2004, Josip Loncaric wrote: > Robert G. Brown wrote: > > [...] One can see how having 64 bits would really > > speed up 64 bit division compared to doing it in software across > > multiple 32 bit registers... > > Correct me if I'm wrong, but doesn't the floating point unit normally > use an internal iterative process to perform the division? This would > not involve 32-bit registers... > > I'm not so sure about *integer* 64-bit division. Integer division may > involve multiple 32-bit integer registers. > > Good ole' Cray-1 used an iterative process for floating point division > which worked like this: given a floating point number x, use the first 8 > bits of the mantissa to index into a lookup table containing initial > guesses, then do a few steps of Newton-Raphson iteration involving only > multiply-add operations to get the fully converged reciprocal mantissa, > fix the exponent, thus obtaining 1/x, then multiply y*(1/x) to get y/x. > > As I recall, the famous Pentium FDIV bug involved some corner cases in a > similar iterative process, all of which is internal to the floating > point unit. Moreover, in addition to following the 32/64-bit IEEE 754 > standard for floating point arithmetic, some implementations (e.g. > Pentium, Opteron) support x87 legacy internal 80-bit representations of > floating point numbers, which can really help when accumulating long > sums and computing square roots, etc. Prof. Kahane has numerous > arguments in favor of this internal 80-bit representation... This may well be -- I used to hand code the 8087 back on the IBM PC and thought that the 80 bit internal representation was peachy keen at the time. I haven't tracked precisely how the x87 coprocessor model has evolved (legacy or not) into P6-class processors, though -- the mixing of RISC, CISC, CISC-interpreted-to-RISC-onchip left me confused years ago. I was really just making an empirical observation, and struggling to understand it. As I pointed out yesterday, trancendental evals seem to be much faster as well, which would certainly be consistent with a resurrection of an efficient internal x87 architecture. If so, I'm all for it -- HPC code (at least MY HPC code:-) tends to have more than just triad-like operations on vectors -- things like the trig functions, exponentials and logs, floating point division. I remember when my Sun 386i could turn in a savage that compared pretty well with the otherwise much faster Sun 110 and Sparc 1 because it had a real CISC 80387 and Sun was doing all of its trancendental calls in (RISC) software. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] AMD64 results...
- Next message: [Beowulf] looking for a switch to purchase
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
