[Beowulf] Question about amd64 architecture and floating pointoperations
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Vincent Diepeveen diep at xs4all.nlFri Nov 24 10:05:59 PST 2006
- Previous message: [Beowulf] Question about amd64 architecture and floating point operations
- Next message: [Beowulf] Question about amd64 architecture and floating pointoperations
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
hi Mark, Hopefully your 80 bits logics code is not critical to anything. I wouldn't count at keeping the entire 62 bits (?) mantissa. Context switch and dang it's gone. I guess question is how important it is to get a lot of digits. Consider PFRSQRT which is 3 cycles. Whereas a floating point square root is 35 cycles. I'd go for that SIMD; you can binary toy then and add results and get quite a lot more bits significance. Perhaps even faster than in 35 cycles. Good luck, Vincent ----- Original Message ----- From: "Mark Hahn" <hahn at physics.mcmaster.ca> To: "Richard Walsh" <rbw at ahpcrc.org> Cc: "Beowulf Mailing List" <beowulf at beowulf.org> Sent: Friday, November 24, 2006 3:35 PM Subject: Re: [Beowulf] Question about amd64 architecture and floating pointoperations >>>> A common confusion ... x86_64 changes nothing about the precision of >>>> floats or doubles in >>>> C or Fortran. >>> >>> well, sort of. it was pretty common to find at least some computations >>> in ia32 using 80b FP, intentionally or not. but iirc in long mode >>> (colloquially x86_64), you no longer get x87 access. >> An important internal detail. My "nothing" above was assigned to the >> program level >> and the computable epsilons. Your point is that in long mode because you >> cannot use >> the x87 FPU there is a potential difference internally--no 80-bit versus >> possibly some-- >> Oui? > > I had the impression that in (pure) 64b mode, one couldn't use the legacy > x87 > instructions. this doesn't seem to be the case, though - but the amd doc > (6.1.2 of AMD64 prog man v1) says that x87 codes have to be recompiled. > for kicks, I compiled the following function using pathscale under x86_64 > with and without -m32: > > double foo(long double a, long double b) { > long double c = a * b; > return c; > } > > m32: > 0: 83 c4 ec add $0xffffffec,%esp > 3: db 6c 24 24 fldt 0x24(%esp) > 7: db 6c 24 18 fldt 0x18(%esp) > b: de c9 fmulp %st,%st(1) > d: dd 5c 24 00 fstpl 0x0(%esp) > 11: 66 0f 12 44 24 00 movlpd 0x0(%esp),%xmm0 > 17: f2 0f 11 44 24 08 movsd %xmm0,0x8(%esp) > 1d: dd 44 24 08 fldl 0x8(%esp) > 21: 83 c4 14 add $0x14,%esp > 24: c3 ret > > x86_64: > 0: 48 83 c4 e8 add $0xffffffffffffffe8,%rsp > 4: db 6c 24 20 fldt 0x20(%rsp) > 8: db 6c 24 30 fldt 0x30(%rsp) > c: de c9 fmulp %st,%st(1) > e: dd 5c 24 00 fstpl 0x0(%rsp) > 12: 66 0f 12 44 24 00 movlpd 0x0(%rsp),%xmm0 > 18: 48 83 c4 18 add $0x18,%rsp > 1c: c3 retq > > you can see that 32b mode provides 12B in the stack frame for a 10B > extended-prec operand, whereas 64b mode aligns mod 16. if the > source skipped conversion to double, the fstpl/etc goes away and the > full precision is left on the FP stack-top. > > I have to assume the AMD doc's rather cryptic comment is simply reflecting > the ABI difference, not anything like encoding or allowed instructions. > > does anyone have a concise demo of using higher precision - approximating > sqrt(2) or something? I have found, on the several linuxes I looked at, > that the x87 control word enabled full 80b precision (it can cause > automatic > rounding to double or even single prec.) > > >>>> This potential itself is not fully utilized as I believe only 40-bits >>>> are used (the socket >>>> F series may have bumped this up to 48-bits). >>> no, that's physical address bits, which are completely unrelated to >>> virtual address bits and/or addr register width. consider that the last >>> generations of ia32 could address more than 4GB of ram (had more >>> than 32b of physical addressability), but any process still only ever >>> really had a 32b address space. >> More clarification. Right. 40-bits are used for physical addressing and >> an additional 8-bits are used to round >> out the virtual space. > > bits 0-12 are offset within a page. then 4x successive 9b chunks index > into the page-translation tree. the 48th bit is sign-extended up. > so it's not the full 64b, but well, is that a real/realistic problem? > > >> I believe socket F extends both of these numbers by 8 to 48-bit physical >> and 54-bit >> virtual. I do not think we are using all 64-bits though ... even in >> socket F ... but you tend to be right very >> often Mark, so I am hesitating here. ;-) > > no, YOU'RE right that the whole 64b is not reachable (virt or phys). > but then again, it's hard to see why that matters: physical ram is > basically limited to 8 sockets, 8 dimms each, and ~4GB/dimm (256G, 36b). > and you won't be able to mmap that 256 TB file in one go, VM-wise. > does anyone do distributed systems with pointer-swizzling any more? > > >>>> Results are truncated to 64-bits when stored to memory, but a path >>> they can be; they don't have to be. >> Mmm ... I did not know this. Compiler flags? What are they? > > just use "long double". the C standard is probably wishy-washy about this > (permitting an implementation to use 64b), but "normal" compilers seem to > preserve the extra bits. compiler switches and the runtime do have some > effect on this, though. it looks like linux tends to default to enabling > 80b (a comment in fpu_control.h claims libm requires it.) > > we have users who claim to need "quad precision" floats, and who prefer > certain cpus/compilers because of quad support. I'm not sure they've ever > actually disassembled the results to see whether they're just getting > 80b... > > regards, mark hahn. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >
- Previous message: [Beowulf] Question about amd64 architecture and floating point operations
- Next message: [Beowulf] Question about amd64 architecture and floating pointoperations
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
