FYI: superlinear speedups in GROMACS (fwd)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Eugene Leitl Eugene.Leitl at lrz.uni-muenchen.deSat Mar 9 01:29:04 PST 2002
- Previous message: FYI: superlinear speedups in GROMACS (fwd)
- Next message: installation help needed ...newbie
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, 8 Mar 2002, W Bauske wrote: > That statement makes me curious. Do you mean embedded memory on chip or > what? If it's on chip, how is it any better than cache? If not on chip, > elaborate please on what you're describing. This is off-topic, but with on-die memory, even DRAM has cache characteristics, without the overhead. The idea is to put the CPU into your memory, not to bring memory where your CPU is. There would be no off-die memory, ideally.The CPU would have to be stripped down, and modified (e.g. segmented) to profit from symmetries available on the die (e.g., ability to directly address and manipulate kBit words, and do SIMD on very long registers). You'd have to interconnect the dies with a fast serial bus, running a packet switched protocol in hardware. Given short distances and small geometries (in extremis on-wafer) you could achive short message latency similiar to how long it currently takes to address a word in memory. Infineon is doing R&D into embedded memory processors (at least, according to the job offers they used to post before they were slashed by the current hardware slump), but I think currently only IBM attempts to build a high-performance architecture on around them (Blue Gene). However, embedded memory processors are intrinsically unsuitable for good float performance, as the whole CPU plus router would have to fit into silicon resources currently occupied by a single float ALU. But they're very effective for parallel operations on arrays of short to long integers and sequence operations (bioinformatics comes to mind, also cryptography, lattice gas stuff, embarassingly parallel stuff, simulation, etc). Big problem with purely embedded processors is that the memory grain size is few MBytes for yield reasons. Pure Linux has too much redundancy for this, but of course you could use L4/Fiasco like nanokernels on such architectures, adding a Linux wrapper where necessary. > I'd like to see a P4 with a GB or so of memory all on the chip. Would > make an interesting node for what I do. Think rather 100 nodes with 32 MBytes each, in a desktop box. The reason this is not being done is that it's a high risk venture, as there will be very little software for it, unless it behaves as a vanilla cluster (even so, mainstream hasn't discovered clustering yet).
- Previous message: FYI: superlinear speedups in GROMACS (fwd)
- Next message: installation help needed ...newbie
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
