more embedded memory sightings; AMD Mustang

Eugene Leitl eugene.leitl at
Mon Oct 16 11:02:49 PDT 2000

(((embedded caches migrate into the chipset (alas not yet into CPU); 
   "Mustang" SMP Athlons by start of 2001)))

As you may know by now, our exclusive disclosure of NVIDIA's chipset
plans has since been widely accepted as fact and is now relatively
common knowledge.  Another company known for its graphics chips, ATI,
is also well down the road towards executing its own chipset
ambitions.  Reportedly ATI is incorporating technology gleaned from
collaboration with ALi which produced the Aladdin 7 chipset that
includes integrated ArtX graphics (ArtX is owned by ATI).

The DRAM maker Micron has discussed in the past its own interests in
developing core logic controllers.  Micron's Samurai DDR SDRAM chipset
was even distributed to several evaluators to serve as a proof of
concept project for DDR SDRAM.  Although it was rumored that Samurai
would eventually reach the market, problems licensing Intel's P6 bus
caused Micron to retreat from these ambitions.

Today, however, Micron blew the doors off of a much more powerful core
logic controller.  This DDR SDRAM chipset, dubbed Mamba, is designed
for the AMD Athlon and leverages design lessons learned from the
Samurai.  In designing the Samurai, Micron noticed that 40% of its die
was unused white space.  Wasted silicon translates into wasted money,
so Micron searched for a way to more efficiently use the Samurai's
die.  The Idaho based memory company came up with the idea of
embedding an eight megabyte L3 cache into the chipset!  Christened
"eCache," this L3 cache memory can maintain 9.6 GB/s of sustainable
bandwidth.  By fabricating eCache on the same die as the memory
controller, Micron can reduce latencies by up to 50%!

Obviously such a large, fast cache fabricated intimately with the
memory controller can have a profound impact on performance.  Micron
claims up to a 15% increase in real system performance, which might
even be conservative.  Micron also asserted that the added cost for
the eCache is minimal - "virtually free" are the words Micron's Dean
Klein used.  An interesting point about this chipset is that the core
logic controller is fabricated with a 0.18 micron process while the
eCache is implemented at 0.15 microns.

Micron's Mamba might very well turn out to be the standout product of
this year's MPF.

By the way, you might recall Rendition, the once leading edge graphics
chip company known for its innovative designs.  Rendition was bought
several years ago by Micron and though name "Rendition" no longer
exists, the design team is intact under Micron's wing.  Much of the
embedded DRAM design from the Mamba was taken from the experimental
125 million transistor V4400e, a Verite graphics controller derivative
with 12MB of embedded cache.  It's not by accident that the chipset
and graphics controller divisions are working closely together.  With
the trends in chipsets toward integration, it is reasonable to assume
that a Mamba integrated with a Rendition graphics core is also being
explored at Micron.

We have also discovered that one other major DRAM manufacturer is
secretly developing its own core logic controller with embedded DRAM.
Unfortunately we cannot disclose who this company is right now, by we
can give you a hint: it has been involved in litigation with Rambus at
one time or another - good luck sorting through the many candidates.

Symmetric Multiprocessing (SMP) Systems based on AMD's Athlons will be
a quantum leap ahead for servers based on the x86 architecture.  As
the Athlon architecture is heavily based on the Alpha, its switch-like
bus design borrows strongly from its Alpha lineage as well.  Unlike
the GTL+ bus used by the P6 where bandwidth is shared among the
processors, the Athlon uses a point-to-point design to enable full
bandwidth for each processor.  For a dual-processing Athlon system,
this point-to-point bus can channel 4.2 GB/s.

The AMD 760 MP chipset is a DDR SDRAM solution that can support two
266MHz FSB Athlons.  The chipset has advanced buffering to enable
maximum transaction concurrency.  The 760 MP also uses a sophisticated
cache concurrency protocol that AMD named "MOESI."  We will discuss
this protocol in a later article.

The AMD 762 System Controller is the northbridge for the AMD 760 MP
chipset.  Supporting up to four gigabytes of PC2100 DDR SDRAM, this
controller has over 900 pins and will require a six layer motherboard.
AMD's goals with this product was to enable unprecedented x86
multiprocessing power using inexpensive and commonly available
motherboard fabrication techniques.  Although the use of six-layer
motherboards will drive up the costs slightly, this was a necessity to
implement the powerful point-to-point design.  Given its flexibility
and bandwidth, multiprocessing systems based on the AMD 762 look to
set new performance standards in dual SMP x86 configurations while
selling at extremely competitive price points.

On the Mustang front, an AMD representative confirmed for us that this
workstation and server oriented chip is firmly in line to ship towards
the end of the year, although products might not reach the channel in
quantity until early 2001.  He also confirmed that there is little
doubt that this enhanced Athlon will reach 1.5GHz on the current 0.18
micron process.  The AMD representative continued that while he can
tell us that the branch prediction unit for the Mustang has been
revamped over the Thunderbird's to improve accuracy, AMD plans to wait
until next month's Comdex to take the wraps off of Mustang.


More information about the Beowulf mailing list