[Beowulf] recommendations for cluster upgrades
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at mcmaster.caWed May 13 22:28:03 PDT 2009
- Previous message: [Beowulf] recommendations for cluster upgrades
- Next message: [Beowulf] recommendations for cluster upgrades
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
>> AMD Barcelona was the first 4 flops per cycle processor from AMD, and it hit >> the street with some problems right when the list was coming out in end of >> 2007. > > That's interesting. What kind of "problems"? Barcelona had a bug in its L3 TLB logic. you can read all about it through google; as mentioned, it was mostly 2h07. there were workarounds for this, but they cost a bit of performance. I think I read that amd ultimately called it a timing issue. bugs of this sort are pretty common, though perhaps usually smaller. both amd and intel provide pretty decent erratum documents. typically the bug descriptions are not all that illuminating, but they do specify which steppings have them, and even say whether a fix is planned... > Do CPU designers mess up and leave bugs on too? it would be fascinating to hear how the bug escaped pre-release testing. unquestionably, it affected the amd/intel balance of power... I think that if amd had managed to bring out a bugless barcelona in mid-late 07, it would have put a serious crimp in intel's core2 sales. especially if they had managed, early, to get a firm grip on pc2/6400. not to mention 45nm. > I heard of an old Intel floating point error > but nothing else. Do later versions of CPUs get these bugfixes? sure. minor revisions are called "steppings", and they can include fairly significant if incremental improvements. > It might change my perspective on the risks of going for a "brand new" CPU. they're hardly ever really brand new. core2 was a huge change for intel, but you can see that it was clearly drived from the PIII->PM->core1 family with some new features and lessons from the P4/netburst. similarly, nehalem cores are pretty similar to current core2 cores (but not dual-die, with smaller L3). the uncore is the big change (mem controller, QPI). it's pretty hard to second-guess the chip vendors in trying to figure out whether a chip is worth the risk. for instance, Intel's been demoing versions of nehalem since fall 07, so there's been lots of testing. vendors are still too closed-kimono for my taste, but they take it seriously. for instance, there have been issues when the vendor replaces chips on their dime. for the barcelona thing, it was pretty easy for amd to point at low-overhead kernel workarounds to avoid this...
- Previous message: [Beowulf] recommendations for cluster upgrades
- Next message: [Beowulf] recommendations for cluster upgrades
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
