[Beowulf] [tt] One million ARM chips challenge Intel bumblebee
prentice at ias.edu
Thu Jul 7 10:17:30 PDT 2011
On 07/07/2011 12:31 PM, Lux, Jim (337C) wrote:
>> On 07/07/2011 10:13 AM, Eugen Leitl wrote:
>>> One million ARM chips challenge Intel bumblebee
>> Now say it like Dr. Evil: one MILLION processors.
>> How long is it going to take to wire them all up? And how fast are they
>> going to fail? If there's a MTBF of one million hours, that's still one
>> failure per hour.
> But this presents a very interesting design challenge.. when you get to this sort of scale, you have to assume that at any time, some of them are going to be dead or dying. Just like google's massively parallel database engines..
> It's all about ultimate scalability. Anybody with a moderate competence (certainly anyone on this list) could devise a scheme to use 1000 perfect processors that never fail to do 1000 quanta of work in unit time. It's substantially more challenging to devise a scheme to do 1000 quanta of work in unit time on, say, 1500 processors with a 20% failure rate. Or even in 1.2*unit time.
Just to be clear - I wasn't saying this was a bad idea. Scaling up to
this size seems inevitable. I was just imagining the team of admins who
would have to be working non-stop to replace dead processors!
I wonder what the architecture for this system will be like. I imagine
it will be built around small multi-socket blades that are hot-swappable
to handle this.
More information about the Beowulf