[Beowulf] A cluster of Arduinos

Wed Jan 11 14:47:00 PST 2012

Jim, your microcontroller cluster is not a rather good idea.

Latency didn't keep up with the CPU speeds...

Todays nodes have a CPU core or 12 and soon 16 which can execute,
let's take a simple integer example in my chessprogram and its IPC,
about 24 instructions per cycle

So nothing SIMD, just simple integer instructions most of it, of  
course loads which effectively
come from L1 play an overwhelming role there.

typical latencies to do a random memory read from the remote nodes,  
even with the latest networks,
it's between 0.85 and 1.9 microseconds. Let's take optimistic 1  
microsecond. RDMA read...

So in that timeframe you can execute 24k+ instructions.

IPC at the cheapo cpu's is far under 1 effectively. Around 0.25 for  
most codes.

Cpu's of 70Mhz can execute 1 instruction in each 280 Mhz. Now we are  
busy with rough measures here.

Let's call that 1/4 millisecond.

Even USB 1.1 has to sticks latencies far under 1 millisecond.

So actual latency of todays clusters is factor 25k worse than this  
'cluster'.

In fact your microcontrollercluster here has latencies that you do  
not even have core to core
within a single CPU today.

There is still too much years 80s and years 90s software out there,
written by the guys who wrote books about how to parallellize, which  
simply
doesn't scale at all at modern hardware.

Let me not quote too many names there as i've done before.

They were just too lazy to throw away their old code and start over  
new writing a new parallel concept
that works at todays hardware.

If we involve GPU's now then there is gonna be an even bigger problem  
and that's that bandwidth of the network
can't keep up with what a single GPU delivers. Who is to blame for  
that is quite a complicated discussion,
if anyone has to be blamed anyway.

We just need more clever algorithms there.