[Beowulf] distributing storage amongst compute nodes

Mon Oct 22 08:17:37 PDT 2007

>> [...] commodity disks are plenty reliable
>> and are not a significant source of uptime problems.
>
> C|N>K (i.e. coffee piped through nose into keyboard)

sorry!

> That's not quite a general truth. 8^)

I mean that in the experience of my organization,
the mundane maxtor and seagate disks that we get 
with our mostly HP hardware is extremely reliable.
surprisingly so - certainly we were expecting worse,
based on the published studies.

we have ~20 clusters online totalling >8k cores.
most nodes (2-4 cores/node) have 2 sata disks, which 
have had a very low failure rate (probably < 1% afr over 
2-3 years of service).  in addition, we have four 70TB
storage clusters build from arrays of 9+2 raids of 
commodity 250G sata disks, as well as a 200TB cluster
(10+2x500G disks iirc).  failure rate of these disks
have been quite low as well (I'm guessing actually lower
than the in-node disks, even though the storage-cluster 
disks are much more heavily used.)

here's my handwaving explanation of this: in-node disks are 
hardly used, since they're just the OS, and nodes spend most
of their time running apps.  disks in the storage clusters 
are more heavily used, but even for a large cluster, we 
simply do not generate enough load.  (I'm not embarassed by 
that - remember cheap disks sustain 50 MB/s these days, so 
if you have a 70 TB Lustre filesystem, you'd have to sustain
> 10 GB/s to actually keep the disks busy.  in other words,
bigger storage is generally less active...)

>> but maybe it makes sense not to fight the tide of disturbingly cheap
>> and dense storage.  even a normal 1U cluster node could often be configured
>> with several TB of local storage.  the question is: how to make use of it?
>
> Some people are running dCache pools on their cluster nodes.

that's cool to know.  how do users like it?  performance comments?

thanks, mark hahn.