[Beowulf] Pretty Big Data

Mon Jan 25 06:41:53 PST 2016

And Christopher Samuel writes:
> The rest of us will carry on as before I suspect...

Using libraries that hide the (sometimes proprietary) API behind sufficient POSIX semantics...  Pretty much what the linked article says.  The "new" architecture is just the old architecture with the fastest components at the relevant data sizes.  The host CPU is faster than the storage CPU (in general), so move FS logic there when possible.  Limit meta data scaling needs by splitting the storage.  eh.

But I suspect the point is that people think they're willing to give up the POSIX semantics they cannot even specify (and often already have given up) to say they're using faster hardware.  Kinda like computational accelerators.  Those started with lighter semantics: single user, no double precision, no atomics, no...

The big, open question that's terribly difficult to address in research space:  How do you efficiently mix multiple massive storage allocations that need high performance for a three to five year funding period and then archival storage afterward?  I suspect much commercial data has a similar time horizon for immediate usefulness.  Health care data is interestingly different.  CPU allocations are relatively short, so inefficiencies from splitting that usage are relatively short-lived.  Storage lasts longer.