[Beowulf] A petabyte of objects
Ellis H. Wilson III
ellis at cse.psu.edu
Tue Nov 13 21:17:53 PST 2012
On 11/13/12 19:00, Bill Broadley wrote:
> If you need an object store and not a file system I'd consider hadoop.
Eeek -- for .5MB to 10MB files is anathema for Hadoop. As much as I
love Hadoop, there's a tool for every job and I'm not sure this one
quite fits for those file sizes. If you had a decent chunk of larger
files (i.e. > 64MB at the very least, ideally like 1GB files on
average), Hadoop might work.
The specific use of the file system seems particularly relevant to this
discussion, so if you can figure out some more hard and fast ideas about
the ways in which your storage will be actually used, we'll probably
have a better idea of what suggestion to offer.
IMHO, it's not the storage of that size of data annually that makes this
a hard problem -- it's what you want to do with it (and how fast). If
you never want to look at it again, and you're receiving that 1PB over
the duration of the year in a steady fashion, you'll note that this
boils down to around 34MB/s. Pretty easy for any parallel file system
(or really, even a slow individual HDD, provided you just continued onto
the next one once you filled the current one). This becomes interesting
if you need to handle big bursts of writes, big bursts of reads, reads
of the whole (or large portions of the) data set, etc, etc.
Again, knowing what you need will help us a lot here.
More information about the Beowulf