[Beowulf] SSD caching for parallel filesystems

Fri Feb 8 08:20:08 PST 2013

To add another side note, in the process of interviewing the Gluster team for my podcast (www.rce-cast.com) he mentioned writing a plugin, that would first write data local to the host, and gluster would then take it to the real disk in the background.  There was constraints to doing this.  I assume because there was no locking to promise consistency, but for some workloads this might be useful, and combine it with local Flash.

That episode should be up Feb 24th.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
brockp at umich.edu
(734)936-1985

On Feb 8, 2013, at 11:12 AM, Ellis H. Wilson III wrote:

> On 02/06/2013 04:36 PM, Prentice Bisbal wrote:
>> Beowulfers,
>> 
>> I've been reading a lot about using SSD devices to act as caches for
>> traditional spinning disks or filesystems over a network (SAN, iSCSI,
>> SAS, etc.).  For example, Fusion-io had directCache, which works with
>> any block-based storage device (local or remote) and Dell is selling
>> LSI's CacheCade, which will act as a cache for local disks.
>> 
>> http://www.fusionio.com/data-sheets/directcache/
>> httP//www.dell.com/downloads/global/products/pedge/en/perc-h700-cachecade.pdf
>> 
>> Are there any products like this that would work with parallel
>> filesystems, like Lustre or GPFS? Has anyone done any research in this
>> area? Would this even be worthwhile?
> 
> Coming late to this discussion, but I'm currently doing research in this 
> area and have a publication in submission about it.  What are you trying 
> to do specifically with it?  NAND flash, with it's particularly nuanced 
> performance behavior, is not right for all applications, but can help if 
> you think through most of your workloads and cleverly architect your 
> system based off of that.
> 
> For instance, there has been some discussion about PCIe vs SATA -- this 
> is a good conversation, but what's left out is that many manufacturers 
> do not actually use native PCIe "inside" the SSD.  It is piped out from 
> the individual nand packages in something like a SATA format, and then 
> transcoded to PCIe before going out of the device.  This results in 
> latency and bandwidth degredation, and although a bunch of those devices 
> on Newegg and elsewhere under the PCIe category report 2 or 3 or 4 GB/s, 
> it's closer to just 1 or under.  Latency is still better on these than 
> SATA-based ones, but if I just wanted bandwidth I'd buy a few, cheaper, 
> SATA-based ones and strap them together with RAID.
> 
> On a similar note (discussing the nuances of the setup and the 
> components), if your applications are embarrassingly parallel or your 
> network is really slow (1Gb Ether) they client-side caching is 
> definitely the way to go.  But, if there are ever sync points in the 
> applications or you have a higher throughput, lower latency network 
> available to you, going for a storage-side cache will allow for 
> write-back capabilities as well as higher throughput and lower latency 
> promises than a single client side cache could provide.  Basically, this 
> boils down to do you want M cheaper devices in each node that don't have 
> to go over the network but that are higher latency and lower bandwidth, 
> or do you want an aggregation of N more expensive devices that can give 
> lower latency and much higher bandwidth, but over the network.
> 
> For an example on how some folks did it with storage-local caches up at 
> LBNL see the following paper:
> 
> Zheng Zhou et al. An Out-of-core Eigensolver on SSD-equipped Clusters,
> in Proc. of Cluster’12
> 
> If anybody has any other papers that look at this of worth, or other 
> projects that look particularly at client-side SSD caching, I'd love to 
> hear about them.  I think in about five years all new compute-nodes will 
> be built with some kind of non-volatile cache or storage -- it's just 
> too good of a solution, particularly with network bandwidth and latency 
> not scaling as fast as NVM properties.
> 
> Best,
> 
> ellis
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf