[Beowulf] CephFS

Wed Apr 15 06:56:31 PDT 2015

On 15 Apr 2015, at 06:50, Mark Hahn <hahn at mcmaster.ca> wrote:

>> In an environment that needs to adapt to evolving user needs, trading some
>> performance for the flexibility that Ceph offers does not seem like a bad
>> deal.
> 
> it would be appreciated if you could be a bit more specific.  what kind of performance, what kind of flexibility?
> 
> thanks, mark hahn.

Sure! 

To give some background, we have two types of environments with different granularity of funding and customership:

1. HPC environment: 
We get a big chunk of funding every few years that needs to be invested within a limited time. The need is for fast parallel storage. Thus big, enterprise class storage boxes with Lustre. The system and SLA will remain fairly static for several years. Growth is fairly predictable.

2. Cloud environment: 
Ongoing streams of small-medium funding from various customers. Some of these can be sold services and some need to show an investment for the research-granting organization. The needs of price-performance-resilience-capacity might be different for different customers. Growth is unpredictable. 

For the first case the Lustre model works fine but for the latter it can be a bit more constrained: For this we should be able to grow our compute and storage capacity smoothly even for cases where the funding is fine-grained, while keeping the architecture simple. Also the workload profiles and resiliency  requirements are not completely clear for future workloads. 

With Ceph we can scale storage in a way that’s more akin to the one that we scale compute nodes: We can throw more nodes at it to make it grow in a fairly linear fashion and with a fine granularity. We can also adjust resiliency parameters in software instead of having a large part of it fixed in the hardware design. 

I don’t see Lustre going away, at least in our environments, anytime soon and we have not done any real apples-to-apples comparisons yet on performance. Initially we’re not targeting huge scalability or performance. Basically something that is better than NFS is good enough initially.  

It’s also interesting to see how the resiliency will compare. Having experienced multiple generations of expensive “invincible” arrays having issues that baffle us (and often the vendors) time after time, something with cheaper but more decoupled HW might turn out to be better. 

O-P