[Beowulf] SSDs for HPC?

Lux, Jim (337C) james.p.lux at jpl.nasa.gov
Tue Apr 8 07:12:58 PDT 2014



On 4/7/14 6:48 PM, "Ellis H. Wilson III" <ellis at cse.psu.edu> wrote:

>On 04/07/2014 09:34 PM, Prentice Bisbal wrote:
>>> Was it wear out, or some other failure mode?
>>>
>>> And if wear out, was it because consumer SSDs have lame leveling or
>>> something like that?
>>>
>> Here's how I remember it. You took the capacity of the disk, figured out
>> how much data would have to be written to it wear it out, and then
>> divided that by the bandwidth of the drive to figure out how long it
>> would take to write that much data to the disk if data was constantly
>> being written to it. I think the answer was on the order of 5-10 years,
>> which is a bit more than the expected lifespan of a cluster, making it a
>> non-issue.
>
>This would be the ideal case, but requires perfect wear-leveling and
>write amplification factor of 1.  Unfortunately, those properties rarely
>hold.
>
>However, again, in the case of using it as a Hadoop intermediate disk,
>write amp would be a non-issue because you'd be blowing away data after
>runs (make sure to use a scripted trim or something, unless the FS
>auto-trims, which you may not want), and wear-leveling would be less
>important because the data written/read would be large highly
>sequential.  Wear-leveling would be trivial under those conditions.
>

Wear leveling would be trivial, if one were designing the wear leveling
algorithms.

I could easily see a consumer device having a different algorithm from an
enterprise device, either because they just spend more time and money
getting a good algorithm, or because of different underlying assumptions
about write/read patterns.

Even in an enterprise environment, there's some very different write
patterns possible.  A "scratch" device might get written randomly, while a
"logging" device will tend to be written sequentially.  Consider something
like a credit card processing system.  This is going to have a lot of "add
at the end" transaction data.  As opposed to, say, a library catalog where
books are checked out essentially at random, and you update the "check
out/check in" status, and writes are sprinkled randomly through out the
data.


Sadly, much of this will not be particularly well documented, if at all.


>




More information about the Beowulf mailing list