[Beowulf] SSDs for HPC?
Lux, Jim (337C)
james.p.lux at jpl.nasa.gov
Tue Apr 8 07:12:58 PDT 2014
On 4/7/14 6:48 PM, "Ellis H. Wilson III" <ellis at cse.psu.edu> wrote:
>On 04/07/2014 09:34 PM, Prentice Bisbal wrote:
>>> Was it wear out, or some other failure mode?
>>> And if wear out, was it because consumer SSDs have lame leveling or
>>> something like that?
>> Here's how I remember it. You took the capacity of the disk, figured out
>> how much data would have to be written to it wear it out, and then
>> divided that by the bandwidth of the drive to figure out how long it
>> would take to write that much data to the disk if data was constantly
>> being written to it. I think the answer was on the order of 5-10 years,
>> which is a bit more than the expected lifespan of a cluster, making it a
>This would be the ideal case, but requires perfect wear-leveling and
>write amplification factor of 1. Unfortunately, those properties rarely
>However, again, in the case of using it as a Hadoop intermediate disk,
>write amp would be a non-issue because you'd be blowing away data after
>runs (make sure to use a scripted trim or something, unless the FS
>auto-trims, which you may not want), and wear-leveling would be less
>important because the data written/read would be large highly
>sequential. Wear-leveling would be trivial under those conditions.
Wear leveling would be trivial, if one were designing the wear leveling
I could easily see a consumer device having a different algorithm from an
enterprise device, either because they just spend more time and money
getting a good algorithm, or because of different underlying assumptions
about write/read patterns.
Even in an enterprise environment, there's some very different write
patterns possible. A "scratch" device might get written randomly, while a
"logging" device will tend to be written sequentially. Consider something
like a credit card processing system. This is going to have a lot of "add
at the end" transaction data. As opposed to, say, a library catalog where
books are checked out essentially at random, and you update the "check
out/check in" status, and writes are sprinkled randomly through out the
Sadly, much of this will not be particularly well documented, if at all.
More information about the Beowulf