[Beowulf] SSD prices - q: how many writes/erases???
Lux, James P
james.p.lux at jpl.nasa.gov
Thu Dec 18 06:39:16 PST 2008
>> If they are cherry picked, then it's not really a turning point
>> in SSDs. It also bothers me that a normal production run can
>> have parts with rewrites at 100K and 1M. Sounds like there
>> are some variabilities that can't be controlled.
> Similar things happen with RAM.
And in CPUs with respect to speed grades, for instance.
It's not inherently a bad thing, as long as the selection process is well
understood by the manufacturer (e.g. They obviously can't do a long term
life test on them, so they're using some other non-destructive parameter or
process indicator that correlates with life)
>> Don't forget that 100K rewrites are SLC products and MLC's
>> are general 10K. Although the Intel drives state 100K with
>> MLC (and pretty decent performance). I'm not sure of the
> Hmmm ...
> I can't find the Intel P/E cycle, but the Ridata units are 2x10^6 (2E+6).
Is that the underlying device wearout life, or is it the apparent life at
the "integrated unit"'s external interface. For instance, if they had a
wear leveler and some smart EDAC inside an ASIC that provides the interface,
and just added extra capacity to account for the life.
After all, it's not like at N cycles, the device stops working. It just
starts working "less well" and throwing more errors, and I'll guess (since I
don't have the data here in front of me) that there's a fair amount of
variability, even within a single device.
Consider the testing needed to exhaustively verify the 2E6 number.. 16GB of
512 byte sectors.. That's 160E6 sectors, roughly. They don't give an "erase
time" spec, but let's just say 1 millisecond to make things easy. So, to do
one erase on ALL sectors takes 160,000 seconds, or about 2 days.
In a mere 4 million or so days, one could actually verify the erase life.
One can beat a single sector to death in 20,000 seconds or 6 hours. But, is
a single sector a valid test? Nope.. You KNOW the EDAC is going to get in
the way, not to mention that a single sector test doesn't address the
variability across the device issue. You'd probably want to sample, oh, 100
or 1000 or so sectors of the 160 million, to get a reasonable statistical
estimate. Now you're back in the days and weeks and months of testing (6000
hours is the better part of a year) regime.
You could run accelerated life tests (very common for other electronics),
where you run it hot. BUT... That's where the wear out model vs temperature
becomes important, and I don't know that Flash is sufficiently well
understood for that. Sure, for 2N2222 silicon junction transistors,
accelerated life testing works, or even for a ?ium CPU.. But for a device
where the basic mode of operation is tied to leakage currents and charge
More information about the Beowulf