[Beowulf] Pretty Big Data
Lux, Jim (337C)
james.p.lux at jpl.nasa.gov
Mon Jan 25 13:42:29 PST 2016
SSDs have substantially slower latency/access time than DRAM /SRAM (microseconds vs nanoseconds). It's true that if you're doing sequential or well structured reads you might be able to get it faster, and the same is true of spinning drives.
And if you do *any* writes (or erases, which are even slower), then your SSD/Flash memory is really going to have a big throughput hit.
I think the original thought is that for some set of problems, a brute force keep it all in RAM approach is as good, if not better, than anything more sophisticated, if you're starting to put development/test/etc costs.
As hardware evolves, though, a lot of the "more sophisticated" will get buried in the hardware. Very few people explicitly manage the wear leveling and error detection and correction in Flash memory, for example. They leave it to some ASIC that does it for you (at a substantial cost in power, as it happens). Just as spinning drives these days provide an abstracted interface, so you're not managing blocks, tracks, and cylinders.
What's interesting is that although people have built them over the years, there doesn't seem to be a persistent demand for "smart file storage" that is hardware based (e.g. something that would do indexed or hashed file data retrieval).
From: Perry E. Metzger [mailto:perry at piermont.com]
Sent: Monday, January 25, 2016 8:21 AM
To: Lux, Jim (337C) <james.p.lux at jpl.nasa.gov>
Cc: Beowulf Mailing List <beowulf at beowulf.org>
Subject: Re: [Beowulf] Pretty Big Data
On Mon, 25 Jan 2016 15:06:44 +0000 "Lux, Jim (337C)"
<james.p.lux at jpl.nasa.gov> wrote:
> Figure that RAM has a read cycle time of 1 ns (DDR3 or DDR4)
But not a *latency* of 1ns.
> High speed disk drives (in the spinning rust, sense) in ³commodity²
> kinds of speeds seem to be in the 7200 RPM range, which is a 4
> millisecond average latency.
But everyone uses SSD at this point for any real apps, and generally directly attached to the PCIe.
> So, if we¹re comparing searching a TB of data in RAM vs searching a TB
> of data on a disk, I think the RAM is always going to win if it¹s a
> sequential search.
Sure, but our original question was a sequential search of RAM vs. an indexed search of mass storage.
The point I was making was that no amount of RAM will protect you from sufficiently stupid data structure choices if the data is big enough and you hit the data structures hard and often enough.
Perry E. Metzger perry at piermont.com
More information about the Beowulf