[Beowulf] Big storage

Thu Sep 6 15:10:41 PDT 2007

(sorry for the delay in responding, hit a heavy spell of work ... will
join the other message I found interesting with this ... this is not
trying to be a commercial either, so I apologize in advance if the
analysis comes off looking like that)

Loic Tortay wrote:
> According to Joe Landman:
>> Bruce Allen wrote:

[...]

>>> Can you boot from a USB device?  You can have an inexpensive RAID-1 USB
>>> device for the root and OS.
>> FWIW:  Other solutions can boot from USB, Flash, ... and provide the 48
>> drive capability.
>>
> Sure, and so can the X4500. :-)

Great.

> But, with only two RAID controllers there is no way your machine will 
> survive a controller failure when using RAID-5 or RAID-6 unless you're 
> willing to half the available space (or add the optional extra 
> controllers).

I don't understand this as you wrote in another message:

> We did not loose any data due to the controller failure.
> 
> The problem occured a few days before a scheduled downtime, the
> mainboard was replaced during the downtime and the machine rebooted
> just fine.

So it seems what you ascribe to be a problem for JackRabbit is also a
problem for x4500, although the pain of replacing a motherboard is
somewhat higher than a PCIe card ...

Aside from that, we can have up to 4 RAID controllers (x4500 has 6 or 8
SATA controllers with 8 or 6 drives per controller).  Thus I think this
issue may not be one you would want to press hard ...  (x4500 has the
same issue, albeit worse, as we can replace PCIe).  In future
generations, with dual ported SAS, we should be able to have redundant
controllers.  This will cost more though, not sure if it is worth it for
the beowulf crowd.

> 
> The density of the X4500 is also slightly better (48 disks in 4U 
> instead of 5U).

Hmmm... we measure density differently.  Here is how I measure it.

x4500		: 24TB RAW / 4U = 6.0 TB/U RAW
JackRabbit	: 36TB RAW / 5U = 7.2 TB/U RAW

and to up the ante a bit ...

JackRabbit-XL	: 48TB RAW / 5U = 9.6 TB/U RAW

So by using the definition of storage density in terms of density of
bytes per vertical rack unit, I think JackRabbit is slightly better to
use your words ...

If you are measuring the number of disk slots per rack unit, yes, you
have 20% more disk slots per 41U rack.  However, I don't know of anyone
measuring storage density that way, and I don't see 1TB drives hitting
the x4500 any time soon.  On a per rack basis today you can get (raw
storage):

x4500 		(10/41U rack):	480*0.5TB = 240 TB/rack
JackRabbit	(8/41U rack):	384*0.75TB= 288 TB/rack
JackRabbit XL	(8/41U rack):	384*1.0 TB= 384 TB/rack

Of course, 2 of those x4500 drives are for the OS, so you really can't
use all 48 slots for storage.  With JackRabbit, we are currently using
80GB of space on the volumes, though in short order we will be using
other technologies for the boot drives.  Including I might add,
something Marc Hahn mentioned:  our servers will boot "diskless" so all
48 drives are available for storage.

> As of today we have 112 X4500, 112U are almost 3 racks which is quite 
> a lot due to our floor space constraints.

Ok, I am not trying to convert you.  You like your Sun boxen, and that
is great.

I will do a little math.  BTW:  thats a fairly impressive size floor you
have there.  112U of x4500 or 112 x4500?

Assuming the former, 112U of x4500 ~ 28 x4500 which is about (best case
RAW) 672 TB.

112 of x4500 (your Sun rep must be *really* happy :) ) is 2.7 PB RAW.

Not sure which you mean, though you indicate 3 racks, so I will assume
the 112U of x4500.

Using the list price of $36k USD for this, this is, ballpark, about $1M
USD in storage, neglecting other elements (support, ...)

To get that same RAW capacity with the JackRabbit (36 TB unit), you
would need 19 units (18.667).  At about $36k for the 36 TB unit, you
would be looking at $684k USD.  19 units would take up 2 racks and
change (a little of a third rack).  For about 2/3rds the (list) price.

Of course, the JackRabbit XL, at 48 TB RAW would require 14 units for
this capacity, which would fit nicely in 2 racks.  And using a current
price of $48k USD for the units, this would come in somewhat less,
around $672k USD.  Using less power, less floor space, and providing
greater storage density.

For that same 1M$ USD, you could have 21 of the 48 TB JackRabbit XL's,
which would take up 2 and change racks, providing greater than 1 PB RAW.

And for whatever it is worth, you can simply use all 48 drives as JBOD,
giving you the same config (albeit with a faster set of SATA controllers
and overall more scalable internal design).

> Besides, Sun machine have been available for one year and they can 

10 months for JackRabbit ... Is that 2 months difference that valuable?  :)

> deliver and maintain theirs reliably wherever we want (specifically 
> in Lyon, France).

Hmmm...  nothing I can talk about yet, though we are working on business
(and with partners providing support/fulfillment) from Pune India,
through the UK and Europe, through the US, ....  :)

> The main performance figure on your website is quite pointless (sorry) 

Hmmm... how so?

> since there is no mention of the number of files, file(s) size(s), 
> number of threads, block size and amount of bytes written.

Oh... ok, we are in agreement that the reader needs more data.  I am
assuming you didn't read the benchmark report.  You might want to pull
that down.  Explains it in detail.  The picture is a snapshot from the
report.

> I do 1.7 GB/s with random writes on a X4500, so my <whatever> is 
> larger (and pointless as well). ;-)

Well, I have lots of concern about using IOzone and others as tests, and
it sounds to me like we are in agreement here.  However, I can't tell
you how many RFPs we see requiring specific performance or higher, so I
feel compelled to tow that line.

What I like are real application tests.  We don't see many (enough) of
them.  I think I have seen one customer benchmark over the last 6 years
that was both real (as in real operating code) that actually stressed an
 IO system to any significant degree.

Aside from that, many (most) of the tests we see are cache bound/cache
constrained.  Far too many...  look at any IOzone result for file sizes
less than ram size, and you are looking at cache effects (unless you are
forcing commits to disk or sync/writethrough).

> Loïc.

Regardless of that, I do appreciate your comments with regards to the
tests.  Maybe worth talking about this offline at some point (or if you
will be at SC07).  My major concern with most tests are that they
generate numbers that users (and vendors) simply report without a
detailed and in-depth discussion and analysis.  This is I believe your
criticism, and if you look through the benchmark report, you will see
that a substantial fraction is explaining what you see and why you see
what you see.

Again, thanks for the note!

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
       http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615