[Beowulf] Third-party drives not permitted on new Dell servers?
landman at scalableinformatics.com
Mon Feb 15 17:41:08 PST 2010
Rahul Nabar wrote:
> This was the response from Dell, I especially like the analogy:
>> There are a number of benefits for using Dell qualified drives in
>> particular ensuring a ***positive experience*** and protecting
>> ***our data***. While SAS and SATA are industry standards there are
>> differences which occur in implementation. An analogy is that
>> English is spoken in the UK, US >and Australia. While the language
>> is generally the same, there are subtle differences in word usage
>> which can lead to confusion. This exists in >storage subsystems as
>> well. As these subsystems become more capable, faster and more
>> complex, these differences in implementation can have >greater
> I added the emphasis. I am in love Dell-disks that get me "the
> positive experience". :)
Please indulge my taking a contrarian view based upon the products we
I see significant derision heaped upon these decisions, which are called
"marketing decisions" by Dell and others. It couldn't be possible, in
most commenter's minds that they might actually have a point ...
... I am not defending Dell's language (I wouldn't use this or allow
this to be used in our outgoing marketing/customer communications).
Let me share an anecdote. I have elided the disk manufacturers name to
protect the guilty. I will not give hints as to whom they are, though
some may be able to guess ... I will not confirm.
We ship units with 2TB (and 1.5TB) drives among others. We burn in and
test these drives. We work very hard to insure compatibility, and to
make sure that when users get the units, that the things work. We
aren't perfect, and we do occasionally mess up. When we do, we own up
to it and fix it right away. Its a different style of support. The
buck stops with us. Period.
So along comes a drive manufacturer, with some nice looking specs on 2TB
(and some 1.5 and 1 TB) drives. They look great on paper. We get them
into our labs, and play with them, and they seem to run really well.
Occasional hiccup on building RAIDs, but you get that in large batches
So now they are out in the field for months, under various loads. Some
in our DeltaV's, some in our JackRabbits. The units in the DeltaV's
seem to have a ridiculously high failure rate. This is not something we
see in the lab. Even with constant stress, horrific sustained workloads
... they don't fail in ou testing. But get these same drives out into
the users hands ... and whammo.
Slightly different drives in our JackRabbit units, with a variety of
RAID controllers. Same types of issues. Timeouts, RAID fall outs, etc.
This is not something we see in the lab in our testing. We try
emulating their environments, and we can't generate the failures.
Worse, we get the drives back after exchanging them at our cost with new
replacements, only to find out, upon running diagnostics, that the
drives haven't failed according to the test tool. This failing drive
vendor refuses to acknowledge firmware bugs, effectively refuses to
Our other main drive vendor, while not currently with a 2TB drive unit,
doesn't have anything like this manufacturers failure rate in the field.
When drives die in the field, they really ... really die in the field.
And they do fix their firmware.
So we are now moving off this failing manufacturer (its a shame as they
used to produce quality parts for RAID several years ago), and we are
evaluating replacements for them. Firmware updates are a critical
aspect of a replacement. If the vendor won't allow for a firmware
update, we won't use them.
So ... this anecdote complete, if someone called me up and said "Joe, I
really want you to build us an siCluster for our storage, and I want you
to use [insert failing manufacturer's name here] drives because we
like them", what do you think my reaction should be? Should it be
"sure, no problem, whatever you want" ... with the subsequent problems
and pain, for which we would be blamed ... or should it be "no, these
drives don't work well ... deep and painful experience at customer sites
shows that they have bugs in their firmware which are problematic for
RAID users ... we are attempting to get them to give us the updated
firmware to help the existing users, but we would not consider shipping
more units with these drives due to their issues."
Is that latter answer, which is the correct answer, a marketing answer?
Yeah, SATA and SAS are standards. Yeah, in theory, they all do work
together. In reality, they really don't, and you have to test.
Everyone does some aspect slightly different and usually in software, so
they can fix it if they messed up. If their is a RAID timeout bug due
to head settling timing, yeah, this is fixable. But if the disk
manufacturer doesn't want to fix it ... its your companies name on the
outside of that box. You are going to take the heat for their problems.
Note: This isn't just SATA/SAS drives, there are a whole mess of things
that *should* work well together, but do not. We had some exciting
times in the recent past with SAS backplanes that refused to work with
SAS RAID cards. We've had some excitment from 10GbE cards, IB cards,
etc. that we shouldn't have had.
I can't and won't sanction their tone to you ... they should have
explained things correctly. Given that PERC are rebadged LSI, yeah, I
know perfectly well a whole mess of drives that *do not* work correctly
So please don't take Dell to task for trying to help you avoid making
what they consider a bad decision on specific components. There could
be a marketing aspect to it, but support is a cost, and they want to
minimize costs. Look at failure rates, and toss the suppliers who have
very high ones.
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf