[Beowulf] Third-party drives not permitted on new Dell servers?

Joe Landman landman at scalableinformatics.com
Mon Feb 15 17:41:08 PST 2010


Rahul Nabar wrote:
> This was the response from Dell, I especially like the analogy:
> 
> [snip]
>> There are a number of benefits for using Dell qualified drives in
>> particular ensuring a ***positive experience*** and protecting
>> ***our data***. While SAS and SATA are industry standards there are
>> differences which occur in implementation.  An analogy is that
>> English is spoken in the UK, US >and Australia. While the language
>> is generally the same, there are subtle differences in word usage
>> which can lead to confusion. This exists in >storage subsystems as
>> well. As these subsystems become more capable, faster and more
>> complex, these differences in implementation can have >greater
>> impact.
> [snip]
> 
> I added the emphasis. I am in love Dell-disks that get me "the 
> positive experience". :)

Please indulge my taking a contrarian view based upon the products we 
sell/support/ship.

I see significant derision heaped upon these decisions, which are called 
"marketing decisions" by Dell and others.  It couldn't be possible, in 
most commenter's minds that they might actually have a point ...

... I am not defending Dell's language (I wouldn't use this or allow 
this to be used in our outgoing marketing/customer communications).

Let me share an anecdote.  I have elided the disk manufacturers name to 
protect the guilty.  I will not give hints as to whom they are, though 
some may be able to guess ... I will not confirm.

We ship units with 2TB (and 1.5TB) drives among others.  We burn in and 
test these drives.  We work very hard to insure compatibility, and to 
make sure that when users get the units, that the things work.  We 
aren't perfect, and we do occasionally mess up.  When we do, we own up 
to it and fix it right away.  Its a different style of support.  The 
buck stops with us.  Period.

So along comes a drive manufacturer, with some nice looking specs on 2TB 
(and some 1.5 and 1 TB) drives.  They look great on paper.  We get them 
into our labs, and play with them, and they seem to run really well. 
Occasional hiccup on building RAIDs, but you get that in large batches 
of drives.

So now they are out in the field for months, under various loads.  Some 
in our DeltaV's, some in our JackRabbits.  The units in the DeltaV's 
seem to have a ridiculously high failure rate.  This is not something we 
see in the lab.  Even with constant stress, horrific sustained workloads 
... they don't fail in ou testing.  But get these same drives out into 
the users hands ... and whammo.

Slightly different drives in our JackRabbit units, with a variety of 
RAID controllers.  Same types of issues.  Timeouts, RAID fall outs, etc.

This is not something we see in the lab in our testing.  We try 
emulating their environments, and we can't generate the failures.

Worse, we get the drives back after exchanging them at our cost with new 
replacements, only to find out, upon running diagnostics, that the 
drives haven't failed according to the test tool.  This failing drive 
vendor refuses to acknowledge firmware bugs, effectively refuses to 
release patches/fixes.

Our other main drive vendor, while not currently with a 2TB drive unit, 
doesn't have anything like this manufacturers failure rate in the field. 
  When drives die in the field, they really ... really die in the field. 
  And they do fix their firmware.

So we are now moving off this failing manufacturer (its a shame as they 
used to produce quality parts for RAID several years ago), and we are 
evaluating replacements for them.  Firmware updates are a critical 
aspect of a replacement.  If the vendor won't allow for a firmware 
update, we won't use them.

So ... this anecdote complete, if someone called me up and said "Joe, I 
really want you to build us an siCluster for our storage, and I want you 
  to use [insert failing manufacturer's name here] drives because we 
like them", what do you think my reaction should be?  Should it be 
"sure, no problem, whatever you want" ... with the subsequent problems 
and pain, for which we would be blamed ... or should it be "no, these 
drives don't work well ... deep and painful experience at customer sites 
shows that they have bugs in their firmware which are problematic for 
RAID users ... we are attempting to get them to give us the updated 
firmware to help the existing users, but we would not consider shipping 
more units with these drives due to their issues."

Is that latter answer, which is the correct answer, a marketing answer?

Yeah, SATA and SAS are standards.  Yeah, in theory, they all do work 
together.  In reality, they really don't, and you have to test. 
Everyone does some aspect slightly different and usually in software, so 
they can fix it if they messed up.  If their is a RAID timeout bug due 
to head settling timing, yeah, this is fixable.  But if the disk 
manufacturer doesn't want to fix it ...  its your companies name on the 
outside of that box.  You are going to take the heat for their problems.

Note:  This isn't just SATA/SAS drives, there are a whole mess of things 
that *should* work well together, but do not.  We had some exciting 
times in the recent past with SAS backplanes that refused to work with 
SAS RAID cards.  We've had some excitment from 10GbE cards, IB cards, 
etc. that we shouldn't have had.

I can't and won't sanction their tone to you ... they should have 
explained things correctly.  Given that PERC are rebadged LSI, yeah, I 
know perfectly well a whole mess of drives that *do not* work correctly 
with them.

So please don't take Dell to task for trying to help you avoid making 
what they consider a bad decision on specific components.  There could 
be a marketing aspect to it, but support is a cost, and they want to 
minimize costs.  Look at failure rates, and toss the suppliers who have 
very high ones.



-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the Beowulf mailing list