[Beowulf] RAID question

Jörg Saßmannshausen j.sassmannshausen at ucl.ac.uk
Tue Mar 17 14:57:43 PDT 2015


Hi David,

for me it looks like either a controller or disc issue.

I have seen these problems before on SCSI discs when the controller had a 
problem. Depending on the manufacturer it might be a good idea to contact them 
and see if they got more informations here. I have had some problems in the 
past with RAID controllers and the manufacturer here was ever so helpful in 
the diagnosis and repair of a failed RAID5 for example.

So it might be a good idea to try them.

All the best from a cold London

Jörg


On Montag 16 März 2015 mathog wrote:
> Thanks for the feedback.
> 
> After copying /boot and /bin from another machine and mucking about with
> grub for far too long (had to edit grub.conf to change virtual disk
> names, and in CentOS's rescue disk it saw the boot disk as hd1, but when
> grub actually started, it saw it as hd0) the system is back on line.
> 
> The logs don't show a root command line that specifically took out those
> directories.  They do show a bunch of scripts being run.  My best guess
> is that one of them did something like this:
> 
>    AVAR=`command that failed and returned an empty string`
>    rm -rf ${AVAR}/b*
> 
> It seems unlikely that a low level controller failure would have snipped
> out those files/directories without resulting in a file system that was
> seen as corrupt by fsck.
> 
> That said, there is something hardware related going on, since
> /var/log/messages has a lot of these (sorry about the wrap):
> 
> Mar 16 12:37:27 mandolin kernel: sd 7:0:0:0: [sdb]  Sense Key :
> Recovered Error [current] [descriptor]
> Mar 16 12:37:27 mandolin kernel: Descriptor sense data with sense
> descriptors (in hex):
> Mar 16 12:37:27 mandolin kernel:        72 01 04 1d 00 00 00 0e 09 0c 00
> 00 00 00 00 00
> Mar 16 12:37:27 mandolin kernel:        00 4f 00 c2 40 50
> Mar 16 12:37:27 mandolin kernel: sd 7:0:0:0: [sdb]  ASC=0x4 ASCQ=0x1d
> 
> That group has several other similar Dell servers, and this is the only
> one logging these.  sdb1 holds /boot and sdb2 is where the lvm keeps its
> information.
> 
> Regards,
> 
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf


-- 
*************************************************************
Dr. Jörg Saßmannshausen, MRSC
University College London
Department of Chemistry
Gordon Street
London
WC1H 0AJ 

email: j.sassmannshausen at ucl.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: This is a digitally signed message part.
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20150317/547fdee6/attachment.sig>


More information about the Beowulf mailing list