[Beowulf] Problem with Single RAID disk larger than 2TB and Linux

Wed Oct 3 04:29:23 PDT 2007

Dear Beowulfers,

We ran into a problem with large disks which I suspect is fairly common,
however the usual solutions are not working.  IBM, RedHat have not been able
to provide any useful answers so I am turning to this list for help. (Emulex
is still helping, but I am not sure how far they can go without access to
the hardware)

Details:

* Linux Cluster for Weather modelling

*  IBM Bladecenter blades and an IBM x3655 Opteron head node FC attached to
a Hitachi Tagmastore SAN storage, Emulex LightPulse FC HBA, PCI-Express,
Dual port

* RHEL 4update5, x86_64 kernel 2.6.9-55 SMP and RHEL provided Emulex driver
(lpfc) and lpfcdfc also installed

* GPT partition created with parted

There is one 2TB LUN, works fine.

There is a 3TB LUN on the Hitachi SAN which is reported as "only" 2199GB (
2.1TB) ,

We noticed that, when the emulex driver loads, the following error message
is reported:

            Emulex LightPulse Fibre Channel SCSI driver 8.0.16.32
            Copyright(c) 2003-2007 Emulex.  All rights reserved.
            ACPI: PCI Interrupt 0000:2d:00.0[A] -> GSI 18 (level, low) ->
IRQ 185
            PCI: Setting latency timer of device 0000:2d:00.0 to 64
            lpfc 0000:2d:00.0: 0:1305 Link Down Event x2 received Data: x2
x4 x1000
            lpfc 0000:2d:00.0: 0:1305 Link Down Event x2 received Data: x2
x4 x1000
            lpfc 0000:2d:00.0: 0:1303 Link Up Event x3 received Data: x3 x1
x10 x0
            scsi5 : IBM 42C2071 4Gb 2-Port PCIe FC HBA for System x on PCI
bus 2d device 00 irq 185 port 0
            Vendor: HITACHI   Model: OPEN-V*3          Rev: 5007
            Type:   Direct-Access                      ANSI SCSI revision:
03
            sdb : very big device. try to use READ CAPACITY(16).
            sdb : READ CAPACITY(16) failed.
            sdb : status=1, message=00, host=0, driver=08
            sdb : use 0xffffffff as device size
            SCSI device sdb: 4294967296 512-byte hdwr sectors (2199023 MB)
            SCSI device sdb: drive cache: write back
            sdb : very big device. try to use READ CAPACITY(16).
            sdb : READ CAPACITY(16) failed.
            sdb : status=1, message=00, host=0, driver=08
            sdb : use 0xffffffff as device size
            SCSI device sdb: 4294967296 512-byte hdwr sectors (2199023 MB)
            SCSI device sdb: drive cache: write back

The problem is with the READ CAPACITY(16) failed, but we are unable to find
the source of this error.

We conducted several experiments without success:

- Tried compiling the latest driver from Emulex (8.0.16.32) - same error
- Tried Knoppix (2.6.19) and Gentoo LiveCD (2.6.19) , and CentOS 4.4   -
same error
- Tried to boot Belenix (Solaris 32 bit live), failed to boot completely
(may be unrelated issue)

We have a temporary workaround in place: We created 3x1TB disks and used LVM
to create a striped 3TB  volume with ext3 FS. This works fine.

RedHat claims ext3 and RHEL4  supports disks upto 8TB and 16TB respectively
(since RHEL4u2)

I would like to know if anyone on the list has any pointers that can help us
solve the issue.

Regards
Anand Vaidya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20071003/118f7440/attachment.html>