[Beowulf] PCI configuration space errors ? (was Nvidia K20 + Supermicro mobo)

Adam DeConinck ajdecon at ajdecon.org
Tue Jul 23 13:31:19 PDT 2013


Hi Mikhail,

Passing on thoughts from a couple of colleagues at NVIDIA:

> BARs are setup by SBIOS.  It looks like the mapping isn't allowing enough room for our big bars (the first BAR eats the bridge window, then boom).  I defer to Mark's wisdom.
>
> [Initially it looked to me like they were trying to do Xen passthrough....]

> You can check out BAR's with:
>
> [mmonger at localhost ~]$ lspci -vvvv -d "10de:*" | grep Region
>        Region 0: Memory at b4000000 (32-bit, non-prefetchable) [size=16M]
>        Region 1: Memory at a8000000 (64-bit, prefetchable) [size=128M]
>        Region 3: Memory at b0000000 (64-bit, prefetchable) [size=32M]
>        Region 5: I/O ports at 3000 [size=128]
>        Region 0: Memory at b3000000 (32-bit, non-prefetchable) [size=16M]
>        Region 1: Memory at 98000000 (64-bit, prefetchable) [size=128M]
>        Region 3: Memory at a0000000 (64-bit, prefetchable) [size=32M]
>        Region 5: I/O ports at 2000 [size=128]
>
> You should see an address for all 4 (per gpu) regions (BAR's).
> If you see "<unassigned>" that's bad.
>
> If BAR's are all assigned then also need to be sure the upstream bridge has a matching assignment.
>
> Xen and ESX have special requirements so if they are doing pass through let me know.

IIRC I don't think you're doing any virtualization, so it might be
worth trying to do the lspci check to see if all the BARs are visible.

Thanks,
Adam

On Mon, Jul 22, 2013 at 9:14 AM, Mikhail Kuzminsky <mikky_m at mail.ru> wrote:
> Let me try to forgot (to distract from) GPUs. I don't know, "who" setup BARs for PCI-E devices: BIOS or Linux kernel (OpenSUSE 12.3 kernel 3.7.10-1.1 - in my case). Here (below) is presented part of /var/log/messages, but at the corresponding moment of kernel loading there is no Nvidia GPU driver loaded (PCI 01:00.0)
>
> -----------------------from /var/log/messages------
> 2013-07-21T02:28:58.348552+04:00 c6ws4 kernel: [    0.432261] ACPI: ACPI bus type pnp unregistered
> 2013-07-21T02:28:58.348554+04:00 c6ws4 kernel: [    0.438011] pci 0000:00:01.0: BAR 15: can't assign mem pref (size 0x18000000)
> 2013-07-21T02:28:58.348555+04:00 c6ws4 kernel: [    0.438015] pci 0000:00:01.0: BAR 14: assigned [mem 0xd1000000-0xd1ffffff]
> 2013-07-21T02:28:58.348555+04:00 c6ws4 kernel: [    0.438018] pci 0000:01:00.0: BAR 1: can't assign mem pref (size 0x10000000)
> 2013-07-21T02:28:58.348556+04:00 c6ws4 kernel: [    0.438020] pci 0000:01:00.0: BAR 3: can't assign mem pref (size 0x2000000)
> 2013-07-21T02:28:58.348557+04:00 c6ws4 kernel: [    0.438023] pci 0000:01:00.0: BAR 0: assigned [mem 0xd1000000-0xd1ffffff]
> 2013-07-21T02:28:58.348558+04:00 c6ws4 kernel: [    0.438026] pci 0000:01:00.0: BAR 6: can't assign mem pref (size 0x80000)
> 2013-07-21T02:28:58.348558+04:00 c6ws4 kernel: [    0.438028] pci 0000:00:01.0: PCI bridge to [bus 01]
> 2013-07-21T02:28:58.348559+04:00 c6ws4 kernel: [    0.438031] pci 0000:00:01.0:   bridge window [mem 0xd1000000-0xd1ffffff]
> 2013-07-21T02:28:58.348561+04:00 c6ws4 kernel: [    0.438035] pci 0000:00:1c.0: PCI bridge to [bus 02]
> ---------------------------------------------------------
>
> Of course, there is much more than 2 PCI devices in the system (based on Supermicro X9SCA-F, last BIOS v.2.0b), but only for 2 of them exist such BAR error messages: for PCI Bridge (00:01.0, Xeon E3-1230 PCI-E port) and for Nvidia/PNY K20c at 01:00.0.
>
> Does this means some BIOS problems - or it's result of absence of loaded nvidia driver  ?
>
> The BAR error messages above are presented independently of BIOS/PCI settings - a) 4G decoding enabled/disabled b) is PCI-E Gen.2 mode forced (instead of Gen.3) or no.
>
> Mikhail Kuzminsky
> Computer Assistance to Chemical Research Center
> Zelinsky Institute of Organic Chemistry
> Moscow
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list