[Beowulf] mpi slow pairs
hakon.bugge at gmail.com
Sun Aug 31 00:11:23 PDT 2014
On 29. aug. 2014, at 22.20, Michael Di Domenico wrote:
> i believe all the pairs do pass through a spine.
IB is destination routed. Without knowing your topology and/or traffic pattern, I expect the pairs connected to the same leaf not to go though the spine. And given that is the case, you are vulnerable to the parking-lot problem
> i'm not familiar
> with the "parking-lot problem", i'll google it, but suspect a
> bazillion hits will come back
You will. But if there are no erroneous components in your system, the behavior of your system is a function of the system itself and the workload given to it. The parking-lot problem might be an explanation of less than perfect bnehaviour. Not saying it is though.
>> Sendt fra min HTC
>> ----- Reply message -----
>> Fra: "Michael Di Domenico" <mdidomenico4 at gmail.com>
>> Til: "Beowulf Mailing List" <Beowulf at beowulf.org>
>> Emne: [Beowulf] mpi slow pairs
>> Dato: fre., aug. 29, 2014 18:09
>> On Fri, Aug 29, 2014 at 11:38 AM, John Hearns <John.Hearns at viglen.co.uk>
>>>> Also have you run ibdiagnet to see if anything is flagged up?
>>> i've run a multitude of ib diags on the machines, but nothing is popping
>>> out as wrong. what's weird is that it's only certain pairing of machines
>>> not any one machine in general.
>>> Would that then be a problem in one of the blades or a part of the switch?
>> not sure yet, i think on the spine modules in the switch is silently
>> failing to send traffic a full speed, but i've not been able to
>> "prove" this yet.
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf