[Beowulf] How to debug slow compute node?
meatheadmerlin at gmail.com
Sat Aug 12 00:35:46 PDT 2017
This may be a long shot, especially in a server room where everything else
is working as expected.
It may be the case that there is nothing wrong with the machine itself,
but rather with the level of power supplied to the machine by the
I have seen incorrectly supplied power levels cause unpredictable
behaviour.in a machine.
But, as I said, it is a long shot.
On Fri, Aug 11, 2017 at 11:35 PM, Chris Samuel <samuel at unimelb.edu.au>
> On Friday, 11 August 2017 12:39:07 AM AEST Faraz Hussain wrote:
> > I thought it may have to do with cpu scaling, i.e when the kernel
> > changes the cpu speed depending on the workload. But we do not have
> > that enabled on these machines.
> Just to add to the excellent suggestions from others: have you compared
> UEFI settings & versions across these nodes to ensure they're identical?
> Also remember that the kernel can enable C states that hurt performance
> if they are disabled in the BIOS/UEFI. This was painfully apparent on our
> first SandyBridge cluster that almost failed the performance part of
> testing until it got found.
> Now we boot all nodes with this in the kernel cmdline:
> intel_idle.max_cstate=0 processor.max_cstate=1 intel_pstate=disable
> Best of luck!
> Christopher Samuel Senior Systems Administrator
> Melbourne Bioinformatics - The University of Melbourne
> Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf