[Beowulf] How to debug slow compute node?

Chris Samuel samuel at unimelb.edu.au
Fri Aug 11 20:35:57 PDT 2017


On Friday, 11 August 2017 12:39:07 AM AEST Faraz Hussain wrote:

> I thought it may have to do with cpu scaling, i.e when the kernel
> changes the cpu speed depending on the workload. But we do not have
> that enabled on these machines.

Just to add to the excellent suggestions from others: have you compared BIOS/
UEFI settings & versions across these nodes to ensure they're identical?

Also remember that the kernel can enable C states that hurt performance even 
if they are disabled in the BIOS/UEFI.   This was painfully apparent on our 
first SandyBridge cluster that almost failed the performance part of acceptance 
testing until it got found.

Now we boot all nodes with this in the kernel cmdline:

intel_idle.max_cstate=0 processor.max_cstate=1 intel_pstate=disable

Best of luck!
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545



More information about the Beowulf mailing list