[Beowulf] [External] anyone have modern interconnect metrics?

Douglas Eadline deadline at eadline.org
Wed Jan 24 19:39:35 UTC 2024


--snip--

> Core counts are getting too high to be of use in HPC. High core-count
> processors sound great until you realize that all those cores are now
> competing for same memory bandwidth and network bandwidth, neither of
> which increase with core-count.
>
> Last April we were evaluating test systems from different vendors for a
> cluster purchase. One of our test users does a lot of CFD simulations
> that are very sensitive to mem bandwidth. While he was getting a 50%
> speed up in AMD compared to Intel (which makes sense since AMDs require
> 12 DIMM slots to be filled instead of Intel's 8), he asked us consider
> servers with LESS cores. Even with the AMDs, he was saturating the
> memory bandwidth before scaling to all the cores, causing his
> performance to plateau. For him, buying cheaper processors with lower
> core-counts was better for him, since the savings would allow us to by
> additional nodes, which would be more beneficial to him.
>

So it does depend on the application <ducks>

Also, server processors are mainly designed for cloud use (their
biggest customers) which means large numbers of cores. Besides
memory BW there is also clock speed. Speeds are based on thermals
which are based on how busy the CPUs are.  For webby/clould
bursty loads, this works okay you can hit "turbo speeds",
but load the system with  one HPC process per core
and you are now running at base frequency with
some crappy memory BW.

--
Doug



>>
>> Alternatively, are there other places to ask? Reddit or something less
>> "greybeard"?
>
> I've been very disappointed with the "expertise" on the HPC-related
> subreddits. Last time I lurked there, it seemed very amateurish/DIY
> oriented. For example, someone wanted to buy all the individual
> components and build assemble their own nodes for an entire cluster at
> their job. Can you imagine? Most of the replies were encouraging them to
> do so....
>
>   You might want to join the HPCSYSPROS Slack channel and ask there. HPC
> SYSPROS is an ACM SIG for HPC system admins that runs workshops every
> year at SC. click on the "Get Involved" link on this page:
>
> https://sighpc-syspros.org/
>
> --
> Prentice
>
>
> On 1/16/24 5:19 PM, Mark Hahn wrote:
>> Hi all,
>> Just wondering if any of you have numbers (or experience) with
>> modern high-speed COTS ethernet.
>>
>> Latency mainly, but perhaps also message rate.  Also ease of use
>> with open-source products like OpenMPI, maybe Lustre?
>> Flexibility in configuring clusters in the >= 1k node range?
>>
>> We have a good idea of what to expect from Infiniband offerings,
>> and are familiar with scalable network topologies.
>> But vendors seem to think that high-end ethernet (100-400Gb) is
>> competitive...
>>
>> For instance, here's an excellent study of Cray/HP Slingshot (non-COTS):
>> https://arxiv.org/pdf/2008.08886.pdf
>> (half rtt around 2 us, but this paper has great stuff about
>> congestion, etc)
>>
>> Yes, someone is sure to say "don't try characterizing all that stuff -
>> it's your application's performance that matters!"  Alas, we're a
>> generic
>> "any kind of research computing" organization, so there are thousands
>> of apps
>> across all possible domains.
>>
>> Another interesting topic is that nodes are becoming many-core - any
>> thoughts?
>>
>> Alternatively, are there other places to ask? Reddit or something less
>> "greybeard"?
>>
>> thanks, mark hahn
>> McMaster U / SharcNET / ComputeOntario / DRI Alliance Canada
>>
>> PS: the snarky name "NVidiband" just occurred to me; too soon?
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>


-- 
Doug



More information about the Beowulf mailing list