[Beowulf] Not all cores are created equal
niftyompi at niftyegg.com
Tue Dec 30 20:41:22 PST 2008
On Mon, Dec 29, 2008 at 4:27 PM, Chris Samuel <csamuel at vpac.org> wrote:
> ----- "Nifty Tom Mitchell" <niftyompi at niftyegg.com> wrote:
>> On Wed, Dec 24, 2008 at 09:03:38PM +1100, Chris Samuel wrote:
>> > I contemplated doing this on our Barcelona cluster, but
>> > sacrificing 1 core in 8 was a bit too much of a high price
>> > to pay. But people with higher core counts per node might
>> > find it attractive.
>> This seems like a be a benchmark decision based on application
>> load and 'implied IO+OS' loading as well as the ability to
>> localize the IO+OS activity to the sacrificed CPU core.
> I'll leave that to sites that have a benchmarkable and
> characterisable workload. :-) We've got over 600 random
> users running random code (some very random indeed )
> that covers all categories from self-written, through open
> source to commercial apps.
>  - including a commercial code that segfaults in one
> particular program in libmsxml.so - yes, that appears to
> be a 3rd party implementation of the M$ XML library on Linux.
> When reported they claimed it was because we were running
> CentOS5 not RHEL4. Can't reproduce on RHEL4 because it
> crashes *before* that point on that distro. Gah.
> Christopher Samuel - (03) 9925 4751 - Systems Manager
Benchmarking with a long list of random applications is problematic.
One additional hard to benchmark aspect of a big cluster is "between job"
legacy I/O. After a process exits it is possible for pending data I/O to slow
the startup of the next process.
It may be simpler to sample the system state watching for waitio and
any other activity measure you can track with a light hand. Statistical
analysis and charting tools are available.... sample oriented benchmarks
on cluster workloads are not common but I suspect they can tell us a lot.
T o m M i t c h e l l
More information about the Beowulf