BEOWULF cluster hangs

Josip Loncaric josip at
Thu Sep 26 13:04:13 PDT 2002

Regarding VM/SMP/IDE issues in 2.4 kernels:

We still see some VM problems on SMP machines with memory intensive jobs, even
in the latest Red Hat kernel 2.4.18-10smp.  Single CPU machines running
2.4.18-10 are generally stable (but see below).  Also, support for ServerWorks
chipset in 2.4 kernels is worse than in 2.2 kernels, resulting in IDE
performance degradation (no UDMA) and downright crashes when kernel detects
that OSB4 is in an "impossible state".


P.S.  "Optimistic memory allocation" in 2.4 kernels can misbehave.  User
application typically gets no indication of memory shortage when it asks for
memory, but when it tries to use the allocated memory, the application (or
another process) can get terminated without any warning by the kernel's
out-of-memory (OOM) killer.  Given this design, I would not want to rely on
any applications staying up under heavy memory demand.  Moreover, while this
at least seems to work as designed on uniprocessor machines, our experience is
that when swap is enabled on SMP machines, even the OOM killer often cannot
prevent system crashes during OOM conditions (the machine crashes trying to
find a free memory page).

