Request on advice on which kernel? 2.2 or 2.4?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Martin Siegert siegert at sfu.caWed Oct 3 15:15:57 PDT 2001
- Previous message: Request on advice on which kernel? 2.2 or 2.4?
- Next message: Request on advice on which kernel? 2.2 or 2.4?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, Oct 03, 2001 at 09:57:44AM -0400, Donald Becker wrote: > On Wed, 3 Oct 2001, Michelle Kuttel wrote: > > > I would like to request some opinions/advice on which kernel is best for > > my Beowulf cluster. We have a cluster of 16 Dual processor PentiumIII-866 > > MHz work nodes (head node AMD athlon 1Ghz CPU, single processor). It has > > been running for a few months now (computational chemistry CHARMM code > > principally). > > I have installed both 2.2.14-5 kernel (with Loncaric's > > tcpfix kernel patch) > > We use and recommend this TCP patch. Josip did excellent work. > > > and the 2.2.4 kernel at different times. > > The biggest advantage of 2.4 kernel is the SMP improvements to the > network stack. You'll see less benefit with your single processor > nodes, with most of the benefit on four processor nodes. This brings up another issue: the APIC code (bugs?) in the 2.4 series of kernels. I encouter the following problem: when using 2.4 kernels (I have tried almost every version starting from RedHat's 2.4.3-12 smp kernel over 2.4.5 - 2.4.10 including various ac versions as well) and the LAM MPI distribution some MPI programs will hang almost every time. These are mostly parallel FFT jobs (from the fftw library) using global communication patterns (MPI_Alltoall). I am using dual Athlon 1.2GHz nodes each with 4 3com NICs, three of which are channel bonded. I make the following observations: - the program hangs when executing a r = read(sock, buf, nbytes) statement over and over again. Typically: r=56 or r=696 and nbytes=116765796, i.e., if you decrease 116765796 in steps of 56 or 696, the program hangs for practical purposes. - when using mpich the program does not hang. - when using the 2.2.19 smp kernel the program does not hang. - using the append="noapic" setting in /etc/lilo.conf with a 2.4.x kernel reduces the failure rate but still the program hangs with a probability that is unacceptable for a production environment.
- Previous message: Request on advice on which kernel? 2.2 or 2.4?
- Next message: Request on advice on which kernel? 2.2 or 2.4?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
