Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

HELP! linux cluster with LAM-MPI

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

khocha at icu.ac.kr khocha at icu.ac.kr
Fri Feb 9 03:35:21 PST 2001


Dear All.

I'm a graduate student of 'Information and Communications Univ.' in Korea. 
In our Lab., we built diskless clustering system with Intel L440GX+ board. 

Our system used Linux kernel 2.2.13 and LAM-MPI 6.3.2.
By the way, during the test, the system made unexpected troubles.

The MPI-test program has only two communications (that means it has 'EP' style).
(1. distribute data(in beginning part), 2 collect result data(in endding part)).
It uses only a little memory, but has many loop operations. 

With a few iteration, it works well, but when we increase the number of loop operations 
for solving some difficult problems, a node displays error message as follow, and then 
it is downed.

======================================================================================
[root at node11 root]# Unable to handle kernel paging request at virtual address e6
70e602
current->tss.cr3 = 07591000, %cr3 = 07591000
*pde = 00000000
Oops: 0002
CPU:    1
EIP:    0010:[]
EFLAGS: 00010246
eax: 00000000   ebx: c7593fb4   ecx: 00000286   edx: 00000000
esi: 00000000   edi: c7592000   ebp: c7593fbc   esp: c7593fa0
ds: 0018   es: 0018   ss: 0018
Process vital (pid: 424, process nr: 20, stackpage=c7593000)
Stack: bffffe14 00000032 00000005 00000000 c7592000 00000000 1dcd6500 bffffd3c 
       c0109fb8 bffffd34 00000000 40107bec 00000000 bffffe14 bffffd3c 000000a2 
       c010002b 0000002b 000000a2 400a9f51 00000023 00000206 bffffd14 0000002b 
Call Trace: [] [] 
Code: 00 b0 02 e6 70 e6 80 e4 71 e6 80 88 c1 31 d2 88 ca 89 54 24 
======================================================================================

Please~~, tell us the hint to solve this problem.

p.s. Our system are consist of
-------------------------------
L440GX+ (Dual Pentium III 550MHz, 24 cluster nodes, each node doesn't have a disk, it use server's RAID),
Compaq  Proliant 1600 server (Dual Pentium III 600MHz , server),
Serial HUB (Comtrol Rocketport),
Fast Ethernet Hub (3com ),
108 GB RAID



Your quick reply will be highly appreciated.
Best Regards.








More information about the Beowulf mailing list