HELP! linux cluster with LAM-MPI
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
khocha at icu.ac.kr khocha at icu.ac.krFri Feb 9 03:35:21 PST 2001
- Previous message: Problem with bproc and pthread
- Next message: HELP! linux cluster with LAM-MPI
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dear All. I'm a graduate student of 'Information and Communications Univ.' in Korea. In our Lab., we built diskless clustering system with Intel L440GX+ board. Our system used Linux kernel 2.2.13 and LAM-MPI 6.3.2. By the way, during the test, the system made unexpected troubles. The MPI-test program has only two communications (that means it has 'EP' style). (1. distribute data(in beginning part), 2 collect result data(in endding part)). It uses only a little memory, but has many loop operations. With a few iteration, it works well, but when we increase the number of loop operations for solving some difficult problems, a node displays error message as follow, and then it is downed. ====================================================================================== [root at node11 root]# Unable to handle kernel paging request at virtual address e6 70e602 current->tss.cr3 = 07591000, %cr3 = 07591000 *pde = 00000000 Oops: 0002 CPU: 1 EIP: 0010:[] EFLAGS: 00010246 eax: 00000000 ebx: c7593fb4 ecx: 00000286 edx: 00000000 esi: 00000000 edi: c7592000 ebp: c7593fbc esp: c7593fa0 ds: 0018 es: 0018 ss: 0018 Process vital (pid: 424, process nr: 20, stackpage=c7593000) Stack: bffffe14 00000032 00000005 00000000 c7592000 00000000 1dcd6500 bffffd3c c0109fb8 bffffd34 00000000 40107bec 00000000 bffffe14 bffffd3c 000000a2 c010002b 0000002b 000000a2 400a9f51 00000023 00000206 bffffd14 0000002b Call Trace: [] [] Code: 00 b0 02 e6 70 e6 80 e4 71 e6 80 88 c1 31 d2 88 ca 89 54 24 ====================================================================================== Please~~, tell us the hint to solve this problem. p.s. Our system are consist of ------------------------------- L440GX+ (Dual Pentium III 550MHz, 24 cluster nodes, each node doesn't have a disk, it use server's RAID), Compaq Proliant 1600 server (Dual Pentium III 600MHz , server), Serial HUB (Comtrol Rocketport), Fast Ethernet Hub (3com ), 108 GB RAID Your quick reply will be highly appreciated. Best Regards.
- Previous message: Problem with bproc and pthread
- Next message: HELP! linux cluster with LAM-MPI
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
