mpich device=ch_p4 -comm=shared, P4_GLOBMEMSIZE segfault

Cabaniols, Sebastien Sebastien.Cabaniols at compaq.com
Tue Feb 20 01:52:36 PST 2001


Hi beowulf people

I have several 4 Cpus Linux Alpha machines (ES40) and I want to launch
some mpich jobs on them. I only have fast ethernet at the moment
so I need the ch_p4 device to go between boxes and the -comm=shared
to compute efficiently into the box.


when I launch my job on one machine only, I have mpich complaining about
the amount of memory allocated in shared-memory:

p1_2061: (6.257812) xx_shmalloc: returning NULL; requested 2609648 bytes
> p1_2061: (6.257812) p4_shmalloc returning NULL; request = 2609648 bytes
> You can increase the amount of memory by setting the environment variable
> P4_GLOBMEMSIZE (in bytes)

> p1_2061:  p4_error: alloc_p4_msg failed: 0
> p0_2060:  p4_error: interrupt SIGINT: 2
> p2_2062:  p4_error: interrupt SIGINT: 2
> p3_2063:  p4_error: interrupt SIGINT: 2


It says to increase with the P4_GLOBMEMSIZE environement variable
on all the involved process (so I put it in the .bashrc)

But then my jobs can't start and give me a seg fault.

p2_1214:  p4_error: interrupt SIGSEGV: 11
p0_1212:  p4_error: interrupt SIGINT: 2
p3_1215:  p4_error: interrupt SIGINT: 2
p1_1213:  p4_error: interrupt SIGINT: 2

I have tried to change and check with ipcs -ml the limits of my system:


ipcs -ml

------ Shared Memory Limits --------
max number of segments = 128
max seg size (kbytes) = 1048576
max total shared memory (kbytes) = 16777216
min seg size (bytes) = 1


So I should be able to allocate 1 Gbyte of shared memory, when 2.6 Mbytes
are requested.

Do you have any ideas ?

Thanks in advance


Sebastien Cabaniols








More information about the Beowulf mailing list