Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Problems with a JS21 - Ah, the networking...

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Ivan Paganini ispmarin at gmail.com
Mon Oct 1 06:05:43 PDT 2007


Just a update: trying several times, the strace stops in different
points, the speficied in the other email and here:
_______________________________________________
munmap(0x40176000, 4096)                = 0
time([1191243868])                      = 1191243868
open("/etc/hosts", O_RDONLY)            = 4
fcntl64(4, F_GETFD)                     = 0
fcntl64(4, F_SETFD, FD_CLOEXEC)         = 0
fstat64(4, {st_mode=S_IFREG|0644, st_size=10247, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x40176000
read(4, "#\n# hosts         This file desc"..., 4096) = 4096
read(4, "yriBlade077\n192.168.30.178  myri"..., 4096) = 4096
read(4, " blade067 blade067.lcca.usp.br\n1"..., 4096) = 2055
read(4, "", 4096)                       = 0
close(4)                                = 0
munmap(0x40176000, 4096)                = 0
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x40046f68) = 31382
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x40046f68) = 31383
brk(0x102ab000)                         = 0x102ab000
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x40046f68) = 31384
waitpid(-1,
_______________________________________________

Thanks.

2007/10/1, Ivan Paganini <ispmarin at gmail.com>:
> Hello Chris, everybody:
>
> I am not using jumbo frames, and I'm now considering this option, but
> first I wanted to know for sure that there is no other problem before,
> just to control the number of variables at hand. But thanks for your
> help.
>
> I did a strace on the hanged process, and the output is this:
> ______________________________________________
>
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x401
> 76000
> read(4, "#\n# hosts         This file desc"..., 4096) = 4096
> read(4, "yriBlade077\n192.168.30.178  myri"..., 4096) = 4096
> read(4, " blade067 blade067.lcca.usp.br\n1"..., 4096) = 2055
> read(4, "", 4096)                       = 0
> close(4)                                = 0
> munmap(0x40176000, 4096)                = 0
> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, chil
> d_tidptr=0x40046f68) = 25994
> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, chil
> d_tidptr=0x40046f68) = 25995
> brk(0x102ab000)                         = 0x102ab000
> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, chil
> d_tidptr=0x40046f68) = 25996
> waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
> --- SIGWINCH (Window changed) @ 0 (0) ---
> waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
> --- SIGWINCH (Window changed) @ 0 (0) ---
> waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
> --- SIGWINCH (Window changed) @ 0 (0) ---
> waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
> --- SIGWINCH (Window changed) @ 0 (0) ---
> waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
> --- SIGWINCH (Window changed) @ 0 (0) ---
> waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
> --- SIGWINCH (Window changed) @ 0 (0) ---
> waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
> --- SIGWINCH (Window changed) @ 0 (0) ---
> waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
> --- SIGWINCH (Window changed) @ 0 (0) ---
> waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
> --- SIGWINCH (Window changed) @ 0 (0) ---
> waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
> --- SIGWINCH (Window changed) @ 0 (0) ---
> waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
> --- SIGWINCH (Window changed) @ 0 (0) ---
> waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
> --- SIGWINCH (Window changed) @ 0 (0) ---
> waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
> --- SIGWINCH (Window changed) @ 0 (0) ---
> waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
> --- SIGWINCH (Window changed) @ 0 (0) ---
> waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
> --- SIGWINCH (Window changed) @ 0 (0) ---
> waitpid(-1,
>
> ______________________________________________
> and just that. I'm now trying to make a better undestanding that what
> is happening.
>
> Thank you.
>
> Ivan
>
>
> 2007/9/29, Chris Samuel <csamuel at vpac.org>:
> > On Sat, 29 Sep 2007, Ivan Paganini wrote:
> >
> > > I sniffed the network in the store nodes interface, and i got lots
> > > of TCP lost fragment, previos lost fragments, ack lost fragments
> > > and TCP window size full.
> >
> > Some suggestions would be to check that all network interfaces are
> > negotiating gigabit back to the switch, and that if you are using
> > jumbo frames then all interfaces are indeed using jumbo frames.
> >
> > A useful check to verify 2 way jumbo frames connectivity is by using
> > the ping command, doing:
> >
> > ping -c 1 -M do -s 8900 $hostname
> >
> > should tell you whether or not it is working.
> >
> > Best of luck!
> > Chris
> > --
> > Christopher Samuel - (03) 9925 4751 - Systems Manager
> >  The Victorian Partnership for Advanced Computing
> >  P.O. Box 201, Carlton South, VIC 3053, Australia
> > VPAC is a not-for-profit Registered Research Agency
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
>
> --
> -----------------------------------------------------------
> Ivan S. P. Marin
> ----------------------------------------------------------
>


-- 
-----------------------------------------------------------
Ivan S. P. Marin
----------------------------------------------------------



More information about the Beowulf mailing list