Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Problems with a JS21 - Ah, the networking...

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Ivan Paganini ispmarin at gmail.com
Mon Oct 1 05:34:46 PDT 2007


Hello Chris, everybody:

I am not using jumbo frames, and I'm now considering this option, but
first I wanted to know for sure that there is no other problem before,
just to control the number of variables at hand. But thanks for your
help.

I did a strace on the hanged process, and the output is this:
______________________________________________

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x401
76000
read(4, "#\n# hosts         This file desc"..., 4096) = 4096
read(4, "yriBlade077\n192.168.30.178  myri"..., 4096) = 4096
read(4, " blade067 blade067.lcca.usp.br\n1"..., 4096) = 2055
read(4, "", 4096)                       = 0
close(4)                                = 0
munmap(0x40176000, 4096)                = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, chil
d_tidptr=0x40046f68) = 25994
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, chil
d_tidptr=0x40046f68) = 25995
brk(0x102ab000)                         = 0x102ab000
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, chil
d_tidptr=0x40046f68) = 25996
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1, 0xffffdbc8, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGWINCH (Window changed) @ 0 (0) ---
waitpid(-1,

______________________________________________
and just that. I'm now trying to make a better undestanding that what
is happening.

Thank you.

Ivan


2007/9/29, Chris Samuel <csamuel at vpac.org>:
> On Sat, 29 Sep 2007, Ivan Paganini wrote:
>
> > I sniffed the network in the store nodes interface, and i got lots
> > of TCP lost fragment, previos lost fragments, ack lost fragments
> > and TCP window size full.
>
> Some suggestions would be to check that all network interfaces are
> negotiating gigabit back to the switch, and that if you are using
> jumbo frames then all interfaces are indeed using jumbo frames.
>
> A useful check to verify 2 way jumbo frames connectivity is by using
> the ping command, doing:
>
> ping -c 1 -M do -s 8900 $hostname
>
> should tell you whether or not it is working.
>
> Best of luck!
> Chris
> --
> Christopher Samuel - (03) 9925 4751 - Systems Manager
>  The Victorian Partnership for Advanced Computing
>  P.O. Box 201, Carlton South, VIC 3053, Australia
> VPAC is a not-for-profit Registered Research Agency
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>


-- 
-----------------------------------------------------------
Ivan S. P. Marin
----------------------------------------------------------



More information about the Beowulf mailing list