Cluster and RAID 5 Array bottleneck.( I believe)

Mon Mar 19 13:44:40 PST 2001

Leonardo,

Hi.. Check for a duplex issue, it's a good idea to hard configure the speed
and duplex on both the server and switch; especially with Cisco products. If
your switch can allow it you might want to fiddle with port priorities but
this is mostly for fine tuning. Are your port counters reporting anything?

There are some other issues that come to mind but every install and
configuration is unique, I will paint with broad brush and hope I hit
something.
A few things pop into my mind when funneling by a factor of 10 from a gig-e
to fast-e, buffers get hammered and the TCP window size. Try opening up the
TCP window size and map your performance, you may find a better number.

You may be able to get a good feel for the problem by trying to isolate it..
Connect a client directly via crossover cable to the array box (assuming it
has a fast-e port somewhere).. Run your test again. Maybe connect the array
to the switch via fast-e and test again. If everything seems to be swell
until the switch is in the mix, maybe borrow a different one to try out. As
far as system configuration goes I'll leave that to the list gods. I hope I
provided some value.

Bill

----- Original Message -----
From: "Leonardo Magallon" <leo.magallon at grantgeo.com>
To: "Beowulf List" <beowulf at beowulf.org>
Sent: Thursday, March 15, 2001 9:07 AM
Subject: Cluster and RAID 5 Array bottleneck.( I believe)

> Hi all,
>
>
>    We finally finished upgrading our beowulf from 48 to 108 processors and
also
> added a 523GB RAID-5 system to provide a mounting point for all of our
> "drones".  We went with standard metal shelves that cost about $40
installed.
> Our setup has one machine with the attached RAID Array to it via a 39160
Adaptec
> Card ( 160Mb/s transfer rate) at which we launch jobs.  We export /home
and
> /array ( the disk array mount point) from this computer to all the other
> machines.  They then use /home to execute the app and /array to read and
write
> over nfs to the array.
>   This computer with the array attached to it talks over a syskonnect
gig-e card
> going directly to a port on a switch which then interconnects to others.
The
> "drones" are connected via Intel Ether Express cards running Fast Ethernet
to
> the switches.
>    Our problem is that apparently this setup is not performing well and we
seem
> to have a bottleneck either at the Array or at the network level.  In
regards to
> the network level I have changed the numbers nfs uses to pass blocks of
info in
> this way:
>
> echo 262144 > /proc/sys/net/core/rmem_default
> echo 262144 > /proc/sys/net/core/rmem_max
> /etc/rc.d/init.d/nfs restart
> echo 65536 > /proc/sys/net/core/rmem_default
> echo 65536 > /proc/sys/net/core/rmem_max
>
> Our mounts are set to use 8192 as read and write block size also.
>
> When we start our job here, the switch passes no more than 31mb/s at any
moment.
>
> A colleague of mine is saying that the problem is at the network level and
I am
> thinking that it is at the Array level because the lights on the array
just keep
> steadily on and the switch is not even at 25% utilization and attaching a
> console to the array is mainly for setting up drives and not for
monitoring.
>
> My colleague also copied 175Megabytes over nfs from one computer to
another and
> the transfers took close to 45 seconds.
>
>
> Any comments or suggestions welcomed,
>
> Leo.
>
>
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf