Cluster and RAID 5 Array bottleneck.( I believe)
leo.magallon at grantgeo.com
Thu Mar 15 09:07:43 PST 2001
We finally finished upgrading our beowulf from 48 to 108 processors and also
added a 523GB RAID-5 system to provide a mounting point for all of our
"drones". We went with standard metal shelves that cost about $40 installed.
Our setup has one machine with the attached RAID Array to it via a 39160 Adaptec
Card ( 160Mb/s transfer rate) at which we launch jobs. We export /home and
/array ( the disk array mount point) from this computer to all the other
machines. They then use /home to execute the app and /array to read and write
over nfs to the array.
This computer with the array attached to it talks over a syskonnect gig-e card
going directly to a port on a switch which then interconnects to others. The
"drones" are connected via Intel Ether Express cards running Fast Ethernet to
Our problem is that apparently this setup is not performing well and we seem
to have a bottleneck either at the Array or at the network level. In regards to
the network level I have changed the numbers nfs uses to pass blocks of info in
echo 262144 > /proc/sys/net/core/rmem_default
echo 262144 > /proc/sys/net/core/rmem_max
echo 65536 > /proc/sys/net/core/rmem_default
echo 65536 > /proc/sys/net/core/rmem_max
Our mounts are set to use 8192 as read and write block size also.
When we start our job here, the switch passes no more than 31mb/s at any moment.
A colleague of mine is saying that the problem is at the network level and I am
thinking that it is at the Array level because the lights on the array just keep
steadily on and the switch is not even at 25% utilization and attaching a
console to the array is mainly for setting up drives and not for monitoring.
My colleague also copied 175Megabytes over nfs from one computer to another and
the transfers took close to 45 seconds.
Any comments or suggestions welcomed,
More information about the Beowulf