[Beowulf] cluster storage design
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Alvin Oga alvin at ns.Linux-Consulting.comWed Mar 23 16:36:22 PST 2005
- Previous message: [Beowulf] cluster storage design
- Next message: [Beowulf] Measurement of TCP traffic between nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, Mar 23, 2005 at 09:41:46AM -0600, Brian Henerey wrote: > > I have a 32 node cluster with 1 master and 1 data storage server with 1.5 > TB's of storage. The master used to have storage:/home mounted on /home via > NFS. I moved the 1.5TB RAID array of storage so it was directly on the > master. This decreased the time it took for our program to run by a factor > of 4. yes .. that is a good thing > I read somewhere that mounting the data to the master via NFS was a > bad idea for performance, but am not sure what the best alternative is. I > don't want to have to move data on/off the master each time I run a job > because this will slow it down as more people are using it. for users, you have 2 choices ?? /home on one big "home server" or automagically sync users loginID and pwd from node to node ( little more work.. but not as bad as it sounds ) if the "home server" dies ... everybody is dead if each node is standalone .. there is no issues with "master" dying for running jobs .... an automated queue is good ... users doesn't necessarily dictate which nodes to run the jobs on, but a good queuer will allow users to specify preferences for "/data" where all nodes share a common big 100TB data farm .. - you have NFS or SANs or ?? - getting good nic cards and good switches helps a lot - change your NFS parameters to send 16K or 32K bytes at a time instead of 512K - dual or quad channel bonding should help with thruput too.. - a TB sized "/data" shouldn't be noticably slow across the nodes - /data should be on the machine where the apps uses it the most - since /data is probably shared across multiple nodes, it might be worth it ( definitely worth it ) to buy another 4 or 8 disks and use it as backups of /data on other nodes - you now have 3 "master nodes" with local /data - you will have to rsync and rdiff your changes from node to node - 1 TB of disks is about $600 now days ( 4x $150 each ) - structuring your /data into /data/xxx and /data/yyy and /data/zzz will allow multiple nodes to have all of its data local to where all the disk i/o access is being done local to itself as opposed to across the slow ethernet > I know there are probably many solutions but I'm curious what the people on > this list do. It seems to me that SAN's are very expensive compared to just > building servers with 4 x 500GB hard drives. I've considered just launching > my lam-mpi jobs from whatever storage server has the appropriate data on it, > but this doesn't seem ideal. for me ... lots of redundant IDE disks is way way better/faster than san/nas > How does performance compare from having the data local on the master via > running it off a PVFS? c ya alvin
- Previous message: [Beowulf] cluster storage design
- Next message: [Beowulf] Measurement of TCP traffic between nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
