I've got 8 linux boxes, what now
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Greg Lindahl lindahl at conservativecomputer.comFri Dec 7 08:59:01 PST 2001
- Previous message: I've got 8 linux boxes, what now
- Next message: I've got 8 linux boxes, what now
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, Dec 06, 2001 at 04:40:43PM -0800, Chris Majewski wrote: > We're a computer science department investigating, very tentatively, > the possibility of installing a linux cluster as our next > general-purpose compute server. To date we've been using things like > expensive multiprocessor SUN machines. Chris, You need to think about your user interface. Here are 3 possibilities: 1) When your user logs in, they are dumped on 1 of N Linux boxes. All their processes run on the box. You can use LVS (Linux Virtual Servers) to do this. NFS mount their directories. If a user starts a long running job, they have to remember which box they started it on if they want to kill it. And there's no guarantee that load averages will remain similar, although LVS can stop sending users to a box with a higher load average. Fortunately, most of your users don't start long running jobs, so for most people, it just works. 2) As (1), but also users can also use a special scheme for long running job, a batch queue. Use Condor for the batch queue. Condor allows people to find out where their jobs are, and it will be able to migrate some long running jobs to different boxes to balance the load. Since only a subset of your users have long running jobs, most people don't have to learn about Condor. 3) As (1), but use MOSIX. MOSIX can automagically migrate long running jobs to a different system, but "ps" still shows the job on the "home" system. This is more transparent to the users, but now the job dies if either system crashes, so it's less reliable than Condor. With any of the 3, you still need to work out a way of administering the system to keep them synchronized. TurboLinux has a cluster admin system that helps you keep system disks synchronized. You can use "rsync" or cfengine, which are traditional Linux sysadmin tools. Scyld Beowulf doesn't really address this situation. However, it could, with a modest amount of work. Don, do you have any comments about this? Since it's many users running many jobs, including interactive ones, it's not really the area that "Beowulf clusters" traditionally address. I wish people would work on this, though, as I'd love to have a prepackaged solution I could sell in this area. greg
- Previous message: I've got 8 linux boxes, what now
- Next message: I've got 8 linux boxes, what now
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
