[Beowulf] While the knives are out... Wulf Keepers
ajt at rri.sari.ac.uk
Mon Aug 21 10:43:11 PDT 2006
Mike Davis wrote:
> For the most part, I think that if a cluster is run correctly, it is an
> appliance for the scientists. Their job is to produce research, mine is
> to manage clusters and smp machines.
I think problems can occur when you enforce such a strict demarcation
boundary between your role and the role of the scientists you support:
If communication between you and the scientists breaks down and you do
not understand what they want to do then you cannot support their work
effectively. The bottom line for me is that it is the objective of the
organisation as a whole to produce research, and your role within the
organisation is to facilitate the work of the scientists who do it.
> A problem that sometimes crops up is that these days, everyone thinks
> that they can manage a cluster (or large smp for that matter), because
> they have a linux box or maybe a 4-1p nodes at their house. Sometimes
> its a real issue getting these people to understand that managing a
> machine for 1 person and managing it for 5,50,500 are entirely different.
I'm a scientist with a Linux box at my house: I also built and manage a
small (64-1p node) openMosix Beowulf cluster for bioinformatics work at
RRI and for the bioinformatics/mathematical work of our sister
organisation BioSS. I don't think I'm exceptional in doing this, but I
do think that having a Linux box at home has been very useful to me in
gaining the experience I needed to manage our Beowulf cluster.
Not 'everyone' like me is as stupid or naive as you imply. I have the
support of an excellent IT department and an electronics workshop who
talk to me and understand very well what I want to do with the Beowulf.
We have about 400 user accounts, which are registered and managed by IT
centrally. I just enable NIS. The IT department also manage the central
filers where precious data files are stored. I manage 3.2 TB of local
RAID on the Beowulf. In my opinion this type of cooperation is a lot
more effective than strict job demarcation...
> For example, on friday, one of our applications analysts wanted to
> upgrade a piece of software on one of the clusters. He didn't know what
> it would affect (libraries, other installed software, users already
> using that software). After a bit of investigation it turned out that
> the PI in question could use the version already installed (which is
> about 6 months old).
Seems to me that it would be straight-forward to know this if you use a
package management system like apt or rpm, which keeps track of what's
installed and what the dependencies are. However, I also think that it's
quite right that you should know more about this than him. In an ideal
world, you should both make the decision about what to do on a rational
basis. I doubt that he asked you to do it for no reason at all.
> I guess that I'm rather "old school" but upgrades have to be for a
> reason other than there's a new version. Maybe they are needed for
> features, or security, or stability. But IMO, they are seldom needed
> because they are new.
Most of the problems I've come accross like this arise from a lack of
communication. I believe it's quite important for you to know why he
wanted to do the upgrade, and for you to inform him about any problems
or conflicts of interest that would result from the upgrade. Presumably,
that is exactly what you did. My only complaint here is the impression
you give that scientists like me want to upgrade software just for the
sake of doing it. Please ask yourself why did the upstream maintainers
release a new version? Was it just for the sake of upgrading it?
I keep our software up-to-date because I want to ensure that all known
bugs fixes and security upgrades are applied. I don't do it just because
they are new. I rely on the package repository maintainers to decide
when software should be upgraded, but I also 'pin' critical packages
that I know are required to be held at a particular revision locally for
some reason. I do advocate upgrading unless there is a reason *not* to
do it. You seem to recommend the opposite of not upgrading unless there
*is* a reason to do it. I wonder which strategy results in less work?
Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk
Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687
More information about the Beowulf