[Beowulf] RedHat Satellite Server as a cluster management tool.

Thu Oct 14 12:14:51 PDT 2004

On Thu, 14 Oct 2004, Michael T. Halligan wrote:

> 
> Robert,
> 
> So have you actually used the satellite server? My biggest problem with
> using RHN has been the strong lack of deployments it's had.. A lot of

We looked at it pretty seriously at Duke -- RH is a short walk away,
we've had a long and productive relationship with them, and they were
offering us a "deal" on their RHN supported product.

The problem (and the likely reason for the strong lack of deployment) is
the cost scaling and minimum buy-in.  Frankly, if they gave it away free
the server requirements are kind of crazy, given the number of machines
we run on campus (and the fact that we manage it now, quite
successfully, on a shoestring and a mix/choice of Centos and FC2).

I think that RHN's major advantage to consumers is topdown network
management in corporate environments where the costs of this sort of
management tool are swallowed in the greater TCO issues of running a
major data center (and where local sysadmin competence is likely to be
"red hat certified systems engineers" who've gone through their training
and are roughly as deep-roots hapless as their MCSE counterparts tend to
be).  That is, they know how to use RH's GUI tools, but they really
don't UNDERSTAND that much about the systems they manage.

For a corporation who just wants it to work and considers spending $100K
on this and that so it works with the human resources they have
available to be petty cash, that's fine.  In the University/Research
world, resources tend to be tight, expertise levels are relatively high,
and there is even opportunity cost labor in the expert labor pools that
can be diverted to learning how to do something really cheaply AND
really well.

That's why you have Debian clusters, ROCKS clusters, RH/FC/Centos
clusters, SuSE clusters, Mandrake clusters, Warewulf clusters,
Clustermatic clusters -- all largely "homebrew" at the administrative
level (although the cluster-specific projects can get pretty fancy
wrapping up the brew;-) -- that avoid using a) anything that you have to
pay for if possible; b) anything that you have to pay a LOT for period;
c) anything that doesn't scale.  Yum requires more investment of effort
at the beginning to learn it, as it is a real, command line sysadmin
tool and yes, you'll need to read the documentation (some of which I
wrote:-), work with it, play with it, figure out how to make it jump
through hoops, and ultimately realize that it is REALLY powerful.
Designed by sysadmins, for sysadmins.  Designed and maintained by people
who use it every day in large scale deployments in resource starved
institutions.  That sort of thing.

Like all GUI tools vs command line tools, there is the usual "learn to
use it in a day, pay for using it forever" that plagues the user of any
windowing interface that actually has to manipulate large numbers of
files and complex relationships (GUIs are all about simplicity, but not
everything is "simple").

So Duke has at least for the moment tabled the RHN issue until there is
a clear and burning need for it that justifies the cost, including the
cost of diverting our human resources AWAY from using a tool that
manifestly scales better once it is mastered.

> people just naturally assume redhat is bad (hell, I even do. I use debian
> for all of my personal and corporate servers).. But very few who
> automatically take that stance have actually worked with the products
> enough to give emperical evidence as to why.

I love RH.  I used to pay them money for their OS distro every major
release voluntarily, until they went hypercommercial.  Now I use FC2 and
may migrate even further away.

RH's pricing model is purely corporate.  I just don't think they've
grokked either the university or the personal or the HPC cluster market,
or maybe they have and just don't care.

   rgb

> 
> It took a while to gather enthusiasm enough to evaluate it, and a couple
> of months of solid testing before I could recommend it.  I've built about
> 1/2 dozen similar deployment/management tools at this point, each one
> built for a customer (hence the reason building 6 instead of just
> improving upon the same one).
> 
> Imaging is one thing, and yeah kickstart is easy, no objections to that..
> RHN just makes it a lot easier to deal with kickstart. It also gives a
> rather useful, but more enterprise focused management system to allow you
> to manage (software|config) channels, server groups, and a good method to
> deal with groups with unions & intersections.  I'm finding it especially
> nice at one site at which 1/2 of their servers are used for testing and
> 1/2 for their production environment.  Pushing new patches, scripts,
> commands, files to select sets of systems requires very little effort.
> 
> RedHat's configuration management system is actually really nice. They've
> put a simple  (but extensible) macro system into it, which allows you to
> keep one configuration file for all of the servers in a given class, when
> only a few things change, and having system-specific variables be parsed
> out when servers pull configs from the gold server.. Sure, you can do this
> with cfengine or pikt, but uploading a config file to a webform is a lot
> simpler than setting up cfengine/pikt and implementing it (I know this
> from a lot of experience.
> 
> One of the lackings of using a yum/pxe/kickstart environment (of which I'm
> rather familiar with, currently  managing 6 customers with a similar
> environment) is that there's no "already there" configuration/versioning
> management system.  That was one of the key points of redhat, the fact
> that I can do at-will repurposing/reprovisioning (like turning a 100
> server 30/70 app db server/app server environment into a 70/30 app/db
> server environment in 5 minutes without kickstarting and zero manual
> interaction)..

Sure, I agree.  Although it needn't take THAT long to do with yum and
the cluster shell of your choice.  Versioning is currently done fairly
casually, or outside yum itself.  There is no point and click package
selector (although any editor works just fine).  However, you can buy an
awful lot of FTE minutes for hundreds of dollars a seat plus thousands
for servers, and clusters typically DON'T change often, or much, once
their prototypes are completed and debugged.

> In the end, it's probably just an apples/oranges comparison.. in a science
> lab/school cluster environment, it's probably more a more valuable place
> to use a more manual process because grad students are cheap, and interns
> are free.. :) In a corporate world, the $28k i'd spend for a 100 server
> environment to save a sysadmin's worth of time, pays for itself 10 fold in
> terms of environment consistency..

For servers, especially heterogenous servers, it might be worth it.  If
by "servers" you mean identical nodes (server or otherwise) I'd say this
is a waste of money.  In HPC it is the latter.  In a lot of corporate
environments, it is a mix of mostly the latter and some of the former.
But I totally agree, the tool is designed for that kind of environment
-- structurally complex and deep pocketed.

> Either way, I'm not trying to evangelize, just relate my own experiences, 
> and try to find the best solution for a given problem. What tools out
> there are good for this type of a situation, then? Thanks for the refs to
> werewulf, I'm checking it out now.

No problem.  I just am a cost-benefit fanatic.  You have to work to
convince me that spending order of 20% of the nodes you might be able to
buy in a compute cluster on RHN will get more work done per total dollar
invested, in most HPC cluster environments, compared to any of a number
of GPL free alternatives (many of which have further benefits to their
use anyway).

   rgb

> 
> 
> 
> >
> 
> 
> > On Wed, 13 Oct 2004, Michael T. Halligan wrote:
> >
> >> Has anybody used (or tried to use) the RHN system as a HPC management
> >> tool. I've implemented this
> >> successfully in a 100 host environment for a customer of mine, and am in
> >> the process of
> >> re-architecting an infrastructure with about 150 nodes.. That's about as
> >> far as I've gotten
> >> with it. Once I get past the cost, the poor documentation, and "OK"
> >> support, I'm finding
> >> that it's actually a great (though slightly immature) piece of software
> >> for the enterprise.  The ease of keeping
> >> an infrastructure in sync, and tthe lowered workload for sysadmins
> >
> > <nuke warning="alert">
> >
> > I can only say "why bother".  Everything it does can be done easier,
> > faster, and better with PXE/kickstart for the base install followed by
> > yum for fine tuning the install, updates and maintenance (all totally
> > automagical).  Yum is in RHEL, is fully GPL, is well documented, has a
> > mailing list providing the active support of LOTS of users as well as
> > the developers/maintainers, and is free as in air.  Oh, and it works
> > EQUALLY well with Centos, SuSE, Fedora Core 2, and other RPM-based
> > distros, and is in wide use in clusters (and LANs) across the country.
> >
> > With PXE/kickstart/yum, you just build and test a kickstart file for the
> > basic node install (necessary in any event), bootstrap the install over
> > the net via PXE, and then forget the node altogether.  yum automagically
> > handles updates, and can also manage things like distributed installs
> > and locking a node to a common specified set of packages.  It manages
> > all dependencies for you so that things work properly.
> >
> > It takes me ten minutes to install ten nodes, mostly because I like to
> > watch the install start before moving on to handle the rare install that
> > is interrupted for some reason (e.g. a faulty network connection).  One
> > can do a lot more than this much faster if you control the boot strictly
> > from PXE so you don't even need to interact with the node on the console
> > at all.  How much better than that can you do?
> >
> > Alternatively, there are things like warewulf and scyld where even
> > commercial solutions probably won't work out to be much more (if any
> > more) expensive.  Especially when you add in the cost of those two
> > "beefy boxes acting as RHN servers".  What a waste!  We use a single
> > repository to manage installs and updates for our entire campus (close
> > to 1000 systems just in clusters, plus that many more in LANs and on
> > personal desktops).  And the server isn't terribly beefy -- it is
> > actually a castoff desktop being pressed into extended service, although
> > we finally have plans to put a REAL server in pretty soon.
> >
> > I mean, what kind of load does a cluster node generally PLACE on a
> > repository server after the original install?  Try "none" and you'd be
> > really close to the truth -- an average of a single package a week
> > updated is probably too high an estimate, and that consumes (let's see)
> > something like 1 network-second of capacity between server and node a
> > week with plain old 100BT.
> >
> > There are solutions that are designed to be scalable and easy to
> > understand and maintain, and then there are solutions designed to be
> > topdown manageable with a nifty GUI (and sell a lot of totally unneeded
> > resources at the same time).  Guess which one RHN falls under.
> > </nuke>
> >
> >   Flamingly yours (not at you, but at RHN)
> >
> >       rgb
> >
> >>
> >> At 100 nodes, the pricing seems to be about $274/year per node including
> >> licensing, entitlements, and the
> >> software cost of a RHN server (add another $5k-$7k for a pair of beefy
> >> boxes to act as the
> >> RHN server.. though as far as I can tell, redhat's specs on the RHN
> >> server are far exagerrated.. I
> >> could get by with $2500 worth of servers on that end for the
> >> environments I've deployed on).  So, in the
> >> end, $28k/year for an enterprise of 100 servers, in one environment has
> >> meant being able to shrink the
> >> next year staffing needs by 2 people, and in one by one person, it pays
> >> for itself..
> >>
> >> We have a 512 node render farm project we're bidding on for a new
> >> customer, and I'm wondering how those in the
> >> beowulf community who have used RHN satellite server perceive it. So far
> >> we're considering LFS and Enfusion,
> >> which are both more HPC oriented, but I'm really enjoying RHN as a
> >> management system.
> >>
> >> ----------------
> >> BitPusher, LLC
> >> http://www.bitpusher.com/
> >> 1.888.9PUSHER
> >> (415) 724.7998 - Mobile
> >>
> >>
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org
> >> To change your subscription (digest mode or unsubscribe) visit
> >> http://www.beowulf.org/mailman/listinfo/beowulf
> >>
> >
> > --
> > Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> > Duke University Dept. of Physics, Box 90305
> > Durham, N.C. 27708-0305
> > Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> >
> >
> >
> >
> 
> 
> -------------------
> BitPusher, LLC
> http://www.bitpusher.com/
> 1.888.9PUSHER
> (415) 724.7998 - Mobile
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu