rpm_utils-0.6 release

Thu Aug 22 10:38:15 PDT 2002

On Thu, 22 Aug 2002, Nicholas Henke wrote:

Dear Nicholas (and listvolken):

You might want to look at

  http://www.dulug.duke.edu/yum/

yum stands for Yellow Dog Updater, Modified and is derived from yup, the
yellow dog updater.  It also runs on top of python and anaconda.  From
your list of tools below, you're definitely doing redundant work, but
yum has more features and looks a bit easier to modify.  From its man
page:

yum(8)                                                     yum(8)

NAME
       yum - Yellowdog Updater Modified

SYNOPSIS
       yum [options] [command] [package ...]

DESCRIPTION
       yum  is an interactive, automated update program which can
       be used for maintaining systems using rpm

       command is one of:
        * install package1 [package2] [...]
        * update [package1] [package2] [...]
        * upgrade
        * remove [package1] [package2] [...]
        * groupinstall group1 [group2] [...]
        * groupupdate group1 [group2] [...]
        * list [...]
        * grouplist [...]
        * clean [packages | headers | old-headers | all]

       Unless the --help or -h option is given, one of the  above
       commands must be present.

...

There is a fairly active yum development list with quite a few
participants, and the primary developer (Seth Vidal) is extremely open
to new suggestions and always happy to have new contributers and
participants.

Features of yum that are appealing:

  Full automated dependency resolution, to the extent possible (yes, one
can easily get into dependency loops with rpms that require humans to
curse and untangle by hand, but yum will deal smoothly with all unbroken
dependencies and at least help with some of the broken ones).

  The yum tools split the headers out of the repository RPMs.  yum
itself then downloads and caches the headers (only) from your repository
ONCE.  This makes dependency resolution pretty much as fast as it can
be.  When it starts up, it checks all header files to ensure that it has
the latest copies (a few seconds), downloads any it is missing (a few
seconds more) and then resolves dependencies and so forth for an update
or install.

  It can install from multiple repositories automatically.

  It supports package groups (!) (the same ones that anaconda uses).

  It has list options to let you know what is installed and what is
available on your repository.

  Other options and features are being discussed and added in the
development version.  The production version is quite stable and
unbroken.

There are two or three aspects of yum that make it truly lovely.  Once
configured, to install or update most packages you just enter (e.g.)

    yum update pvm

at a command line.  To maintain a system completely synchronized with
your repository, just

    yum update

in a nightly cron.  To do a RUNNING UPGRADE from (e.g.) 7.1 to 7.3, one
can actually enter

    yum upgrade

(followed when it is finished with a reboot) and have a decent chance of
it actually succeeding!  You are also extremely unlikely to break
anything if it doesn't.  Note that the ability to upgrade a running
system is VERY slick.  Note also that if your system is in an
inconsistent state of dependencies OR has packages installed that are
obsoleted (not in the new repository) OR is changing from e.g. lilo to
grub, you may have to do a certain amount of preparatory work to get the
upgrade feature to work, but yum will not proceed with an inconsistent
upgrade -- it reports any problems it encounters and waits for you to
fix them and proceeds only when it is safe to do so.

At this point I've done a number of running upgrades from 7.1 and 7.2 to
7.3 -- very lovely.  Note that new rpm's do vary in how they handle
upgrades -- some will generate e.g. a file like /etc/whatever.rpmnew and
leave /etc/whatever alone; others will install /etc/whatever and save
/etc/whatever.rpmsave -- so one CAN have a tiny bit of work to do by
hand to recover a configuration, but in nearly all cases (and definitely
all the really important ones) your installed configuration data is
perfectly preserved.

Another thing that will likely be a future possibility (it really is
possible now, but hasn't been implemented) is a two-step kickstart or
network install:  Run kickstart (or do a hand install) to install a base
system with the bare minimum of support for yum to run; complete the
installation with yum groupupdates and yum updates.  Anyone who has
dealt with installing over a slow/erratic network (e.g. DSL link, modem
link, busy network) has probably done an install that got (say) 5/6 of
the way through and then died.  Grrr.  Have to start over.  On a DSL
link, that can be another 2-5 hours of waiting and eating most of the
line bandwidth.  With a two-step install, the basic system takes only a
few minutes, and yum is RESTARTABLE!  You can just repeat running the
update script (run out of a post.sh) until it finishes, and will do no
work over that it doesn't need to.

Another trick I use at home (DSL bottlenecked to the primary repository)
to good effect is to put the yum rpm and header cache into a shared NFS
directory and mount it across all the hosts I am maintaining on my home
network.  If one doesn't deliberately clean up the rpm cache (with a yum
option) yum simply updates those rpms and headers that might have
changed ONCE over the slow link, and then does the update on all the
other hosts in the LAN out of the now up-to-date cache!  Sure, I could
also do this by rsync'ing a local repository to the main repository, but
the NFS trick is actually simpler and automagically downloads and keeps
sync'd only the rpm's I use on at least one host.  Much more efficient.

Finally, IIRC there is somebody making noises about fronting yum with a
simple GUI.  I don't know if this would really be all that advantageous
for experienced administrators, but, it might help novices get started
and will certainly make it very easy for user/administrators with single
systems to shop for packages from their local yum repository.

On campus, it looks like this toolset will let EVERY SYSTEM ON CAMPUS
PUBLIC AND PRIVATE be kept sync'd to a common repository.  One human
being can (and in fact, is at this time) de facto manage software for
several thousand hosts in an environment distributed across departments,
schools, dormitories, laboratories..  Nodes, desktops, departmental
systems, private/student systems, clients, servers -- all maintained
automatically from one repository.  When most security problems are
discovered and patched, the entire campus is immune to the exploit after
the nightly yum update, including the systems of users who will NEVER
know that the problem even existed or how to check for the exploit if
they did.

So anyway, it would be lovely to have more people contributing to this
toolset, especially good python programmers who are already familiar how
all this works.

BTW (apropos the Winblows cluster discussion that I largely missed while
on vacation:-) it is tools like yum that clearly show the TCO advantages
of linux relative to WinXX for ANY sort of LAN or cluster or even single
installation.  With linux, one can now:

   a) Install, for free and in real-time, directly over the network,
from many public sites.  It is also absolutely trivial (well, to a
competent systems administrator:-) to set up a mirror (or a mirror with
modifications and local additions) to act as an installation server.
With yum, one can even install from one (e.g. public) repository while
grabbing certain packages from several other (e.g. private)
repositories.

   b) Keep your system(s) slaved to your installation repositories,
automatically updating as often as you wish, keeping your entire
installed software base as secure as it is possible to be even if some
of it is managed by amateurs.

   c) Approach the theoretical limits of administrative scaling
efficiency.  One administrator can (and does, at Duke) take care of the
base linux software installation(s) for the entire campus, thousands of
machines.  The primary factors that limit the number of systems our
linux folks can manage (on desks, in clusters, or both at the same time)
are (in order of impact):

   * hardware reliability (fixing downed hardware)

   * user support (answering silly or otherwise questions, unsticking
     the stuck)

   * server management (inevitably, for any OS -- managing mail,
     accounts, shared filespace, security)

   * competence of the administrators (which could always be the first
     one in terms of impact for a very incompetent manager:-).

   * software management.

Note well that in a Windows environment, one will likely spend more time
on managing LICENSING ISSUES ALONE than one does managing software, per
system, in any comparable linux shop.

If there were any point, I could go on systematically brutalizing WinXX
(including NT, 2k or XP) in any TCO comparison with linux, but there
isn't any.  Anybody who asserts that HPC people are ignorant of TCO
issues on the >>beowulf<< list (recalling that beowulfs were INVENTED to
lower the total cost of ownership and operation of parallel HPC
resources) is clearly so clueless that there is no point.

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu