[Beowulf] hpl size problems

Robert G. Brown rgb at phy.duke.edu
Tue Oct 4 06:26:29 PDT 2005


Chris Samuel writes:

> On Sun, 2 Oct 2005 05:38 am, Robert G. Brown wrote:
> 
>> This is where I think things NEED to head.  RPM was and is in some ways
>> a lovely thing.  However, I personally think it's way short on the
>> metadata front, and woefully difficult to extend.
> 
> Would Debian's .deb format be better ?
> 
> They created it in 95 to be a more extensible version of whatever dpkg used 
> before..

I don't THINK that the issue is one of format per se.  All packaging
formats consist of a container with an installable tree, the ability to
run scripts pre and post (un)install, and a variety of metadata
associated with the packaged material -- dependencies, provides,
information/synopsis, copyright/license information, in addition to
package name and revision and build/release info.  The "problems" I
allude to are in both the metadata and the associated db for storing it,
not in the containerizing of the data in the physical file format.  One
can make a "package" out of a simple tarball (as slackware has long
done) instead of a CPIO archive and it would work just as well.  

I happen to think that the metadata in the package itself should be in
xml format as a "meta" design choice for a variety of reasons, but even
this isn't a necessary thing, only desireable -- it just forces you to
structure your data in a strictly hierarchical and extensible manner,
which I personally think is a good thing that enables much future
development flexibility so that the metadata never AGAIN can be said to
be inadequate longer than it takes to define and insert a new tag
hierarchy in the appropriate place in the tag tree.

Now, there is a group working on this, and they're neither ignorant nor
ill-intentioned.  In fact, they're pretty serious coders, as you might
expect with such a complex and powerful toolset.  It's just that package
management is a difficult problem and conservative development is
ABSOLUTELY NECESSARY to avoid breaking things.  Remember what a pain it
was in the rpm 4.x transition, where man rpms no longer built unless you
worked on them pretty extensively. It is further complicated by the fact
that there has recently been a (long necessary) fork between "community"
RPM package management and "Redhat Package Management" where the two are
in the future not necessarily going to be mutually compatible.  This
helps address concerns voiced in this thread that Redhat will unfairly
dominate the packaging system and be able to (whether or not they
actually do) manipulate it to their advantage.  You can check out (e.g.)

  https://lists.dulug.duke.edu/mailman/listinfo/rpm-devel

It's a low traffic list and suitable for anybody who is (as they say:-)
133t and seriously into fixing some of the problems with RPMs.  As in,
perhaps 1/3 to 1/2 the list traffic appears to be diff patches:-).  We
host it I believe because Seth and Friends are pretty heavily involved
with this whole process -- co-developing yum and rpms themselves,
working on the metadata issue.  I >>believe<< that there is some very
nice stuff in the pipes here, including much dark magic that yum will be
able to perform in future releases (at least Seth mutters charms and the
like under his breath at me when I ask him about this sort of stuff, and
now has a position in the University that is pretty much "make yum
better while maintaining the repos that underlie all campus linux
installations", a task that I think he relishes.

Given that he (with the usual kudos to the original yup developers,
Michael Stenner, Icon Ryabitsev, and several other yum co-developers,
but still largely Seth Himself:-) did yum WHILE also doing the physics
department network management including all the niggly little user
support and printers and system fixing and so on AND maintaining the
campus linux repos AND helping to support linux use in Arts and Sciences
in general, I expect that he'll finally have a Lot More Time to devote
to projects, and since AFAICT he never sleeps (he tends to answer ME at
4 a.m. as readily as at 4 p.m. -- maybe even more readily:-) it is
possible that yum will really move.

This is an important point.  There has been considerable discussion that
I've listened in on from time to time about whether rpms really ARE
inadequate in any fundamental sense, or if the problems are really just
in the upper level tools that turn the basic rpm library calls into
userspace or rootspace functionality.  Yum is an important case in
point.  Who would have imagined, before the yup project, that it was
possible to get so MUCH out of rpms, but now we all don't even think
about it -- missing pvm on a system?  "yum -y install \*pvm\*" and a
minute later you aren't.  Looking for a file or package?  yum provides,
yum list.  Interested in just browsing everything available to BE
installed?  yum info.  And rpms ARE changing.  Look at the comps lists,
now in xml.  Also, there is effort to actually communicate with and
debian developers and keepers of the holy deb -- there may even be
motion of sorts to a truly universal packaging specification (since as I
said, ALL packaging systems CAN do what needs to be done, it is just a
matter of merging specific advantages of any of the subspecies into a
new beast that has it all and does it all, efficiently).

"Real" tools like yum define the problems to be solved much better so
that the developers don't have to head out into the wild cracked yonder
and break a VERY VERY FUNDAMENTAL tool trying to extend it in
ill-directed ways.  Since yum/createrepo pulls all the header metadata
from the rpms it services, it focusses a lot of attention on just what
is IN that metadata and just how it CAN be efficiently and portably
represented.  This in turn can help support the kind of registry that
Mark has alluded to, perhaps even the ability to "layer" rpm db's the
same way that yum layers rpm repos.  The one tool defines usage, which
provides a natural guidance track for the evolution of the underlying
library (as well as several possible hacks that will let you do what you
want along the way WITHOUT altering the underlying library).  

There are usually many ways of doing nearly anything -- the trick is in
recognizing the one that is the least crack-ridden and that efficiently
solves the problem and that is under your control.  Sometimes, rarely,
all three happen to occur in concert and things bop right along and
miracles happen.  

Finally, one MUST NOT FORGET that rpms tend to be FUBAR not because of
any particular weakness in rpm (the design) per se or in rpmbuild, but
out of egregious user error.  rpms with circular dependency loops, nasty
obsoletes behavior, wider dependency chains than strictly necessary,
overly specific package requirements (perl version EQUAL to
blankety-blank-blank oh my holy lord -- as if you're never permitted to
update perl if you wish to run this particular generic-perl application,
library version EQUAL to whatever ditto), installation locations that
are not compliant with the FHS or rpm rules and customs (rpm's that
install man pages in /usr/man or /usr/local/man, for example, where the
latter is "permitted" by the FHS but deprecated for rpm-based installs
in general as packaging something de facto makes it "part of the
distribution" by extension and hence installable in /usr).  

There is no abuse under the sun that developers are not capable of,
especially developers that e.g. work on a debian system and provide a
spec file of sorts from some template they borrowed somewhere or
generated by a tool that will "work" to make an rpm from their sources
on at least some rpm-based distros at least some of the time. Not to
pick on debian developers -- you see the same thing in rpms built by
people who work on rpm-based system, by people who should know better,
by people who DO know better but don't care enough to do it "right"
because it would involve porting their code from where it built just
fine back when you could still do it with rpm directly, not using
rpmbuild, and why should they change?  

This latter category includes a whole lot of commercial software and
(unfortunatly!) HPC tools.  How in the world can you build a commercial
package that builds only under a specific compiler and with specific
flags set on a specific distribution with specific libraries?  You call
that robust, best practice coding?  I call it s**t. Bloody amateur s**t.
Written (probably) by physics graduate students who got a C- in
programming 101 (the only course they ever took on how to program) and
who wouldn't know an algorithm (other than one from Numerical Recipes)
if they tripped over it.  And believe me, it's out there.  I personally
know of a huge web-based application developed at one of our neighboring
institutions here in the triangle that was written by a physicist as
perl-cgi scripts on a Mac -- I kid you not -- to manage a massive
database of information that the tool interactively presented that he
kept as one, big flatfile.  I don't remember, 60K lines of code, not one
comment.

The Southern courtroom defense for murder.  "He needed killin'". Sounds
pretty good, doesn't it?  Maybe you can think of one or two targets...?

SO, my friend, who KNOWS what wonders rpms could work if People Just
Followed The Rules, were Competent Programmers, and Maintained Their
Code.  I honestly think that if they did these things, Life Would Be
Good and rpms would be perfectly adequate (as indeed they are, for
nearly anything).  Or a whole lot better.

So I, like many others, groan when rpm's and yum and repos don't quite
work the way I'd like them to, when I encounter (rarely) an application
that isn't rpm packaged and I want to install it around, but I also
think that the problem is MORE THAN HALF with the software developers,
with only some fairly narrowly proscribed areas where rpm itself may or
may not prove to be inadequate.  Mark's example of ATLAS being a very
useful, near archetypical, case in point.  It replaces PART of some
libraries, there are tools that can use if you've got it, there are
libraries that will use it if you've got it and install it the right
way, but rpm can easily get confused about just what it provides and
whether or not it is safe to install without breaking a dependency of
something that uses the basic e.g. lapack/blas packages.  Even this,
though, probably depends more on the effort one spends building the rpm
itself and less on rpm per se.  

The problem of building and maintaining diskless systems is another case
in point -- rpm's db management isn't QUITE flexible enough to do all
that needs to be done there, and although there are hacks one can use
(usually) to make things work, they aren't as natural as one might like.
How much of this is rpm's fault, and how much the fault of the
underlying db software I cannot say -- it seems like there are several
way one might try to fix it.  The rpm-devel list is there waiting for
anybody with time hanging heavy who wants to give it a shot -- rpm 5.x
is probably taking orders for features and suggestions right now...
preferrably, I'm sure, from people who can singlehandedly take the
existing sources and Make Them Happen and then help to Keep Them
Running.  As usual.

   rgb
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20051004/51021f54/attachment.sig>


More information about the Beowulf mailing list