[Beowulf] OS for 64 bit AMD
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduMon Apr 4 11:06:52 PDT 2005
- Previous message: [Beowulf] OS for 64 bit AMD
- Next message: [Beowulf] OS for 64 bit AMD
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, 3 Apr 2005, Joe Landman wrote: > It is *long term* behavioural, driver, and interface stability. > Changing an ABI midway through (4k stacks) is *not* behavioral > stability. You have no real reason to expect a code to work correctly > when you alter one of the critical underlying structures that it relies > upon. Many drivers rested on 8k kernel stacks, it was in the ABI as a > (defacto) standard. RHEL3 did not (properly so) change its underlying > kernel structures in such a way to render some portions of the system > unworkable. RHEL4 is not likely to change its underlying kernel > structures in such a way to render some portions of the system > unworkable. FC-x is likely to (and has) changed its underlying kernel > structures. I don't understand this assertion at all. Are we talking about the linux kernel here? The one that Linus Torvalds and friends are working on? I would think that whether or not the kernel stacks change size, and how drivers insert into the kernel, and oh so much more are utterly out of the hands of BOTH RH and the FC people (who are, after all, largely the same people). If/when the kernel changes stack size and other things (for example, the layout of /proc) as it has in the past, we will all have to live with it and yes things will break. The only way RH could avoid it is to either run a twinned kernel configuration with the change backed off or to freeze on a legacy kernel, both of which are such stellarly bad ideas that I hope there is no need to even discuss them. So sure, RHEL 3.x may be "frozen" with 8k kernel stacks as long as they choose to use and support the last kernel release with that set as the default or where RH is willing to back off a shift, but RHEL in general is going to break this kind of thing from release/upgrade to release/upgrade. Again, one DEFINITION of major release upgrade is that it is one that breaks binary compatibility of at least something major (e.g. kernel, libc, x). Minor releases break or add features in userspace, updates fix bugs. So the only thing you're discussing is WHEN FC does it vs WHEN RHEL does it. RH will do it more slowly, for sure, but you'll be just as pissed off about it when it does. In either case, your biggest problem isn't with the distro itself (which IS, as Mark has repeatedly pointed out, beta tested and a real release) -- it is with packages or programs that are NOT in the distribution and were NOT beta tested -- they de facto go through a whole alpha/beta/pilot/production cycle from a point that STARTS in many cases when the new distribution is first released. For just one example -- how many people would notice it (or DID notice it) the minute that the kernel starts outputting numbers in /proc that were ulls instead of uls? I notice it because stuff I write breaks like all shit. Most people wouldn't/didn't notice it because they got the change at the same time they got a fixed procps that managed it. Only people like me running software that was coded on the older no longer valid assumptions see it break. Then it is usually a straightforward task to fix it in an open source world where things are well documented. Even in the closed source world companies like NVIDIA tend to be at least REASONABLY responsive and eventually fix it if their marketplace demands that it be fixed. So be aware that things like this DO change fairly regularly in ALL distributions, commercial or non. Most of the time the changes are hidden, especially for folks that use the properly beta-tested software that comes "with" the distribution. Sometimes they are not, and obviously can strongly affect people whose source is e.g. 32 bit source but they are suddenly running on a 64 bit platform. The real question is: Feature or Bug? I tend to think of dynamical evolution as an unquestionable feature. Who here really WISHES we were still running the 0.9x kernel? SLS linux? X11R5? Raise those hands high, folks, can't make them out from here. Hmmmm, not a whole lot of you. Everything after that is a question of rate and degree. FC is "fast" but has a full development cycle and (most important) has engaged developers with a well-defined, yum/repository based mechanism for distributing updates from the toplevel repositories to end-user computers in no more than 1-2 days (faster than that in an emergency). RHEL is "slow" and has pretty much the same update mechanism (or up2date, used by an increasingly vanishingly small segment of hapless humanity given that it scales, um, "poorly" and costs the moon). SuSE is different, Debian different again. Viva la choice. > So you have pre-release testing as "beta-testing" but you deny that > "proving ground" is beta-testing? Seems to be same side of a coin here. > Having a normal release management does not a production quality > system make. It is most definitely one of the requirements for such a > system, but it does not, in and of itself, make the OS a production > class OS. A reasonable definition of production class OS will likely > incorporate inherent stability of the underlying structures of the > system, and a guarantee that they will not change for some fixed > interval. Production specifically implies a repetitive behavior, > specifically for HPC, a cycle shop. If the next incompatible change in > FC-x renders your IB drivers unworkable for your cluster, does that in > fact make the OS that you have installed on the system production ready > or not? If you have to continuously chase hacks/patches/etc to keep > your system operational after every upgrade, does that make your system > production ready? > Here is where you keep getting hung up on this whole beta thing. Look, beta testing (as previously noted) referes to a specific phase of a commercial-grade software development cycle. It is fair to apply the term to "Fedora Core X" as a whole, as it goes through such a cycle. It is crazy to assert that Fedora Core X is a "beta" product because e.g. "Matlab" (to pick on one commercial package) may not run on it the day it comes out the door. In actual fact, matlab might run on it, or might not. FC does not guarantee binary compatibility across major release numbers. Neither does RH. They can't. The very definition of a major release is one that shifts at least some ABI's. In actual fact, the matlab people technically need to undergo a whole product cycle of their own including alpha and beta testing ON FC or RHEL's new release to port to it if necessary and certify the result. It is entirely possible that RH has a relationship with many vendors and includes them IN their beta cycle so that those vendors can complete their own port and betas in time to update their product at the same time as the new release. So perhaps RHEL 4 comes out Monday, and by Monday evening customers can upgrade matlab to run under RHEL 4 for free or for an additional fee. If a LOT of their customers use FC, though, they are also likely to do that port and testing as rapidly as they can manage it. SO, please differentiate between: * the kernel -- in a space by itself outside ALL distributions. All you can choose here is how soon you want your next major release to also be a major kernel release, but when 2.8 is released in the fullness of time, all distros will eventually use it even if it breaks the hell out of every driver in current existence. Note well that complaining about "breaking drivers" is really complaining about the kernel and that Mark's point about closed source binary insertion modules is really well taken. This has nothing to do with FC per se, only with their decision to track the kernel fairly rapidly. The real issue here is that Nvidia and others should clearly keep up with the current linux kernels and not release a product and let it just sit static forever, or they should release their code so that it can be built into and beta tested WITH the kernel. * the major compilers -- also in a space outside of all distributions, also a major driver of incompatibility. A conservative approach to gcc would have features like SSE still unsupported, which is a bad thing for HPC systems. Remember the problems that ensued when the kernel required a different variant of gcc to build than the production gcc in many distros. * the major libraries -- libc, libm, and a slew of other core dynamic libraries ARE the ABI for the "distribution". The only requirement for a stable distribution (that I'm aware of) is that the entire distribution be built self-consistently from the kernel and compilers through the primary/major libraries down through the applications. A binary built for e.g. RH 7.2 is supposed to run on 7.1 or 7.3 but is absolutely not guaranteed to run on 6.2 or 9. Nobody would be horribly surprised if a binary built on 7.2 in fact fails on 7.1 or 7.3, though. This is what "rpmbuild --rebuild" is for...and why open source portable, rebuildable packaging of standards compliant sources are a really really good thing. It is also one of the things many vendors utterly fail to cope with -- since their code is often built according to proprietary shop standards, it ends up being reviewed only by a limited set of inbred eyes and ends up non-portable or maintainable crap. Since it costs money (in the eyes of the board) to actually invest in the development process, underfunded crap at that. The board would LOVE it if they could pay just once to have the sources developed and then fire the whole development team and sell the product forever, as that is the way THEY view intellectual property -- as a commodity they can purchase and exploit for a profit and wealth, not as a participatory exercise from which they happen to earn a well-deserved living. So why should we be surprised by vendors that are still trying to sell software that "only runs on RH 7.3" libraries? They'd actually have to assemble a team of competent programmers redevelop their product again to get it to work on anything more recent instead of just make money... * X -- Again, X is on a separate development cycle outside of most distributions (as are many other packages, but X is a low-level sine qua non to many, many applications and hence qualifies as a "distribution" of sorts in its own right). In few places are the costs of both sides of the rapid release coin as visible as with X. OTOH rapid updates break applications and require admins to learn new configuration tools and are more likely to have bugs including serious ones. OTOH everybody gets pissed off if the brand new bleeding edge video card they got with their high-end visualization or gaming workstation doesn't work perfectly with linux. To add insult to injury, you're getting all irritated at FC-X for a change made in the KERNEL that broke an X DRIVER (Nvidia) that is deliberately engineered to live OUTSIDE the X, library, and kernel cycle. What do you expect? Sooner or later, the kernel, a key library, X itself were bound to change. When that happened Nvidia's driver was BOUND to break. How could it not break? Surely you don't expect Linus Torvalds to freeze the kernel development cycle just so Nvidia never has to actually WORK on its proprietary driver and can just keep making money on the basis of its original investment? * applications. There are two parts of application space. The part that is "inside" the distribution, and add-ons. The part that is inside the distribution is the part that is beta tested as part and parcel of the whole shebang -- kernel, compiler, libraries, X and applications (GUI and otherwise). There are a lot of moving parts, dependencies, dynamic libraries, and complex interactions. Incredibly, when any new distribution is released (after beta testing!), it is a matter of WEEKS before nearly all of this works on nearly all systems the distro is installed on. This is an effin' miracle and a testament to the incredible strength and robustness of the open source development cycle. The "gamma testing" in linux is ongoing, but it is also very, very efficient and rapid because everybody has the sources, and tens of thousands of competent eyes look at every emerging problem. The add-on part is NOT beta tested by the distribution developers, obviously. How could it be? Why is this a problem? How is this an AVOIDABLE problem? It is a simple reality of modern software that it is complex and typically has a complicated dependency tree on many libraries all with slowly varying ABIs. The safest way to get software to run perfectly on any new distribution is to rebuild it (if possible) and test/port/patch it as needed until it both rebuilds and functions properly. Binary compatibility is mostly an illusion, and will become INCREASINGLY illusory as the systems become still more complex and intertwined in the future. However, because of rebuildable, standards compliant packaging and sources, in MOST cases rebuilding is a matter of entering a simple command or two, and in cases where this isn't true it is a clear signal that the product badly needs a major rewrite. Perhaps what this discussion should really morph to is one of "standard linux" -- the Linux Standard Base -- a low level ABI for all major linux libraries to assure binary compatibility across all flavors of linux. Naturally, there is www.linuxbase.org and a lot of committed people. Equally naturally, commercial linux vendors are in no rush to implement it as far as I can see, and it is by no means clear that the goals of the project CAN or SHOULD BE accomplished. Standardization can equal stagnation, and there is an alternative. The alternative is that represented by gentoo but really possible within all packaged linuces. Non-binary packaging that autobuilds on your system. Shrink "linux" to little more than an LSB-standard core plus an enhancement of any of the packaging schema that permits source packages with complicated dependences to be automatically retrieved from a repository via e.g. yum and built and installed as a part of the system installation process. However, that's not something that I'm pushing -- just noting to emphasize that there are a number of software distributions competing here, and the one that is suffering most is the one that relies on the distribution of generic static binary images of software. The fundamental problem is that this is an outmoded paradigm and one that is likely to disappear altogether in the next few years. This has nothing to do with choice of distribution except that some distributions cater less strongly towards the desire of those software companies that rely on this scheme to make money without an ongoing maintenance and development effort, with clear tradeoffs. > > the existence of commercial products which specify RH-whatever vX.Y > > does not magically turn FC into a beta-test. if you redefine words > > that way, you might as well call all of SunOS a beta for Solaris. > > Er... you are the only one who indicated this, so if you want to argue > this, I would suggest you contact the person who generated this idea > (that commercial products dependent upon RH make FC a beta test) who can > be found at hahn _at_ physics _dot_ mcmaster _dot_ ca. > > I said "My customers care about running on distributions (whoops, there > we go with that word again) on which their apps are supported. I am not > aware of active support for FC-x for applications from commercial > program providers. If I am incorrect about this, please let me know > (seriously, as FC-3++ looks to be pretty good)." Prior to this I said > "It is by Redhat's definition, a rolling beta (proving ground)." The > two are specifically independent ideas. I know of few commercially > supported applications that will accept support calls from FC-x running > users. > > Note: Debian has very little in the way of commercial support (none > from the distributer). It is most definitely not a beta. You can use > the beta version in unstable. This is analogous to Fedora. > > What makes FC a beta is that Redhat specifically is note that, and is > using Fedora as a "proving ground" (c.f. > http://dictionary.reference.com/search?q=proving+ground ) as in "It is > also a proving ground for new technology that may eventually make its > way into Red Hat products." (from http://fedora.redhat.com/ ) From the > reference.com site "prov·ing ground (prvng) n. A place for testing new > devices, weapons, or theories." Would you call a system that is defined > by its maker to be a proving ground to be a production environment (e.g. > stable, unchanging) ? As I repeatedly say (as self-appointed referee) here is where you guys are REALLY fighting -- about nothing. FC is not a beta for RH any more than Debian is. RH is a collection of packages. So is FC. So is Debian. So is linux itself. Those packages include things as diverse as kernels, compilers, libraries, x, and applications of all sorts. MOST of those packages are NOT maintained by RH per se, and the ones that are are often "co-maintained" by RH and SuSE and Gnu and the kernel team and xorg and a host of contributing programmers from all over the world. ALL of them are a "beta" for RH, and for FC, and for Debian, and for SuSE, by this sloppy a definition. I reiterate -- the fundamental problem is that alpha/beta testing refer to specific phases present in the development cycle for most commercial-grade (monolithically supported) software. They do not fit comfortably into the open source development process where there often isn't a single well-defined "team" with "responsibilities". In lots and lots of cases, the "team" is a single person who wrote and accepts full responsibility for a product, and an associated "list" of active users that serve as some mix of co-developer (as they contribute working codons back to the project memetic source code set), alpha tester as they implement new snapshots, beta tester as they implement new snapshots (oops, failed to see much difference there) and user as the new snapshots they implemented prove to be stable and are put into production. Then there is a whole NEW cycle when the product is distributed outside of that group, and there may be several such cycles in parallel as the product goes into several distributions. "FC-X" is most definitely NOT pre-RHEL in anything like the sense that rawhide was pre RH. Sure, packages there migrate eventually into RHEL -- how could they not? So do packages that are actively developed under Debian. In fact, a lot of those packages "get into" FC-x as well. Other packages (e.g. PVM or LAM) might be packaged by groups that have nothing to do with any linux distro. Also, FC goes through multiple cycles of development betweeen RHEL upgrades. How specifically is FC-1 contributing to RHEL4? FC-2? FC-3? All three releases have overlapped with the RHEL 3 timeframe -- so how is a package that appears and is distributed, stable, for an entire FC release, then re-released and distributed, stable, for an entire FC release, then re-released somehow "being tested for RHEL" in two of those three releases? It just isn't. FC is a release in and of itself, and has a lot of energy going into it for its own sake. Sure, it is important to RH as a "proving ground for new software" because RH has to be (or chooses to be) conservative to the edge of insanity within the EL distribution. Remember, now that they are selling it for so much money, customer tolerance to "gamma" releases in the best of open source traditions is doubtless greatly reduced. Note well that RHEL is frozen WAY out to the point of insanity. I can't get packages I'm actively working on under FC-2 to BUILD under Centos 3.x because -- guess what -- they froze the GSL along with everything else. This is doubtless very comforting to somebody that built something to work with just that GSL snapshot, but that snapshot was broken in many ways and is missing all sorts of new features. So here's an out for the two of you. Call FC-X a "gamma release", not of RHEL (it isn't) but of itself. That's ok since gamma release is a joke anyway, but to the extent that it means anything it is probably accurate because ALL linux distributions are "gamma" releases. Accept the fact that FC-X is beta tested and quality assured before release, and that the testing and assurance are likely lower than they are for RH because they cost money and RH is trying to minimize investment here, so it probably (to be fair) DOES rely more on the gamma phase for end-stage debugging, which so far, for a community-based linux, seems to work just gangbusters well. Leave the entire issue of commercial software and library compatibility alone, as it is something you don't NEED to agree on. It is what it is. Mark's environment uses FC (as does mine) because we rely very, very little on commercial code and are comfortable telling consumers of our resource that their commercial products need to run on top of FC or don't bother. We both have Centos as an alternative or can always pay for RHEL or SuSE if we need more. You have particular customers with particular needs that are orthogonal (in some cases) to FC -- that's fine too! I think we'd all agree that this is FUNDAMENTALLY not FC's problem -- its an open source/closed source issue, and at its ROOT is probably due to inadequate investment in the software development process in the owning corporation and poor methodology, with some obvious exceptions -- but that doesn't make it less validly a problem with your customers. It does make it wrong to give as knee-jerk advice "never install FC on clusters" as that's just plain silly, even as it makes it perfectly reasonable to say "don't install FC on your cluster if you want to run product X, as it may have binary/library compatibility issues". > > the customer needs to evaluate how fragile a commercial product is: > > how well it conforms to the ABI. NVidia is a great example of > > an attractive product which is inherently fragile since NVidia > > chooses to hide trade secrets in a binary-only, kernel-mode driver > > which (by definition and example) depends on undefined behavior. > > VMWare is another good (flawed) example. > > Hmmm. I hear this argument time and again from people about the closed > source nature of nVidia's drivers. nVidia does not (as far as I know) > own all the intellectual property in their driver, and they do not have > the right to give that IP away via GPL or any other mechanism. The > fundamental flaw in the arguments against the nVidia driver are an > inherent presumtion that nVidia is hiding trade secrets in order to make > its life better and get end user lock-in. The behavior it (the driver) > depends upon has been built into the kernel, and when that behavior > suddenly changed, nVidia wasnt the only driver affected. Many open > source drivers were impacted. Are you going to argue that this makes > them (the open source drivers) inherently fragile? This is a natural > extension and simple application of your argument. This is a weak > argument at best, and some of its fundamental premises are fatally > flawed. If nVidia owned all the IP in everything they released, and > chose simply to release binary only drivers, that would be a completely > different case. Unfortunately, a fair amount of the IP in OpenGL and > other related standards is owned by companies that have no interest in > open source other than demolishing it. SGI sold off most of its IP in > OpenGL to some other outfit. Hmmm, I'm skeptical about this specific argument. Cynical might be a better word. After all, there exist nvidia drivers in the open source world -- see "nv" in xorg. They simply don't work as well. I find it very, very difficult to believe that -- with access to the internals -- nvidia couldn't write an open source driver that worked as well as their "proprietary" driver. Of course I'm also a radical person who doesn't believe that there can or should be "IP" in a software driver. Their product is a hardware device; it has an ABI, whether or not they choose to publish it. It isn't easy for me to see how that ABI is IP. If they published the ABI with full documentation, open source people could write as good a driver as they could without their help. One encounters similar things for everything from palm pilots to NICs. There is always some argument for a hardware vendor protecting their "trade secrets" in their drivers, where the real trade secret may be that they're using some standard chipset with a tiny bit of nonstandard glue and their board is really a piece of crap. In linux, it is evolution in action -- unsupported boards aren't purchased by linux users, and that is finally adding up to something pretty significant in the marketplace. Nvidia is something of an exception because they are very popular with gamers and visualization labs and because there is SUCH a big difference between nvidia's driver and the nv driver. However, this may not last. Note also that the kernel has NEVER guaranteed that drivers built for one revision will remain valid for all revisions. How could it? Will drivers that aren't "part" of the kernel break when the kernel changes something major (the ones that are "part" of the kernel are again beta tested in situ and tend not to break -- as much -- so we're obviously talking about add-ons)? Sure, how could they not? Either participate in the kernel development process and work all these kinks out snapshot to snapshot or accept that you'll have to port/debug every time the kernel changes enough to break your driver. Just don't complain. There's nothing to complain about. It's like complaining that those consarned rattletraps with four wheels and a stinkin' engine and horn scare the horses and should be banned from the roads. > > "supported configuration" is nothing more or less than a way to > > "download" support costs to the platform vendor (PV). it's a lever, > > acting on the customer as a pivot, to force the PV to avoid changes > > of any sort, since its impossible to tell what internals the proprietary > > product depends on. > > Uh.... I think we disagree again. A supported configuration is > something that a customer, an end user, a developer should have a > reasonable and fighting chance of having it work right. This means that > the internals that are exposed to developers will no change (including > driver developers). This means that end users and customers have a > reasonable expectation that their configuration on the supported list > should work, and the onus is on the platform vendor (nice to see you > switched to the definition of platform that I was using BTW) to make it > work without breaking other stuff. Oh, but that's easy. Just lock into Red Hat 5.2 and tell customers that any hardware older than a Pentium is out of bounds. What, you meant that you wanted CONTEMPORARY hardware configurations to work? But DON'T want bleeding edge kernels (most likely to have the requisite drivers), modern libraries (most likely to be posix compliant, most likely to have useful features), modern compilers (most likely to have e.g. SSE support and other bells and whistles) and modern applications to help you run, configure, and otherwise support the systems in question? Don't want MUCH, do you, but your cake and its consumption all at the same time. FC-x is far more likely than RHEL to have: contemporary kernel, good 64 bit support, large list of supported devices. up to date compilers up to date libraries (e.g. GSL given above) lots of nifty -- and new -- applications, and relatively bug-free and feature-rich version of older ones e.g. Open Office that might have been under active development when RHEL was "frozen". So I don't think you mean this. I think what you mean is that you wish that VENDORS rewrote and updated their SOFTWARE PRODUCTS so they would REBUILD on FC-x with less pain than months of work porting and debugging and testing. Since they won't do this and are instead insisting that YOU use the ancient kernel and libc where their last build still worked, you wish that somehow that ancient kernel could use newer drivers for modern networks and busses, that libxml had been rebuilt for the older libc (presuming that it COULD be rebuilt for it, by no means a given), that xorg had bothered to port their latest set of devices and applications back to that ancient OS release so that your customer could still use their nifty new monitor and graphics card. I'm not trying to be cruel here -- I'm just pointing out that there are some fairly fundamental conflicts here that cannot easily be resolved by YOU or YOUR CUSTOMER. The only way to fix this problem properly is (perhaps) the LSB, and even then it would only work if the SOFTWARE PROVIDER learned how to write portable and cheaply rebuildable software and made a siginficant and revolving investment in development and maintenance of same. In the meantime you literally cannot have what you want -- you can only choose the place where you make the compromises that let you make things work well enough to get by. > > drastistically > > similarly, SOP in the Fibrechannel world is to provide only negative > > definitions of support (nothing but HP disks in HP SANs.) this can be > > seen as a flaw in standard-defining, since Ethernet provides a fairly > > decent counterexample where interoperability is the norm because > > products need to conform, not "qualify". > > A standard is only useful if people pay attention to it, and > engineer/design/build to it. Standards are very useful to developers, > in that if they code in a particular manner that adheres to the > standard, they have a fighting chance of developing something that will > work. If the standard suddenly changes on them, and their stuff breaks, > who do they turn to? If the target is moving, how much time/effort will > they expend to chase it? Sigh. Somehow there are (how many? lots!) mountains of linux packages -- hundreds and hundreds -- that are written so that they will work. Not only work, but keep working distribution release to distribution release, often with nothing more than that aforementioned "rpmbuild --rebuild". It isn't an issue with "creeping standards". The most common problem is "failure to code to standards", followed by "failure to invest in maintaining the code". Most real standards have consortia associated with them. Some are defined by IEEE docs (a process that I abjure, because it is both elitist and non-open). I prefer the RFC-defined standards as exemplary of the open standard development process. In such a process there is really never any reason for a vendor or developer to be caught by surprise. Surprises are more likely to occur when vendors are heavily involved in writing the "standard" and yank it around for customer lock-in and commercial advantage. M$ being past masters at the game, but they are far from the only ones. > In some cases (development tools) it makes sense to chase some specific > moving targets (though it costs time/effort and therefore real money). > In other cases it makes sense to wait for stable releases where things > will not change, so your customers/end users can get your stuff and make > it work, because you have a fighting chance at making it work. > > Greg's company (and the folks at the Portland Group) have to chase these > targets... many of their customers are there (I'd bet that a small > fraction of their collective total customer base are using the > development tools to generate commercial code, most are using the tools > for their research/development tasks). These are a notable exception to many of the things I say above, but then, no compiler company survives for long without a sustained investment in the world's MOST serious programmers who know MORE about product development cycles and hardware features and interoperability and maintenance and all that than nearly anybody. However, I know of numerous projects that are developed (one can WATCH them being developed) and then most of the development staff goes away with only a small skeleton hanging out to do debugging and solving those "gamma release" problems. This lets the company make a lot of money without having to pay a full development team of pesky programmers. It also means that if they wrote a Windows version of the application and you want a linux version you're SOL. If you want a new feature you're SOL. If you want new hardware to be supported, well, their two remaining programmers will get around to new hardware in roughly 13 months as maintenance issues and putting out fires run down. In fact, if you just want them to fix a damn bug get in line -- those same two guys are booked up fixing bugs that have already been reported for the next nine weeks, but they'll try really hard to do yours then. I'm not criticizing this -- it may be that this is the only way for them to maintain any sort of profitability at all. Or maybe, just maybe, the VC guys want to maximize profit for long enough to sell their stock and/or make their investment x 20 back some other way, and don't give a goddamn if the company still exists two years from now. I've seen both things openly expressed in board rooms, with the latter usually NOT said but perfectly obvious. Rape, pillage and burn, right? Take no prisoners. > Yeah, there are significant interoperability problems in things like SAN > and what-not-else. These are unfortunate. This is part of the reason > why I try to avoid such things (I don't like vendors locking me in, and > I know my customers don't like being locked in, so I don't waste my > companies time trying to figure out how to do this). Don't assume that > a companies / end users misapplication of a standard, hijacking of a > standard, or abuse of a standard somehow makes all standards bad. They > are not. Standards are sometimes the only lever you have in a > commercial closed source context... demanding that a company adhere to > what it claims to sell is sometimes a necessary path. Interoperability > means that when people interpret the standards, that all parties agree > on the definitions, and that they guarantee that their products will in > fact conform to the standard, and that there will be tests of the > standard compliance, and out of compliant systems will be adjusted to be > in-compliance, and that interoperability with other standards will be > guaranteed. This is why IDE, SCSI, and Ethernet work so well. This is > why some others do not. IB is likely to work quite well going forward. > This is why the SAMBA folks are chasing a moving target, as the CIFS > "standard" is a moving one (just go ahead and update that XP with a > SAMBA server around .... grrrrr). > > I like and use FC-x, we run FC-2 and FC-3 on various machines (AMD64, my > laptop as part of a triple boot, and x86). I make sure our software > runs on this, we compile and test on FC as well as for others > (RH/Centos, SuSE, looking at Ubuntu/Debian) . I am happy that our > binary packages seem to work nicely across multiple distributions > (though we usually bring the source along to be sure), and our large > systems are built from source, so they should work (as long as the > underlying technology works). Our software works at a high level, and > depends upon lower level bits. I don't see the effect of the OS changes > as much as the tool/hardware vendors do, though every now and then > something breaks a driver. But, and this is the critical point for us, > if our software breaks at our customers site, we own the fixes, it is > our job to make them happen. More importantly, if something breaks in > the chain of software (whether we own it or not), we try to help, as it > is critical to make sure that failure modes are understood, and problems > are resolved. We have been and will be helping our customers resolve > problems with third party software, commercial and otherwise. If our > target platform were moving, so that the C compiler structures were > changing, and we had to rebuild time and time again with each OS update, > I would wait until we saw this settle out. Otherwise we are spinning > our wheels, as each change is more work, and in the end, it should > converge to a final state. It is the final state that is worth > targetting (for us, for others such as PathScale, they have to follow > what their customers use). It sounds like you too take the software end seriously. This is just what you need to do to maintain a viable and interoperable product. Good job. > The issue in FC-x is that it is open to internals changing. I think > this is a good thing. It is doing what it was intended to do, and I > like seeing the directions I need to worry about going forward. I will > not likely deploy this as an OS for a cluster customer without the > customer understanding exactly what they are getting, and making sure > they understand what is needed to support this. If they really want a > cheap RH, they can get Centos/Tao. If they want internal structural > stability, and support from commercial vendors for their commercial > codes, they will have to run something that the commercial vendors will > support. PathScale and possibly the Portland group (and I am going to > guess Etnus and a few others) do or will likely support it. LSTC, MSC, > Accelrys, Tripos, Oracle, ... will likely not (though it will probably > run fine with no issues). This is all very reasonable, and Mark would probably even agree. Run FC if it works and your software base permits it, use Centos or buy whatever you like if it doesn't. Don't expect apples to become oranges, remember that tanstaafl, and make the best compromises you can to get things to work (the important thing). And DON'T WORRY about what is a "beta" or what isn't. It isn't relevant. FC is without question more dynamic. It is without question already through a real beta before release. It is without question more "daring" as it evolves more quickly and will break things more often. It will also GENERALLY be a lot more functional, as breaking non-commercial things is confined to a relatively short phase right after release, and with yum fixes are RAPIDLY deployed. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] OS for 64 bit AMD
- Next message: [Beowulf] OS for 64 bit AMD
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
