[Beowulf] Which distro for the cluster?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduWed Jan 3 06:51:44 PST 2007
- Previous message: [Beowulf] Which distro for the cluster?
- Next message: [Beowulf] Which distro for the cluster?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 3 Jan 2007, Leif Nixon wrote: > "Robert G. Brown" <rgb at phy.duke.edu> writes: > >> Also, plenty of folks on this list have done just fine running "frozen" >> linux distros "as is" for years on cluster nodes. If they aren't broke, >> and live behind a firewall so security fixes aren't terribly important, >> why fix them? > > Because your users will get their passwords stolen. > > If your cluster is accessible remotely, that firewall doesn't really > help you very much. The attacker can simply login as a legitimate user > and proceed to walk through your wide-open local security holes. So: a) Our cluster wasn't remotely accessible. In fact, it was on a 192.168 network and in order to even touch it, one had to login to an up to date, carefully defended desktop workstation login server in the department. b) If an attacker has compromised a user account on one of these workstations, IMO the security battle is already largely lost. They have a choice of things to attack or further evil they can try to wreak. Attacking the cluster is one of them, and as discussed if the cluster is doing real parallel code it is likely to be quite vulnerable regardless of whether or not its software is up to date because network security is more or less orthogonal to fine-grained code network performance. Still, a cluster is paradoxically one of the best monitored parts of a network. Although it would make a gangbusters DoS platform, network traffic on the cluster, cpu consumption on the cluster, user access to the cluster are all relatively carefully monitored. The cluster installation is likely to be different enough and "odd" enough to make standard rootkit encapsulations fail for anyone but the legendary Ubercracker (who can always do whatever they want anyway, right?;-) In an organization that tightly monitors everything all the time on general security principles (first line of defense, really, as one can NEVER be sure all exploitable holes are closed even with a yum-updated, stable, currently supported distro and human eyes are better at picking up anomalies in system operation than any automated tool) I think it is pretty likely that any attempt to take over a cluster and use it for diabolical ends would be almost instantly detected. BTW, the cluster's servers were not (and I would not advise that servers ever be) running the old distro -- we use a castle keep security model where servers have extremely limited access, are the most tightly monitored, and are kept aggressively up to date on a fully supported distro like Centos. The idea is to give humans more time to detect intruders that have successfully compromised an account at the workstation LAN level and squash them like the nasty little dung beetles that they are. FWIW, our department is entirely linux at the server level, and almost entirely linux at the workstation level. A very few experimental groups and individuals run either Windows boxes (usually to be able to use some particular software package) or Macs (because they are, umm, "that kind of user":-). I'm guessing that the ratio is something like 4:1 linux to Win at the workstation level (Macs down there in the noise) and maybe 10:1 linux to win if you include cluster nodes, whatever OS they might be running. Since Seth introduced yup on top of RH (maybe 7-8 years ago? How time flies...), and then proceeded to write yum to replace yup for RPM distros in general, we haven't had a single successful promotion to root in the department. Nothing done locally can prevent some grad student's password from being trapped as they login from some compromised win-based system in their hometown over fall break, but the very few of these that have occurred have been quickly detected and quickly squashed without further compromise. In that same interval, we had a WinXX system compromised and turned into a pile of festering warez rot something like twice a year. Pretty amazing given that they are kept up to date as best as possible and they make up only 10-20% of our total system count. > But you know this already. Oh yeah;-) And we didn't do this "willingly" and aren't that likely to repeat it ourselves. We had some pretty specific reasons to freeze the node distro -- the cluster nodes in question were the damnable Tyan dual Athlon systems that were an incredible PITA to stabilize in the first place (they had multiple firmware bugs and load-based stability issues under the best of circumstances). Once we FINALLY got them set up with a functional kernel and library set so that they wouldn't crash, we were extremely loathe to mess with it. So we basically froze it and locked down the nodes so they weren't easily accessible except from inside the department, and then monitored them with xmlsysd and wulfstat in addition to the usual syslog-ng and friends admin tools. Odd usage patterns (that is, almost any sort of running binary that wasn't a well-known numerical task associated with one of the groups, logins by anyone who wasn't a known user) would have been noticed by any of a half-dozen people, one of whom was me, almost immediately. The kernel was "barely stable" as it was and couldn't easily have been replaced with a hacker kernel (to e.g. erase /proc trace) without a VERY high probability that the hacker kernel would crash the system and reveal the hacker on the first try. xmlsysd reads all sorts of stuff from all over /proc and was custom code that I was working on and periodically updating, even while Seth was working on yum and updating THAT. Somebody would have had to literally custom craft some very advanced C code to stay hidden on the cluster and even then would have been revealed by e.g. an update of xmlsysd unless they were a bit beyond even Ubercracker status. In general, though, it is very good advice to stay with an updated OS. My real point was that WITH yum and a bit of prototyping once every 12-24 months, it is really pretty easy to ride the FC wave on MANY clusters, where the tradeoff is better support for new hardware and more advanced/newer libraries against any library issues that one may or may not encounter depending on just what the cluster is doing. Freezing FC (or anything else) long past its support boundary is obviously less desireable. However, it is also often unnecessary. On clusters that add new hardware, usually bleeding edge, every four to six months as research groups hit grant year boundaries and buy their next bolus of nodes, FC really does make sense as Centos probably won't "work" on those nodes in some important way and you'll be stuck backporting kernels or worse on top of your key libraries e.g. the GSL. Just upgrade FC regularly across the cluster, probably on an "every other release" schedule like the one we use. On clusters (or sub-clusters) with a 3 year replacement cycle, Centos or other stable equivalent is a no-brainer -- as long as it installs on your nodes in the first place (recall my previous comment about the "stars needing to be right" to install RHEL/Centos -- the latest release has to support the hardware you're buying) you're good to go indefinitely, with the warm fuzzy knowledge that your nodes will update from a "supported" repo most of their 3+ year lifetime, although for the bulk of that time the distro will de-facto be frozen except for whatever YOU choose to backport and maintain. And really, there isn't much stopping folks from adopting a range of "mixed" strategies -- running FC-whatever on new nodes for a year or whatever as needed in order to support their hardware or use new libraries, then reinstalling them with Centos/RHEL (which is basically FC-even-current-at-release-time frozen and supported or so it seems recently anyway) as Centos support catches up with the hardware by syncing with an FC-current on a new release. Nowadays, with PXE/Kickstart/Yum (or Debian equivalents, or the OS of your choice with warewulf, or...) reinstalling OR upgrading a cluster node is such a non-event in terms of sysadmin time and effort that it can pretty much be done at will. Except for pathological cases (like the Tyans) we're talking at most a few days of sysadmin time to set up a prototyping node or four, flash over to the new distro via a discrete node reboot (unattended automated reinstall or a new node diskless image), and let selected users whack on it for a week or two. If it proves invisibly stable and satisfactory -- the rule rather than the exception -- crank it on up across the cluster. Even if it "fails" on some untested pathway after you do this, it costs you at most a reboot (again to a reinstall/replacement of a node image) to put things back as they were while you fix things. The worst thing that such a strategy might require is a rebuild of user applications for both distros, but with shared libraries to my own admittedly anecdotal experience this "usually" isn't needed going from older to newer (that is, an older Centos built binary will "probably" still work on a current FC node, although this obviously depends on the precise libraries it uses and how rapidly they are changing). It's a bit harder to take binaries from newer to older, especially in packaged form. There you almost certainly need an rpmbuild --rebuild and a bit of luck. Truthfully, cluster installation and administration has never been simpler. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] Which distro for the cluster?
- Next message: [Beowulf] Which distro for the cluster?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
