[Beowulf] Which distro for the cluster?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comSun Jan 7 07:06:17 PST 2007
- Previous message: [Beowulf] Which distro for the cluster?
- Next message: [Beowulf] Which distro for the cluster?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Andrew M.A. Cater wrote: > On Wed, Jan 03, 2007 at 09:51:44AM -0500, Robert G. Brown wrote: >> On Wed, 3 Jan 2007, Leif Nixon wrote: >> >>> "Robert G. Brown" <rgb at phy.duke.edu> writes: >>> >> b) If an attacker has compromised a user account on one of these >> workstations, IMO the security battle is already largely lost. They s/largely/completely/g At least for this user, if they have single factor passwordless login set up between workstation and cluster. Of course if they are using a malware-ridden, keylogger hosting machine, they have ... uh ... somewhat worse things to deal with than just their accounts on the cluster being open to attack. The solution to this is simple. Never let this happen. Which means, don't use a system which is significantly vulnerable to malware or keylogger insertion. It is left as an exercise to the reader to figure out which platforms are more vulnerable. >> have a choice of things to attack or further evil they can try to wreak. >> Attacking the cluster is one of them, and as discussed if the cluster is >> doing real parallel code it is likely to be quite vulnerable regardless >> of whether or not its software is up to date because network security is >> more or less orthogonal to fine-grained code network performance. >> > > Amen, brother :) > >> BTW, the cluster's servers were not (and I would not advise that servers >> ever be) running the old distro -- we use a castle keep security model >> where servers have extremely limited access, are the most tightly >> monitored, and are kept aggressively up to date on a fully supported >> distro like Centos. The idea is to give humans more time to detect >> intruders that have successfully compromised an account at the >> workstation LAN level and squash them like the nasty little dung beetles >> that they are. Yup. Even better is never letting the users log in to admin machines. Provide machines for them to log into, submit and run jobs from. Just not the admin nodes. [...] >> In general, though, it is very good advice to stay with an updated OS. ... on threat-facing systems, yes, I agree. For what I call production cycle shops, those places which have to churn out processing 24x7x365, you want as little "upgrading" as possible, and it has to be tested/functional with everything. Ask your favorite CIO if they would consider upgrading their most critical systems nightly. It all boils down to a CBA (as everything does). Upgrading carries risk, no matter who does it, and how carefully things are packaged. The CBA equation should look something like this: value_of_upgrade = positive_benefits_of_upgrade - potential_risks_of_upgrade And if the value_of_upgrade is not strongly positive, you probably should not do it if you are supplying a service to a user base. Sure, you can do it on your own personal cluster. I appreciate that people on this list do this for their systems. Regardless of this, you need to be of the (somewhat paranoid) mindset when looking at an upgrade, and the potential for loss of time/data/... A (not so great) example would be someone packaging up a recent 2.6.19 kernel with that oh-so-nice ext3-vm interaction which gave us compromised files. It hit mmap based files from what I could see. All you need is an end user with a corner case that happens to tickle the trigger and whammo. You are now spending time fixing their problem (which requires downgrading/upgrading). You have a perfectly valid reason to upgrade threat facing nodes. Keep them as minimal and as up-to-date as possible. The non-threat facing nodes, this makes far less sense. If you are doing single factor authentication, and have enabled passwordless access within the cluster: ssh keys or certificates or ssh-agent based, once a machine that holds these has been compromised, the game is over. Multi-factor authentication for launching cluster runs is still a challenge, as queuing systems may schedule jobs to start at 3am local time, and no one wants to wait around for job start to enter additional factors. You want to test any upgrade, and only upgrade what needs upgrading. Just like other aspects of security 101, threat facing nodes need to be running as little (important) stuff as possible, and need as limited access as you can give them. Upgrades can and do carry their own bugs and security holes, and you really don't want to be chasing those as well. >> My real point was that WITH yum and a bit of prototyping once every >> 12-24 months, it is really pretty easy to ride the FC wave on MANY >> clusters, where the tradeoff is better support for new hardware and more >> advanced/newer libraries against any library issues that one may or may >> not encounter depending on just what the cluster is doing. Freezing FC >> (or anything else) long past its support boundary is obviously less >> desireable. However, it is also often unnecessary. >> > > Fedora Legacy just closed its doors - if you take a couple of months > to get your Uebercluster up and running, you're 1/3 of the way through > your FC cycle :( It doesn't square. Fedora looks set to lose its way > again for Fedora 7 as they merge Fedora Core and Extras and grow to Hmmm. Fedora is the testing framework for RHEL. We know this. I like 6, it looks to be a fine test distro, and has lots of nice things in it. Works on lots of hardware. If I were building a cluster on it, I would not upgrade the compute nodes. Once they are set, unless there is a good reason to upgrade (newer packages that do not add needed or missing features is not a valid reason IMO), I would leave the compute nodes alone. Probably the head node as well. The login nodes are a different story. Upgrade them (security patches) as quickly as possible. > n-000 packages again - the fast upgrade cycle, lack of maintainers and > lack of structure do not bode well. They're apparently moving to a 13 month > upgrade cycle - so your Fedora odd releases could well be three years apart. > The answer is to take a stable distribution, install the minimum and work > with it OR build your own custom infrastructure as far as I can see. > Neither Red Hat nor Novell are cluster-aware in any detail - they'll > support their install and base programs but don't have the depth of > expertise to go further :( Both are happy to sell licenses to the unwary. At the end of the day, if you are going to build a RHEL cluster, use Centos/Scientific Linux unless you absolutely wish to pay RH for security patches. With SuSE, use OpenSuSE. If you are going to settle on Fedora, pick a distro, and remember that it will be out of support in a year, which shouldn't matter to the compute/head node once they are up. >> On clusters that add new hardware, usually bleeding edge, every four to >> six months as research groups hit grant year boundaries and buy their >> next bolus of nodes, FC really does make sense as Centos probably won't >> "work" on those nodes in some important way and you'll be stuck >> backporting kernels or worse on top of your key libraries e.g. the GSL. >> Just upgrade FC regularly across the cluster, probably on an "every >> other release" schedule like the one we use. >> > > Chances are that anything Red Hat Enterprise based just won't work. New > hardware is always hard. Heh. Try to point this out to a purchasing agent on an RFP which demands a) newest possible hardware and b) RHEL 4 support. You get to pick one or the other, not both. Which one do you want? Hint: "b" is far less valuable. The other (not-so-funny) aspect of this is when we deliver new hardware with an OS load that supports the newer hardware and someone wants to pull it back to the "corporate standard". In doing so, they give up stability, performance, and often file system support. Or in the case of our JackRabbit unit, when we deliver 30TB of 5U system and we get the "ext3 is almost as good as xfs" line. Uh.... er.... no. Those who really insist upon this must only want 16TB units with no possibility to ever grow beyond this (we have a design cooked up to show how to do a 1 PB in 4 racks as a single file system, or better, an HA 1 PB in 9 racks as a single file system). 16TB is great for some folks, but it is a fundamental ext3 limit. You need the untried-in-the-real-world ext4 to break that limit. Or xfs and jfs. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615
- Previous message: [Beowulf] Which distro for the cluster?
- Next message: [Beowulf] Which distro for the cluster?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
