[Beowulf] Which distro for the cluster?

Mon Jan 8 07:20:34 PST 2007

I am posting before coffee (PBC) so if I ramble more than usual, my
apologies.

Leif Nixon wrote:
> Joe Landman <landman at scalableinformatics.com> writes:

>>>>   b) If an attacker has compromised a user account on one of these
>>>> workstations, IMO the security battle is already largely lost.  They
>> s/largely/completely/g
>>
>> At least for this user, if they have single factor passwordless login
>> set up between workstation and cluster.
> 
> Of course. But you want to contain the intrusion to that single user,
> as far as possible.

I think there are two different issues.  First: security is meant to be
an access control and thottle/choke point.  Second: is how you view your
cluster.  Is it "one-big-machine" in some sense (not necessarily Scyld,
but with a security model such that if you are on the access node you
are on the machine), or is it really a collection of individual machines
each with their own administrative domain?  One of these models works
really well for "cluster" use.

> If your security hinges on no user passwords ever
> being stolen, you can very easily wind up in a situation that
> traditionally is said to involve creeks, but not paddles. 

Your security model should mirror your intended usage model as indicated
above.  If you are using a cluster, security is the front door.  If you
are using something else, the security model may be different.  Since we
are into analogies, why not look at it like the front of a very
exclusive club.  If you get in, you are in.  If you want, you can even
implement different security room to room, which very quickly causes
your club members to leave as it gets hard to move room to room.

Security is in part about containment.  Containment is not necessarily
putting a lock on every door, and a different required key or three to
unlock.

More importantly, security is about minimizing the maximum damage an
attack can do.  A different lock on every door may stop the casual
attacker, but as you have large binders of stolen passwords (the
authorites might wish to ask you how you got them :( ), I have some not
so nice log files of years of hackers, some script kiddies, and some
very good ones, beating on everything but the front door.

Put another way, I've been mimicing a few others for the better part of
a decade, saying security is a process, not a product.  Making a process
hard doesn't necessarily make it secure.  Making sure that when the
process breaks down, and it will, the damage as a result of that
breakage is as low as you can make it.

> I have two
> thick binders sitting on my desk, containing stolen passwords from an
> impressive range of commercial, academic and military institutions. 
> 
>>>> In general, though, it is very good advice to stay with an updated OS.
>> ... on threat-facing systems, yes, I agree.
>>
>> For what I call production cycle shops, those places which have to churn
>> out processing 24x7x365, you want as little "upgrading" as possible, and
>> it has to be tested/functional with everything.  Ask your favorite CIO
>> if they would consider upgrading their most critical systems nightly.
> 
> I see this in hospitals a lot.

I see this in every single production cycle shop we have been in.  Not
just FDA-regulated.  So much so that they have a process that involves
building a second (or Nth) test machine, called a sandbox, specifically
to test things until they believe them to work before deploying them.

Back to this in a second.

> Some healthcare systems can't be
> patched without reapplying for FDA approval, which is of course a
> hideously complicated process. So hospitals wind up running software
> which you can push over with a feather. Theoretically, they should be
> running on an isolated network ("It's no problem, we have
> firewalls!!!"), but it only takes a single mistake: somebody plugs in
> an infected laptop, or somebody misconfigures a VLAN. Our local
> hospital has fallen over due to worm infestations a couple of times.

The analogy fails to hold up.  Zero-day viruses and malware on fully
patched windows systems burns through the desktop/laptop population of
many.  What is terrifying to me is that my government still
mandates/allows the use of systems which are easily compromised in its
most sensitive inner reaches.  Specifically in the military and related
areas.  I don't know details, only heard faint mutterings online, but
something like this appears to have knocked some portion of government
computers in a highly sensitive area offline for several days very recently.

As indicated before, security is not a product (e.g. an updated patch),
it is a process (minimizing the maximum damage).  If you act otherwise,
the zero day virus' and malware are going to wreak havoc.  Or if you
think your systems are secure because you use multifactor access control
with long random passwords and secure id cards, you somehow (mistakenly)
believe your systems are secure, and you don't pay attention to some ...
misfeatures that are being exercised by people of nefarious intent.

If all I ever do is send random garbage to port 22 after doing the
handshaking, and eventually blow ssh out of the water, it really doesnt
matter if you have multifactor authentication running.  I would be in as
the user running the daemon.  Hence privilege separation.  Change the
code so that if there is a break in, the maximum damage that can be done
is done as the sshd_daemon user.  Since they are no longer root user,
and they are isolated, in their own group, the damage they can do is
contained.  Minimizing the maximum damage.

> 
>> It all boils down to a CBA (as everything does).  Upgrading carries
>> risk, no matter who does it, and how carefully things are packaged.  The
>> CBA equation should look something like this:
>>
>> 	value_of_upgrade = positive_benefits_of_upgrade -
>> 			   potential_risks_of_upgrade
> 
> With the security benefits being really hard to quantify. 

Not really.  If you have a huge gaping hole that needs patching (OpenSSL
off-by-one or weakness), the benefits are easy.  Again, it is a process.
 You test the upgrade, and if it breaks nothing else, you do it.  In
fact, this suggests (usually) doing upgrades in smaller incremental bits
rather than large complex bits.  A huge bolus of patches and fixes often
has a few new (mis)features (I could name a company here, but they know
who they are) which are unfortunately potentially exploitable.

To keep risks low, make as few changes as possible.  To keep benefits
high, update important threat facing things.  To keep risks lower, do
not introduce more changes than absolutely needed.  Patches should not
include new (mis)features.

>> You have a perfectly valid reason to upgrade threat facing nodes.  Keep
>> them as minimal and as up-to-date as possible.  The non-threat facing
>> nodes, this makes far less sense.  If you are doing single factor
>> authentication, and have enabled passwordless access within the cluster:
>>  ssh keys or certificates or ssh-agent based, once a machine that holds
>> these has been compromised, the game is over.
> 
> I don't get this. What's the point of having a "secure" frontend if
> the systems behind it are insecure? OK, there's one big point -
> hopefully you can buy some time - but other than that? 

Its the model of how you use the machine.  If you lock all the doors
tight with impenetrable seals, and the attacker goes through the weaker
windows, those impenetrable seals haven't done much for you.

The idea is you minimize the exposed footprint of the machine to threat
facing access.  This is why lots of the secure sites are disabling USB
ports on the motherboards (but mistakenly then running systems which can
install keyloggers and other malware ... ).  If the USB does not
electrically work, it is not a possible attack vector.

You can always take the approach of compartmentalization; locking
*everything* down.  Put those impenetrable seals up.  Have one port
exposed.  Allow no back channels whatsoever.  No shared storage.  No
single factor authentication.  This is not a cluster computing model
that I have heard of.  Would break too many things.  Yeah yeah, grid
this and that.

> The goal is to be able to contain user level intrusions. If you can do

I disagree.  I think the goal is to minimize the maximum damage.  I do
not think it is possible to completely contain a smart and resourceful
attacker with multiple attack vectors.  I know lots of security folks
who used to think that their firewalls could, and then watched said
resourceful hackers go through them.

> this, the game *isn't* over even if you have an intrusion spreading to
> a cluster machine. A user level intrusion isn't too hard to deal with,
> but a cluster-wide root intrusion... isn't much fun. Sure, you can
> probably reinstall the entire cluster in an hour. To a vulnerable
> state. Hooray.

Again, I disagree.  I do not believe patching is a magic solution.  A
well designed security model that, in the event that the assumptions of
the model break down (say all the doors and ports suddenly, magically
spring open, because the attacker muttered the appropriate phrase into
the wire), still limits the damage that can be done, might be an
approach worth considering.

Again, I watch (in horror) as military organizations, with some really
nice rules and procedures behind them designed to contain and control
these bits, proceed to use systems that are known to be insecure by
design.  If your system can be keylogged, it should never ever be on a
network, anywhere.  Or change your system so that keylogging is
harder/impossible.  Security is a process.  Like never downloading
important information to a laptop only to let it be stolen / lost later
on.  The current fad is encrypting the disk, and this might prevent some
attacks, or slow the rate of information release.  Or not.

The point is that if the maximum damage an attacker can do is contained
or minimized, then you can gather valuable threat information from their
attack.  Part of the rationale for honeypots is putting systems without
anything important out there, in order to observe attacks, find
vulnerabilities, and learn how to defend against them.  This is done by
not co-locating the honeypot on a useful system.  By containing what is
in there, and what it has access to.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615