[Beowulf] Why I want a microsoft cluster...

Wed Nov 23 14:40:02 PST 2005

At 01:30 PM 11/23/2005, Joe Landman wrote:

>Jim Lux wrote:
>>So here we go with some devil's advocacy...
>
>:)
>
>>  From the user viewpoint, in a largish shop, but with a single user in mind
>>The scenario is that I want to run some sort of analysis tool that is 
>>computationally intensive enough to require more crunch than I can get 
>>with a single desktop.  Applications that spring to mind are various
>
>[...]
>
>fine so far
>
>>So, whatever I do, my output is eventually going to wind up pasted or 
>>copied into some MS product, AND, it has to be "clean" enough that when 
>>the admin for the manager 3 levels above me tries to resize the images 
>>that I've cut and pasted, it doesn't choke (that means using WMF or EMF 
>>for graphics, for instance).
>
>Hmmm... I typically try to use a more standard image format (png, jpg, 
>tiff, ...).  I have found these work better than others in my apps. Most 
>everything in the windows world happily deals with those.  I have had lots 
>of problem with WMF import.  Don't know why.

The formats you gave are bitmap images.. The problem is with rescaling.. 
Say you have a nifty graph generated from something like Matlab.  If you 
rescale the bitmap image, then sometimes the  text becomes unreadable, 
especially if there's more than one rescale involved and the rescaling 
actually resampled the image (say, to save file space...)  If, on the 
otherhand, you have the image as a vector format, when you rescale, the 
rendering application (ppt, acrobat, word) can render the text as text at 
whatever scale it needs to.

This is particularly pernicious for documents that get viewed with a 
projector, and then get zoomed to look at the details.

>>What does this sort of environment mean?  It means that a strategy where 
>>I run my analysis tool on a Linux box and then try to export the data 
>>back to my Windows box for doing the reports is a royal pain.  It's
>
>Hmmm.... If this is a royal pain then (and please don't take this wrong) 
>your linux/windows system is set up wrong.  To "export"/"import" I open a 
>file on our SAMBA share, that happens to house the home directories from 
>the cluster.   If you have to do anything else then something somewhere is 
>broken, and yes, it is a pain.

Yes, assuming your netops folks allow you to have a "public" sharepoint 
that is visible from your desktop machine, something that requires a fair 
amount of institutional negotiation in a MS centric shop.  Many aspects of 
FUD will be raised about whether that's secure, etc.

>We set up the transparent access for our customers, and I had assumed that 
>all cluster vendors did.  Maybe I am wrong.

Many "locked down" shops want a fair amount of control over the software 
that's hanging on their internal network, and if your particular cluster 
distro isn't one they "trust", then it's a problem.

Clearly, if a vendor is providing a turnkey systems to someone, then 
they've taken on the responsibility (and expense) of complying with all the 
rules for the customer.  That can be a major project in itself (it can also 
be easy.. depends a lot on the customer...)

>>worse than sneakernet.  Sure, I can SSH into the Linux box from my 
>>windows box, and even fire up a Xserver on the Windows box, but things 
>>like cutting and pasting just don't work very seamlessly, and it seems 
>>that Linux application creators consider generating Windows compatible 
>>file formats anathema (leaving aside the file format aspects..) because 
>>they might be considered "pandering to the dark side".  Folks.. 
>>uncompressed TIFF images don't hack it as an interchange medium.
>
>What?  16MB TIFF for a 50k JPEG?  I usually use JPEG or png.  If I need to 
>use a windows format I do.  Linux can read/write all of them (and 
>interconvert) using the right tools, so this is rarely an issue.

But not vector formats as easily.. And, yes, you can interconvert on the 
Linux side to get something useful, but you've also lost a lot of the 
"seamless user interaction" and starting to get more like a batch oriented 
sequence of operations: I set up my cluster job, run it, get the output 
file, then run this script to convert it to a format that's Windows 
pastable, then copy the file to my windows box, then do "insert file...." 
in ppt...  A lot more steps than "Click on graph, Ctrl-C (or Ctl-Ins), 
Click on ppt slide, Shift-Ins"..

In the Linux cluster but Windows desktop scenario, you're running on two 
different machines, with two different user interfaces, and you have a 
cognitive "context switch" every time you go back and forth.  (Just the 
"single click" in kde vs "double click" in Windows distinction is bad 
enough... how many times have I opened multiple copies of the editor on the 
headnode?)

>As for "pandering", well, that may be an option for some, but my customers 
>and users need to get work done.  The issue is how do you get it done in 
>the least amount of time and effort, and the greatest impact.

And my contention is that in some circumstances, that might be with a 
cluster using Windows rather than Linux.

>>And no, Open Office is not fully interoperable with MS Office.  There's 
>>always little hiccups with things that you really, really need.
>
>charts (cough cough)
>
>
>>(hmm.. equation editor?  footnotes?  change tracking? Outline mode?)  The
>
>Actually all of these work really well in 2.0.  Its the charts that drive 
>me batty.
>
>>typical scenario is that you're one of half a dozen folks working on a 
>>document, and you all pass it back and forth and make changes, and for 
>>all practical purposes, we ALL have to be using the same tools (even 
>>going back and forth between Mac and PC is problematic.. Those "big red 
>>X" things that appear in your ppt slides).
>
>Hmmm.  Been doing this without problem for a while for ppts going to 
>certain groups that like good ppts.  For the most part we haven't run into 
>problems apart from font mapping, and I am admittedly not interested in 
>fixing it at this moment.

>>So, whatever applications I'm using on my cluster have to seamlessly 
>>integrate with the tools the "rest of the business world" are using, 
>>whether I like or not.
>
>Also true, quite true of my customers.  They are looking for minimum time 
>to insight (hey look, its a marketing phrase).  Fussing around with 
>transport and conversion of data is not an option.  So they don't have to.

presumably, though, your customers are Linux centric?  That is, they are 
doing their analysis and reporting in the Linux environment, not the 
Windows environment?

>They can mount their home directories on the cluster as yet another drive 
>letter.  They can and do use it as yet another resource.
>
>>Now, let's consider another practical detail..  I've got my cluster 
>>running, and I'm cracking through my work.  Something breaks (maybe a PC 
>>rolls over and dies).  I call the help desk.  The vast majority of 
>>problems are something simple (whether the cluster is Linux or Windows).

>>Thats why there are support experts out there (ahem) selling their services.

Indeed.. and an essential resource they are.  However, if you're a naive 
buyer of cluster computing, you might think that it's easier to just buy 
that MS cluster and have it supported by the inhouse MS support 
folks.  Fear/Uncertainty/Doubt are powerful, powerful forces when it comes 
to support.  There's a big psychological difference between having your 
boss call the boss of the support division (keeping it within the company) 
and calling your outside support vendor.

Practically speaking, the actual support might be identical, but it's the 
thought that "I can get one of those PC support guys over here in an hour 
to fix my dead node" without spending a fortune (because you've already 
spent the money for them to be always there for all those thousands of 
desktops).

>>The odds of getting someone to fix my broken cluster, today or tomorrow, 
>>are much higher if it's Windows based, just because there's more folks 
>>around who are capable of doing it. If that 1 Linux cluster weenie 
>>happens to be on vacation, I'm dead... the odds of all 10 Windows cluster 
>>weenies being on vacation simultaneously is much lower.
>
>Hmmmm.  Again, thats why there are external experts who do this stuff. As 
>for the 50-100, I think the number is closer to 20-50 desktops per 
>admin.  I have seen 4000 node clusters supported by 2 people full 
>time.    I am not going to comment on the other aspects of this.

Typical desktop support costs are around $150/month for a large shop 
(paying for the help desk and the roving techs)... That's roughly 
$2K/yr.  Figuring a support person costs $200K, that's 100 systems per 
person.  But, as you say, it could easily be a factor of 5 either way, and 
I think a very good case can be made that the boxes/body ratio for a 
cluster (which, by definition is all identical boxes configured the same) 
might be higher.

OTOH, a lot of big corporate desktop management systems do the "You can 
have system configuration A, B, or C, and no other" thing, so they are 
pretty homogeneous too.

>>Now let's talk security.  My speculative IT organization supports 10,000 
>>windows desktops, and has fairly systematic and rigorous ways to deal 
>>with the patches that come out once a month, as well as hotfixes for 
>>vulnerabilities that get discovered.  My Windows based cluster isn't 
>>going to seem scary to the IT security folks.. it's just another 100 
>>computers and represents an infinitesimal increase in the overall 
>>workload and a small increase in the complexity of their workload. The 
>>incremental cost to bring my cluster into the corporate fold, from a 
>>security standpoint, is small.
>
>Uh.... I am going to disagree with you.  If you saw the firedrills these 
>folks go through when the patch bolus hits... it breaks standard desktops 
>and servers, and they need to watch everything very carefully.  I know of 
>at least one fortune 500 IT organization that lives in fear and 
>trepidation of the next patch bolus.   They are serviced by some other 
>fortune 500 organization for most of their IT stuff.  I will not comment 
>on the quality of that support.

yes, huge firedrills on Patch Tuesday.. BUT... the MS cluster won't incur 
*additional* hassles, because it's just the same as everything else. (Well, 
it might.. the applications are different)

>Look carefully at what the CTC has to go through with their systems.  If 
>you are running 1000 copies of Norton on your disks, with each one loaded 
>up with a personal fire wall, anti spyware and virus ... do you really 
>have a cluster?  I don't think so.

Sure.. why wouldn't it be a cluster?  Sure, you've encountered some 
inefficiencies in having all those antivirus programs, etc. running where 
they're not really needed, but still, you're doing cluster work.  Mostly, 
all that dreck fills up disk space and doesn't hugely affect computational 
performance most of the time, except once a night when SAV phones home for 
the latest virus pattern files, etc.

>>Say I wanted to install a Linux cluster.  Ooops.. they're not quite as 
>>familiar with that.
>
>And this is why there is a market for this expertise.

But then, you have two sources of expense:  The outside expertise PLUS the 
inside people who have to deal with them.  Yes, the inside expense is less 
than it would have been before, but it might well be that the total cost is 
higher.

>>They don't have all the patch rollout stuff, they don't have a patch 
>>validation methodology, etc.  Sure, there's all kinds of patch management 
>>stuff for Linux, in a bewildering variety of options, but now we've got 
>>to have a Linux security expert, in addition
>
>... all of this outsourceable for a tiny fraction of what they pay for 
>their required in house windows staff ...

Is it really that much cheaper?  I suspect not.  Large windows shops aren't 
all that inefficient.. they can't be.  It's not going to be, say, 10 times 
more expensive to manage windows boxes than Linux boxes in an apples/apples 
comparison.  And, there's a huge commercial pressure to come up with ways 
to reduce the management costs of Windows boxes (so you have products like 
Patchlink).

And once you add in the overhead to make the inhouse compliance folks feel 
warm and fuzzy, outsourcing might not be as cost effective.

>>to the cadre of MS security folks we already have. You mean your cluster 
>>uses a different distro than the other iconoclastic Linux desktop users 
>>have? You recompiled the kernel to get the latest whizbang high 
>>performance network support?
>
>Hmmmm.....
>
>>With MS, the choice is easy.. use what you're already using for the rest 
>>of the company (SMS probably).  Kernel or distro compatibility isn't an 
>>issue.. you use what you're given and suck up the inefficiencies and live 
>>with it.  If it's a performance dog, you go make the pitch to buy more nodes.
>
>Actually it is a huge issue.  Some codes are just not supported under some 
>patch levels (SP2) because it breaks it.  So you have a choice of running 
>your mission critical code or running at the accepted support level.

But this is essentially the same choice.. either your software works on 
your configuration (be it Linux distro or Windows build) or it doesn't. In 
the Linux case, you've got the potential option of spending serious time 
convincing the powers that be that you can make it work and still be in 
conformance with the instutitional computing rules.  In the Windows case, 
you're just plain out of luck.

Of course, in the Windows world, most shops have dealt with the "how to 
support multiple SP levels of Windows" problem.. and, of late, this problem 
is much, much reduced from the days of Win95,Win98,WinME,NT4.0. I haven't 
had anything break with Win2K or WinXP patches in a long time, with the 
exception of a weird interaction between the OpenGL drivers in Matlab R13 
and Win2K.

As far as the "MS only supports 2 versions back".. that comes with the 
territory in Windows applications. If you're selling applications for 
Windows based clusters, you'd have to factor that into your support 
strategy, same as if you're selling applications for any other flavor of 
Windows.

>>So, all in all, there's a real case to be made for a Windows based 
>>cluster, even if the raw performance takes a big hit.  In terms of 
>>"getting the work done" for a fixed dollar allocation, you might be 
>>better off buying more nodes to make up for the performance than paying 
>>for all the  extra stuff that corporate IT is going to require.
>
>There is a case, but I think it is different than you argued.  I have been 
>thinking this through for a while.  A large cluster is an appliance, as is 
>a small cluster, as is a router.  It needs to be managed as one, and 
>appear to drop onto the net as an appliance.  All interaction from the 
>windows centric folk need to be as an appliance.  This is the windows model.

I think so... and you've got to "look" to the Windows centric world 
appropriately "trustworthy", just like that network connected HP printer 
down the hall.  What you can't do is "look like a computer", because that 
will scare them.

>It is very hard to get a windows person to change to linux, unless they 
>need to.  So you lower the barriers by making this thing appear as nice 
>and friendly as possible.  Wrap it with some nice web tools  for job 
>submission (we have some being deployed at a number of customer 
>sites).  Make data transfer drag and drop via explorer (which we 
>do).  Make all aspects of this be as easy as possible.
>
>It can be done, just requires effort and focus upon integration.
>
>There are lots of cluster hardware vendors out there who will happily sell 
>linux hardware (or windows for a premium).  There are few who actually 
>know how to make this all work seamlessly.   If you call it a cluster 
>appliance and build it as such, and make it appear to be just a little web 
>appliance with some disk which happens to calculate for people, well, the 
>resistance level is very low.

Indeed... I agree... the problem is that someone has to pay for building 
that "web appliance" and I'm not entirely sure that the market for clusters 
is big enough to allow that (substantial) expense to be spread thin enough.

My Linksys WRT54G has Linux inside, as does my Moxi digital cable box, but 
the vast majority of users of either are not aware of it.  They're also 
being produced in million scale quantities, so the considerable work 
required to "hide" Linux (or equally valid for some other devices, WinCE) 
and make it truly "appliance like" is small in a per unit sense.

I think there IS a market for an appliance with some applications tailored 
to use it.  Say you have someone doing a lot of work with NASTRAN or HFSS 
(both are computationally intensive FEM programs).  You could give them 
their familiar windows executable that just happens to feed work off to a 
networked attached computational engine.  I've contemplated doing something 
like this with a program called 4NEC2, which wraps a nice Windows front end 
around a compiled FORTRAN backend program (NEC -Numerical Electromagnetics 
Code).  NEC runs just fine on Linux (heck, there's even cluster versions of 
it) and the "interface" is just text files in 80 column card images and 132 
character printer output.

Jim