[Beowulf] SGI to offer Windows on clusters

Fri Apr 13 05:19:34 PDT 2007

rgb,

I was thinking more along the lines of a cluster - not a grid. This makes the
problem a bit easier. but as always the devil is in the details.

My thought was to have a some sort of file system in the cluster (pick your
poison) with the ability to mount it on the user's desktop (NFS, CIFS, or
whatever). Then the user would create their input files and put them on the
cluster storage (for the Windows users they just drag the directory to a new
drive). Then they enter a URL in the browser, log-in, tell the browser which
application they are using, where the files are located, how many processors
they want, and off the job goes. They could also use the same webpage to
track the jobs - how long has it been running, etc.? You could even put
diagnostics in there - if the nodes swap then tell the user to try more processors
next time to speed up the job. You can also report back statistics if you want
to - how long the job took, how much IO was used, how much disk space was
used, etc. This might not tell the novice user too much, but it might tell the
admin something if there is a problem or it might tell the user or admin that in the
future they might need a system with more IO, or a higher performing
interconnect.

The reason that I like the browser idea is that virtually every potential cluster
user has used a browser before. In most cases they have even used webmail
or some sort of web site that requires logging in and entering data into a field.
It's also very portable - it's web based.

Maybe, it's just me, but I find this idea much better than just hitting a button
that says "Run". This at least forces the user to think a bit about what they are
doing instead of playing "Monkey hit the Button." There are examples of times
when people should have taken a few moments to think about what they were
doing instead of blindly just doing it. Plus, now it's not dependent on the ISV's
putting a "Run" button in their GUI's (I'm amazed at the reluctance of ISV's to
change virtually anything in their code base - but that's another rant :)  ).

Perhaps I'm not thinking as broadly as Rich. But I see a web-base solution as a
better idea than asking (or forcing) ISV's to put some new code in their applications
to run on a cluster (BTW - in my experience some customers use the GUIs that
come with ISV codes and some don't.).

Jeff

> On Thu, 12 Apr 2007, laytonjb at charter.net wrote:
> 
> > I really think the web interface is the way to go. This way you can submit jobs from
> > any machine that has a browser (Linux, Windows, Mac, etc.).
> 
> Isn't that what gridware basically does already?  Doesn't SGE provide a
> web interface to resources (a cluster) it controls?  Isn't that a
> significant part of the point of the Globus project?  The ATLAS grid
> (IIRC) uses a grid interface, more or less, to provide just this sort of
> layer isolation between cluster/grid resource and the user.
> 
> There are problems with this, of course.  It is wonderful if the
> grid/cluster already has a canned package installed, so that what the
> user "submits" is a parametric dataset that tells the cluster how and
> what to run with that package.  BLAST etc work fine this way and there
> exist cluster/grids architected just this way for this purpose, from
> what I can tell via google.  If you want to run YOUR source code on the
> cluster from a different OS, however, well, that's a problem isn't it?
> 
> I think that there have been efforts to resolve even this -- ways of
> submitting your source (or a functional binary) with build (or run)
> instructions in a "package" -- I vaguely remember that ATLAS attempts to
> implement a little horror called "pacman" for this purpose.  I leave to
> your imaginations the awesome mess of dealing with library requirements,
> build incompatibilities, mistaken assumptions, and worse across
> architectures especially ones likely for people who write in MS C++ (or
> a C downshift thereof) and expect it the source to "just run" when
> recompiled on a linux box.
> 
> Practically speaking, for source code based applications if the user has
> a linux box (or even a canned vmware linux development environment they
> can run as a windows appliance -- and there are many of them prebuilt
> and available for free so this is no longer that crazy a solution on a
> moderately powerful windows workstation -- and sufficient linux
> expertise to work through builds thereupon, they can develop binaries or
> build packages that they can submit to a cluster via a web interface
> that hides all cluster detail.  If not, then not.
> 
> Joe of course is building specific purpose clusters for many of his
> clients and hence can successfully implement either canned software
> solutions OR can manage the porting, building, preinstallation of the
> client's software so that they can use it via a web-appliance interface.
> Basically they purchase his expertise to do the code migration -- which
> is again fine if the source is mature and unlikely to need a lot of
> real-time tweaking and if they mostly want an appliance with which to
> process a very large data space or parametric space a little at a time
> (so "jobs" are parametric descriptions used to start up a task).
> 
> There are various other details associated with gridware and cluster
> usage of this sort that make the idea "good" or "bad" per application.
> If the application is bottlenecked by data access -- it processes huge
> files, basically -- one can spend a lot of time loading data onto the
> cluster via e.g. the web interface compared to a little time running the
> application on the data, something that can perhaps be done more
> smoothly and faster with a native shared disk implementation instead of
> double hits on native disk on both ends plus a (probably slow) network
> transfer.  Accessing other resources -- GUI access to the program being
> run, for example -- similarly depends strongly on having the right hooks
> on both ends.
> 
>     rgb
> 
> >
> > Jeff
> >
> >> Here is a proactive suggestion for keeping open source
> >> ahead of Microsoft CCS:
> >> 1. I think CCS will appeal to small shops with no prior cluster
> >>     and no admin capability beyond a part time windows person.
> >> 2. such customers are the volume seats for a range of desktop
> >>     CAD/CAE tools.
> >> 3. Such ISVs will see potential of license growth, and will
> >>     likely choose to tie-in to the Microsoft message of ease-of-use.
> >>     A big feature here, in my view, is the one-button-job-launch.
> >>
> >> This means, for Linux to have a position as the backend
> >> compute cluster, we must have this one button job launch
> >> capability.  A Windows library must be available to
> >> the ISV, to provide a job submission API  to the batch
> >> scheduler.  With such a feature, the ISVs can be
> >> persued to incoporate.
> >>
> >> Ideally the job submission API is a kind of standard, so
> >> the ISV does not see duplicate work versus the batch scheduler
> >> used.
> >>
> >> So,
> >> a) we need a job submission API, and
> >> b) we need the Windows library added to Linux batch schedulers.
> >>     (I'm not saying the scheduler runs on Windows, we just need
> >>     the submission/retrieve portion).
> >>
> >> Does such exist already?
> >>
> >> Thanks, Rich
> >> Rich Altmaier, SGI
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org
> >> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
> 
> -- 
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> 
>