[Beowulf] openMosix ending

Mon Jul 16 13:12:45 PDT 2007

On Mon, 16 Jul 2007, Geoff Galitz wrote:

> The lack of kernel supported checkpointing capabilities in the linux
> kernel is
> something that  has baffled me or a while.  I wonder if  it was ever
> submitted
> and then rejected?  It seems a natural fit for many organizations.  Are there
> hardware limitations in the x86 world?

Probably an efficiency issue.  Originally the linux kernel was lean and
mean and thereby fast on much slower hardware.  Over time, I think that
the speed of the hardware has increased according to Moore's Law at a
rate that far exceeded the complexity of the kernel, to where the
overhead of checkpointing is now down there in the noise.  But back in
the 2.0 or 2.2 kernels for sure (which lacked symmetric interrupt
processing, had trouble with spinlocks, struggled with various chipsets,
was missing or late with various key hardware drivers) it would have
been a major resource consumer.  128 MB was a LOT of memory on the
machines that ran 2.0; 1.x kernels ran on anything from 4 MB on up.  My
first stable linux desktop at home was "fat" at 8 MB because the kernel
was just getting to where it wouldn't quite install on a 4 MB system
happily, and I managed to bump it to 16, where it ran like a dream.

Now the issue is probably timing.  I suspect that we're about to undergo
a bit of a paradigm-shift revolution over the next three years, one that
has already begun.  Investing effort into certain areas of kernel
development until the shape of things becomes clear could be a waste of
time.

> Modern x86 virtualization is great for the most part, but moving around
> entire VM images rather than a group of threads seems a little... kludgy.
> Mind you, I'm a big proponent of virtualization due the positives it
> provides.

It is totally kludgy, actually, and it makes doing things like backing
up hot and running databases (even on VMs) a total pain even as it is a
LOVELY idea in a lot of ways.  So there is money in this -- DBs are
business, not HPC, and ultimately VMs need to be able to very quickly
snap running systems interactive with a DB lock-and-flush so that people
can e.g. backup running DBs hourly without interrupting services for
more than a second an hour or thereabouts.  Some of the virtual file
systems (e.g. LVM) provide a way of doing the snap quickly but one tends
to have to stop, lock, flush, snap, resume by hand in some synchronized
way, which doesn't work for all applications.

The below is highly speculative.  Be warned.  But I think that we will
soon see:

   a) Hardware support descending into the BIOS level.  Flash memory is
now down to $20/GB at full retail, probably half that wholesale or even
less to a chip/motherboard mfr.  In two more years, we can assume that
4-8 GB flash will be this cheap.  Installing hardware drivers at the OS
level SHOULD be totally obsolete, and I believe that shortly it will be
obsolete.  If HAL moves down into the bios so that all/most devices on a
computer become virtual devices in the sense that they present a uniform
interface to the KERNEL, kernels will start looking like they do now
inside vmware VMs.  That is, really boring.  They'll see virtual network
devices (all of which use the same driver), virtual memory allocations,
virtual disk devices.  The real drivers will install into the BIOS and
should stop being OS specific at all -- they'll operate on a BIOS-loaded
microkernel that basically does nothing but HAL and device drivers.

   b) Consequently, we should start seeing machines that are basically
running VMware or Xen in the microkernel, and all operating systems will
basically run virtual.  This may cost a small performance hit at first,
but it will make up for it ten times over in ease of use and reduced
TCO, and computers have cycles to spare.  Burning 1% of a CPU for the
convenience of having an absolutely uniform environment presented to the
kernel is totally worth it (just as vmware is worth it to get it in
software, at a slightly higher cost in real dollars and efficiency,
already).

   c) This will (I think) ENABLE real cross-system thread migration,
checkpointing, and much more.  That was the point of my original post.
Currently kernels tend to be massively complex, largely because of the
hardware.  The scheduler is relatively simple.  Managing tasks (pids and
associated structs) hasn't changed all that much for years.  Memory and
CPU utilization have been efficient and fairly trouble free since
somewhere in 2.4.  Hardware has ALWAYS been the bugaboo -- vendors just
love to write proprietary drivers that require proprietary insertions
into the kernel with all sorts of associated state information and other
crap just to get things to where one can "read" or "write" to the device
safely and reasonably efficiently.

It has gotten even worse recently with vendors EITHER selling devices
with little internal logic so that the entire "driver" resides on the
computer (it is really a rather complex program, not just a hardware
driver, e.g. lexmark cheapie printers) OR selling devices without any
firmware so that the first thing a driver has to do is unpack and
download the firmware (e.g.  numerous wireless network cards).

VMs eliminate almost all of this.  One can clone a VM, move the clone to
another system entirely, and as far as the VM is concerned the hardware
hasn't changed, the memory allocation hasn't changed, the CPU hasn't
changed (in any way except possibly irrelevant system clock).  So
finally it is WORTH it to write the code that will manage the virtual
file handle migration for specific virtual devices with known
properties.  Does it have to work with everybody's network device?  No,
only with the PCnet/PCI II 78C980A, because that's what it looks like on
every system no matter what the real hardware.  And so on.

So I'm optimistic that whether or not HAL moves south, so to speak, VM
explosion will STILL take the guesswork out of task migration systems
and enable some pretty marvelous new "stuff".  It's amusing to note (on
THIS list, at least:-) that this revolution seems set to take place
right at the time that Microsoft has shot itself in the foot -- with an
80 mm cannon.  Instead of entering this new world on the back of XP
(where even I, Microsoft-basher extraordinaire, must admit that XP is by
far the least broken of any of the many Windows ever), which had the
CAPABILITY of being cleaned up and made efficient in a VM universe,
they've introduced Vista.  Which is nightmarish by any standards.  On my
nice new dual core 2 GB laptop, Vista Home runs like -- what?  What is a
suitable metaphor for a system that can't even keep up with a moving
mouse?  Even after turning off "transparency" and other non-productive
fluff.

That takes talent.  I wouldn't have BELIEVED that somebody could slow
down a simple user interface on 3+ GHz of aggregate processing power
with 2 GB of memory to buffer pretty much everything.  Even my
microsoft-loving kids are trying to figure out how to install XP back
onto the dual core Athlon XP64 because it cannot actually PLAY something
like World of Warcraft -- it has a hard time with five or six year old
Diablo II!  The utterly obsolete single core 1.1 GHz Intel Celery it
replaced (on the other hand) could play games just fine under linux and
windows emulation, let alone under windows native.

Rumor has it that MS is running scared of virtualization -- they want to
own it or kill it.  Therefore they've altered licenses (as I understand
it) so it is a license violation to run Vista home as a VM.  You have to
pony up at least for the Pro version.  Naturally, they want you to use
THEIR virtual machine manager.

Unfortunately, what does it have left to run a VM with?  If Vista is too
slow to use single user in the toplevel kernel, where will it get cycles
to run a VM containing anything at all?  They've gone ANTI-lean just
when lean would be a real advantage on server and desktop.  They are
counting on uber-core hardware getting them back to "acceptable" levels
of performance before the jaded and Microsoft tolerant mass market
revolts, but I just don't see it.

Ultimately, one is trapped by one's program designs.  Ten or twelve
years ago, Sun discovered this when they released Solaris, rapidly
code-named Slow-aris by the Sun-OS-loving workstation enthusiasts of the
day.  It was all pretty and everything and slow as molasses with a
broken scheduler and worse.  They pulled this one RIGHT when linux was
in the ascendant with P6 able to give Sun's 4x more expensive
ultrasparcs a run for their money even WITHOUT the handicap of
Slow-aris.  Ten years later, Sun has completely lost the mass market,
the workstation market, and they are heavily squeezed even in the server
room.

Microsoft elected years ago to go with a heavily integrated OS and GUI,
emulating the design of the Mac.  The Mac design proved utterly
unscalable as processors got every faster, multitasking ever more
important, stability and ease of management ever more critical in home
and office and server room alike.  Macs now run Unix and are
surprisingly lean and functional in spite of maintaining a nice UI.

Microsoft missed that one.  They are still running ever more arcane
extensions of their heavily integrated OS/UI, with their Registry of
Evil, their built-in-nobody-knows-why crap, their lack of a sane non-GUI
interface.  They are drowning in their own design flaws.  XP was a step
ahead -- at least it works (or worked, as they clearly consider it
"history").  Vista has passed over a nonlinear performance hump into
catastrophe.  I suspect that for it to emerge, they'll have to start
thinking about rewriting Windows from scratch, altering the fundamental
design.  If they've really reached a scaling limit of the current
design, it will be a corporate disaster unparalleled in modern times.
Falling back to XP (however embarrassing, or more likely relabelling XP
"Vista II") would at most delay the inevitable.

We'll see...

     rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu