[Beowulf] High Performance for Large Database

Robert G. Brown rgb at phy.duke.edu
Wed Oct 27 08:08:04 PDT 2004


On Wed, 27 Oct 2004 hanzl at noel.feld.cvut.cz wrote:

(a bunch of stuff, leading to...)

> The only difference I can see is the application (which is not a CFD or
> galactic evolution or similar). From the point of wiew of
> interconnects, OS types, parallel libraries used, RAM, processors,
> cluster management etc. I see no reason why databases and beowulf
> could not overlap.

And I would agree, and even pointed out that there WERE areas of logical
overlap.  The problems being solved and bottlenecks involved are in many
cases the same.  However, by convention "beowulf" clusters per se and
MOST of the energy of this list is devoted to HPC -- numerical
computations and applications.  It is undeniable that numerical
applications exist that interact with data stores of many different
forms, including I'm sure databases.  Databases are also used to manage
clusters.  Grids, in particular, tend to integrate tightly with
databases to be able to manage distributed storage resources that aren't
necessarily viewable or accessible as "mounts" of an ordinary
filesystem.  When one has thousands of (potential) users and millions of
inodes spread out across hundreds of disks with a complicated set of
relationships regarding access to the HPC-generated data, a DB is needed
just to permit search and retrieval of your OWN results, let alone
somebody else's.

Nevertheless, databases per se are not numerical HPC, and a cluster
built to do SQL transactions on a collective shared database is not
properly called a "beowulf" cluster or even a more general HPC cluster
or grid.  That is one reason that they ARE rarely discussed on this
list.  In fact, most of the discussion that has occurred and that is in
the archives concerns why database server clusters aren't really HPC or
beowulf clusters, not how one might build a cluster-based database.  The
latter is more the purview of:

  http://www.linux-ha.org/

and its associated list; at least they address the data reliability,
failover, data store design, logical volume management aspects of shared
DB access.

Even this list doesn't actually address "cluster implementation of a
database server program" though, because that is actually a very narrow
topic. So narrow that it is arguably confined to particular database
servers, one at a time.

To put it another way, writing a SQL database server is a highly
nontrivial task, and good open source servers are rare.  Mysql is common
and open source (if semi-commercial) for example, but there exist
absolute rants on the web against mysql as being a high quality,
scalable DB for all of that.  (I'm not religiously involved in this
debate, BTW, so flames->/dev/null or find the referents with google and
flame them instead, I'm just pointing out that the debate exists;-).

Writing a PARALLEL SQL database server is even MORE nontrivial, and
while yes, some reasons for this are shared by the HPC community, the
bulk of them are related directly to locking and the file system and to
SQL itself.  Indeed, most are humble variants of the time-honored
problem of how to deal with race conditions and the like when I'm trying
to write to a particular record with a SQL statement on node A at the
same time you're trying to read from the record on node B, or worse yet,
when you're trying to write at the same time.  Most of the solutions to
this problem lead to godawful and rapid corruption of the record itself
if not the entire database.  Robust solutions tend to (necessarily)
serialize access, which defeats all trivial parallelizations.

NONtrivial parallelizations are things like distributing the execution
of actual SQL search statements across a cluster.  Whether there is any
point in this depends strongly on the design of the database store
itself; if it is a single point of presence to the entire cluster, there
is an obvious serial bottleneck there that AGAIN defeats most
straightforward parallelizations (depending a bit on just how long a
node has to crunch a block of the DB compared to the time required to
read it in from the server).  It also depends strongly on how the DB
itself is organized, as the very concept of "block of the DB" may be
meaningless.

In fact, to make a really efficient parallel DB program, I believe that
you have to integrate a design from the datastore on up to avoid
serializing bottlenecks.  The actual DB has to be stored in a way that
can be read in units that can be independently processed.  It has to be
organized in such a way that the hashing and
information-theoretic-efficient parsing of the blocked information can
proceed efficiently on the nodes (not easy when there is record linkage
in a relational DB -- maybe not POSSIBLE in general in a relational DB).
The distributed tasks have to be rewritten from scratch by Very Smart
Humans to use parallelizable algorithms (that integrate with the
underlying file store and with the underlying DB organization).  These
algorithms are likely to be so specialized as to be patentable (and I'll
bet that e.g. Oracle owns a slew of patents on this very thing).
Finally the specter of locking looms over everything, threatening all of
your work unless you can arrange for record modification not to
serialize everything.  For read only access, life is probably livable if
not good.  RW access to a large relational DB to be distributed
across N nodes -- just plain ouch...

So yes, it is fun to kick around on this list in the abstract BECAUSE
lots of these are also problems in parallel applications that work with
data (in a DB per se or not) but in direct reference to the question,
no, this list isn't going to provide direct guidance on how to
parallelize mysql or oracle or sybase or postgres or peoplesoft because
EACH of these has to engineer an efficient parallel solution all the way
from the file store to the user interface and API, at least if one wants
to get reliable/predictable and beneficial scalability.

There may, however, be people on the list that have messed with
parallelized versions of at least some of these DBs.  There has
certainly been list discussion on parallizing postgres before (e.g.

   http://beowulf.org/pipermail/beowulf/1998-October/002070.html

Which is alas no longer accessible in the archives at this address,
although google still turns it and a number of other hits up; perhaps it
is a part of what was lost when beowulf.org crashed a short while ago.
Unfortunately, I failed to capture the list archives in my last mirror
of this site.  And Doug Eadline probably can say a few words about
the parallelization of mysql (which has ALSO been discussed on the list
back in 1999 and is ALSO missing from the archives).

Both mysql and postgres appear to have a parallel implementation with at
least some scalability, see:

  http://www.illusionary.com/snort_db.html

Mysql's is an actual cluster implementation:

  http://www.mysql.com/products/cluster/

(note that bit about "Designed for 99.999% Availability" -- high
availability, not HPC).

A thread on mysql and postgres clustering on slashdot:

  http://developers.slashdot.org/comments.pl?sid=62549&cid=5843509

(a search is complicated by the fact that postgres refers to relational
database structures on disk as "clusters" and has actual commands to
create them etc.).

Postgres based clustering project (of sorts) lives here:

  http://gborg.postgresql.org/project/erserver/projdisplay.php

There is a sourceforge project trying to implement some sort of
lowest-common-denominator embarrassingly parallel cluster DB solution
that can be implemented "on top of" SQL DBs (as I make it out, read it
for yourself).

  http://ha-jdbc.sourceforge.net/

Really, google is your friend in this.  In a nutshell, it IS possible to
find support for cluster-type access to at least mysql and postgres in
the open source community, in at least two ways (native and as an add-on
layer in each case).  Add-on layer clustering provides a better than
nothing solution to the serial bottleneck problem, but it will not scale
well for all kinds of access and has the usual problems with the design
of the data store itself.  I can't comment on the native implementations
beyond observing that mysql looks like it is in production while
postgres looks like it is still very much under development, and that
both of them are "replication" models that likely won't scale well at
all for write access (they likely handle locking at the granularity of
the entire DB, although I >>do not know<< and don't plan to look it
up:-).

Hope this helps somebody.  If nothing else, it is likely worthwhile to
reinsert a discussion on this into the archives because of recent
developments and because previous discussions have gone away.

   rgb

> 
> Best Regards
> 
> Vaclav Hanzl
> 
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list