[Beowulf] High Performance for Large Database
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduTue Oct 26 14:21:57 PDT 2004
- Previous message: [Beowulf] High Performance for Large Database
- Next message: [Beowulf] High Performance for Large Database
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 26 Oct 2004, Joshua Marsh wrote: > Hi all, > > I'm currently working on a project that will require fast access to > data stored in a postgreSQL database server. I've been told that a > Beowulf cluster may help increase performance. Since I'm not very > familar with Beowulf clusters, I was hoping that you might have some > advice or information on whether a cluster would increase performance > for a PostgreSQL database. The major tables accessed are around > 150-200 million records. On a stand alone server, it can take several > minutes to perform a simple select query. > > It seems like once we start pricing for servers with 16+ processors > and 64+ GB of RAM, the prices sky rocket. If I can acheive high > performance with a cluster, using 15-20 dual processor machines, that > would be great. This sort of cluster isn't a "beowulf" cluster; rather it is a variant of a high availability cluster. It's Extreme Linux, just not beowulf. The beowulf design (and focus of this list) is "high performance computing" clusters, aka supercomputing clusters. With that said, there may be some resources out there that can help you, and listening in on this list and learning how HPC clusters work will certainly help you with other kinds, as the issues are in many cases similar. The first/best place to look is the September issue of Cluster World Magazine (www.clusterworld.com/issues.html). Its cover focus is on "Database Clusters". My copy is at Duke (and I'm at home:-) so although I'm pretty sure it covers mysql used in a cluster environment I cannot recall if it discusses alternatives such as oracle or postgres. Other CWM issues will also be pertinent, regardless. One major issue associated with any kind of file access is assembling a large, shared file store that avoids the file and communications bottlenecks that are as much an issue in HPC as they are in HA. A series of articles just begun by Jeff Layton deals with SAN's and massive scalable storage in general -- he's only done a couple of articles so far, so if there are still September/October issues around you'd be in great shape. CWM also abounds with ads for large and scalable and blindingly fast storage solutions. We just had an extensive discussion on this very list on storage (I kicked it off as we have a big proposal out that had a very large storage component and I needed to learn -- fast!). The recent list archives should show you the thread. Finally, there are some companies out there that make their bread and butter by assembling custom clusters to accomplish very specific tasks at a cost (as you note) far less than the cost of a big multiprocessor machine even though they make a healthy (and well earned) profit on the deal. Some of them have employees or owners on this list -- if any of them can help you I expect they'll talk to you offline. That's about all the help I personally can offer; I haven't built a large database cluster and only have listened halfheartedly when they were discussed on list in the past (although there have been previous discussions you can also google for in the list archives, I think). The problem is a fairly complex one -- not just various file latency and bandwidth issues (these are likely the "easy part") but the issue of sharing the underlying DB brings up locking. It is one thing to provide lots of nodes read-only access to a DB on a SAN engineered for fast, cached, read-only access; it is another to provide all the nodes with read AND write access, as writing requires a lock, and a lock effectively serializes access. This (and related problems) are serious issues with speeding up databases through parallelism. I vaguely recall that big companies like Oracle have dumped pretty serious money into this kind of thing looking for solutions that scale well. Maybe somebody else on list knows more than I do, though, and maybe they'll tell all of us! rgb > > Thanks for any help you may have! > > -Josh > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] High Performance for Large Database
- Next message: [Beowulf] High Performance for Large Database
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
