[Beowulf] High Performance for Large Database
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Kumaran Rajaram kums at mpi.mpi-softtech.comMon Nov 15 08:53:30 PST 2004
- Previous message: [Beowulf] High Performance for Large Database
- Next message: [Beowulf] High Performance for Large Database
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Imho, the I/O workload in databases are dominated by random, small block-sized requests. In order to cater such I/O pattern, nodes hosting databases tend to have large caches (RAM) and are SMP-based. The database software implement proprietary storage/access policies for high performance. In this sense, databases mostly require block-device interface from the storage system than file-system interface. File-system interface should also work although performance-wise, you tend to add additional layer and are restricted by file system storage policies. The pros is that file-system aggregates the storage and provides a single namespace, making it easier to manage + backup the data. In terms of block-devices, SAN provides low latency, high bandwidth, and high availability ideal for database environment. For moderates prices, iSCSI SAN may be used instead of FC SAN. SAN also makes management of block-devices easier. The only caveat is that the maximum size of the block-device is 2TB in 2.4 kernel. 2.6 kernel extends this to 16TB. PVFS/Lustre are currently tuned for HPC style applications which are dominated by large, contiguous I/O requests and the file-system striping policies helps to provide higher bandwidth. However, for small-sized requests, striping may not prove beneficial. Also, most file-systems use TCP/IP, hence the network layer latency can affect database performance. MPI-IO interface may be used to optimize non-contiguous, smaller requests through its datatype and file-view features. Newer PVFS/Lustre versions offer native implementation for low-latency interconnects like Myrinet, IB, or Quadrics, however, the stability of the file-system needs to be studied. Consistency, intergrity, and availability of data cannot be compromised in databases. Current PVFS/Lustre versions stripes files across their I/O nodes in RAID-0 pattern. Going down another level, hardware or software RAID 1/5 can be performed at the disk level, resulting in file system providing RAID 10/50. However, the failure of a single I/O nodes might lead to temporary loss (data in cache)/unavailability of file-data until the node is revived. RAID1/5 across I/O nodes is planned in future versions. Price, Performance, Availability, Manageability, and Consistency of file-data need to weighed when architecting the Database solution. Regards, -Kums On Mon, 15 Nov 2004, Laurence Liew wrote: > Hi > > The current version of GFS have a 64 node limit.. something to do with > maximum number of connections thru a SAN switch. > > I believe the limit could be removed in RHEL v4. > > BTW, GFS was built for enterprise and not specifically for HPC... the > use of SAN (all nodes need to be connected to a single SAN storage).. > may be a bottleneck... > > I would still prefer the model of PVFS1/2 and Lustre where the data is > distributed amongst the compute nodes > > I suspect GFS could prove useful however for enterprise clusters say 32 > - 128 nodes where the number of IO nodes (GFS nodes with exported NFS) > can be small (less than 8 nodes)... it could work well > > Cheers! > Laurence > > Chris Samuel wrote: > > On Wed, 10 Nov 2004 12:08 pm, Laurence Liew wrote: > > > > > >>You may wish to try GFS (open sourced by Red Hat after buying > >>Sistina)... it may give better performance. > > > > > > Anyone here using the GPL'd version of GFS on large clusters ? > > > > Be really interested to hear how folks find that.. > > > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > -- > > Laurence Liew, CTO Email: laurence at scalablesystems.com > Scalable Systems Pte Ltd Web : http://www.scalablesystems.com > (Reg. No: 200310328D) > 7 Bedok South Road Tel : 65 6827 3953 > Singapore 469272 Fax : 65 6827 3922 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >
- Previous message: [Beowulf] High Performance for Large Database
- Next message: [Beowulf] High Performance for Large Database
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
