[Beowulf] High Performance for Large Database

Laurence Liew laurence at scalablesystems.com
Tue Nov 16 18:26:37 PST 2004


Hi.

When GFS was commercial and only available from Sistina - to put a GFS + 
SAN solution together for a customer for a Beowulf customer made the 
storage portion of the cluster around 40% of the overal costs of a 16 
node beowulf cluster.

That was when we investigated GFS GNBD - but the performance leaves much 
to be desired....

Anyway - today with Lustre and PVFS(2?) - I think they are much more 
suitable filesystems for HPC type workloads.

BTW: Over in Singapore and countries around here - a 64 nodes cluster is 
considered large... so a SAN solution *IS* a significant costs for most 
academic customers.

Cheers!
laurence

Craig Tierney wrote:
> On Tue, 2004-11-16 at 02:01, Laurence Liew wrote:
> 
>>Hi,
>>
>> From what I understand it has to do with "locking" on the SAN devices 
>>by the GFS drivers.
>>
>>Yes.. you are right.. most implementations will have separate IO and 
>>compute nodes... in fact that is the recommended way.... what I had 
>>meant in my earlier statement was that "I prefer data to be distributed 
>>amongst nodes - IO nodes - rather than have them centralised in a single 
>>SAN backend.
> 
> 
> Do you have an issue with a single storage unit, or actually using
> a SAN?  You could connect, dare I say "cluster", smaller FC based
> storage units together.  You will get much better price/performance
> that going with larger storage units.  This solution would work
> for shared filesystems like GFS, CXFS, or StorNext.  You could
> connect the same units directly to IO nodes for distributed filesystems
> like Lustre, PVFS1/2, or Ibrix.
> 
> Craig
> 
> 
> 
>>GFS + NFS is painful and slow as you have experienced it... hopefully 
>>RHEL V4 will bring about better performance and new features to address 
>>HPC by GFS (unlikely but just hoping).
>>
>>Laurence
>>
>>Craig Tierney wrote:
>>
>>>On Mon, 2004-11-15 at 06:26, Laurence Liew wrote:
>>>
>>>
>>>>Hi
>>>>
>>>>The current version of GFS have a 64 node limit.. something to do with 
>>>>maximum number of connections thru a SAN switch.
>>>
>>>
>>>I would suspect the problem is that GFS doesn't scale past
>>>64 nodes.  There is no inherent limitation in Linux on the
>>>size of a SAN (well, if there is, it is much larger than 64 nodes).
>>>Other shared filesystems, like StorNEXT and CXFS, are limited 
>>>to 128 nodes due to scalability reasons.
>>>
>>>
>>>
>>>>I believe the limit could be removed in RHEL v4.
>>>>
>>>>BTW, GFS was built for enterprise and not specifically for HPC... the 
>>>>use of SAN (all nodes need to be connected to a single SAN storage).. 
>>>>may be a bottleneck...
>>>>
>>>>I would still prefer the model of PVFS1/2 and Lustre where the data is 
>>>>distributed amongst the compute nodes
>>>
>>>
>>>You can do this, but does anyone do it?  I suspect that most
>>>implementations are setup where the servers are not on the compute
>>>nodes.  This provides for more consistent performance across
>>>the cluster.  Also, are you going to install redundant storage in
>>>all of your compute nodes so that you can build a FS across the
>>>compute nodes?  Unless the FS is for scratch only, I don't want
>>>to have to explain to the users why the system keeps losing their data. 
>>>Even if you use some raid1 or raid5 ATA controllers in a 
>>>few storage servers, you are going to be able to build a fast and
>>>fault-tolerant system that just using disk in the compute nodes.
>>>
>>>
>>>
>>>>I suspect GFS could prove useful however for enterprise clusters say 32 
>>>>- 128 nodes where the number of IO nodes (GFS nodes with exported NFS) 
>>>>can be small (less than 8 nodes)... it could work well
>>>
>>>
>>>I had some experience with a NFS exported GFS system about 12
>>>months ago and it wasn't very pleasant.  I could feel the latency
>>>in the meta-data operations when accessing the front ends of the
>>>cluster interactively.  It didn't surprise me because other experience
>>>I have had with shared file-systems have been similar.
>>>
>>>Craig
>>>
>>>
>>>
>>>>Cheers!
>>>>Laurence
>>>>
>>>>Chris Samuel wrote:
>>>>
>>>>
>>>>>On Wed, 10 Nov 2004 12:08 pm, Laurence Liew wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>You may wish to try GFS (open sourced by Red Hat after buying
>>>>>>Sistina)... it may give better performance.
>>>>>
>>>>>
>>>>>Anyone here using the GPL'd version of GFS on large clusters ?
>>>>>
>>>>>Be really interested to hear how folks find that..
>>>>>
>>>>>
>>>>>
>>>>>------------------------------------------------------------------------
>>>>>
>>>>>_______________________________________________
>>>>>Beowulf mailing list, Beowulf at beowulf.org
>>>>>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>>
>>>
>>______________________________________________________________________
>>_______________________________________________
>>Beowulf mailing list, Beowulf at beowulf.org
>>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 

-- 
Laurence Liew, CTO		Email: laurence at scalablesystems.com
Scalable Systems Pte Ltd	Web  : http://www.scalablesystems.com
(Reg. No: 200310328D)
7 Bedok South Road		Tel  : 65 6827 3953
Singapore 469272		Fax  : 65 6827 3922




More information about the Beowulf mailing list