[Beowulf] how large can we go with 1GB Ethernet? / Re: how large of an installation have people used NFS, with?

psc pscadmin at avalon.umaryland.edu
Thu Sep 10 05:28:55 PDT 2009


Thank you all for the answers.  Would you guys please share with me some
good brands of those
200+  1GB Ethernet switches? I think I'll leave our current clusters
alone , but the new cluster I
will design for about 500 to 1000 nodes --- I don't think that we will
go much above since for big jobs
our scientists using outside resources. We do all our calculations and
analysis on the nodes and only the final produce
we sent to the frontend , also we don't run jobs across the nodes , so I
don't need to get too much creative with the network
beside being sure that I can expand the cluster without having the
switches as a limitation (our current situation)

thank you again!


Henning Fehrmann wrote:
> Hi
>
> On Wed, Sep 09, 2009 at 03:23:30PM -0400, psc wrote:
>   
>> I wonder what would be the sensible biggest cluster possible based on
>> 1GB Ethernet network .
>>     
>
> Hmmm, may I cheat and use a 10Gb core switch?
>
> If you setup a cluster with few thousand nodes you have to ask yourself
> whether this network should be non-blocking or not.
>
> For a non blocking network you need the right core-switch technology.
> Unfortunately, there are not many vendors out there which provide
> non-blocking Ethernet based core switches but I am aware of at least
> two. One provides or will provide 144 10Gb Ethernet ports. Another one
> sells switches with more than 1000 1 GB ports.
> You could buy edge-switches with 4 10Gb uplinks and 48 1GB ports. If
> you just use 40 of them you end up with a 1440 non-blocking 1Gb ports.
>
> It might be also possible to cross connect two of these core-switches
> with the help of some smaller switches so that one ends up with 288
> 10Gb ports and, in principle, one might connect 2880 nodes in a 
> non-blocking way, but we did not have the possibility to test it
> successfully yet. One of problems is that the internal hash table can
> not store that many mac addresses. Anyway, one probably needs to change
> the mac addresses of the nodes to avoid an overflow of the hash tables.
> An overflow might cause arp storms.
>
> Once this works one runs into some smaller problems. One of them is the arp
> cache of the nodes. It should be adjusted to hold as many mac addresses
> as you have nodes in the cluster.
>
>
>   
>> And especially how would you connect those 1GB
>> switches together -- now we have (on one of our four clusters) Two 48
>> ports gigabit switches connected together with 6 patch cables and I just
>> ran out of ports for expansion and wonder where to go from here as we
>> already have four clusters and it would be great to stop adding cluster
>> and start expending them beyond number of outlets on the switch/s ....
>> NFS and 1GB Ethernet works great for us and we want to stick with it ,
>> but we would love to find a way how to overcome the current "switch
>> limitation".   
>>     
>
> With NFS you can nicely test the setup. Use one NFS server and let all
> nodes write different files into it and look what happens.
>
> Cheers,
> Henning
>   




More information about the Beowulf mailing list