[Beowulf] Fabric design consideration

Smith, Brian brs at admin.usf.edu
Thu Jul 30 08:18:50 PDT 2009


Hi, All,

I've been re-evaluating our existing InfiniBand fabric design for our HPC systems since I've been tasked with determining how we will add more systems in the future as more and more researchers opt to add capacity to our central system.  We've already gotten to the point where we've used up all available ports on the 144 port SilverStorm 9120 chassis that we have and we need to expand capacity.  One option that we've been floating around -- that I'm not particularly fond of, btw -- is to purchase a second chassis and link them together over 24 ports, two per spline.  While a good deal of our workload would be ok with 5:1 blocking and 6 hops (3 across each chassis), I've determined that, for the money, we're definitely not getting the best solution.

The plan that I've put together involves using the SilverStorm as the core in a spine-leaf design.  We'll go ahead and purchase a batch of 24 port QDR switches, two for each rack, to connect our 156 existing nodes (with up to 50 additional on the way).  Each leaf will have 6 links back to the spine for 3:1 blocking and 5 hops (2 for the leafs, 3 for the spine).  This will allow us to scale the fabric out to 432 total nodes before having to purchase another spine switch.  At that point, half of the six uplinks will go to the first spine, half to the second.  In theory, it looks like we can scale this design -- with future plans to migrate to a 288 port chassis -- to quite a large number of nodes.  Also, just to address this up front, we have a very generic workload, with a mix of md, abinitio, cfd, fem, blast, rf, etc.

If the good folks on this list would be kind enough to give me your input regarding these options or possibly propose a third (or forth) option, I'd very much appreciate it.  

Thanks in advance,

Brian Smith
Sr. HPC Systems Administrator
IT Research Computing, University of South Florida
4202 E. Fowler Ave. ENB308 
Office Phone: +1 813 974-1467
Organization URL: http://rc.usf.edu





More information about the Beowulf mailing list