[Beowulf] recommendations for a good ethernet switch for connecting ~300 compute nodes
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joshua Baker-LePain jlb17 at duke.eduWed Sep 2 20:54:17 PDT 2009
- Previous message: [Beowulf] recommendations for a good ethernet switch for connecting ~300 compute nodes
- Next message: [Beowulf] recommendations for a good ethernet switch for connecting ~300 compute nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 2 Sep 2009 at 10:29pm, Rahul Nabar wrote > That brings me to another important question. Any hints on speccing > the head-node? Especially the kind of storage I put in on the head > node. I need around 1 Terabyte of storage. In the past I've uses > RAID5+SAS in the server. Mostly for running jobs that access their I/O > via files stored centrally. > > For muscle I was thinking of a Nehalem E5520 with 16 GB RAM. Should I > boost the RAM up? Or any other comments. It is tricky to spec the > central node. > > Or is it more advisable to go for storage-box external to the server > for NFS-stores and then figure out a fast way of connecting it to the > server. Fiber perhaps? Speccing storage for a 300 node cluster is a non-trivial task and is heavily dependent on your expected access patterns. Unless you anticipate vanishingly little concurrent access, you'll be very hard pressed to service a cluster that large with a basic Linux NFS server. About a year ago I had ~300 nodes pointed at a NetApp FAS3020 with 84 spindles of 10K RPM FC-AL disks. A single user could *easily* flatten the NetApp (read: 100% CPU and multi-second/minute latencies for everybody else) without even using the whole cluster. Whatever you end up with for storage, you'll need to be vigilant regarding user education. Jobs should store as much in-process data as they can on the nodes (assuming you're not running diskless nodes) and large jobs should stagger their access to the central storage as best they can. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF
- Previous message: [Beowulf] recommendations for a good ethernet switch for connecting ~300 compute nodes
- Next message: [Beowulf] recommendations for a good ethernet switch for connecting ~300 compute nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
