[Beowulf] HPC in the cloud question

Hutcheson, Mike Mike_Hutcheson at baylor.edu
Thu May 7 15:28:11 PDT 2015


Hi.  We are working on refreshing the centralized HPC cluster resources
that our university researchers use.  I have been asked by our
administration to look into HPC in the cloud offerings as a possibility to
purchasing or running a cluster on-site.

We currently run a 173-node, CentOS-based cluster with ~120TB (soon to
increase to 300+TB) in our datacenter.  It¹s a standard cluster
configuration:  IB network, distributed file system (BeeGFS.  I really
like it), Torque/Maui batch.  Our users run a varied workload, from
fine-grained, MPI-based parallel aps scaling to 100s of cores to
coarse-grained, high-throughput jobs (We¹re a CMS Tier-3 site) with high
I/O requirements.

Whatever we transition to, whether it be a new in-house cluster or
something ³out there², I want to minimize the amount of change or learning
curve our users would have to experience.  They should be able to focus on
their research and not have to spend a lot of their time learning a new
system or trying to spin one up each time they have a job to run.

If you have worked with HPC in the cloud, either as an admin and/or
someone who has used cloud resources for research computing purposes, I
would appreciate learning your experience.

Even if you haven¹t used the cloud for HPC computing, please feel free to
share your thoughts or concerns on the matter.

Sort of along those same lines, what are your thoughts about leasing a
cluster and running it on-site?

Thanks for your time,

Mike Hutcheson
Assistant Director of Academic and Research Computing Services
Baylor University




More information about the Beowulf mailing list