[Beowulf] Project Planning: Storage, Network, and Redundancy Considerations

Brian D. Ropers-Huilman brian.ropers.huilman at gmail.com
Mon Mar 19 15:10:19 PDT 2007


On 3/19/07, Brian R. Smith <brs at usf.edu> wrote:
> Brian,
>
> (Threads like this can get confusing, which Brian? :)
>
> Brian D. Ropers-Huilman wrote:
> > I would keep your /home and /scratch/global separate.
>
> I've thought about this and it makes sense on a couple of levels.

You mention backups and single point of failure. Both are important.
We do not backup anything in /scratch, as a matter of fact, they are
both scrubbed. All you need to backup are the codes and metadata for
runs which should all be stored in /home or in /archive. FYI, I do
_not_ yet have a true /archive solution, but this is where I'm moving.
When I have it, I will _not_ back it up. Rather, I will pay to have a
bunch of cheap disks and RAID to keep the data available.

Also, when (not if) /scratch/global goes down users will still want to
login, most likely to pull their code so it can be submitted on
another machine. :)

As to your comment on, "You only have one disk to administer and all
of your efforts for fault tolerance, monitoring, and maintenance can
be focused on that device," while this is true, you're either making
compromises on what you offer your users or you're paying too much for
your /home space. Obviously that trade-off is situationally dependent.

Finally, I did not comment on your mention of Panasas. I evaluated
them against other possible solutions in my previous job as Director
of HPC @ LSU and selected them for our /scratch/global solution. I
would highly recommend them. They do cost more, but you get value
(performance, ease of maintenance and administration, reliability,
support, and the like) for that cost. Just FYI.

-- 
Brian D. Ropers-Huilman



More information about the Beowulf mailing list