[Beowulf] Putting /home on Lusture of GPFS
prentice.bisbal at rutgers.edu
Wed Dec 24 07:39:46 PST 2014
On 12/23/2014 01:35 PM, Michael Di Domenico wrote:
> I've always shied away from gpfs/lustre on /home and favoured netapp's
> for one simple reason. snapshots. i can't tell you home many times
> people have "accidentally" deleted a file.
We used a NetApp at my last employer for everything (/home, /usr/local,
etc.) and everything used it (desktops, servers, the cluster), and the
snapshot feature was priceless. Many users liked being able to replace a
file that they accidentally deleted themselves. My inclination is
towards GPFS for this reason instead of Lustre, since GPFS supports
snapshotting (and a few other useful features that Lustre doesn't
> but yes, the "user education" about running jobs from /home usually
> happens at least once a year when someone new starts. we tend to
> publicly shame that person and they don't seem to do it anymore
> you never want to be "that guy" that slowed the whole system down... :)
> On Tue, Dec 23, 2014 at 12:12 PM, Prentice Bisbal
> <prentice.bisbal at rutgers.edu> wrote:
>> I have limited experience managing parallel filesytems like GPFS or Lustre.
>> I was discussing putting /home and /usr/local for my cluster on a GPFS or
>> Lustre filesystem, in addition to using it just for /scratch. I've never
>> done this before, but it doesn't seem like all that bad an idea. My logic
>> for this is the following:
>> 1. Users often try to run programs from in /home, which leads to errors, no
>> matter how many times I tell them not to do that. This would make the system
>> more user-friendly. I could use quotas/policies to encourage them to use
>> 'steer' them to use other filesystems if needed.
>> 2. Having one storage system to manage is much better than 3.
>> 3. Profit?
>> Anyway, another person in the conversation felt that this would be bad,
>> because if someone was running a job that would hammer the fileystem, it
>> would make the filesystem unresponsive, and keep other people from logging
>> in and doing work. I'm not buying this concern for the following reasons:
>> If a job can hammer your parallel filesystem so that the login nodes become
>> unresponsive, you've got bigger problems, because that means other jobs
>> can't run on the cluster, and the job hitting the filesystem hard has
>> probably slowed down to a crawl, too.
>> I know there are some concerns with the stability of parallel filesystems,
>> so if someone wants to comment on the dangers of that, too, I'm all ears. I
>> think that the relative instability of parallel filesystems compared to NFS
>> would be the biggest concern, not performance.
>> Prentice Bisbal
>> Manager of Information Technology
>> Rutgers Discovery Informatics Institute (RDI2)
>> Rutgers University
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf