[Beowulf] Putting /home on Lusture of GPFS

Wed Dec 24 07:39:46 PST 2014

On 12/23/2014 01:35 PM, Michael Di Domenico wrote:
> I've always shied away from gpfs/lustre on /home and favoured netapp's
> for one simple reason.  snapshots.  i can't tell you home many times
> people have "accidentally" deleted a file.
We used a NetApp at my last employer for everything (/home, /usr/local, 
etc.) and everything used it (desktops, servers, the cluster), and the 
snapshot feature was priceless. Many users liked being able to replace a 
file that they accidentally deleted themselves. My inclination is 
towards GPFS for this reason instead of Lustre, since GPFS supports 
snapshotting (and a few other useful features that Lustre doesn't 
provide yet).
>
> but yes, the "user education" about running jobs from /home usually
> happens at least once a year when someone new starts.  we tend to
> publicly shame that person and they don't seem to do it anymore
>
> you never want to be "that guy" that slowed the whole system down... :)
>
>
>
> On Tue, Dec 23, 2014 at 12:12 PM, Prentice Bisbal
> <prentice.bisbal at rutgers.edu> wrote:
>> Beowulfers,
>>
>> I have limited experience managing parallel filesytems like GPFS or Lustre.
>> I was discussing putting /home and /usr/local for my cluster on a GPFS or
>> Lustre filesystem, in addition to using it just for /scratch. I've never
>> done this before, but it doesn't seem like all that bad an idea. My logic
>> for this is the following:
>>
>> 1. Users often try to run programs from in /home, which leads to errors, no
>> matter how many times I tell them not to do that. This would make the system
>> more user-friendly. I could use quotas/policies to encourage them to use
>> 'steer' them to use other filesystems if needed.
>>
>> 2. Having one storage system to manage is much better than 3.
>>
>> 3. Profit?
>>
>> Anyway, another person in the conversation felt that this would be bad,
>> because if someone was running a job that would hammer the fileystem, it
>> would make the filesystem unresponsive, and keep other people from logging
>> in and doing work. I'm not buying this concern for the following reasons:
>>
>> If a job can hammer your parallel filesystem so that the login nodes become
>> unresponsive, you've got bigger problems, because that means other jobs
>> can't run on the cluster, and the job hitting the filesystem hard has
>> probably slowed down to a crawl, too.
>>
>> I know there are some concerns  with the stability of parallel filesystems,
>> so if someone wants to comment on the dangers of that, too, I'm all ears. I
>> think that the relative instability of parallel filesystems compared to NFS
>> would be the biggest concern, not performance.
>>
>> --
>> Prentice Bisbal
>> Manager of Information Technology
>> Rutgers Discovery Informatics Institute (RDI2)
>> Rutgers University
>> http://rdi2.rutgers.edu
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf