[Beowulf] Putting /home on Lusture of GPFS

Jason Chong jchong at scinet.utoronto.ca
Tue Dec 23 10:46:16 PST 2014


On Tue, Dec 23, 2014 at 01:35:30PM -0500, Michael Di Domenico wrote:
> I've always shied away from gpfs/lustre on /home and favoured netapp's
> for one simple reason.  snapshots.  i can't tell you home many times
> people have "accidentally" deleted a file.

I actually do run GPFS as /home mostly for client scalability reason
(have 4000 clients mounting /home on the cluster).  However, we only
mount it as read-only on compute node and usually user would see I/O
failure and ask what is happening to prevent jobs from running from home.

> 
> but yes, the "user education" about running jobs from /home usually
> happens at least once a year when someone new starts.  we tend to
> publicly shame that person and they don't seem to do it anymore

Definitely needs "user education" and I have seen one user killing
parallel filesystem that way and everyone is frustrated.  In some cases,
we had to limit the number of jobs a user run to minimize their impact
since they still would not listen and just want to get their code to run
and finish the task they are doing without considering the consequences.

Jason

> 
> you never want to be "that guy" that slowed the whole system down... :)
> 
> 
> 
> On Tue, Dec 23, 2014 at 12:12 PM, Prentice Bisbal
> <prentice.bisbal at rutgers.edu> wrote:
> > Beowulfers,
> >
> > I have limited experience managing parallel filesytems like GPFS or Lustre.
> > I was discussing putting /home and /usr/local for my cluster on a GPFS or
> > Lustre filesystem, in addition to using it just for /scratch. I've never
> > done this before, but it doesn't seem like all that bad an idea. My logic
> > for this is the following:
> >
> > 1. Users often try to run programs from in /home, which leads to errors, no
> > matter how many times I tell them not to do that. This would make the system
> > more user-friendly. I could use quotas/policies to encourage them to use
> > 'steer' them to use other filesystems if needed.
> >
> > 2. Having one storage system to manage is much better than 3.
> >
> > 3. Profit?
> >
> > Anyway, another person in the conversation felt that this would be bad,
> > because if someone was running a job that would hammer the fileystem, it
> > would make the filesystem unresponsive, and keep other people from logging
> > in and doing work. I'm not buying this concern for the following reasons:
> >
> > If a job can hammer your parallel filesystem so that the login nodes become
> > unresponsive, you've got bigger problems, because that means other jobs
> > can't run on the cluster, and the job hitting the filesystem hard has
> > probably slowed down to a crawl, too.
> >
> > I know there are some concerns  with the stability of parallel filesystems,
> > so if someone wants to comment on the dangers of that, too, I'm all ears. I
> > think that the relative instability of parallel filesystems compared to NFS
> > would be the biggest concern, not performance.
> >
> > --
> > Prentice Bisbal
> > Manager of Information Technology
> > Rutgers Discovery Informatics Institute (RDI2)
> > Rutgers University
> > http://rdi2.rutgers.edu
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Jason Chong                                          Phone: 416-978-4157
Systems Administrator and Web App Developer      http://www.scinethpc.ca
Compute/Calcul Canada                        http://www.computecanada.ca


More information about the Beowulf mailing list