[Beowulf] HPC workflows

Wed Nov 28 04:09:47 PST 2018

On Wed, 28 Nov 2018 at 11:33, Bogdan Costescu <bcostescu at gmail.com> wrote:

> On Mon, Nov 26, 2018 at 4:27 PM John Hearns via Beowulf <
> beowulf at beowulf.org> wrote:
>
>> I have come across this question in a few locations. Being specific, I am
>> a fan of the Julia language. Ont he Juia forum a respected developer
>> recently asked what the options were for keeping code developed on a laptop
>> in sync with code being deployed on an HPC system.
>>
>

> I think out loud that many HPC codes depend crucially on a $HOME directory
>> being presnet on the compute nodes as the codes look for dot files etc. in
>> $HOME. I guess this can be dealt with by fake $HOMES which again sync back
>> to the Repo.
>>
>
> I don't follow you here... $HOME, dot files, repo, syncing back? And why
> "Repo" with capital letter, is it supposed to be a name or something
> special?
>

I think John is talking here about doing version control on whole HOME
directories but trying to be mindful of dot files such as .bashrc and
others which can be application or system specific. The first thing which
comes to mind is to use branches for different cluster systems. However
this also taps into backup (which is another important topic since HOME
dirs are not necessarily backed up). There could be a working solution
which makes use of recursive repos and git lfs support but pruning old
history could still be desirable. Git would minimize the amount of storage
because it's hash based. While this could make it possible to replicate
your environment "wherever you go", a/ you would drag a lot history around
and b/ a significantly different mindset is required to manage the whole
thing. A typical HPC user may know git clone but generally is not a git
adept. Developers are different and, who knows John, maybe someone will
pick up your idea.

Is gitfs any popular?

In my HPC universe, people actually not only need code, but also data -
> usually LOTS of data. Replicating the code (for scripting languages) or the
> binaries (for compiled stuff) would be trivial, replicating the data would
> not. Also pulling the data in or pushing it out (f.e. to/from AWS) on the
> fly whenever the instance is brought up would be slow and costly. And by
> the way this is in no way a new idea - queueing systems have for a long
> time the concept of "pre" and "post" job stages, which could be used to
> pull in code and/or data to the node(s) on which the node would be running
> and clean up afterwards.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20181128/cb0b54e3/attachment.html>