[Beowulf] teaching linux/hpc?
prentice.bisbal at rutgers.edu
Thu Aug 8 07:45:28 PDT 2013
On 07/29/2013 06:52 PM, Michael Di Domenico wrote:
> if you were going to teach a unix/shell scripting/perl/c
> programming/etc which might reach hpc at it's crescendo, how would you
> answer these questions?
> Can you tell us what the audience would be?
> Specifically what business/industry would utilize these?
> I was asked these questions and found myself unsure of how to answer.
> I know how to answer them if i were talking to someone in/near the
> computing industry. But what kind of answer would you give someone
> who has very little knowledge outside of the windows/office world?
I was going to answer this question right away, and then got distracted
by other tasks around here. I will eventually be responsible for the
teaching something similar here. Here's how I would do this. My audience
would be scientists new to supercomputing at a public university where
most research is grant-funded, to give you some context.
In my typical fashion, this will be long and verbose, so make yourself
comfortable. You might want to get a cup of coffee before reading any
further. My thoughts about this are based on two things: 1. Assume that
your audience knows nothing about the topic, and 2. You have to walk
before you can run. Here's how I plan on teaching this:
1. Start by stating a problem that having these computer skills will
solve, or show them benefits of having these computer skills. Try to
personalize these problems/or benefits to suit your audience. The point
of this is to convince your audience that what your teaching them has
value to them. It answers the question why am I here, and why should I
care about this? This should motivate them to continue to pay attention,
and gives them some context to the rest of the material. For my audience
this step is simple: Using HPC, and learning better computer skills in
general, will allow them to be more productive and do more research, or
do more complex research, both of which should lead to more grant
awards, which will allow them to continue to do their research, or in
the best case, expand their research.
For the natural sciences, I would also say that computers are like any
other lab instrument they would use. All chemists are expected to be
able to use NMR spectrometers and GC/Mass spec, so why do so few know
how to run simulations on a computer? That's like a carpenter not
knowing how to use modern powertools. A great example of this is that DE
Shaw refers to Anton as an 'electronic microscope'.
2. Before jumping into the details of how computers work, and how to use
HPC, explain to them what HPC is in real simple terms ANYONE could
understand. I start with the example of gravity between two objects, and
how you could simulate the motion of those two objects towards each
other. Anyone who has made it past 7th grade science should understand
this. The expand it to include the solar system.
Next, explain how you can break up a job to to run in parallel. Again,
keep it simple. Planetary motion might be tough to illustrate this
simply, so I use this example: Try to sum up a really, really long list
of numbers. On one computer this will take a long time. If you have two
computers, it should take 1/2 as log. You can have each computer add up
1/2 of the numbers, and then when done, the two computers share their
partial sums to come up with the total. And then you can do that for 4,
8, 16, etc.
I've used these two examples to explain what I do to a lot of
non-science/non-tech people, and I've never had anyone NOT be able to
3. Now that you've convinced them why they should care and what HPC is,
don't jump into HPC. remember, you have to walk before you can run, and
Linux is a very alien place for the layperson who grew up on
Windows/Mac. Start with the basic concepts of Linux (everything is a
file, no drive letters, the filesystem hierarchy, piping a bunch of
simple commands together, etc..), then get into the shell, and basic
commands. but don't get into shell scripting. That will come later.
4. Now get them to thing like programmers, but do it without using any
programming language. Yes, that's right. Use thought examples to teach
them how to think like programmer. I think variables are an easy
concept, so I like to start with conditional statements and error
checking, and then loops. First, I challenge them to explain, in detail,
how to do a simple, everyday task, like picking a glass up off of one
table, and putting it down on another. Then ask them to think about
error checking at each step. How do you detect those errors. Then when
those errors are encountered, how do you react? Again, natural language.
Once the lights go on at this point, I'd introduce shell scripting.
5. Start programming with simple shell scripting tasks, like moving a
file from one directory to another (analogous to moving a glass from one
table to another). Let them right something basic, with all values hard
coded, no command-line arguments, no conditionals, et. Ideally, it
should look like this simple:
mv /dir1/filename /dir2/filename
Then start introducing what-ifs and start introducing conditionals to
check for errors at each stage: Does the file exist? Can I read it, Can
I write to the destination directory? etc. The next step would be to get
them to print out meaningful messages when errors are encountered, and
when the task succeeds (good segue into exit values here).
The you can move into including arguments, like the filename to move,
and the destination directory.
Then have them move 10 files, all with different names, etc.
6. Since HPC is your goal, I'd go to compiled languages next. Start with
the differences between an interpreted language like a shell script and
a compiled language, and the explain the transition from source code to
assembly to machine language,and then libraries, etc. Again, general
concepts, without discussing a specific language.
7. Next start to introduce your compiled language of choice, again with
real simple examples: Hello world, conditionals/loops, the move a glass
example from above, including arguments, etc.
8. Since this is HPC we're talking about, I'd start talking about
optimization next. Don't confuse them with parallelism just yet. Talk
about the cost of different operations, etc. Maybe come up with some
simple programs with great optimization opportunities, and work though
all those optimizations together as a class.
9. Finally, I'd start introducing parallelism using MPI. I think the
simple adding up a long string of numbers example is a good way to
start. You could even have rank 1 just send it's result to rank 0 using
mpi_send and mpi_recv, and then introduce collective operations. Again,
start really, really simple, and slowly build upon earlier lessons.
10. Where you go after that is up to you.
As you can see, this is a lot to cover. This is something I would cover
over the course of a semester with weekly multihour classes. You didn't
mention how much time you have, but I'm assuming is a reasonably large
amount of time, giving the scope in your original statement. I'm not
even sure I could cover this in the detail outlined above in a 14-week
course. I guess I'll find out one day.
I hope you're still awake, and I hope that helps you. You should also
check out the Software Carpentry website
(http://software-carpentry.org/) for more ideas on this topic.
More information about the Beowulf