[Beowulf] teaching linux/hpc?

Prentice Bisbal prentice.bisbal at rutgers.edu
Thu Aug 8 07:45:28 PDT 2013


On 07/29/2013 06:52 PM, Michael Di Domenico wrote:
> if you were going to teach a unix/shell scripting/perl/c
> programming/etc which might reach hpc at it's crescendo, how would you
> answer these questions?
>
> Can you tell us what the audience would be?
> Specifically what business/industry would utilize these?
>
> I was asked these questions and found myself unsure of how to answer.
> I know how to answer them if i were talking to someone in/near the
> computing industry.  But what kind of answer would you give someone
> who has very little knowledge outside of the windows/office world?
>

Michael,

I was going to answer this question right away, and then got distracted 
by other tasks around here. I will eventually be responsible for the 
teaching something similar here. Here's how I would do this. My audience 
would be scientists new to supercomputing at a public university where 
most research is grant-funded, to give you some context.

In my typical fashion, this will be long and verbose, so make yourself 
comfortable. You might want to get a cup of coffee before reading any 
further. My thoughts about this are based on two things: 1. Assume that 
your audience knows nothing about the topic, and 2. You have to walk 
before you can run. Here's how I plan on teaching this:


1. Start by stating a problem that having these computer skills will 
solve, or show them benefits of having these computer skills. Try to 
personalize these problems/or benefits to suit your audience. The point 
of this is to convince your audience that what your teaching them has 
value to them. It answers the question why am I here, and why should I 
care about this? This should motivate them to continue to pay attention, 
and gives them some context to the rest of the material. For my audience 
this step is simple: Using HPC, and learning better computer skills in 
general, will allow them to be more productive and do more research, or 
do more complex research, both of which should lead to more grant 
awards, which will allow them to continue to do their research, or in 
the best case, expand their research.

For the natural sciences, I would also say that computers are like any 
other lab instrument they would use. All chemists are expected to be 
able to use  NMR spectrometers and GC/Mass spec, so why do so few know 
how to run simulations on a computer? That's like a carpenter not 
knowing how to use modern powertools. A great example of this is that DE 
Shaw refers to Anton as an 'electronic microscope'.

2. Before jumping into the details of how computers work, and how to use 
HPC, explain to them what HPC is in real simple terms ANYONE could 
understand. I start with the example of gravity between two objects, and 
how you could simulate the motion of those two objects towards each 
other.  Anyone who has made it past 7th grade science should understand 
this. The expand it to include the solar system.

Next, explain how you can break up a job to to run in parallel. Again, 
keep it simple. Planetary motion might be tough to illustrate this 
simply, so I use this example: Try to sum up a  really, really long list 
of numbers. On one computer this will take a long time. If you have two 
computers, it should take 1/2 as log. You can have each computer add up 
1/2 of the numbers, and then when done, the two computers share their 
partial sums to come up with the total. And then you can do that for 4, 
8, 16, etc.

I've used these two examples to explain what I do to a lot of 
non-science/non-tech people, and I've never had anyone NOT be able to 
understand it.

3. Now that you've convinced them why they should care and what HPC is, 
don't jump into HPC. remember, you have to walk before you can run, and 
Linux is a very alien place for the layperson who grew up on 
Windows/Mac. Start with the basic concepts of Linux (everything is a 
file, no drive letters, the filesystem hierarchy, piping a bunch of 
simple commands together, etc..), then get into the shell, and basic 
commands. but don't get into shell scripting. That will come later.

4. Now get them to thing like programmers, but do it without using any 
programming language. Yes, that's right. Use thought examples to teach 
them how to think like programmer. I think variables are an easy 
concept, so I like to start with conditional statements and error 
checking, and then loops. First, I challenge them to explain, in detail, 
how to do a simple, everyday task, like picking a glass up off of one 
table, and putting it down on another. Then ask them to think about 
error checking at each step. How do you detect those errors. Then when 
those errors are encountered, how do you react? Again, natural language. 
Once the lights go on at this point, I'd introduce shell scripting.

5. Start programming with simple shell scripting tasks, like moving a 
file from one directory to another (analogous to moving a glass from one 
table to another). Let them right something basic, with all values hard 
coded, no command-line arguments, no conditionals, et. Ideally, it 
should look like this simple:

#!/bin/bash

mv /dir1/filename /dir2/filename

Then start introducing what-ifs and start introducing conditionals to 
check for errors at each stage: Does the file exist? Can I read it, Can 
I write to the destination directory? etc. The next step would be to get 
them to print out meaningful messages when errors are encountered, and 
when the task succeeds (good segue into exit values here).

The you can move into including arguments, like the filename to move, 
and the destination directory.

Then have them move 10 files, all with different names, etc.

6. Since HPC is your goal, I'd go to compiled languages next. Start with 
the differences between an interpreted language like a shell script and 
a compiled language, and the explain the transition from source code to 
assembly to machine language,and then libraries, etc. Again, general 
concepts, without discussing a specific language.

7. Next start to introduce your compiled language of choice, again with 
real simple examples: Hello world,  conditionals/loops, the move a glass 
example from above, including arguments, etc.

8. Since this is HPC we're talking about, I'd start talking about 
optimization next. Don't confuse them with parallelism just yet. Talk 
about the cost of different operations, etc. Maybe come up with some 
simple programs with great optimization opportunities, and work though 
all those optimizations together as a class.

9. Finally, I'd start introducing parallelism using MPI. I think the 
simple adding up a long string of numbers example is a good way to 
start. You could even have rank 1 just send it's result to rank 0 using 
mpi_send and mpi_recv, and then introduce collective operations. Again, 
start really, really simple, and slowly build upon earlier lessons.

10. Where you go after that is up to you.

As you can see, this is a lot to cover. This is something I would cover 
over the course of a semester with weekly multihour classes. You didn't 
mention how much time you have, but I'm assuming is a reasonably large 
amount of time, giving the scope in your original statement. I'm not 
even sure I could cover this in the detail outlined above in a 14-week 
course. I guess I'll find out one day.

I hope you're still awake, and I hope that helps you. You should also 
check out the Software Carpentry website 
(http://software-carpentry.org/) for more ideas on this topic.

Prentice







More information about the Beowulf mailing list