[Beowulf] NASTRAN on cluster

Tue Apr 12 06:58:27 PDT 2005

Mark Hahn wrote:

>>Nastran doesn't really want to run more than one job (MPI rank) per
>>node.
> 
> I bet that isn't true on dual-opterons.

NASTRAN loves memory bandwidth, and hates sharing it.  A properly built 
dual, which does not have node-interleaving on, does pretty well.

>>The distro can/will have a significant impact on allocatable memory.
>>Nastran uses brk(2) to allocate memory, so the TASK_UNMAPPED_BASE is
>>significant.
> 
> can nastran run on amd64?  it might even run nicely as a ia32 process

Yes.  This was just released.

> on amd64.  just for curiosity:

It does.

[...]

>>I can't comment on SATA, but PATA disks are a really bad choice, as they
>>require too much effort from the CPU to drive them--SCSI is MUCH
>>preferred in that case.
> 
> this is one of the longest-lived fallacies I've ever personally experienced.
> it was true 10+ years ago when PIO was the norm for ATA disks.  busmastering
> has been the norm for PATA for a long while.

Actually under very intensive I/O load, cheap/crappy IDE controllers 
flood the CPU with interrupts.  Good quality ones do not.  David (when 
he was at MSC) and the rest of the team working on this stuff used 
machines that had ... issues ... with their IDE interfaces.  The SCSI 
interfaces they used were top notch, most vendors tend to toss in IDE as 
an afterthought.  Very little attention was/is paid to building good 
controllers.  That leads to analyses/statements like this, which are 
sometimes part of biases, but sometimes not (as in this case, they were 
not).

Basically, I am saying he is right about the effect (poor PATA 
experience on NASTRAN on their test systems), though the cause is not 
because PATA is crappy, but because PATA chip sets are usually quite 
crappy, and flood the CPU with interrupts.  Then again, I think it is 
arguable that the net effect of a crappy PATA chip set is a crappy PATA 
experience.  Which would tend to re-inforce this viewpoint.

Note also: I believe that when they tested this, they went in with an 
open mind, and real test cases.  As debugging IO systems wasn't their 
purpose in life, they probed it as far as they needed, and drew their 
conclusions.

> 
>>As for CPU v. I/O.  The factors are (in no order):
>>
>>fp performance
>>memory b/w
>>disk b/w
>>memory size
>>
>>Which of the above dominates the analysis depends on the analysis.
> 
> for the reported symptoms (poor scaling when using the second processor), 
> the first doesn't fit.  memory bw certainly does, and is Intel's main 
> weak spot right now.  then again, disk bw and memory size could also fit

Note where David works now.

> the symptoms (since they're also resources shared on a dual), but would be 
> diagnosable by other means (namely, both would result in low %CPU utilization; 
> the latter (thrashing) would be pretty obvious from swap traffic/utilization.)

I have a ... sneaking suspicion ... that David knows what he is talking 
about w.r.t. NASTRAN.

> 
> like I said, I bet the problem is memory bandwidth.  mainly because I just don't
> see programs waiting on disk that much anymore - it does happen, but large 
> writes these days will stream at 50+ MB/s, and reads are often cached.

No Mark, MSC.NASTRAN is in a collection of programs that you generally 
call IO monsters.  They can and will consume every last MB/s that you 
can throw at them on the IO channel for good sized problems.  You most 
definitely do not want a single spindle as your scratch space.  At SGI 
years ago, we were looking to try to provide GB/s sustainable 
performance for NASTRAN on the file systems.

With NASTRAN, you want the widest non-blocking IO channel you can get 
(widest in terms of MB/s, not physical bit depth).  To get this, you 
need to look at various striping techniques.  You really do not want a 
RAID5, a RAID3, or anything that calculates a parity structure getting 
in the way of moving data.

NASTRAN does pound on the memory bandwidth (large linear algebra 
systems), and on the FPU.  I think they have been using the Intel 
compilers so they get some advantage on Intels (and software disabling 
of fast SSE paths on Opteron) for the IA32 code.  The AMD64 variant code 
was built with other compilers from what I understand.

Here NASTRAN should benefit significantly from a non-segmented memory 
system (the 64bit AMD64 space is not a segment+offset addressing schema) 
in terms of memory bandwidth.

> I should mention that if HT is enabled on these duals, the problem could be 
> poor HT support in your kernels.  (HT creates two virtual processors for 
> each physical one.  if the scheduler treats HT-virtual processors as real,
> you will get very poor speedup.  this would also be diagnosable by simply
> running 'top' during a test.)

HT can help a few codes, but it will not generally help IO bound codes. 
  I have seen it as a win in a very restricted subset of computationally 
intensive  codes, and these are usually home-grown.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615