[Beowulf] LSI Megaraid stalls system on very high IO?

Wed May 13 15:57:01 PDT 2015

On 31-Jul-2014 10:55, mathog wrote:
> On 31-Jul-2014 10:36, Joe Landman wrote:
>> On 7/31/14, 12:37 PM, mathog wrote:
>>> Any pointers on why a system might appear to "stall" on very high IO 
>>> through an LSI megaraid adapter?  (dm_raid45, on RHEL 5.10.)
>> What IO scheduler are you using?
>> 
>>     cat /sys/block/sd*/queue/scheduler
> 
> % cat /sys/block/sd*/queue/scheduler

I just ran into the same thing on another similar (but larger) Dell.  A 
job was run that spun off 20 subprocesses, each of which tried to read a 
different 11Gb file.  The way this works is that each process opened its 
file, determined its size with seeks, allocated a big block of memory to 
hold the whole thing, and then started reading the data into that block 
sequentially.  top showed almost no CPU time on these processes.  A 
minute or two in this locked up tight for about 20 minutes.  
Interestingly, when the machine once again started answering keystrokes 
"top" showed each of these processes with 22 Gb of virtual (see below).

This machine has all of its disks packed into two volumes controlled by 
the megaraid adapter: one is a Raid and the other is just a small disk 
partition for swap.   Neither volume appears to have any scheduler 
enabled:

cat /sys/block/dm*/queue/scheduler
none
none

there is also sda, which seems to be the same external disk that gave me 
conniptions a while back in another thread, moved over to this machine 
(for no apparent reason).  That disk has a scheduler, which is cfq.

How is it possible to mount a volume, Raid or no, with no scheduler???  
The fstab
doesn't provide any clues:

/dev/mapper/vg_sitar-lv_root  /   ext4    defaults        1 1

The script that locked things up was run again with a limit of 15 
subprocesses.  This did not lock up (yet) and it shows 30-100% CPU on 
each.  The CPU usage is jumping around in some complex manner.  The 
strangest thing is that as they read in data
top shows virt of 11G and res of something less than that.  When the 
process gets to the end of the input file, that is Res hits 11G, it 
closes the input file, opens the output file (which is the same in this 
case) and then calls qsort.  Literally, it is just:

   fclose(fp);
   fp=fopen(out,"wb");
   qsort((void *)buffer, len_file/gbl_reclen, gbl_reclen, 
compare_records);

When that happens virtual jumps from 11G to 22G (instantaneously) and 
CPU usage goes to 100%.  I can see why the CPU usage would be at 100% 
while qsort ran, but cannot imagine why virtual should double, unless it 
somehow associates file cache for the output file with the subprocess.  
Note that it did not do that while the same file was being read.

Also, can somebody explain the point of having 4Gb of swap on a 529Gb 
RAM machine?  It is a Centos box, perhaps that is just the way that OS 
sets things up?

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech