Block Sizes

Thu Oct 31 12:09:27 PST 2002

On Thu, Oct 31, 2002 at 01:48:29PM -0500, Robert G. Brown wrote:
> On Thu, 31 Oct 2002, Leandro Tavares Carneiro wrote:
> 
> > We can't use ramdisks because the ammount of data nedeed to be load is very 
> > huge, something about 100Gb of information needed to process the data. These 
> > are tables with a lot of information, and they are loaded when demanded, and 
> > wich tables are loaded depends of the data is going to process.
> > Now, we are trying to run this on an SGI machine, but this application will 
> > run also on our clusters, but with a different parallelism.
> > The development team is searching to how to write an routine in C for read 
> > with different block sizes, like a "dd" do, but they are Fortran especialists, 
> > and they are searching for help...

In addition to the use of RAID 5 disk arrays, which I suggested earlier, the
best you can do is set up a multithreaded IPC subsystem that results in a
direct DMA data transfer from disk to memory. One way to do this is to use
sendfile() (see man 2 sendfile). This requires a recent version of glibc,
because sendfile64() support for large files was just recently implemented.

Now this is tricky, if you don't have enough ram. You'll need the thread to
have a throttle mechanism in such cases. You might look into the thread
using mmap(). The thread could read a word from each page frame in a look
ahead fashion as the application rolls through the data.

Obviously this would require some optimization and performance tuning. But
it appears to be workable.

cheers,
Karen
-- 
 Karen Shaeffer
 Neuralscape; Santa Cruz, Ca. 95060
 shaeffer at neuralscape.com  http://www.neuralscape.com