shaeffer at neuralscape.com
Thu Oct 31 12:09:27 PST 2002
On Thu, Oct 31, 2002 at 01:48:29PM -0500, Robert G. Brown wrote:
> On Thu, 31 Oct 2002, Leandro Tavares Carneiro wrote:
> > We can't use ramdisks because the ammount of data nedeed to be load is very
> > huge, something about 100Gb of information needed to process the data. These
> > are tables with a lot of information, and they are loaded when demanded, and
> > wich tables are loaded depends of the data is going to process.
> > Now, we are trying to run this on an SGI machine, but this application will
> > run also on our clusters, but with a different parallelism.
> > The development team is searching to how to write an routine in C for read
> > with different block sizes, like a "dd" do, but they are Fortran especialists,
> > and they are searching for help...
In addition to the use of RAID 5 disk arrays, which I suggested earlier, the
best you can do is set up a multithreaded IPC subsystem that results in a
direct DMA data transfer from disk to memory. One way to do this is to use
sendfile() (see man 2 sendfile). This requires a recent version of glibc,
because sendfile64() support for large files was just recently implemented.
Now this is tricky, if you don't have enough ram. You'll need the thread to
have a throttle mechanism in such cases. You might look into the thread
using mmap(). The thread could read a word from each page frame in a look
ahead fashion as the application rolls through the data.
Obviously this would require some optimization and performance tuning. But
it appears to be workable.
Neuralscape; Santa Cruz, Ca. 95060
shaeffer at neuralscape.com http://www.neuralscape.com
More information about the Beowulf