Cluster programming...

Karl Bellve Karl.Bellve at umassmed.edu
Fri Jan 24 11:54:08 PST 2003


I want to thank everyone for the suggestions.

First,  one problem was that some nodes did not have the fftw libraries 
in their ld.so.config which caused them to fail and not output a result. 
I corrected that.

Now I get two different behaviors when I run my application:

1) Locking activated. Every node except the master node writes properly 
to the file. If I take out the master node out of the possible choices, 
everything works fine. I am not sure what is up with the master node. It 
should be exactly like the others, except it has two ethernet cards. 
Once in a while, I see another node fail to write out.

2) Locking not activated, just seek and write. Master node can now write 
properly, but now I get random drops from other nodes. Not the same 
node. Some runs show no drops.

I might go with your option, about cating the files together at the end.

I decided against using a lock file. Although, I send out jobs 
sequentially, they won't finished squentially, which will delay some 
nodes getting another job if they finish early.



Jakob Oestergaard wrote:

>On Wed, Jan 22, 2003 at 10:52:31AM -0500, Karl Bellve wrote:
>  
>
>>I am running into a little problem about multiple writes to a single 
>>file via NFS.
>>    
>>
>
>Ok, first of all that sounds like a bad idea to begin with.
>
>Why not have each node write it's own file, and run a "cat node.* >
>bigfile" afterwards?
>
>Quadratish, praktisch, gut  ;)
>  
>

-- 
Cheers,



Karl Bellve, Ph.D.                   ICQ # 13956200
Biomedical Imaging Group             TLCA# 7938 		
University of Massachusetts
Email: Karl.Bellve at umassmed.edu
Phone: (508) 856-6514
Fax:   (508) 856-1840
PGP Public key: finger kdb at molmed.umassmed.edu






More information about the Beowulf mailing list