[Beowulf] Questions about a large job
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Leandro Tavares Carneiro leandro at ep.petrobras.com.brTue Apr 18 07:28:46 PDT 2006
- Previous message: [Beowulf] request for comments for NFS v.3 vs NFS v.4
- Next message: [Beowulf] Questions about a large job
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi, I tried this weekend run HPL on our largest cluster, 1172 dual Opteron nodes.The network is Gigabit ethernet as our applications don't need and don't use a lot of process intercommunication. I have available 1148 dual nodes, 2296 CPUs and configured HPL.dat to run on that. I already have tested the parameters so i know it was good for this cluster. So, I have compiled HPL with Pathscale using ACML mathematical library. The MPI used was LAM-MPI. I have run some tests with 10 nodes and it runs well. But, when I tried to run with 2296 CPUs, the job won't start. Various errors happened, one for each try. The Torque version installed is 2.0.0p8 and is working fine with other largers jobs, with 1000 CPUs. I must admit, I never have tried to run a job with this size. I know, I can made some mistake, but what I wish know is about timeouts. The processes takes a long time to start and don't start. When it start run, I saw it because the HPL.out was created, ir dies. Do you guys have jobs larger than that running OK with Torque and LAM-MPI? There are something I can do to accelerate the start of the job? I know i lost the list, but any help will be great! Thanks a lot. -- Leandro Tavares Carneiro Petrobras TI/TI-E&P/STEP Suporte Tecnico de E&P Av Chile, 65 sala 1501 EDISE - Rio de Janeiro / RJ Tel: (0xx21) 3224-1427
- Previous message: [Beowulf] request for comments for NFS v.3 vs NFS v.4
- Next message: [Beowulf] Questions about a large job
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
