Issues with 2466 based cluster
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Abhishek SINHA aby_sinha at yahoo.comFri Oct 18 17:58:54 PDT 2002
- Previous message: Fwd: [GE users] Grid Engine 5.3p2 is now ready
- Next message: Issues with 2466 based cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello List members I am monitoring an 8 node Dual AMD Athlon, Tyan 2466 with 4GB ram based athlon cluster.I would have to add to Prof Brown's experience that it was hard to make them work at the first go , but once we got them stabilised we have seen good performance and lot of stableness from these systems.Now the problem! One of our systems failed. On the screen it showed that the kernal had paniced. I tried rebooting the system and sometimes it would boot and sometimes it would not. One time when the system was up I checked the message log and could find no errors listed. When we put a load on the system it would panic again. Of course putting a load on the system causes the CPU to run hotter. Since I could not find any errors in the log and getting no beeps when booting Idecided to pull the cover off and take a look. This was when I noticed the fan was turning slowly and making a lot of noise.Well it was kind of hummmmmmm sound and not exactly a huge breaking or wierd noise At this point I contacted vendor asked for a replacement fan. While waiting forthe new fan I decided to remove the failing fan and oil it. Oiling the fan seemed to fix the fan, at least temporarily. If fact the fan I oiled turns better than the other CPU fan (it spins 3 or 4 seconds longer than the other CPU fan when the power is turned off)!! So, I put the cover back and proceeded to power up the system. The system booted fine. When I put a load on the system (The same one that crashed the application) the job fails with strange errors. Sometimes the system panics. I was beginning to think one of the CPUs was damaged. When I received the replacement CPU fans from the vendor; I opened the cabinet to replace the failing fan. I noticed that the other CPU fan was barely turning!! The one I had oiled was spinning away. I replaced both fans and they are working correctly. The system is running fine. I stressed it today and it passed with flying colors. Now we are using the fans that come with AMD ,They are really big with nice blue fans on the top.They are the same fans which the system came with and are the same fans which i replaced.I am now wondering the cause and the effect of this.Does it make sense to really replace all the fans. I know AMD's are known to run hot and if we dont do anything i know we are gonna burn out the processor. I need to understand if i can do anything abt it..I saw some of the systems and the fans seem to run slow.Atleast when i continously see them :) While i wait for all of your expert opinions on the issue i will go and oil the fans.... They call it system administration :) Thanks Abby __________________________________________________ Do you Yahoo!? Faith Hill - Exclusive Performances, Videos & More http://faith.yahoo.com
- Previous message: Fwd: [GE users] Grid Engine 5.3p2 is now ready
- Next message: Issues with 2466 based cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
