Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] lost in parallel computing

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

CHEN, XIAOMING CHEN25 at engr.sc.edu
Wed Dec 7 12:53:16 PST 2005


Dear all,

I've been practicing scientific parallel computing for 3~4 years, but as
a remote user I never really touched the subjects on parallel computer
management. Things work out if the remote computers I am working on are
managed well. However, when they are not in good hands, they will go on
'strike' for a long time. This is what I am experiencing now. One remote
cluster just reloated recently and it lost myrinet. A new cluster
purchased from Dell hasn't been working since it was installed 3 months
ago. Another one has some strange behavior. For example, sometimes it
writes data twice into a file in a random order; a user cannot kill his
process unless he terminates the xwindow (i.e, exit). I guess during
this holiday season nobody will stand out to solve the problem. But it
seems such problems will continue to exist and evolve as computer
technologies evolve themselves. I am wondering if a inexpensive but
robust parallel executing environment is possible to build. If it is so
difficult to maintain a parallel computer, how can we persuade people to
invest money in parallel computers? 


This is the first time for me to post a message. Please kindly remind me
if I do not follow the rules. I appreciate your response. 

Xiaoming Chen
University of South Carolina




More information about the Beowulf mailing list