[Beowulf] HPC fault tolerance using virtualization
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mike Davis jmdavis1 at vcu.eduTue Jun 16 12:06:39 PDT 2009
- Previous message: [Beowulf] HPC fault tolerance using virtualization
- Next message: [Beowulf] HPC fault tolerance using virtualization
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
John Hearns wrote: > > > 2009/6/16 Egan Ford <egan at sense.net <mailto:egan at sense.net>> > > I have no idea the state of VMs on IB. That can be an issue with > MPI. Believe it or not, but most HPC sites do not use MPI. They > are all batch systems where storage I/O is the bottleneck. > > > Burn the Witch! Burn the Witch! > > Any HPC installation, if you want to show it off to alumni, august > committees from grant awarding bodies etc. and not get sand kicked in > your face from the big boys in the Top 500 NEEDS an expensive > infrastructure of various MPI libraries. Big, big switches with lots > of flashing lights. Highly paid, pampered systems admins who must be > treated like expensive racehorses, and not exercised too much every > day. They need cool beers on tap and luxurious offices to relax in > while they prepare to do that vital half hours work per day which > keeps your Supercomputer flashing away and making noises. [rant] I realize that this is humor, but one must remember just how sensitive System Admins can be before making such statements. I would like to refer you to the BOFH (Bastard Operator from Hell) or as I like to call it the SysAdmins guide to interpersonal relationships. Remember what these people do and more importantly what they can do. On a serious note, who else get's out of bed at 3 am because an automated system indicates an issue with an HPC research cluster, or the Computing Center Calls because fresh water has been cut off and the building is warming, or you get the call that the water pumps (dual for redundancy but sharing one controller, now that's engineering) have failed, or that machine room power is dirty because 1/2 of the battery bank has shorted and the other half can't supply all of the needed clean power etc, etc. In my experience, Sysadmins don't want beer or luxurious offices they want the tools that they need, proper managerial support, and respect. [/rant] -- Mike Davis Technical Director (804) 828-3885 Center for High Performance Computing jmdavis1 at vcu.edu Virginia Commonwealth University "Never tell people how to do things. Tell them what to do and they will surprise you with their ingenuity." George S. Patton
- Previous message: [Beowulf] HPC fault tolerance using virtualization
- Next message: [Beowulf] HPC fault tolerance using virtualization
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
