[Beowulf] hpl size problems
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Donald Becker becker at scyld.comWed Sep 28 10:42:23 PDT 2005
- Previous message: [Beowulf] iptaled
- Next message: [Beowulf] hpl size problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 28 Sep 2005, Robert [UTF-8] G. Brown wrote: > laytonjb at charter.net writes: > >> > At most you waste a few seconds and a few > >> > hundred megabytes of disk by leaving them in, > >> > >> Latter yes (more like GB), former no. Trimming the fat from the SuSE > >> cluster install got it from over an hour down to about 8 minutes with > >> everything, per node. This is more typical: a distribution that comes on many CD-Rs isn't going to be easily stripped down to something that can loaded in a few seconds. A stripped-down install will take on the order of 5 minutes. > >> I think I hit the point of diminishing returns. I don't mind waiting up > >> to about 10 minutes for a reload, beyond that, I mind. This really isn't scalable: even 5 minutes per machine has a big impact on how you consider operating a cluster of dozens or hundreds of machines. > > Another thing to consider is that having all of that extra > > stuff on the nodes leads to a huge security tracking > > headache. ... > > put back into production. By not having all of the cruft on > > the nodes, our security headache could have been reduced. Consider taking that idea to the logical conclusion: by eliminating everything but the user applications on the nodes you can eliminate not just the appearance of a security problem, you can eliminate the opportunity. There is a good reason for updating vulnerable daemons and services even if they are not currently enabled. What if they are turned to -- "gee, I'll just turn on the web server so that this new admin tool works through the firewall". > I agree, guys, I agree. My point wasn't that trimming cluster > configurations relative to workstation or server configurations is a bad > thing -- it is not, and indeed one would wish that eventually e.g. FC, > RHEL, Centos, Caosity etc will all have a canned "cluster configuration" > in their installers to join server and workstation, or that somebody > will put up a website with a generic "cluster node" kickstart fragment > containing a "reasonable" set of included groups and packages for people > to use as a baseline that leaves most of the crap out. We went down this path years ago. It doesn't take long to find the problem with striping down full installations to make minimal compute node installs: your guess at the minimal package set isn't correct. You might not think that you need the X Window system on compute nodes. But your MPI implementation likely requires the X libraries, and perhaps a few interpreters, and the related libraries, and some extra configuration tools for those, and... Yes, there are a number of labor-intensive ways to rebuild and repackage to break these dependencies. But now you have a unique installation that is a pain to update. There is no synergy here -- workstation-oriented packages don't have the same motivations that compute cluster or server people have. > a) In most cases the crap doesn't/won't affect performance of > CPU/memory/disk bound HPC tasks. Except for additional cruft automatically installed and started. Your compute nodes might not ever need 'xfs' (the X font server, not the file system), but it will be started anyway. > either a cluster operating system (e.g. scyld) or a really sparse and > tuned install (e.g. sparse and tuned warewulf or a similarly sparse and > tuned kickstart or...) > b) There are a lot of things that can be USEFUL on general purpose > cluster nodes. I always put editors on them, for example, and > programming tools and compilers, because every now and then I'm logged > into one and want to work on code. You are starting out with the idea that you will be logging into every node. Once you make that assumption, you need the whole set of support tools. Even something as simple as an editor that matches your primary environment implies a whole set of additional support. ("Of course I expect indent support for Prolog and syntax validation for APL!") (Admittedly it's easy to mitigate this: put all of your admin/user interaction tools on a network file system that only needs to be mounted when logged in. But there are better solutions.) > anyway). Similarly I want to be able to read man pages (while I'm coding > for certain) so I put them on. They drag TeX along which I don't mind > because I use it anyway for a lot of things and maybe will be doing a > build in an application directory that has a tex-based manual or paper > in it and it bugs me when a build fails because of missing resources. > So do I put geyes on? Of course not. Uhmmm, but my start-up depends on 3D-Xeyes and the Klingon fonts! (..thanks for making my point.) > Network daemons are an OBVIOUS EXCEPTION to this -- network services > should ALWAYS be carefully considered, even on plain old LAN > workstations, because of both security and performance. A cluster is a single machine. It should run a single set of network services. Pure compute nodes of the cluster need not duplicate services. So you should start with no daemons (and no configuration files) rather than stripping down and turning off (and writing bunches of ad hoc configuration file generation scripts). > Things like > ipchains or ipfilters tend to be "expensive" overhead on all TCP/UDP > connections, and overhead in parallel computations is anathema. A different topic. It's one we can't win. The structure for ipchains and ipfilters costs no matter what you do. Disabling them doesn't revert the code to the simple, fast case. It just makes it impossible to use the feature. It's much the same as passing all output through "| grep -v $emptyvar | ..." You can make it semantically do nothing by not actually having matching rules, but you still have the overhead. > So I agree, I agree -- thin is good, thin is good. Just avoid needless > anorexia in the name of being thin -- thin to the point where it saps > your nodes' strength. You've got the wrong perspective: you don't build a thin compute node from a fat body. You develop a system that dynamically provisions only the needed elements for the applications actually run. That takes more than a single mechanism to do correctly, but you end up with a design that has many advantages. (Sub-second provisioning, automatic update consistency, no version skew, high security, simplicity...) -- Donald Becker becker at scyld.com Scyld Software Scyld Beowulf cluster systems 914 Bay Ridge Road, Suite 220 www.scyld.com Annapolis MD 21403 410-990-9993
- Previous message: [Beowulf] iptaled
- Next message: [Beowulf] hpl size problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
