disadvantages of linux cluster - admin
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
alvin at Maggie.Linux-Consulting.com alvin at Maggie.Linux-Consulting.comWed Nov 6 15:56:13 PST 2002
- Previous message: disadvantages of linux cluster - admin
- Next message: disadvantages of linux cluster - admin
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
hi ya robert On Wed, 6 Nov 2002, Robert G. Brown wrote: > On Tue, 5 Nov 2002 alvin at Maggie.Linux-Consulting.com wrote: > > > - one person should be able to maintain 100-200 servers ... within an 8hr > > period .... ( to hit reset if needed and boot it properly ) ... > > -- think google .. they have 10,000 machines... and at 5,000 servers, > > they still had 5-10 people maintainting 5,000 servers > > The limiting element (we've found) in either LAN or cluster is not > software scaling at all. We have OS installation down to a few minutes > of work, and once installed tools like yum automate most maintenance. I'd say ... one needs to have a "admin policy" put into place that all users and root-admins must obey and follow > It is hardware, humans, and changes. hardware problem is sorta fixable ... - as you say... dont buy cheap hw to say a buck-or-two ( its NOT worth it ) - buy stuff you're comfy with, and wont be awaken at 3AM when the hw fails - hardware lifespan is 3 months.. after that, your favorite motherboard is out-of-stock or discontinued.. humans ... ( trainable and teachable and avoidable issues ) - nobody has root passwd unless they are the ones receiving the 3AM phone calls to come fix the machines "now" - give users an html web page to clicky-t-click to their hearts content to run the jobs or "command lines" to do the same - harden the server from the user standpoint - remove /usr/sbin and /usr/local/sbin from user access - no user has root passwd - remove passwd command, remove tar, remove make/gcc... changes... ( most expensive if it breaks - depending on 3rd party sw ) - after the server is built and patched and hardened.. - you dont need to apply any new changes unless it prevents some kind of functionality that is needed and or a security vulnerability/exploitability > Hardware breaks and the > probability of failure is proportional to the number of systems. Humans > have problems with this package or application or that printer and those > problems also scale with the number of systems. if they have problems... they do NOT get to fix it.. and have a redundant backup methodology and systems to do the same job ... if done right ... i think things scales well... :-) < Even if your > OS/software setup is "perfect", you cannot avoid it costing minutes to > hours of systems person time every time a system breaks, every time a > human you're supporting needs help. You also cannot avoid constantly if one human needs help... its likely others will need the same help... - send um to the internal "help docs" > working on the future -- preparing for the next major revision upgrade, > installing new hardware, building a new tool to save yourself time on > some specific task that isn't yet scalably automated. > > This teaches us how to minimize administrative expense. > > * Buy high quality hardware with on-site service contracts (expensive if you need "on-site contracts" ... buy other hardware instead ... as it implies that the hw will die ... if you need onsite contracts ... they are expensive ... and your systems will be down till they show up ... you cannot wait for them ??? lots of inhouse people probably wuld like to fiddle with the dead/dying/flaky box for the same expensive time the outside service contractors are paying ??? ( whom actually gets $10/hr -$20/hr to go from site-to-site ... while the contract incident costs are $150/hr - $300/incidents and goes upwards > up front but cheap later on) OR be prepared to deal with the higher rate > of failure and increase in local labor cost. Note that either strategy > might be cost-benefit optimal depending on the number of systems in > question and your local human resources and how well, quickly, and > cheaply your vendor can provide replacement parts. To achieve the > highest number of systems per admin person, though, you'll definitely > need to go with the high quality hardware option. cheaper to buy 2 systems.... than it is to buy support contracts.. ( keep lots of "spare parts" floating around - you need to back things up anyway.... those are your spare parts too and emergency replacements > * Shoot your users. G'wan, admit it, you've thought about it. give um GUIs to use .. :-) > They > just clutter up the computing landscape. Well, OK, so we can't do that > <sigh>. So user support costs are relatively difficult to control, > especially since it is a well known fact that all the things one might > think of to reduce user administrative costs (providing extensive online > documentation, providing user training sessions, providing individual > and personalized tutorial sessions) are metaphorically equivalent to > pissing into a category 5 hurricane. > > * Don't upgrade. Don't change. Don't customize. unfortunately... customization is always needed ??? customization is how to keep the costs down to almost zero ?? and self automating ... no matter which distro is used or compute platform > It is a well-known > fact that one could get as much work done with the original slackware or > RH 5.2 -- or even DOS -- as one can today with RH 8.0 (scaled for CPU > speed, of course). A further advantage of never changing is that > eventually even the dullest of users figures out pretty much everything > that can be done with the snapshot you've stuck with for the last five > years. > > So Google can manage with a relatively few admin humans because they > probably hide hardware expenses behind a fancy service contract (so that they do NOT have service contracts... - if the pc dies for any reason... it goes offline and stays offline ... they have about 5,000 PCs that are just sitting there occupying space .. and doing nothing... its powered off its cheaper for them to just buy a new P4-2G machine for $500 than to go figure out what broke and why ... on an old server that is obsolete ?? ( some being Pentium-class pcs ) > they REALLY have another ten full time Dell maintenance folks who do > nothing but pull and fix systems all day long) last i heard ..few months ago .. they buy generic PCs ... parts only ... ( think they might be farming out the generic physical pc-assembly part ( of using this mb w/ that cpu and xxx memory w/ specific disk drive ) - it needs to be able to do network boot/install - it needs to be able to run pxe - they build and install in something like 5 minutes and convert to a front end www machine or a "search index box" or user workstattion or ?? fun stuff ... to try to keep machines up and running .. while keeping the noise level from users at a minimum ... c ya alvin > and because they don't > have any users. Well, they have LOTS of users but they're all far away > and can't come into their offices ranting and don't expect their hands > to be held while learning a simple command like ls with no more than a > few dozen command line options. And I'm sure that THEY never change a > thing they don't have to, and dread the day they have to. > > More realistically, we're finding that in an active LAN/cluster > environment, two full time admins are a bit stretched when the total > number of LAN seats plus cluster nodes reach up towards 400-500, over > 200 apiece, with all of the above (HHC) being the limiting factors. One > reason the Tyans we opted for for our last round of cluster nodes have > been a problem is the anomalously high costs of installing them (see > ongoing discussion of their quirky BIOS) and their relatively high rate > of hardware failure. We're now considering going back to Intel Xeon > duals and are evaluating a loaner -- they are a bit more expensive but > if they reduce human costs they'll be worth it. >
- Previous message: disadvantages of linux cluster - admin
- Next message: disadvantages of linux cluster - admin
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
