[Beowulf] [OT] HPC and University IT - forum/mailing list?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at physics.mcmaster.caTue Aug 15 14:50:03 PDT 2006
- Previous message: [Beowulf] [OT] HPC and University IT - forum/mailing list?
- Next message: [Beowulf] [OT] HPC and University IT - forum/mailing list?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> - integration of a cluster into a larger University IT infrastructure > (storage, authentication, policies, et. al.) just say no. we consciously avoid taking any integration steps that would involve the host institution trusting us or vice versa. well, not quite - we managed to live for ~5 years with our machines dual-homed, network-wise (albeit only ssh access.) but most of the (now 16) institutions are now beginning to worry (though we've _not_ had any security breaches). IMO, we'll continue to avoid any intimate integration, trust-wise, and stick to ssh, possibly augmented with a http-based user portal. the goal of loose coupling is _not_ inherently in conflict with good design (usability, robustness, managability, cost-effectiveness.) > - funding models (central funds, grant based) both for equipment and > personnel. yeah, our primary funding is pure-hardware, and we have to do some funny book-keeping to get people money out of it. (especially people other than "direct infrastructure support" - for instance, we have parallel programming consultants, who are not sysadmins, and who can potentially participate in research. the main funding agency will support sysadmins, but not these.) > - centralized research IT versus local departmental/school support. IMO, centralization breeds contempt ;) the institution where I sit does have central IT, but it's mostly limited to providing network/phones and groups that do the "business" side of things like grades, money, registration. pretty much everything research- related (incl a prof's desktop) is handle outside the central IT. most depts have some local experts, but a lot of the expertise is in a contract consulting group which reports to the VP-research. this seems to lead to a more responsive organization (though that could easily be a matter of people, history, innate niceness of Canadians, etc ;) > - education and training we run classes/seminars/symposia/etc several times a year at each site. mostly, it doesn't seem too much to expect people to pick up the basics with little help (use an ssh client, choose a text editor, here's how to compile, and submit, where to store your files.) it's not all that clear how HPC fits into curricula, undergrad or higher. we have relatively little involvement by CS-ish people, and lots from in-domain expert researchers. we're just starting to figure out how to put some for-credit HPC training into the computational streams of some depts (physics, etc). > - deployment issues (who pays F&M?) facilities and maintenance? it's fuzzy for us. I have a compressor in a liebert reporting low suction right now, and I'm not sure who's going to fix it. technically, the hosting university owns the equipment, and signed a letter stating that they'd provide infrastructural support. our compute hardware is bought with 3yr NBD onsite support. is chiller maintenance different from server maintenance? > - sustainability and growth donno. our first round of hardware was installed in 2001 and was based on ~400p of alphas. some smallish refreshes happened, including 1-200p clusters, but the main refresh was installed in 1q06 (~7k cpus, mostly opterons.) (and all the new stuff was instantly full, of course). but I wouldn't claim a 75% annual growth rate was sustainable, and I'm not sure how I'd plan for the next round. since I'm a technologist, my ideal would be a constant sampling of good-looking parts, and when the time is right, move quickly to buy a substantial facility. for instance, 2005 was a very bad year for most of our users, since we actually had _fewer_ cycles available due to renovations. pipelining the acquisitions would have been a lot beter. not to mention that you really want to respond to products, not be driven entirely by funding cycles. there's a sweet-spot for buying any product, after the initial wrinkles are clarified, and when its unique properties are still unique. what we're learning about Core2 right now, for instance, is quite fascinating, and should influence anyone buying a cluster in the next 6-9 months. after that, perhaps AMD K8L will need to be considered. interconnect-wise, InfiniPath seems to be still quite a wise choice, but perhaps the 10G market will finally do something... that said, it's entirely possible to sustain a "rolling cluster": start with one generation, and incrementally move it forward. this is easiest if you have standard parts (plain old ethernet, IPMI, PXE, x86, 110/220 auto-sensing PS, 1U). people will still use old hardware, if you make it available and easy.
- Previous message: [Beowulf] [OT] HPC and University IT - forum/mailing list?
- Next message: [Beowulf] [OT] HPC and University IT - forum/mailing list?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
