FW: Node cloning
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Schilling, Richard RSchilling at affiliatedhealth.orgFri Apr 6 11:50:52 PDT 2001
- Previous message: Does Linux support Multiple threads?
- Next message: Scyld & linpack
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Did you all hear that the English actually did this first. The hostname of the first node cloned on their server farm was "dolly", but it keeps crashing due to bad code transmitted in the cloning process . . . Sorry . . . it's Friday and I couldn't resist. Richard Schilling Web Integration Programmer/Webmaster phone: 360.856.7129 fax: 360.856.7166 URL: http://www.affiliatedhealth.org Affiliated Health Services Information Systems 1971 Highway 20 Mount Vernon, WA USA -----Original Message----- From: Richard C Ferri [mailto:rcferri at us.ibm.com] Sent: Thursday, April 05, 2001 6:41 PM To: Robert G. Brown Cc: Giovanni Scalmani; beowulf at beowulf.org Subject: Re: Node cloning Since cloning continues to be a fertile topic, I'll jump right in... if you're not interested in node installation or cloning, skip this note... I feel that the node installation problem using the NFS root/tftp/pxe boot approach has been solved by LUI (oss.software.ibm.com/lui) and others. I don't see why anyone would need to roll their own solution. When one defines a node to LUI, LUI creates a custom remote NFS root and updates dhcpd.conf or /etc/bootptab with an entry for the node. One chooses a set of resources to install on the node, and creates a disk partition table. Resources are lists of RPMs or tar files, custom kernels, individual files (/etc/hosts, /etc/resolv.conf, /etc/shadow would be good examples). You either pxe boot the node, or boot from diskette using etherboot technology. The node boots, gets a custom boot kernel over the network via tftp, and transfers control. The kernel mounts the remote root, and reads the list of allocated resources. Based on the resources, the node partitions the harddrive,creates FSs, installs RPMS or tar files, copies any specified files, installs a custom kernel, and so on. The software configures the eth0 device based on the IP info for that particular node, assigns a default route and runs lilo to make the node ready to boot. If you allow rsh, LUI will also remove the /etc/bootptab entry, and optionally reboot the node. It keeps a log of all activity during install. The goal of the LUI project is to install any distro on any architecture (ia-32, itanium, PowerPC and alpha). So far RedHat and ia-32 are supported, but Suse and PowerPC are in test but not ready for prime time. It's an open source project, and open to contributors. Since LUI is resource based, and resources are reusable, it's perfect for heterogenous clusters, clusters where nodes have different requirements. Many people have said that the NFS/tftp/pxe solution doesn't scale and should be abandoned. Well, users have installed 80-way clusters using LUI, and while that's not huge, it's not dog meat either. Simple cloning, basically copying an image from one golden node to another, changing some rudimentary info along the way, is performed today by SystemImager, based on rsync technology. rysync is superior to simple copy in that you can easily exclude files or directories (/var for example) and can be used for maintainence as well. rsync does intelligent copying for maintenance -- it copies only files that are different on the source and target systems, and copies only the parts of the file that have changed. SystemImager and rsync are good solutions when the nodes in your cluster are basically the same, except for IP info and disk size. Then there's kickstart. Well, it's ok if you do RedHat. I think the real burning issue is not how to install nodes, but *whether* to install nodes or embrace the beowulf 2 technology from SCYLD. I think SCYLD is close to becoming the linux beowulf appliance, a turnkey commodity supercomputer. It will be interesting to see how many new clusters adopt traditional beowulf solutions, and how many adopt beowulf 2... the view from here, Rich Richard Ferri IBM Linux Technology Center rcferri at us.ibm.com 845.433.7920 "Robert G. Brown" <rgb at phy.duke.edu>@beowulf.org on 04/05/2001 06:47:46 PM Sent by: beowulf-admin at beowulf.org To: Giovanni Scalmani <Giovanni at lsdm.dichi.unina.it> cc: <beowulf at beowulf.org> Subject: Re: Node cloning On Thu, 5 Apr 2001, Giovanni Scalmani wrote: > > Hi! > > On Thu, 5 Apr 2001, Oscar Roberto [iso-8859-1] López Bonilla wrote: > > > And then use the command (this will take long, so you can do it overnight) > > cp /dev/hda /dev/hdb ; cp /dev/hda /dev/hdc ; cp /dev/hda /dev/hdd > > I also did this way for my cluster, BUT I've experienced instability > for some nodes (3/4 over 20). My guess was that "cp /dev/hda /dev/hdb" > copied also the bad-blocks list of hda onto hdb and this looks wrong > to me. So I partitioned and made the filesystems on each node and then > cloned the content of each filesystem. Those nodes are now stable. > > A question to the 'cp gurus' out there: is my guess correct about > the bad blocks list? One of many possible problems, actually. This approach to cloning makes me shudder -- things like the devices in /dev generally have to built, not copied, there are issues with the boot blocks and bad block lists and the bad blocks themselves on both target and host. raw devices are dangerous things to use as if they were flatfiles. Tarpipes (with tar configured the same way it would be for a backup|restore but writing/reading stdout) are a much safer way to proceed. Or dump/restore pipes on systems that have it -- either one is equivalent to making a backup and restoring it onto the target disk. One reason I gave up cloning (after investing many months writing a first generation cloning tool for nodes (which booted a diskless configuration, formatted a local disk, and cloned itself onto the local disk) and started a second generation GUI-driven one) was that just cloning isn't enough. There is all sorts of stuff that needs to be done to the clones to give them a unique identity (even something as simple as their own ssh keys), one needs to rerun lilo, it requires that you keep one "pristine" host to use as the master to clone or you have the very host configuration creep you set out to avoid. Either way you end up inevitably having to upgrade all the nodes or install security or functionality updates. These days there are just better ways (in my opinion) to proceed if your goal is simple installation and easy upgrade/update and low maintenance. Cloning is also very nearly an irreversible decision -- if you adopt clone methods it can get increasingly difficult to maintain your cluster without ALSO developing tools and methods that could just as easily have been used to install and clean things up post-install. Even so, if you are going to clone, I think that the diskless->local clone is a very good way to proceed, because it facilitates reinstallation and emergency operation of a node even if a hard disk crashes (you can run it diskless while getting a replacement). It does require either a floppy drive ($15) or a PXE chip, but this is a pretty trivial investment per node. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010406/e48cffb5/attachment.html
- Previous message: Does Linux support Multiple threads?
- Next message: Scyld & linpack
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
