[Beowulf] Differenz between a Grid and a Cluster???
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comThu Sep 22 05:05:23 PDT 2005
- Previous message: [Beowulf] Differenz between a Grid and a Cluster???
- Next message: [Beowulf] Differenz between a Grid and a Cluster???
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
In a nutshell, a grid defines a virtualized cloud of processing/data motion across one or more domains of control and authentication/authorization, while a cluster provides a virtualized cloud of processing/data motion across a single domain of control and authentication/authorization. Clusters are often more tightly coupled via low latency network or high performance fabrics than grids. Then there is the relative hype and the marketing/branding ... Robert G. Brown wrote: > Mark Hahn writes: > [...] > To be really fair, one should note that tools have existed to manage > moderate cluster heterogeneity for single applications since the very > earliest days of PVM. The very first presentation I ever saw on PVM in > 1992 showed slides of computations parallelized over a cluster that > included a Cray, a pile of DEC workstations, and a pile of Sun > workstations. PVM's aimk and arch-specific binary path layout was aimk is IMO evil. Not PVM's in particular, but aimk in general. The one I like to point to is Grid Engine. It is very hard to adapt to new environments. When you run on multiple heterogenous platforms and you are dealing with floating point codes, you need to be very careful with a number of things, including rounding modes, precision, sensitivity of the algorithm to roundoff error accumulation at different rates, the fact that PCs are 80 bit floating point units, and RISC/Cray machines use 32/64 bits and 64/128 for doubles. It could be done, but if you wanted reliable/reasonable answers, you had to be aware of these issues and make sure you code was designed appropriately. [...] > Some of the gridware packages do exactly this -- you don't distribute > binaries, you distribute tarball (or other) packages and a set of rules > to build and THEN run your application. I don't think that any of these > use rpms, although they should -- a well designed src rpm is a nearly RPM is not a panacea. It has lots of problems in general. The idea is good, just the implementation ranges from moderately ok to absolutely horrendous, depending upon what you want to do with it. If you view RPM as a fancy container for a package, albiet one that is slightly brain damaged, you are least likely to be bitten by some of its more interesting features. What features? Go look at the RedHat kernels circa 2.4 for all the work-arounds they needed to do to deal with its shortcomings. I keep hearing how terrible tarballs and zip files are for package distribution. But you know, unlike RPMs, they work the same, everywhere. Sure they don't have versioning and file/package registry for easy installation/removal. That is an easily fixable problem IMO. Sure they don't have scripting of the install. Again, this is easily fixable (jar files and the par files are examples that come to mind for software distribution from java and perl respectively). > Most grids are likely not THAT hardware heterogeneous so that only a > handful (e.g. i386, x86_64) of binaries need to be maintained. Because > of binary compatibility, these grid applications give up at most certain > optimizations when run on imperfectly matched platforms, e.g. i386 on an > Opteron. That leaves plenty of room for very beneficial scaling as far > as the cycle consumer is concerned, even if it is less than hardware > optimal. It also permits the grid organization to trade off the human > costs of managing multiple binary images against the efficiency costs of > running a generic version even where it isn't optimal. Generally, if an application wrapper is in a container, say a .jar file (not advocating Java, but just bear with me), which runs on the execution target, and copies over the relevant binary into a temporary binary directory, then you don't need anything installed on the grid system execution host apart from a queuing system connection. > Basically, it isn't that hard to manage binaries for x86_64 and i386 -- > I have to do this in our own cluster, let alone a grid. Nor is it that > bad (performance-wise) if you have to run i386 on x86_64. We have seen up to a factor of 2 on chemistry codes. If your run takes 2 weeks (a number of our customers take longer than 2 weeks), it matters. If your run takes 2 minutes, it probably doesn't matter unless you need to do 10000 runs. It is not hard to manage binaries in general with a little thought and design. It is not good to purposefully run a system at a lower speed as a high performance computational resource unless the cost/pain of getting the better binaries is to large or simply impossible (e.g. some of the vendor code out there is still compiled against RedHat 7.1 on i386, makes supporting it ... exciting ... and not in a good way) > For most of > the (embarrassingly parallel) jobs that use a grid in the first place, > the point is the massive numbers of CPUs with near perfect scaling, not > how much you eke out of each CPU. Grids are used not just for embarrassingly parallel jobs. They are also used to implement large distributed pipeline computing systems (in bio for example). These systems have throughput rates governed in large part by the performance per node. Running on a cluster would be ideal in many cases, as you will have that nice high bandwidth network fabric to help move data about (gigabit is good, IB and others are better for this). Rapidly emerging from the pipeline/grid world for bio computing is something we have been saying all along, that the major pain is (apart from authentication, scheduling, etc) data motion. There, CPU speed/type doesn't matter all that much. The problem is trying to force fit a steer through a straw. There are other problems associated with this as well, but the important aspect of these systems is measured in throughput (which is not number of jobs of embarrassingly parallel work per unit time, but how many threads and how much data you can push through per unit time). To use the steer and straw analogy, you can build a huge pipeline by aggregating many straws. Just don't ask the steer how he likes having parts of him being pushed through many straws. The pipeline for these folks is the computer (no not the network). Databases factor into this mix. As do other things. The computations are rarely floating point intensive. Individual computation performance does matter, as pipelines do have transmission rates at least partially impacted by CPU performance. In some cases, long pipelines with significant computing tasks are CPU bound, and can takes days/weeks. These are cases prime for acceleration by leveraging the better CPU technology. >> in that way of thinking, grids make a lot of sense as a >> shrink-wrap-app farm. > > Sure. Or farms for any application where building a binary for the 2-3 > distinct architectures takes five minutes per and you plan to run them > for months on hundreds of CPUs. Retuning and optimizing per > architecture being strictly optional -- do it if the return for doing so > outweighs the cost. Or if you have slave -- I mean "graduate student" > -- labor with nothing better to do:-) Heh... I remember doing empirical fits to energy levels and band structures and other bits of computing as an integral part of the computing path for my first serious computing assignment in grad school. I seem to remember thinking it could be automated, and starting to work on the Fortran code to do. Perl was quite new then, not quite to version 3. Pipelines are set up and torn down with abandon. They are virtualized, so you never know which bit of processing you are going to do next, or where your data will come from, or where it is going to until you get your marching orders. It is quite different from Monte Carlo. It is not embarrassingly parallel per node, but per pipe which may use one through hundreds (thousands) of nodes. Most parallelization on clusters is the wide type: you distribute your run over as many nodes as practical for good performance. Parallelization on grids can either be trivial ala Monte Carlo, or pipeline based. Pipeline based parallelism is getting as much work done by creating the longest pipeline path practical keeping as much work done per unit time as possible (and keeping the communication costs down). Call this deep type parallelism On some tasks, pipelines are very good for getting lots of work done. For other tasks they are not so good. There is an analogy with current CPU pipelines if you wish to make it. Joe > > rgb > >> >> regards, mark hahn. >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > ------------------------------------------------------------------------ > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615
- Previous message: [Beowulf] Differenz between a Grid and a Cluster???
- Next message: [Beowulf] Differenz between a Grid and a Cluster???
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
