[Beowulf] Parallel memory
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Richard Walsh rbw at ahpcrc.orgWed Oct 19 07:26:40 PDT 2005
- Previous message: [Beowulf] Parallel memory
- Next message: [Beowulf] Re:console hardware
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Henderson, T Todd @ IS wrote: > Are there any drivers, tools, etc. that can make the memory space on a > cluster look shared, something like pvfs but for memory? I'm sure > there would be a speed hit, but in this instance, speed isn't the > problem as much as memory. We have a code that uses a huge amount of > memory and the memory usage is proportional to the cube of the problem > size, but the time for it to run isn't too much of an issue. I've > been asked to parallelize the code using mpi which is going to be a > major effort. However, I thought that if there was anyway, even if > inefficient speed wise, to create a virtual parallel memory system it > would be better than it using swap space and save a bunch of coding time. > > > > Also, are there any tools to help implement mpi in an older code? > > > > Any thoughts? > > > > Thanks, > > Todd > > > >------------------------------------------------------------------------ > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > Todd, I have not read through all the replies, but options to investigate include the Unified Parallel C (UPC) and Co-Array Fortran (CAF) programming models. These are parallel programming extensions to standard C and F90. They prefer hardware that supports instruction level remote memory access like that of the Cray X1 and SGI Altix (so-call partitioned Global Address Space memory pGAS) where they out performance MPI in both bandwidth and latency measures, but there are compilers (source-to-source) available for clusters that compile to RDMA abstraction layers/libraries. Most of the common cluster interconnects have a library, so you can grab a version of the Berkeley UPC compiler for your particular interconnect/cluster. In UPC, global arrays are declared as shared: shared double mass[32][THREADS]; and distributed across the abstracted global address space according to a blocking factor that you can control. References are implicitly mixed between data that happens to be local and the rest that is remote. Performance is managed by minimizing remote references through that blocking factor, casting shared pointers to local pointers, and affinity mapped indexing in looping structures. The performance issues that others I am sure are mentioninng do not dissappear, but coding is more implicitly parallel, more idiomatic ... than message passing and can be introduced incrementally even to MPI code. In CAF, your declare a co-array that creates a copy on each image/processor: real, dimension(1:100)[*] :: mydata then any reference to the a non-local piece of the co-array like: mass = mydata(59)[4] implies a remote reference (this assignment grabs the 59th element from the 4th image on each image executing this code). Local references just drop the []s. Again, the performance issues related to remote references do not disappear, but are abstracted away in the particular RDMA library for the interconnect of the cluster. The target for performance of these models is to equal or slightly improve on MPI on cluster systems while simplfying the conversion of the serial code to parallel. There are two competing abstraction libraries, GASnet and ARMCI. Google these and the two extensions. If you are interested I can send you my course notes for both extensions. There are several good references as well and a recently published book on UPC. These are not replacements for MPI especially in the short run, but do create the impression of a single memory space regardless of the real remoteness of the each partition. Regards, Richard Walsh AHPCRC -- Richard B. Walsh Project Manager Network Computing Services, Inc. Army High Performance Computing Research Center (AHPCRC) rbw at ahpcrc.org | 612.337.3467 ----------------------------------------------------------------------- This message (including any attachments) may contain proprietary or privileged information, the use and disclosure of which is legally restricted. If you have received this message in error please notify the sender by reply message, do not otherwise distribute it, and delete this message, with all of its contents, from your files. -----------------------------------------------------------------------
- Previous message: [Beowulf] Parallel memory
- Next message: [Beowulf] Re:console hardware
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
