[Beowulf] I/O bound simulation
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comFri Nov 30 15:33:40 PST 2007
- Previous message: [Beowulf] I/O bound simulation
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Mark: Mark Kozikowski wrote: [...] > When I approach the higher fidelity levels, the simulation starts > to choke on the quantity of data being processed. > > It appears that the system is failing on I/O. Transferring large > amounts of time critical data between process elements. Could you describe what you mean by "choke" and what you mean by "failing on I/O"? This would help enormously. Also, could you tell us what uname -a reports, as well as what type of network you are using, and the NIC and switch type for laughs ? (Intel, broadcom, SMC, ...) > I a running on a mostly standard Red Hat distro, no special > compiling or running architectures are in place. Is this something you built from source? Using MPI? > > Do any of you have suggestions as to how I might start > getting control of this I/O problem? First is problem identification, which you may have gotten a good start on. It would help to know what I indicated above. Also, it might be worth it if you grab a copy of dstat (http://dag.wieers.com/rpm/packages/dstat/) and atop (http://dag.wieers.com/rpm/packages/atop/) and install them. Dstat is your friend (though it does make mistakes on aggregate IO calculations, it is useful at figuring out other relevant information). Atop is your friend on your file server node. Run atop on the head node, and dstat on the compute nodes while running your job. Try to capture some of this output ... simple cut and paste is fine. If you can show a "choked" versus "non-choked" run, this would help immensely in diagnostics. Once we are sure where the point of pain is, the next steps would be planning for remediation of the same. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615
- Previous message: [Beowulf] I/O bound simulation
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
