[Beowulf] [jak@uiuc.edu: Re: [APPL:Xgrid] [Xgrid] Re: megaFlops per Dollar? real world requirements]
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Eugen Leitl eugen at leitl.orgSun May 15 04:28:03 PDT 2005
- Previous message: [Beowulf] CFP: HPC-Asia 2005
- Next message: [Beowulf] [jak@uiuc.edu: Re: [APPL:Xgrid] [Xgrid] Re: megaFlops per Dollar? real world requirements]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
----- Forwarded message from "Jay A. Kreibich" <jak at uiuc.edu> ----- From: "Jay A. Kreibich" <jak at uiuc.edu> Date: Sat, 14 May 2005 01:08:51 -0500 To: xgrid-users at lists.apple.com Subject: Re: [APPL:Xgrid] [Xgrid] Re: megaFlops per Dollar? real world requirements User-Agent: Mutt/1.4.2.1i Reply-To: jak at uiuc.edu On Thu, May 12, 2005 at 01:45:45PM -0500, Jay A. Kreibich scratched on the wall: > IPoFW performance is very very low. Expect 100Mb Ethernet (yes, > that's "one hundred") to provide better performance than 400Mb FW. > There was a big discussion about this many months ago that led to > Apple removing any referneces to IPoFW from their Xserve and cluster > web pages. The utilization difference is that big. Since it appears that there are members on this list that disagree with me and would rather cuss at me in private than have an intelligent, rational discussion with the whole group. Since they choose harsh language over running a few simple bandwidth tests, I did that myself (numbers below), and will direct my a few comments at the group as a whole. Maybe others can contribute some meaningful comments. If you disagree with me, at least do it in public. > While the raw bandwidth numbers for FireWire are higher, the FireWire > MAC layer is designed around block transfers from a disk, tape, or > similar device. First off, let's be sure we're all on the same page. The original question was about the use of Xgrid over FireWire based networks. Since Xgrid runs on top of BEEP over TCP/IP, the question really boils down to one of performance of IP over FireWire-- e.g., IPoFW. It is important to understand that this is not an encapsulation of an Ethernet stream on the FireWire link, or some other more traditional networking technology, but actually running FireWire as the Layer-2 transport for IP. RFC-2734 explains how this is done. <http://www.rfc-editor.org/rfc/rfc2734.txt> The problem with IPoFW is that FireWire is designed as an infrastructure interconnect, not a networking system. It has a lot more in common with systems like SCSI, HiPPI, and Fibre Channel than it does with systems like Ethernet. Since every major networking technology of the last 30 years has been frame/package or cell based (and even cell is getting more and more rare), it shouldn't be a big shock that most traditional networking protocols (e.g. IP) are designed and tuned with these types of physical transport layers in mind. While FireWire is much better at large bulk transfers, it is not so hot at moving lots of very small data segments around, such as individual IP packets. In many ways, it is like the difference between a fleet of large trucks and a train of piggy-back flat cars. Both are capable of transporting the same basic unit of data, but each is designed around a different set of requirements. Each has its strength and weakness, depending on what you are trying to do. If you're trying to move data en mass from a disk (or video camera) to a host system, the train model will serve you much better. The connection setup is expensive, but the per-unit costs are low assuming a great number of units. If, on the other hand, you're trying to download data from the web, the truck model is a better deal. The per-unit costs are a bit higher, but the system remains fairly efficient with lower numbers of units since the connection setup is much less. So if you hook two machines together with a FireWire cable, put one of those machines into "target disk" mode, and start to copy files back and forth, I would expect you get really good performance. In fact, despite the fact that GigE has over twice the bandwidth of FireWire 400 (GigE = 1000Mbps, FW400 = 400Mbps), I would expect the FireWire to outperform any network based file protocol, like NFS or AFP, running over GigE, in operations such as a copy. This is exactly the type of operation that FireWire is designed to do, so it is no shock that it does it extremely efficiently. When used in something like target disk mode, it is also operating at a very low level in the kernel (on the host side), with a great deal of hardware assistance. NFS or AFP, on the other hand, are layered on top of the entire networking stack (on both the "disk" side and the "host" side) and have to deal with a great number of abstractions. Also, because of the hardware design (largely having to do with the size of the packets/frames) it is difficult for most hardware to fully utilize a GigE connection, so the full 1000Mb can't be used (at all; this limit isn't specific to file protocols). So it isn't a big shock that a network file protocol doesn't work very efficiently and that the slower transport can do a better job-- it is designed to do a better job, and you aren't using the technologies in the same way. A more valid comparison might be between FW and iSCSI over Ethernet so that the two transport technologies are at least working at the same level (and even then, I would still expect FW to win, although not by as much). This is, however a two way street. If we return to the question of IPoFW, where you are moving IP packets rather than disk blocks, it should be no shock that a transport technology specifically designed to move network packets can outperform one that was designed around block copies. Ethernet is a very light-weight protocol (which is both good and bad, like the trucks) and deals with frame based network data extremely well. Even if we assume that FireWire can run with a high efficiency, it would be normal to expect GigE to outperform it, just because it has 2.5x the bandwidth. But because you're asking FireWire to do something it isn't all that great at, the numbers are much worse. So here's what I did. I hooked by TiBook to my dual 1.25 QuickSilver. On each I created a new Network Location with just the FireWire physical interface, and assigned each one an address on the 10 net. There were not other active interfaces. I then ran a series of 60 second tests using "iperf" from NLANR, forcing a bi-directional data stream over the IPoFW-400 link. I used the TCP tests, because this is the only way to have the system do directly bandwidth measurements. This adds overhead to the transaction and reduces the results (which are indicated as payload bytes only), but since I ran the test the same way in all cases, that shouldn't make a huge difference. Anyways, with the bi-directional test, I was able to get roughly 90Mbps (yes, "ninety megabits per second") upstream, and 30Mbps downstream using the IPoFW-400 link. It seems there was a lot of contention issues when data was pushed both ways at the same time, and one side seemed to always gain the upper hand. That's not a very good thing for a network to do, and points to self-generated congestion issues. If I only pushed data in one direction, I could get it up to about 125Mbps. I'll grant you that's better than 100baseTX, but I'm not sure I consider half-duplex speeds all that interesting. As was clear from the other test, when you add data going the other way, performance drops considerably. Just to be sure I was doing the test correctly, I ran the same tests with a point-to-point Ethernet cable between the machines. Both machines have GigE, so it ran nicely around 230Mbps in both directions. That may sound a bit low, but the TiBook is an older machine and the processor isn't super fast. In fact, running 460Mbps of data through the TCP stack isn't too bad for an 800MHz machine (that's one payload byte per 14 CPU cycles, which is pretty darn good!) that isn't running jumbo frames. Speed aside, it is also important it point out that the up-stream and down-stream numbers were EXACTLY the same. The network seemed to have no contention issues, and both sides were able to run at the maximum speed the end-hosts could sustain. Just for kicks, I manually set both sides to 100Mb/full-duplex and ran the test. The numbers worked out to about 92Mbps, both ways. A bit lower than you might expect, but given the known overhead of TCP it isn't too bad. Again, both sides were able to sustain the same rates. It is also worth noting that the CPU loads on the systems seemed to be considerably less for this test than the FireWire test, even though the amount of data being moved was slightly higher. I also ran a few UDP tests. In this case, you force the iperf to transmit at a specific rate. If the system or network is unable to keep up, packets are simply dropped. In a uni-directional test the IPoFW-400 link could absorb 130 Mbps well enough, and was able to provide that kind of data rate. When pushed to 200Mbps, the actual transmitted data dropped to an astounding *20*Mbps or less. It seems that if a FireWire link gets the least be congested, it totally freaks out and all performance hits the floor. This isn't a big surprised given the upstream/downstream difference in the other tests. These types of operating characteristics are extremely undesirable for a network transport protocol. This wasn't a serious are rigorous test, but it should provide some "back of the envelope" numbers to think about. I encourage others to run similar tests using various network profiling tools if you wish to get better numbers. So call it BS if you want, but if we're talking about moving IP packets around, I stand by the statement that one should "Expect 100Mb Ethernet to provide better performance than 400Mb FW." I'll admit the raw numbers are close, and in the case of a nice smooth uni-directional data stream, the FW400 link actually out-performed what a 100Mb link could deliver-- but the huge performance derogation caused by congestion gives me serious pause for a more generalized traffic pattern. Regardless, it definitely isn't anything near GigE speeds. There are also more practical limits to the use of a FireWire network vs Ethernet. For starters, from what I understand of FireWire "hubs", they are usually repeater based, and not switch based, at least in the terms of a more traditional Ethernet network. So while the bandwidth numbers are close for a single point-to-point link, I would expect the FireWire numbers to drop off drastically when you started to link five or six machines together. There is also the issue of port density. You can get 24 port non-blocking GigE switches for a few thousand bucks. I'm not even sure if a 24 port FireWire hub exists. If you start to link multiple smaller hubs together (even with a switch style data isolation) your cluster's bi-section bandwidth sucks, and your performance is going to suffer. Beyond that, FireWire networks are limited to only 63 devices, although I would expect that to not be a serious limitation for most clusters. In short, while running something over FireWire is possible, I see very little motivation to do so, especially with the low-cost availability of high-performance Ethernet interfaces and switches. -j -- Jay A. Kreibich | CommTech, Emrg Net Tech Svcs jak at uiuc.edu | Campus IT & Edu Svcs <http://www.uiuc.edu/~jak> | University of Illinois at U/C _______________________________________________ Do not post admin requests to the list. They will be ignored. Xgrid-users mailing list (Xgrid-users at lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/xgrid-users/eugen%40leitl.org This email sent to eugen at leitl.org ----- End forwarded message ----- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://www.scyld.com/pipermail/beowulf/attachments/20050515/a91e2cc6/attachment.bin
- Previous message: [Beowulf] CFP: HPC-Asia 2005
- Next message: [Beowulf] [jak@uiuc.edu: Re: [APPL:Xgrid] [Xgrid] Re: megaFlops per Dollar? real world requirements]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
