Processor contention(?) and network bandwidth on AMD
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduMon Apr 29 14:07:24 PDT 2002
- Previous message: Processor contention(?) and network bandwidth on AMD
- Next message: Processor contention(?) and network bandwidth on AMD
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 29 Apr 2002, Joshua Baker-LePain wrote: > > > unloaded: 11486.6 KB/real sec > > > 2 matlab simulations: 10637.8 KB/real sec > > > 2 matlab simulations and 2 SETI at homes (nice -19): 6645.4 KB/real sec > > > > SETI at home is obviously in the "so don't do that" category. I expect your > > matlab was decelerated by a similar amount. > > Sure, but it was just an example of a niced background load, which > "shouldn't" interfere with anything. It certainly shouldn't crash > bandwidth like that. Joshua, Actually, running a heavy background load can (as you have observed) significantly affect network times, especially if it is the receiver that is loaded. As to whether or not it "should", I cannot say (kind of a value judgement there:-), but one can try to understand it. There are deliberate tradeoffs made in the tuning of the kernel and for better or worse the linux tradeoffs optimize "user response time" at the expense of a variety of things that might improve throughput on a purely computational load or throughput on the network or pretty much anything else. Sometimes one can retune -- Josip Loncaric's TCP patch is one such retuning, but one can also envision changing timeslice granularity and other things to optimize one thing at the expense of others. Generally such a retuning is a Bad Idea. Right now the kernel is pretty damn good, overall, and all components are delicately balanced. As Mark's previous reply made clear, some naive retunings would just lock up the system (or really make performance go to hell) as important components starve. It isn't too hard to see why loading the receiver might decrease the efficiency of the network. Imagine the network component of the kernel from the point of view of the stream receiver (not the transmitter). It never knows when the next packet/message will come through. The kernel does its best to do OTHER work in the gaps between packets by installing top half and bottom half handlers and the like (so it does no more work then absolutely necessary when the asynchronous interrupt is first received, postponing what it can until later) to provide the illusion of seamless access to the CPU and other resources for running processes. One side effect of this is that there are times when the delivery of packets is delayed so that a background application can complete a timeslice it was given "in between" packets when the system was momentarily idle. What this ends up meaning is that when the system is BUSY, it de facto delays the delivery of packets that it has buffered for fractions of the many timeslices of CPU the system is allocating to the competing tasks when the network process is momentarily idle (blocked, waiting for the next packet). If it didn't do this a high speed packet stream could (for example) starve running processes for CPU by forcing them to wait for the whole stream to complete. Processing the text of TCP packets (not to mention the interrupts and context switches themselves) is a nontrivial load on the CPU in its own right, so much so that people try NOT to run high-performance network connections for fine-grained code over TCP if they can avoid it. The network stack ends up contending for CPU with everything else that is running, and it makes no sense to retune things so that this is never true as the cure will likely be worse than the disease for most usage patterns. Curiously, transmitting works more efficiently than receiving, probably because the transmitter is in charge of the scheduling. In very crude terms the transmitter is never interrupted or delayed by other processes -- it just gets its timeslice, executes a send or stream of sends, eventually blocks (moving up in priority while blocked) or finishes its timeslice, and then moves on. No delays to speak of. Try this: Do your netpipe transmitter on an unloaded host, a host at load 1 and at load 2. Do your netpipe receiver on an unloaded host, a host at load 1 and one at load 2. Fill in the matrix -- load 0 to load 0, load 0 to load 1, etc. I found (in similar tests done years ago) that a TRANSMITTER could be loaded to 2 (per cpu) with only a small degradation of throughput, but loading a RECEIVER would drop throughput dramatically, by as much as 50%. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: Processor contention(?) and network bandwidth on AMD
- Next message: Processor contention(?) and network bandwidth on AMD
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
