[Beowulf] Re: dealing with lots of sockets
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Perry E. Metzger perry at piermont.comWed Jul 2 18:06:55 PDT 2008
- Previous message: [Beowulf] Re: dealing with lots of sockets
- Next message: [Beowulf] Re: dealing with lots of sockets
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Robert G. Brown" <rgb at phy.duke.edu> writes: > I'm not quite sure what you mean by "vast numbers of teeny high > latency requests" so I'm not sure if we really are disagreeing or > agreeing in different words. I mostly have worried about such schemes in the case of, say, 10,000 people connecting to a web server, sending an 80 byte request, and getting back a few k several hundred ms later. (I've also dealt a bit with transaction systems with more stringent throughput requirements, but rarely with things that require an ack really, really fast.) That said, I'm pretty sure event systems win over threads if you're coordinating pretty much anything... >> Sure, but it is way inefficient. Every single process you fork means >> another data segment, another stack segment, which means lots of >> memory. Every process you fork also means that concurrency is achieved >> only by context switching, which means loads of expense on changing >> MMU state and more. Even thread switching is orders of magnitude worse >> than a procedure call. Invoking an event is essentially just a >> procedure call, so that wins big time. > > Sure, but for a lot of applications, one doesn't have a single server > with umpty zillion connections Well, often one doesn't build things that way, but that's sort of a choice, isn't it. Your machine has only one or two or eight processors, and any other processes/threads above that which you create are not actually operating in parallel but are just a programming abstraction. It is perfectly possible to structure almost any application so there is just the one thread per core and you otherwise handle the programming abstraction with events instead of additional threads, processes or what have you. > If the connection is persistent, the overhead associated with task > switching is just part of the normal multitasking of the OS. That overhead is VERY high. Incredibly high. Most people don't really understand how high it is. If you compare the performance of an http server that manages 10,000 simultaneous connections with events, versus one that handles it with threads, you'll see there is no comparison -- events always beat threads into the ground, because you can't get away from threads requiring a new stack for each thread, and you can't get away from the fact that context switching is far more expensive than a procedure dispatch. > Similarly, many daemon-driven tasks tend to be quite bounded. If a > server load average is down under 0.1 nearly all the time, nobody cares, That implies almost nothing is in the run queue. For an HPC system, one hopes that the load is hovering around 1. Less means you're wasting processor, more means you're spending too much time context switching. But I digress.. > Still, it is important to understand why there are a lot of applications > that are. In the old days, there were limits on how many processes, and > open connections, and open files, and nearly any other related thing you > could have at the same time, because memory was limited. Believe it or not, memory is still limited, and context switch time is still pretty bad. Changing MMU contexts is unpleasant. Even if you don't have to do that, because you're using another thread in the same MMU context rather than a process, the overhead is still quite painful. Seeing is believing. There are lots of good papers out there on concurrency strategies for systems with vast numbers of sockets to manage, and there is no doubt what the answer is -- threads suck compared to events, full stop. Event systems scale linearly for far longer. > Or maybe not. If you make writing event driven network code as easy, > and as well documented, as writing standard socket code and standard > daemon code, the forking daemon may become obsolete. Maybe it IS > obsolete. It is pretty easy. The only problem is getting your mind wrapped around it and getting experience with it. Most people have been writing fully linear programs for a whole career. If you tell them to try events, or try functional programming, or other things they're not used to, they almost always scream in agony for weeks until they get used to it. "Weeks" is often more overhead than people are willing to suffer. That said, I am comfortable with both of those paradigms... > So, what do you think? Should one "never" write a forking daemon, or > inetd? It depends. If you're doing something where there is going to be one socket talking to the system a tiny percentage of the time, why would you bother building an event driven server? If you're building something to serve files to 20,000 client machines over persistent TCP connections and the network interface is going to be saturated, hell yes, you should never use 20,000 threads for that, write the thing event driven or you'll die. It is all about the right tool for the job. Apps that are all about massive concurrent communication need events. Apps that are about very little concurrent communication probably don't need them. >> Event driven systems can also avoid locking if you keep global data >> structures to a minimum, in a way you really can't manage well with >> threaded systems. That makes it easier to write correct code. >> >> The price you pay is that you have to think in terms of events, and >> few programmers have been trained that way. > > What do you mean by events? Things picked out with a select statement, > e.g. I/O waiting to happen on a file descriptor? Signals? More the former, not the latter. Event driven programming typically uses registered callbacks that are triggered by a central "Event Loop" when events happen. In such a system, one never blocks for anything -- all activity is performed in callbacks, and one simply returns from a callback if one can't proceed further. The programming paradigm is quite alien to most people. I'd read the libevent man page to get a vague introduction. -- Perry E. Metzger perry at piermont.com
- Previous message: [Beowulf] Re: dealing with lots of sockets
- Next message: [Beowulf] Re: dealing with lots of sockets
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
