dealing with lots of sockets (was Re: [Beowulf] automount on high ports)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduWed Jul 2 16:44:58 PDT 2008
- Previous message: dealing with lots of sockets (was Re: [Beowulf] automount on high ports)
- Next message: dealing with lots of sockets (was Re: [Beowulf] automount on high ports)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 2 Jul 2008, Perry E. Metzger wrote: > > "Robert G. Brown" <rgb at phy.duke.edu> writes: >>> Well, it actually kind of is. Typically, a box in an HPC cluster is >>> running stuff that's compute bound and who's primary job isn't serving >>> vast numbers of teeny high latency requests. That's much more what a >>> web server does. However... >> >> I'd have to disagree. On some clusters, that is quite true. On others, >> it is very much not true, and whole markets of specialized network >> hardware that can manage vast numbers of teeny communications requests >> with acceptably low latency have come into being. And in between, there >> is, well, between, and TCP/IP at gigabit speeds is at least a contender >> for ways to fill it. > > I have to admit my experience here is limited. I'll take your word for > it that there are systems where huge numbers of small, high latency > requests are processed. (I thought that teeny stuff in HPC land was > almost always where you brought in the low latency fabric and used > specialized protocols, but...) Not so much high latency, but there are many different messaging patterns. Some are BW dominated with large messages, some are many small latency dominated messages and use specialized networks, and some are in between -- medium sized messages, medium sized amounts. YMMV is a standard watchword. Some people do fine with TCP/IP over ethernet for whatever message size. I'm not quite sure what you mean by "vast numbers of teeny high latency requests" so I'm not sure if we really are disagreeing or agreeing in different words. If you mean that the problem of HA computing on a human timescale is different than the typical HPC problem then we agree very much, but then I don't see the point in the context of the current discussion. > Sure, but it is way inefficient. Every single process you fork means > another data segment, another stack segment, which means lots of > memory. Every process you fork also means that concurrency is achieved > only by context switching, which means loads of expense on changing > MMU state and more. Even thread switching is orders of magnitude worse > than a procedure call. Invoking an event is essentially just a > procedure call, so that wins big time. Sure, but for a lot of applications, one doesn't have a single server with umpty zillion connections -- which may be what you may mean with your "high latency teensy message" point above. If the connection is persistent, the overhead associated with task switching is just part of the normal multitasking of the OS. In cluster computing, one may have only a small set of these connections to any particular host, or one may have lots -- many to many communications, master-slave communications. Similarly, many daemon-driven tasks tend to be quite bounded. If a server load average is down under 0.1 nearly all the time, nobody cares, if the overhead of communication in a parallel application is down in the sub-1% range, people don't care much. But then, few cluster applications are built on forking daemons...;-) Still, it is important to understand why there are a lot of applications that are. In the old days, there were limits on how many processes, and open connections, and open files, and nearly any other related thing you could have at the same time, because memory was limited. Kernel resources (if nothing else) have to be allocated for each one, and kernel overhead associated with all of the connections, files, etc could scale up to where it more or less shut down a system. Nowadays, with my LAPTOP having 4 GB, multiple cores, far more scalable MP kernels, the limits are a lot more flexible, and it may well be better to maintain many persistent connections within a single application and make it essentially an extension of the kernel with the kernel managing the "multitasking" overhead of message reception per connection and then avoiding the additional multitasking associated with farming the information out per connection to a forked copy of a server process. As I said, very interesting and a good idea -- I'm learning from you -- but a good idea for certain applications, possibly more trouble than it's worth for others? Or maybe not. If you make writing event driven network code as easy, and as well documented, as writing standard socket code and standard daemon code, the forking daemon may become obsolete. Maybe it IS obsolete. So, what do you think? Should one "never" write a forking daemon, or inetd? [Incidentally, does this mean that you are similarly negative about forking applications in general, since similar resource constraints apply to ALL forks, right? Or should one use event driven servers only for big servers with no particular hurry on returning messages for any given connection? I'm guessing that when writing such a server, one has to do some of the work that the kernel would do for you for forked processes -- ensure that no connection is starved for timeslices or network slices, manage priorities if necessary, smoothly multitask any underlying computation associated with providing the data. After all, the MOST efficient server is one with the server code built into the kernel -- DOS plus an application, as it were. Why bother with the overhead of a general purpose multitasking operating system when you can handle all the multitasking native within your one monolithic application? Ditto networking -- why replicate general purpose features of the network stack in the kernel and network structs when you'll never need them for your ONE application? Usually one trades off the ease of programming and use in a general purpose environment against some penalty, as general purpose environments require more state information and overhead to maintain and operate. So are you arguing that there are no tradeoffs, and one should "always" write server network code (or code in a suitably segmented application) on an event model, or that it is a better one for some class of client applications, some pattern of use? This still is (I think) OT, as master-slave parallel applications are fairly common, with a toplevel master doling out units of work to the slaves and then collecting the results. I think that it is probably more usual to write the code for this as a non-forking application anyway, but I can still imagine exceptions. IIRC, some of these things are the motivation for e.g. Scyld and bproc. If anyone else on list is bored with this, let me know we can take it offline. > Event driven systems can also avoid locking if you keep global data > structures to a minimum, in a way you really can't manage well with > threaded systems. That makes it easier to write correct code. > > The price you pay is that you have to think in terms of events, and > few programmers have been trained that way. What do you mean by events? Things picked out with a select statement, e.g. I/O waiting to happen on a file descriptor? Signals? I think the bigger problem is that a lot of the events in question are basically (fundamentally) kernel interrupts, I/O being driven by one or more asynchronous processes, and you're right, a lot of programmers never learn to manage this because it is actually pretty difficult. One has to handle blocking vs non-blocking issues, raw I/O (in many cases), a scheduler of sorts to ensure that connections aren't starved (unless you are content to process events in FIFO order, letting an event piggy or buggy/crashed process hang the entire pending queue). Forking provides you with a certain amount of "automatic" robust parallelism. Without it, one has to make the code a lot more robust; if a forked connection crashes, it crashes just one connection, not the server or any of the rest of the existing connections. The kernel DOES do a lot of things for you on a forked process that you have to do for yourself in event driven code, and it isn't exactly trivial to provide it either well, efficiently, or robustly (where the kernel is perforce all three, within the limits imposed by its general purpose design). As I said, people wrote lots of applications on UDP because they thought "hmmm, I don't need ALL the overhead associated with making TCP robust, I'll use lightweight UDP instead and write my own packet sequencer, my own retransmit, etc." Then they discovered that by the time they ended up with something that was reliable, they hadn't really saved much -- or may well have ended up with something even slower than TCP. People work(ed) HARD on making TCP fairly efficient and making it handle edge cases. Doing it on your own is unlikely to match either one, unless you are an uberprogrammer. You sound like you probably are, but I'm not sure everyone is...;-) I'm not arguing, mind you -- I already believe that writing an event driven server (or client, or both in a more symmetric model) makes sense for a certain class of applications, including many/most of the ones relevant to cluster computing. I'm asking if one should NEVER write a forking daemon because the libraries you mention above provide schedulers and can manage dropped connections or hung resources or because you think that the programmer should always be able to add them as needed, or if there is a problem scale and server type for which it makes sense, and others for which it is overkill or for which the services provided by the kernel for forked processes (or threads, a rose by any other name...) are worth their cost. An event driven application IS basically a kernel, in a manner of speaking. Should every daemon be a kernel, or can some use the existing kernel for kernel-like functionality and focus on just provisioning a single connection well? rgb > > Perry > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
- Previous message: dealing with lots of sockets (was Re: [Beowulf] automount on high ports)
- Next message: dealing with lots of sockets (was Re: [Beowulf] automount on high ports)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
