comparing kernel socket performance

rickf@transpect.net rickf@transpect.net
Sun, 6 Sep 1998 17:29:27 -0400


On Fri, 4 Sep 1998, Perry Harrington wrote:

> > > }Well, the fundamental problem there is that it's hard to expand the #
> > > }of fd's available to a process. That alone disqualifies Linux for use
> > > }by most big IRC servers.
> > 
> > > You're saying that echo value > /proc/... or a recompile is "hard"?
> > 
> > The # of fd's available to the entire system can be increased by
> > echoing values to /proc. That's easy.
> > 
> > On most Unixes, you can make one syscall and up your # of fd's per
> > process to 1024 or 4096 or more. That's easy and standard.
> > 
> > In Linux, upping the # of fd's available to invididual processes
> > involves a kernel recompile, and a glibc recompile, and then all
> > processes get larger. This is not what I would call "easy". Um, do you
> > also have to recompile all RPMS? I'd have to think about that.
> 
> You are mistaken.  Only a kernel and (application which uses > 1024 fds) recompile 
> is neccessary to take advantage of thousands of descriptors.
> 
> There is a patch available for 2.0.33 for Squid which allows you to have an 
> arbitrary number of descriptors.  4096 per process is typically the sane upper
> limit, probably because of page size limits (FD_SET array occupies one page max).
> On alpha this is probably 8192.  The patch uses malloc to allocate the FD arrays,
> rather than statically allocating them.  
> 
> I have yet to see an application that could use > 4096 descriptors, and be properly
> written.

There are high-usage webserver applications that can easily go over 2048
descriptors, although arguably squid could be employed in some situations
(or the application redesigned to allow a more distributed access, via
mirroring or whatever else). 

Linux (at least on the 2.0.x kernels and earlier) does have a 'soft spot'
when it comes to FDs. I've found that trying to set the kernel limit past
1024 introduces more instability to the machine, and a very annoying thing I
discovered between 1am and 5am last night: Linux 2.0.x doesn't do BSD style
open descriptor passing correctly (via sendmsg/recvmsg)

Apparently the 2.1.x kernels solve both of those problems and more, freeing
up a lot of the FD limitations (yay!), fully implementing open descriptor
passing, more substantial /proc filesystem kernel tuning, as well as just
generally being faster 'n stuff. Linux users all round should have a serious
performance facelift with the release of the 2.2.x stable series kernel.

(In case you hadn't heard, 2.1.120 was released the other day. Get it at a
kernel distro point near you, or at ftp://ftp.kernel.org if you don't know
any other place).

> Squid is unique in that it has cached fd's for files and it has sockets.  IRC servers
> are mainly socket bound.  This can be gotten around by simply writing the code correctly;
> meaning, use shared memory segments or unix domains sockets to share common data, then
> run multiple processes in accept-on-common-fd and select loops. (or SIGIO/SIGPOLL driven
> I/O).
> 
> If I was inclined to care, I'd write an IRC server the *right* way rather than
> the pseudo-college-student-Eric-Allman-wannabe-hacker way.

I've worked with a younger fellow who wrote his own IRC daemon from scratch
which happily hums away on linux. His primary gripe (the last time I talked
to him) is the time it takes to rebuild the FD_SET for 1000 simultaneous
connections when a majority of the people there are idle (he was thinking
about some sort of bi-level deal where the 'passive' sockets only get polled
once a minute or so and sorted in a separate set so's to free up cycles). 

--
  __________________________________________
 |                                          |
 |  Rick Franchuk  -  TranSpecT Consulting  |
 |_______                            _______|
         \mailto:rickf@transpect.net/
          \_____ICQ_#_4435025______/