[Beowulf] Memory issue with Quad Dual-Core Opteron w/Two NICs
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Jeremy Fleming jtfleming at gmail.comMon Sep 24 17:47:52 PDT 2007
- Previous message: [Beowulf] Whats up with these newer Intel NICs?
- Next message: [Beowulf] Re: Memory issue with Quad Dual-Core Opteron w/Two NICs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I have a quad opteron node machine where each node is a dual core with each core running at 2.0Gz. The machine has 64 GB of ram, two broadcom ethernet gigabit cards and 2 other gigabit intel cards each supplying 2 ports, and are supported by the e1000 driver. The machine is running the default install of Redhat Enterprise 5.0 (original release, no patches or updates). Remote machines are supplying ~512 megabit/sec streams over gigabit ethernet to this machine. There are two streams on seperate ethernet lines. I have each stream connected to a different port on one of the intel cards. The streams are sent via multicast, and there are 4 sub-streams per ethernet line. Each substream is approximately 131.072 megabits/sec. On the opteron machine I have a process that can pull a substream off of an ethernet port and dump it to a ring buffer in shared memory. To start, the process could never keep up with receiving the data via ethernet and then doing a memcpy to shared memory. Then I found out about NUMA, and decided to use sched_setaffinity to bind the process to a cpu, I bound the process to the same cpu the ethernet card is bound to via it's IRQ. I looked in /proc/interrupts and found "eth0" or "eth1", looked up it's IRQ, then went into /proc/irq/<eth 0 IRQ>/smp_affinity, and checked which cpu the IRQ was bound to. I bound the process to that processor and ran it again. Luckily no data loss and it could keep up. I bound the process before I allocated memory so the memory was bound to the same process too. I was even able to run three more processes, bound to the same cpu and have all 4 read the sub-streams from the ethernet device eth0, with no data loss. I can even run another process which reads from the ring buffer and dumps the data to disk and it causes no slow downs or data loss. Now I want to read a substream from the other stream connected to eth1 while reading from the other 4 sub-streams. I start that up just by binding the same application to the processor associated with eth 1, by checking "/proc/irq/<eth 1 IRQ>/smp_affinity". When the process starts the system starts to not be able to keep up anymore, just like in the beginning when I just had 1 processor reading one stream without doing anything else. I thought I was just trying to do too much work, so I turned off all streams, and ran just two processes bound to two different processors, each bound to the same processor as the associated eth device. I ran them both, and they lose data. If I run them seperately they work fine, but when I read 1 sub-stream from each of the two unique streams they fail. Are the two ethernet devices dumping their multicast data into kernel buffers associated with different processors? How do I know what processor the kernel ethernet buffers are associated with? Is there a way to set cpuaffinity for ethernet devices before they boot up so I know which processors memory they are dumping data to? Any ideas on why there would be a problem with reading a stream from each eth device at the same time and not with reading 4 streams from one eth device? Do I need to turn a NUMA aware scheduler on somehow, or is that on by default in RHEL 5? I also noticed that linux assigns IRQs at bootup that vary with each boot, is there a way to statically assign IRQs to the ethernet cards? Any help or pointers at all would be great! Thanks in advance Jeremy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070924/f6e466ae/attachment.html
- Previous message: [Beowulf] Whats up with these newer Intel NICs?
- Next message: [Beowulf] Re: Memory issue with Quad Dual-Core Opteron w/Two NICs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
