Opinion/experience with Intel 845E nodes?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduFri Aug 2 09:04:57 PDT 2002
- Previous message: Opinion/experience with Intel 845E nodes?
- Next message: questions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
OK, I have time to do a bit more detail on the 845G/E question. Before addressing the below, I should note in response to the original 845E question that tomshardware has a review of both 845G and 845E, gives thumbs up to G and thumbs down to E. For what it's worth. There are also two G versions, and from the look of things the GL is "G light" and probably to be avoided although there are some lovely micro ATX motherboards that might work for some people. On Fri, 2 Aug 2002, Ferdinand Geier wrote: > I've also a 845G board in my Fujitsu-Siemens box, but it does not correctly > detect the IDE controller: > > <6>Uniform Multi-Platform E-IDE driver Revision: 6.31 > <4>ide: Assuming 33MHz system bus speed for PIO modes; override with > idebus=xx > <4>PCI_IDE: unknown IDE controller on PCI bus 00 device f9, VID=8086, > DID=24cb > <3>PCI: Device 00:1f.1 not available because of resource collisions > <4>PCI_IDE: chipset revision 1 > <4>PCI_IDE: not 100%% native mode: will probe irqs later > <4> ide0: BM-DMA at 0x2800-0x2807, BIOS settings: hda:DMA, hdb:DMA > <4> ide1: BM-DMA at 0x2808-0x280f, BIOS settings: hdc:DMA, hdd:DMA > <4>hda: MAXTOR 6L080L4, ATA DISK drive > <4>hdb: MAXTOR 6L080L4, ATA DISK drive > <4>hdc: IDE-CD R/RW 24x12A, ATAPI CD/DVD-ROM drive > <4>hdd: LITEON DVD-ROM LTD163, ATAPI CD/DVD-ROM drive Mine does the same thing (with 2.4.18-5) but, as with yours, the IDE-ATAPI driver still works and AFAICT the system is stable. There are certainly other messages of interest at boot. The PCI bridge isn't correctly identified early on, but the assumption of transparency seems to "work" (and this happens on a lot of boards, e.g. Tyan 2466). I'd guess that these details (of identification and minor function tweaks) will be straightened out by the next stable major, as this is going to be a popular motherboard -- cheap, fast, and with damn near the whole computer (sound, video, network) on the motherboard. If I weren't buried up to my nether region in scaly reptilians with sharp teeth I'd turn the kernel list back on and check to see if it is already fixed in bleeding edge snapshots and if not help out. Alas (ouch), I cannot manage that (let go, dammit!). > With a stock 2.2.18 kernel dma could be enabled, but the SuSE kernel > refused to do. Maybe the chip is too new... There is also a small chance that it is a bios issue, although I didn't spend much time messing with it to find out. Reading the manual, for example (shudder:-) I see that it is quite possible that this motherboard comes with APIC disabled by default and I see no APIC interaction at boot time. If I actually had a monitor plugged into mine I'd even reboot to find out. It has an onboard i82562ET LAN chip (working through intel's new ICH4 south bridge) that works with the eepro100 driver. It does WOL and APCI but alas, nothing I can find mentions PXE. It would really suck to have to add a redundant NIC just to get PXE on a node, especially when (lacking 64/66 PCI on at least the implementation I have) it isn't going to be suitable for a gigE or myrinet node in most cases -- EP to coarse grained parallel only. Note the following benchmarks: r00 is a Tyan 2466 with 1900+MP Athlons. rgb at r00|T:105>cpu_rate -t 1 -s 1000 # ======================================================================== # Timing "Empty" Loop # Samples = 100 Loop iterations per sample = 4194304 # Time(sec): 3.13821554e-09 +/- 2.32249621e-12 # ======================================================================== # Timing test 1 # Time(sec): 1.47502246e-05 +/- 3.40095362e-08 # Samples = 100 Loop iterations per sample = 1024 #======================================================================== # Vector Double Precision Float averaged over four operations: # d[i] = (ad + d[i])*(bd - d[i])/d[i] # with d[i] = ad = bd = 3.141593 # and vector size = 1000 (8000 bytes) # Average Time: 3.69 nanoseconds # BogomegaRate: 271.24 megafloats per second rgb at r00|T:106>cpu_rate -t 1 -s 10000000 # ======================================================================== # Timing "Empty" Loop # Samples = 100 Loop iterations per sample = 4194304 # Time(sec): 3.15272570e-09 +/- 3.90137720e-12 # ======================================================================== # Timing test 1 # Time(sec): 2.47412485e-01 +/- 3.04261604e-05 # Samples = 100 Loop iterations per sample = 2 #======================================================================== # Vector Double Precision Float averaged over four operations: # d[i] = (ad + d[i])*(bd - d[i])/d[i] # with d[i] = ad = bd = 3.141593 # and vector size = 10000000 (80000000 bytes) # Average Time: 6.19 nanoseconds # BogomegaRate: 161.67 megafloats per second 50.670user 0.090sys 91.9%, 0ib 0ob 0tx 0da 0to 0swp 0:55.23 Note the strong differential in performance between in-cache (-s 1000) and out of memory (-s 10^7 = 8x10^7 bytes in the vector). rgb at lucifer2|T:123>cpu_rate -t 1 -s 1000 # ======================================================================== # Timing "Empty" Loop # Samples = 100 Loop iterations per sample = 4194304 # Time(sec): 3.32885504e-09 +/- 1.54754165e-14 # ======================================================================== # Timing test 1 # Time(sec): 2.38647266e-05 +/- 6.61387967e-10 # Samples = 100 Loop iterations per sample = 512 #======================================================================== # Vector Double Precision Float averaged over four operations: # d[i] = (ad + d[i])*(bd - d[i])/d[i] # with d[i] = ad = bd = 3.141593 # and vector size = 1000 (8000 bytes) # Size: 1000 Vector Length (bytes): 8000 # Average Time: 5.97 nanoseconds # BogomegaRate: 167.63 megafloats per second rgb at lucifer2|T:123>cpu_rate -t 1 -s 10000000 # ======================================================================== # Timing "Empty" Loop # Samples = 100 Loop iterations per sample = 4194304 # Time(sec): 3.32885265e-09 +/- 1.04886505e-14 # ======================================================================== # Timing test 1 # Time(sec): 2.40166545e-01 +/- 4.82193280e-05 # Samples = 100 Loop iterations per sample = 2 #======================================================================== # Vector Double Precision Float averaged over four operations: # d[i] = (ad + d[i])*(bd - d[i])/d[i] # with d[i] = ad = bd = 3.141593 # and vector size = 10000000 (80000000 bytes) # Size: 10000000 Vector Length (bytes): 80000000 # Average Time: 6.00 nanoseconds # BogomegaRate: 166.55 megafloats per second 49.480user 0.180sys 91.7%, 0ib 0ob 0tx 0da 0to 0swp 0:54.13 Note the nearly flat performance out of cache and memory. Odd, no? On the other hand, stream: r00 # Function Rate (MB/s) RMS time Min time Max time Copy: 605.4189 0.0265 0.0264 0.0266 Scale: 673.5707 0.0238 0.0238 0.0238 Add: 780.8441 0.0309 0.0307 0.0323 Triad: 640.4618 0.0375 0.0375 0.0376 lucifer2 # Function Rate (MB/s) RMS time Min time Max time Copy: 993.2930 0.0162 0.0161 0.0164 Scale: 1009.9076 0.0159 0.0158 0.0159 Add: 1130.6354 0.0212 0.0212 0.0213 Triad: 1126.3845 0.0213 0.0213 0.0214 ...an impressive difference. One last benchmark I like to run is my Monte Carlo code, at a fixed size (the only benchmark that "matters", really:-). I've been running this for many years and thus have an excellent historical record of its performance on things from a Sparcstation 1 on. It tends to be CPU bound, not memory bound, and generally scales well with CPU clock within a processor family. Here I see a real anomaly: #============================================================ # Benchmark run of On_spin3d on host ganesh (Mark III) # CPU = 933 MHz PIII, Total RAM = 128 MB # L = 16 # Time = 22.64user 0.00system 0:22.74elapsed #============================================================ # Benchmark run of On_spin3d on host eve # CPU = 800 MHz Athlon Tbird, Total RAM = 64 # L = 16 # Time = 25.870user 0.000system 0:25.895elapsed #============================================================ # Benchmark run of On_spin3d on host lucifer (lucifer, Mark III) # CPU = 1800 MHz P4, Total RAM = 512MB # L = 16 # Time = 17.760user 0.020system 0:18.53elapsed #============================================================ # Benchmark run of On_spin3d on host r00 # CPU = 1600.084 MHz Athlon (1900+MP), Total RAM = 1024MB # L = 16 # Time = 13.160user 0.030sys 0:13.19elapsed The Athlon scales (as expected) nearly perfectly with clock -- an 800 MHz Tbird takes twice as long as a 1600 MHz 1900+MP (pause for a Grrr at their silly numbering scheme). The 1800 MHz P4, on the other hand, is only about 25% faster than a 933 MHz P3! This is so unbelievable that I recompiled, checked that I was using the same sources, ran it on a couple of P4's (one of which I didn't configure). It seems consistent, and regardless of what user/system might say, wall clock does not lie. So go figure. The obvious moral of THIS story is assume makes an Ass out of U and Me (as my wife the doctor likes to say). Every P6 CPU from the PPro through the P3, including the Celeron, scaled on this application with clock: # CPU = dual 200 MHz PentiumPro, Total RAM = 128 MB # L = 16 # Time = 97.35user 0.06system 1:38.13elapsed (933/200 = 4.665, 97.35/22.64 = 4.300, close enough for goverment work:-). The P4 does not, and it is obviously not MEMORY bound as the memory on lucifer2 screams (and besides, the other P4 I tested had different memory and motherboard altogether). So be sure to TEST YOUR APPLICATION on ANY new CPU and do not assume that non-application benchmarks mean a damn thing. I don't know what feature of the P4 is killing my code (although I'm tempted to compile with profiling and find out) but SOMETHING it is doing is scaling terribly indeed with clock relative to the earlier P6-core designs. Nevertheless say that the 845G system works more than well enough to be a cheap, fast compute node (especially for memory bound problems that can maximally benefit from its very fast memory features) is "working" (with this unknown control problem and an also yet-unsupported onboard video -- beyond VGA mode -- and sound) well enough to be a functional desktop, and is bound to be totally supported (clean boot, functioning sound and XFree86 on the motherboard) by September at the outside. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: Opinion/experience with Intel 845E nodes?
- Next message: questions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
