[Beowulf] Home beowulf - NIC latencies
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Vincent Diepeveen diep at xs4all.nlFri Feb 4 04:35:22 PST 2005
- Previous message: [Beowulf] NFS over TCP or smth else... WHAT I've done wrong?
- Next message: [Beowulf] Home beowulf - NIC latencies
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
At 00:29 4-2-2005 -0800, Bill Broadley wrote: >On Thu, Feb 03, 2005 at 04:53:27AM +0100, Vincent Diepeveen wrote: >> Good morning! >> >> With the intention to run my chessprogram on a beowulf to be constructed >> here (starting with 2 dual-k7 machines here) i better get some good advice >> on which network to buy. Only interesting thing is how fast each node can >> read out 64 bytes randomly from RAM of some remote cpu. All nodes do that >> simultaneously. > >Is there any way to do this less often with a larger transfer? >If you >wrote a small benchmark that did only that (send 64 bytes randomly >from a large array in memory) and make it easy to download, build, run, >and report results, I suspect some people would. One way pingpong with 64 bytes will do great. Shared memory examples i have plenty, but one way pingpong approaches it excellent. Just multiply the time with 2 and one knows the bound :) >> The faster this can be done the better the algorithmic speedup for parallel >> search in a chess program (property of YBW, see publications in journal of >> icga: www.icga.org). This speedup is exponential (or better you get >> punished exponential compared to single cpu performance). >> >> Which network cards considering my small budget are having lowest latencies >> can be used? > >Define small budget. For more than 2 nodes myrinet needs a switch. >Do you expect to be totally network latency bound? How low is enough >to keep the processors busy? CPU's are 100% busy and after i know how many times a second the network can handle in theory requests i will do more probes per second to the hashtable. The more probes i can do the better for the game tree search. >> quadrics/dolphin seems bit out of pricerange. Myrinet is like 684 euro per >> card when i altavista'ed online and i wonder how to get more than 2 nodes >> to work without switch. Perhaps there is low cost switches with reasonable >> low latency? >Do you know that gigabit is too high latency? The few one way pingpong times i can find online from gigabit cards are not exactly promising, to say it very polite. Something in the order or 50 us one way pingpong time i don't even consider worth taking a look at at the picture. Each years cpu's get faster. For small networks 10 us really is the upper limit. >Can't you send enough >work, like say search 3 moves ahead on the head node, then for each legal >move send that search tree to a different node? Each node would reply with >the highest ranked moves when done. Let's not discuss parallel chess algorithm too much in depth. 100 different algorithms/enhancements get combined with each other. They are not the biggest latency problem. The latency problem is caused by the hashtable. Hashtable is a big cache. The bigger the better. It avoids researching the same tree again. In games like chess and every search terrain (even simulated flight) you can get back to the same spot by different means causing a transposition. Like suppose you start the game with 1.e4,e5 2.d4 that leads to the same position like 1.d4,e5 2.e4. So if we have searched already 1.e4,e5 2.d4 that position P we store into a large cache. Other cpu's first want to know whether we already searched that position. Those hashtable positions get created quite quickly. Deep Blue created them at a 100 million positions a second and simply didn't store vaste majority in hashtable (would be hard as it was in hardware). That's one of the reasons why it searched only 10-12 ply, already in 1999 that was no longer spectacular when 4 processor pc's showed up at world champs. At a PC with a shared hashtable nowadays i get 10-12 ply (ply = half move, full move is when both sides make a move) in a few seconds, searching a 100000 positions per second a cpu. So before we start searching every node (=position) we quickly want to find out whether other cpu's already searched it. At the origin3800 at 512 processors i used a 115 GB hashtable (i started search at 460 processors). Simply because the machine has 512GB ram. So in short you take everything you can get. The search works with internal iterative deepending which means we first search 1 ply, then 2 ply, then 3 ply and so on. The time it takes to get to the next iteration i hereby define as the branching factor (Knuth has a different definition as he just took into account 1 algorithm, the 'todays' definition looks more appropriate). In order to search 1 ply deeper obvious it's important to maintain a good branching factor. I'm very bad in writing out mathematical proofs, but it's obvious that the more memory we use, the more we can reduce the number of legal moves in this position P as next few ply it might be in hashtable, which trivially makes the time needed to search 1 ply deeper shorter. Storing closer to the root (position where we started searching) is of course more important than near the leafs of the search tree. When for example not storing in hashtable last 10 ply near the leafs in an overnight experiment the search depth dropped at 460 processors from 20 ply to 13 ply. Of course each processor of supercomputers is deadslow for game tree search (it's branchy 100% integer work completely knocking down the caches), so compared to pc's you already start at a disadvantage of a factor 16 or so very quickly, before you start searching (in case of TERAS i had to fight with outdated 500Mhz MIPS processors against opterons and high clocked quad Xeons), so upgrading my own networkcards is more clever. Yet getting yourself a network even between a few nodes as quick as those supercomputers is not so easy... Additional your own beowulf network you can first decently test at before playing at a tournament, and without good testing at the machine you play at in tournaments you have a hard 0% chance that it plays well. The only thing in software that matters is testing. >-- >Bill Broadley >Computational Science and Engineering >UC Davis
- Previous message: [Beowulf] NFS over TCP or smth else... WHAT I've done wrong?
- Next message: [Beowulf] Home beowulf - NIC latencies
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
