<div>Richard,</div>

<div> </div>

<div>No not at all ("..which you already know"), as you might guess from the abysmal typography of my question, I'm a newbie to distributed computing; I just have a distributable algorithm.</div>

<div> </div>

<div>1. I'll try to write a minimal wiki entry for DLP to go along with the existing ones for ILP and MLP, but it would be better if you did it :-)  But maybe your post almost makes a good entry itself.</div>

<div> </div>

<div>2. Since my distributable algorithm searches for algorithms (actually in my case, for parameter vectors for itself, so it just searches the tiny subspace of it's own operating parameters) so I'm incidentally interested in metrizable classification of algorithms, and your "aspect ratio" looks toothsome. I'm imaging something like measuring performance of one algorithm on one data set, on each of several systems, such as a GPU, a hyperthreading CPU, a multi-core CPU, and a distributed network, and the (ILP, DLP, TLP) metric would be some linear combination of those serveral performances. But I'm having trouble picturing which would be which. 

</div>

<div> </div>

<div>Thanks,</div>

<div>Peter<br> </div>

<div>

<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">

<div>

<div> </div>

<blockquote style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid"><span class="q">-------------- Original message -------------- <br>From: "Peter St. John" <<a onclick="return top.js.OpenExtLink(window,event,this)" href="mailto:peter.st.john@gmail.com" target="_blank">

peter.st.john@gmail.com</a>> <br>

<div>DLP? Wiki has entries for Indtruction Level Parallelism and Thread LP (alsom Memory LP) but </div>

<div>not DLP?</div>

<div> </div></span>

<div>Hey Peter,</div>

<div> </div>

<div>That would be "data level parallelism".  So, ILP is very low level parallelism which</div>

<div>works on somewhat locally scoped instructions that are independent in a super-scalar, </div>

<div>fanned-out parallel way on a "wide" processor.  TLP refers to a slightly higher level of parallelism</div>

<div><span>that is instruction dominated/oriented and is associated with independent program blocks</span></div>

<div><span>or loop interations (can be within or between programs or subroutines), and DLP refers</span></div>

<div><span>to data dominant parallelism usually associated with looping structures that could vectorize</span></div>

<div><span>or, in other words, allow a compiler to stimulate, with a small number of instructions, a large pipeline (and/or parallel stream [think </span><span>GPUs here]) of independent data operations in the</span></div>


<div><span>CPU for which the total instruction latency is trivially small in theory (i.e. when you have a</span></div>

<div><span>vector instruction set).</span></div>

<div><span></span> </div>

<div><span>In one sense, the low level parallelism (and structural hazard performance limitations) of any program can be defined by a sort of aspect </span><span>ratio that is ILP x DLP x TLP and every code/kernel has its own dimensionality and volume. 

</span></div>

<div><span></span> </div>

<div><span>A major question for performance and processor design is also, which kind of latency dominates</span></div>

<div><span>--instruction latency or data latency--in limiting a code's performance to something less</span></div>

<div><span>than register-to-register optimal.  Generally, in  HPC data latency is dominant and the Enterprise world is i</span><span>nstruction latency dominates.</span></div>

<div><span></span> </div>

<div><span>But now I am probably rambling, and telling you something that you already know.</span></div>

<div><span></span> </div>

<div><span>Regards,</span></div><span class="q">

<div><span></span> </div>

<div><span>rbw</span></div>

<div><span><br></span></div></span></blockquote></div></blockquote></div>