[Beowulf] Tilera to Introduce 64-Core Processor

Thu Oct 18 11:10:38 PDT 2007

Richard,

No not at all ("..which you already know"), as you might guess from the
abysmal typography of my question, I'm a newbie to distributed computing; I
just have a distributable algorithm.

1. I'll try to write a minimal wiki entry for DLP to go along with the
existing ones for ILP and MLP, but it would be better if you did it :-)  But
maybe your post almost makes a good entry itself.

2. Since my distributable algorithm searches for algorithms (actually in my
case, for parameter vectors for itself, so it just searches the tiny
subspace of it's own operating parameters) so I'm incidentally interested in
metrizable classification of algorithms, and your "aspect ratio" looks
toothsome. I'm imaging something like measuring performance of one algorithm
on one data set, on each of several systems, such as a GPU, a
hyperthreading CPU, a multi-core CPU, and a distributed network, and the
(ILP, DLP, TLP) metric would be some linear combination of those serveral
performances. But I'm having trouble picturing which would be which.

Thanks,
Peter

>
>
> -------------- Original message --------------
> From: "Peter St. John" <peter.st.john at gmail.com>
> DLP? Wiki has entries for Indtruction Level Parallelism and Thread LP
> (alsom Memory LP) but
> not DLP?
>
> Hey Peter,
>
> That would be "data level parallelism".  So, ILP is very low level
> parallelism which
> works on somewhat locally scoped instructions that are independent in a
> super-scalar,
> fanned-out parallel way on a "wide" processor.  TLP refers to a slightly
> higher level of parallelism
> that is instruction dominated/oriented and is associated with independent
> program blocks
> or loop interations (can be within or between programs or subroutines),
> and DLP refers
> to data dominant parallelism usually associated with looping structures
> that could vectorize
> or, in other words, allow a compiler to stimulate, with a small number of
> instructions, a large pipeline (and/or parallel stream [think GPUs here])
> of independent data operations in the
> CPU for which the total instruction latency is trivially small in theory (
> i.e. when you have a
> vector instruction set).
>
> In one sense, the low level parallelism (and structural hazard performance
> limitations) of any program can be defined by a sort of aspect ratio that
> is ILP x DLP x TLP and every code/kernel has its own dimensionality and
> volume.
>
> A major question for performance and processor design is also, which kind
> of latency dominates
> --instruction latency or data latency--in limiting a code's performance to
> something less
> than register-to-register optimal.  Generally, in  HPC data latency is
> dominant and the Enterprise world is instruction latency dominates.
>
> But now I am probably rambling, and telling you something that you already
> know.
>
> Regards,
>
> rbw
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20071018/ceee5a06/attachment.html>