<div dir="ltr">Adapteva's CEO, Andreas Oloffson, gave a talk Friday at ORNL, which was very well attended. He gave an interesting talk about how to program a 16,000 core chip, which was more about the architecture and design choices than actually programming a 16K core chip. It is most impressive given that it was a team of three over a period of three months.<div>

<br></div><div>The cores are simple, dual issue RISC with 32 KB of scratch pad and a network router. There is no cache or coherency protocol. Every core can read/write every other core's memory so that it can appear as a distributed, shared memory machine. Non-local accesses are automatically converted to network calls and sent out over the NoC. Nearest neighbor latency is 4 ns for writes and 16 ns for reads. Farthest neighbor writes are 16 ns and 30 ns reads. Routing is east/west then north/south. The cores form a 2D mesh. He claims that they can build a 1,024 core chip today if there is demand for it.</div>

<div><br></div><div>The initial markets are telecom, military, and medical and the applications best suited for it would need a DSP. For HPC, they claim 102 GF/s at 2 watts (51 GF/watt), which is exascale class almost (i.e. 1 EF/s at 20 MW ignoring cooling, networks, etc). It only has single-precision floating point currently. They can add double-precision given enough demand. Depending on the memory per core configured, it could provide a double-precision peak performance about 30-40% less than the current board.</div>

<div><br></div><div>They support C/C++ and OpenCL. Actually, the latter is converted to C++ and C++ is limited given the limited amount of memory. That said, if the bulk of your program can fit under 1,500 lines of C, he asserts that it will scream.</div>

<div><br></div><div>Lastly, once all the kickstarter boards go out, they hope to have them available on Amazon for immediate delivery.</div><div><br></div><div>Scott</div><div><br></div></div><div class="gmail_extra"><br>

<br><div class="gmail_quote">On Fri, May 23, 2014 at 9:32 AM, Eugen Leitl <span dir="ltr"><<a href="mailto:eugen@leitl.org" target="_blank">eugen@leitl.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

After I've finally gotten my Kickstart backer board and set it<br>

up to boot (you will need the included heatsink on the Zynq 7020<br>

as well as a small fan) I've ran a few included benchmarks.<br>

<br>

In no particular order of relevance:<br>

<br>

linaro-nano:~/Parallella/epiphany-examples/mesh_bandwidth_all2one> ./run.sh<br>

0x0000417e!<br>

The bandwidth of all-to-one is 4193.00MB/s!<br>

<br>

<br>

linaro-nano:~/Parallella/epiphany-examples/mesh_bandwidth_bisection> ./run.sh<br>

0x00000f46!<br>

The bandwidth of bisection is 9590.00MB/s!<br>

<br>

linaro-nano:~/Parallella/epiphany-examples/basic_math> ./run.sh<br>

<br>

The clock cycle count for addition is 5.<br>

<br>

The clock cycle count for subtraction is 5.<br>

<br>

The clock cycle count for multiplication is 6.<br>

<br>

The clock cycle count for division is 47.<br>

<br>

The clock cycle count for "fmodf()" is 66635.<br>

<br>

The clock cycle count for "sinf()" is 23930.<br>

<br>

The clock cycle count for "cosf()" is 51115.<br>

<br>

The clock cycle count for "sqrtf()" is 93785.<br>

<br>

The clock cycle count for "ceilf()" is 18475.<br>

<br>

The clock cycle count for "floorf()" is 17690.<br>

<br>

The clock cycle count for "log10f()" is 10735.<br>

<br>

The clock cycle count for "logf()" is 9976.<br>

<br>

The clock cycle count for "powf()" is 348243.<br>

<br>

The clock cycle count for "ldexpf()" is 36306.<br>

<br>

linaro-nano:~/Parallella/epiphany-examples/matmul-16> ./run.sh<br>

<br>

Matrix: C[512][512] = A[512][512] * B[512][512]<br>

<br>

Using 4 x 4 cores<br>

<br>

Seed = 0.000000<br>

Loading program on Epiphany chip...<br>

Writing C[1048576B] to address 00200000...<br>

Writing A[1048576B] to address 00000000...<br>

Writing B[1048576B] to address 00100000...<br>

GO Epiphany! ...   Writing the GO!...<br>

Done...<br>

Finished calculating Epiphany result.<br>

Reading result from address 00200000...<br>

Calculating result on Host ...   Finished calculating Host result.<br>

Reading time from address 00300008...<br>

<br>

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***<br>

Verifying result correctness ...   C_epiphany == C_host<br>

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***<br>

<br>

Epiphany -  time:     153.0 msec  (@ 600 MHz)<br>

Host     -  time:    1867.2 msec  (@ 667 MHz)<br>

<br>

* * *   EPIPHANY FTW !!!   * * *<br>

<br>

I can run the rest of the examples and post numbers if there's<br>

interest:<br>

<br>

naro-nano:~/Parallella/epiphany-examples> ls -la<br>

total 152<br>

drwxrwxr-x 36 linaro linaro 4096 May 22 15:46 ./<br>

drwxrwxr-x  5 linaro linaro 4096 Mar  7 12:09 ../<br>

drwxrwxr-x  8 linaro linaro 4096 Mar  6 23:47 .git/<br>

-rw-rw-r--  1 linaro linaro  227 Mar  6 23:42 .gitignore<br>

-rw-rw-r--  1 linaro linaro 1464 Mar  6 23:42 README.md<br>

drwxrwxr-x  4 linaro linaro 4096 May 17 11:47 assembly/<br>

drwxrwxr-x  4 linaro linaro 4096 Mar  6 23:44 basic_math/<br>

drwxrwxr-x  4 linaro linaro 4096 Mar  6 23:47 clockgating_mode/<br>

drwxrwxr-x  4 linaro linaro 4096 May 17 11:48 ctimer/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 dma_2d/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 dma_chain/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 dma_interrupt/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 dma_message_read/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 dma_message_write/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 dma_slave/<br>

drwxrwxr-x  4 linaro linaro 4096 May 22 15:48 e-dump-mem/<br>

drwxrwxr-x  4 linaro linaro 4096 May 22 15:46 e-dump-regs/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 e-mem-sync/<br>

drwxrwxr-x  4 linaro linaro 4096 Mar  6 23:43 e-toggle-led/<br>

drwxrwxr-x  4 linaro linaro 4096 May 22 12:48 emesh_read_latency/<br>

drwxrwxr-x  4 linaro linaro 4096 May 22 12:48 emesh_traffic/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 erm/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 erm_example/<br>

drwxrwxr-x  4 linaro linaro 4096 Mar  6 23:42 fft2d/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 hardware_barrier/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 hardware_loops/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 hello_parallella/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 interrupts/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 link_lowpower_mode/<br>

drwxrwxr-x  4 linaro linaro 4096 Mar  7 02:04 matmul-16/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 mem_protect/<br>

drwxrwxr-x  4 linaro linaro 4096 May 23 13:26 mesh_bandwidth_all2one/<br>

drwxrwxr-x  4 linaro linaro 4096 May 22 12:42 mesh_bandwidth_bisection/<br>

drwxrwxr-x  4 linaro linaro 4096 May 22 12:41 mesh_bandwidth_neighbour/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 mutex/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 nested_interrupts/<br>

drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 register_test/<br>

drwxrwxr-x  4 linaro linaro 4096 May 22 12:07 remote_call/<br>

<br>

_______________________________________________<br>

Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>

To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>

</blockquote></div><br></div>