libo at buaa.edu.cn
Mon Sep 1 17:43:16 PDT 2008
It seemed that you had got a very good example for GPGPU. As I said before, it's not the time for GPGPU to do the DP calculation at the moment. If you can bear SP computation, you will find more about it.
NVidia just sent me some special offer about their Tesla platforms, which said that the workstation equipped with two GTX280 level professional cards costs about $5000, not bad. But my intention is still to lower the core frequency of a gaming card, and use it for computation.
----- Original Message -----
From: "Mikhail Kuzminsky" <kus at free.net>
To: "Kozin, I (Igor)" <i.kozin at dl.ac.uk>
Cc: <beowulf at beowulf.org>
Sent: Tuesday, September 02, 2008 1:34 AM
Subject: Re: [Beowulf] gpgpu
>I performed some simplest estimation for possible performance
> improvements using "dgemm on FirerStream 9250".
> It's extremally good for GPGPU example.
> The source data for 9250: peak DP performance 200 GFLOPS, GDDR3 RAM 1
> 1 Gbyte can hold 3 DP(64 bit) matrixes (n x n) for n=6000 - they
> require 864 Mbytes.
> Let me suppose that real performance of FireStream will be 90% of peak
> value (I'm afraid, that reality will be more bad), i.e. 180 GFLOPS.
> dgemm requires 2*n^3 FP operations (I neglect n^2 operations for
> matrix addition and scaling), i.e. 432 GFLOP
> The calculation time will be 432/180 = 2.4 sec
> We'll need for dgemm calculation also 4 matrix transmissions: 3 to
> GPGPU, 1 - from GPGPU to main memory of server.
> It's 1152 Gbytes of data.
> For PCI-e x16 v.2 peak throughput value is 8 GB/s, therefore
> transmission time will be about 0.144 sec (I don't know what may be
> real throughput for PCIe).
> The total calc. time is therefore about 2.54 sec.
> On dual socket quad core Xeon server w/3 Ghz E5472 (8 cores) the peak
> performance is 96 GFLOPS. Parallelized dgemm will give, I believe,
> about 80% of peak - i.e. 77 GFLOPS; therefore calcualtion time is
> 432/77= 5.6 sec.
> Speedup is 2.2 times. Price increase - I don't know, for example from
> $4500 to $6500 (if Firestream costs $2000, but may be $1000 as Igor
> Kozin wrote here), it's about 1.4 times.
> But I think there will be not too many job which require matrix
> multiplication for *dense* matrixes w/such large (6000 x 6000) sizes;
> for sparse matrixes the dimensions, I beleive, will be lower.
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf