[Beowulf] Fastest way to compute Euclediant distance [spin off from: Building new cluster - estimate]

Eric Thibodeau kyron at neuralbs.com
Thu Aug 7 10:02:58 PDT 2008

>>>   Most of the arguments I have heard are "oh but its compiled with 
>>> -O3" or whatever. Any decent HPC code person will tell you that that 
>>> is most definitely not a guaranteed way to a faster system ...
>> Hey...as I stated above, one would have to be quite silly to claim 
>> -O3 as the all well and all good optimization solution. At least you 
>> can rest assured your solutions will add up correctly with GCC. To get a 
> Well, sometimes.  You still need to be careful with it.
> This said, I am not sure icc/pgi/... are uniformly better than gcc.  I 
> did an admittedly tiny study of this http://scalability.org/?p=470 
> some time ago.  What I found was the gcc really held its own.  It did 
> a very good job on a very simple test case.
Very nice post, thanks for that, it so happens I am going through the 
exact same steps trying to optimize a very simple piece of code 
computing the Euclidean distance and I was a little stomped to find out 
the simople C code outperforms BLAS (both GOTO and MKL). If you have 
gnuplot, a BLAS library with cblas interface, and icc installed, all you 
have to do is run `make` with the three attached files in the same dir 
and you'll get nice plots of what's going on. I'm also attaching an 
example run with:

icc 10.1.017
gcc 4.3.1

PS: regular disclaimers about crappy code writing apply ;)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: EuclideanDist.c
Type: text/x-csrc
Size: 3596 bytes
Desc: not available
Url : http://www.scyld.com/pipermail/beowulf/attachments/20080807/63baf0c3/EuclideanDist.bin
-------------- next part --------------
### Global, generic variables ###
SHELL     = /bin/bash
ARCH      = $(shell uname -m)
GCC       = gcc-4.3.1
#GCCFLAGS  = -Wall -march=native -mfpmath=sse,387 -O3 -fomit-frame-pointer -fkeep-inline-functions -funsafe-loop-optimizations -freorder-blocks-and-partition -fno-math-errno -ffinite-math-only -fno-trapping-math -fno-signaling-nans -fwhole-program --param l1-cache-line-size=1 --param l1-cache-size=64 --param l2-cache-size=4096
GCCFLAGS  = -Wall -march=native -O3 

# For ICC 
# on Opteron: -xW
#ICCFLAGS    = -xW
# on Core2 Duo -xT
ICCFLAGS    = -xT 

LIBS      = -lm -lblas -lcblas

### TAU specific variables ###
TAU_MAKEFILE = ~/TAU/TAU/$(ARCH)/lib/Makefile.tau-pdt
TAU_CXX      = tau_cxx.sh
TAU_CC       = tau_cc.sh
TAU_OPTS     = -optNoRevert -optLinking="$(LIBS)" -optTauCC="$(CC)" -optCPPOpts="$(GCCFLAGS)" -tau_makefile=$(TAU_MAKEFILE)

PROGNAM    = EuclideanDist
PROGRAM    = $(PROGNAM)               # nom de l'executable
SRCS       = $(PROGNAM).c             # les fichiers source
OBJS       = $(PROGNAM).o             # fichiers objets

MKL_LIBS = -liomp5 -lpthread -I/opt/intel/mkl/ -L/opt/intel/mkl/

.SUFFIXES: .c .o
	$(CXX) -c $(GCCFLAGS) $<
	$(CC) -c $(GCCFLAGS) $<

# Targets
default: all
# all: $(PROGRAM) icc gcc
all: $(PROGRAM) icc gcc tests plots


	sudo eselect blas set mkl-gfortran

	sudo eselect blas set goto


	/bin/rm -f $(OBJS) $(PROGRAM) $(TAU_PROG) *.dat

tests: icctest gcctest

	./$(PROGOUT)_ICC     > icc.dat

	./$(PROGOUT)_GCC     > gcc.dat

	gnuplot Plot.gp
-------------- next part --------------
set title "BLAS Vs C execution time for\n Euclidean Distance computation"
set xlabel "Vector Size (bytes)"
set ylabel "Time (sec)"
set logscale xy
set grid xtics mxtics
set key top left
set key box
#set term post enh         # enhanced PostScript, essentially PostScript
                           # with bounding boxes
#set term postscript
#set term png
set term postscript enhanced color

set out 'BlasVsC.eps'

plot "icc.dat" using 1:2 title 'icc-BLAS' w l lw 1 , \
"icc.dat" using 1:3 title 'icc' w l lw 1 , \
"gcc.dat" using 1:2 title 'gcc-BLAS'  w l lw 1 , \
"gcc.dat" using 1:3 title 'gcc' w l lw 1  

More information about the Beowulf mailing list