View previous topic :: View next topic |
Author |
Message |
taviso Retired Dev
Joined: 15 Apr 2003 Posts: 261 Location: United Kingdom
|
Posted: Wed Mar 23, 2005 4:00 pm Post subject: Tip: Maths performance tweak |
|
|
If you have an intel cpu, try this:
Code: | # emerge dev-lang/icc
$ wget http://dev.gentoo.org/~taviso/cpml.c
$ gcc -O2 -ldl -o cpml cpml.c
$ ./cpml |
If you get interesting results, you can switch to using libimf globally, take some benchmarks first:
Code: |
# emerge nbench
$ nbench
# echo /opt/intel/compiler80/lib/libimf.so >> /etc/ld.so.preload
$ nbench
|
Probably not a big difference for most people, but it's still interesting I was investigating this to see how the (aging) cpml library compared to glibc and found it interesting, so added x86 support. How do the benchmarks look for other people if preloading libimf? what about games performace? might be interesting to find out _________________ --------------------------------------
Gentoo on Alpha, is your penguin 64bit?
-------------------------------------------------------- |
|
Back to top |
|
|
ballero n00b
Joined: 10 Jul 2004 Posts: 62
|
Posted: Thu Mar 24, 2005 11:59 am Post subject: |
|
|
some benchies:
Code: | acos: icc 8.1 icc 9.0
libm.so->acos() (373 cycles)
libimf.so->acos() (206 cycles) (204 cycles)
asin:
libm.so->asin() (372 cycles)
libimf.so->asin() (187 cycles) (186 cycles)
atan:
libm.so->atan() (296 cycles)
libimf.so->atan() (118 cycles) (85 cycles)
atan2:
libm.so->atan2() (303 cycles)
libimf.so->atan2() (62 cycles) (62 cycles)
cos:
libm.so->cos() (226 cycles)
libimf.so->cos() (102 cycles) (101 cycles)
exp:
libm.so->exp() (383 cycles)
libimf.so->exp() (89 cycles) (89 cycles)
hypot:
libm.so->hypot() (128 cycles)
libimf.so->hypot() (45 cycles) (47 cycles)
log:
libm.so->log() (200 cycles)
libimf.so->log() (122 cycles) (92 cycles)
log10:
libm.so->log10() (208 cycles)
libimf.so->log10() (128 cycles) (100 cycles)
pow:
libm.so->pow() (859 cycles)
libimf.so->pow() (211 cycles) (111 cycles)
sin:
libm.so->sin() (253 cycles)
libimf.so->sin() (102 cycles) (101 cycles)
sqrt:
libm.so->sqrt() (64 cycles)
libimf.so->sqrt() (53 cycles) (40 cycles)
tan:
libm.so->tan() (354 cycles)
libimf.so->tan() (186 cycles) (184 cycles) |
w/o preload
Code: | BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 1352.8 : 34.69 : 11.39
STRING SORT : 73 : 32.62 : 5.05
BITFIELD : 7.1218e+08 : 122.16 : 25.52
FP EMULATION : 232.24 : 111.44 : 25.71
FOURIER : 20760 : 23.61 : 13.26
ASSIGNMENT : 39.553 : 150.50 : 39.04
IDEA : 2430.2 : 37.17 : 11.04
HUFFMAN : 2170.4 : 60.19 : 19.22
NEURAL NET : 30.8 : 49.48 : 20.81
LU DECOMPOSITION : 1248.2 : 64.67 : 46.69
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 65.526
FLOATING-POINT INDEX: 42.271
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Pentium(R) 4 CPU 3.20GHz 3681MHz
L2 Cache : 1024 KB
OS : Linux 2.6.11-gentoo-r4
C compiler : 3.4.3-20050110
libc :
MEMORY INDEX : 17.133
INTEGER INDEX : 15.789
FLOATING-POINT INDEX: 23.445
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder. |
with preload
Code: | BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 1345.8 : 34.51 : 11.33
STRING SORT : 73.291 : 32.75 : 5.07
BITFIELD : 7.1013e+08 : 121.81 : 25.44
FP EMULATION : 232.16 : 111.40 : 25.71
FOURIER : 35437 : 40.30 : 22.64
ASSIGNMENT : 39.568 : 150.56 : 39.05
IDEA : 2435.1 : 37.24 : 11.06
HUFFMAN : 2179.3 : 60.43 : 19.30
NEURAL NET : 30.631 : 49.21 : 20.70
LU DECOMPOSITION : 1232.7 : 63.86 : 46.11
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 65.546
FLOATING-POINT INDEX: 50.217
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual GenuineIntel Intel(R) Pentium(R) 4 CPU 3.20GHz 3681MHz
L2 Cache : 1024 KB
OS : Linux 2.6.11-gentoo-r4
C compiler : 3.4.3-20050110
libc :
MEMORY INDEX : 17.141
INTEGER INDEX : 15.791
FLOATING-POINT INDEX: 27.852
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder. |
There's a huge difference in FOURIER, useful to people who use SETI@home.
ftp://pi.super-computing.org/Linux/super_pi.tar.gz
Code: | ====================== super_pi ==================================
with preload w/o preload
Total calculation(I/O) time= 26.598 28.784
================================================================== |
Code: | ====================== Quake3-timedemo ==================================
with preload w/o preload
demo001 (fps) = 374 370
demo002 (fps) = 366 364
=========================================================================
|
Great tips, taviso. thanks.
Last edited by ballero on Sat Aug 06, 2005 10:46 am; edited 1 time in total |
|
Back to top |
|
|
Mac-or n00b
Joined: 20 Mar 2005 Posts: 15
|
Posted: Fri Apr 15, 2005 1:06 am Post subject: |
|
|
You opened a very interesting thread.
./cpml brought me interesting results, too.
acos:
libm.so->acos() (274 cycles)
libimf.so->acos() (163 cycles)
asin:
libm.so->asin() (282 cycles)
libimf.so->asin() (166 cycles)
atan:
libm.so->atan() (173 cycles)
libimf.so->atan() (86 cycles)
atan2:
libm.so->atan2() (410 cycles)
libimf.so->atan2() (274 cycles)
cos:
libm.so->cos() (109 cycles)
libimf.so->cos() (174 cycles)
exp:
libm.so->exp() (209 cycles)
libimf.so->exp() (137 cycles)
hypot:
libm.so->hypot() (677 cycles)
libimf.so->hypot() (199 cycles)
log:
libm.so->log() (153 cycles)
libimf.so->log() (105 cycles)
log10:
libm.so->log10() (162 cycles)
libimf.so->log10() (126 cycles)
pow:
libm.so->pow() (1164 cycles)
libimf.so->pow() (604 cycles)
sin:
libm.so->sin() (128 cycles)
libimf.so->sin() (166 cycles)
sqrt:
libm.so->sqrt() (91 cycles)
libimf.so->sqrt() (73 cycles)
tan:
libm.so->tan() (182 cycles)
libimf.so->tan() (197 cycles)
Note cos, sin and tan. (Actually, I don't understand, why those are slower and the rest of the functions is so much faster).
I had an emerge running in the background but i think that this should be responsible for those results. I ran cpml a few times with no change.
My System is a PIII and my CFLAGS include O3, etc.
Nevertheless i will investigate this further and use it.
BTW: Is prelink aware of /etc/ld.so.preload? |
|
Back to top |
|
|
roothorick Tux's lil' helper
Joined: 30 May 2004 Posts: 83
|
Posted: Sat Apr 16, 2005 3:03 am Post subject: |
|
|
Out of curiosity, tried it on an AMD Athlon XP 2200+:
Quote: | acos:
libm.so->acos() (296 cycles)
libimf.so->acos() (137 cycles)
asin:
libm.so->asin() (269 cycles)
libimf.so->asin() (138 cycles)
atan:
libm.so->atan() (203 cycles)
libimf.so->atan() (72 cycles)
atan2:
libm.so->atan2() (197 cycles)
libimf.so->atan2() (195 cycles)
cos:
libm.so->cos() (128 cycles)
libimf.so->cos() (167 cycles)
exp:
libm.so->exp() (144 cycles)
libimf.so->exp() (128 cycles)
hypot:
libm.so->hypot() (79 cycles)
libimf.so->hypot() (61 cycles)
log:
libm.so->log() (211 cycles)
libimf.so->log() (116 cycles)
log10:
libm.so->log10() (234 cycles)
libimf.so->log10() (142 cycles)
pow:
libm.so->pow() (419 cycles)
libimf.so->pow() (221 cycles)
sin:
libm.so->sin() (74 cycles)
libimf.so->sin() (173 cycles)
sqrt:
libm.so->sqrt() (64 cycles)
libimf.so->sqrt() (32 cycles)
tan:
libm.so->tan() (148 cycles)
libimf.so->tan() (206 cycles)
|
Interesting, to say the least. libm edges out libimf on the basic trig functions (sine, cosine, tangent) while libimf beats libm, usually quite badly, in everything else. I'll take a crack at nbench.
-UPDATE- I tried Doom 3 both ways; no performance difference whatsoever in caching timedemo (timedemo demo1 usecache); it came back 27.9fps both times. UT2k4 may or may not be different, not sure. (D3 might use statically linked or built-in math functions.) _________________ Note: This user has been arrested under the DMCA for copyright infringement based on a complaint from The Inernational Cliche Company. He is also facing charges for violating US patents describing the encoding of text in digital form. |
|
Back to top |
|
|
caslca Tux's lil' helper
Joined: 24 Aug 2003 Posts: 85
|
Posted: Sat Apr 16, 2005 1:56 pm Post subject: |
|
|
I show improvement on all calls:
Code: |
acos:
libm.so->acos() (305 cycles)
libimf.so->acos() (186 cycles)
asin:
libm.so->asin() (304 cycles)
libimf.so->asin() (165 cycles)
atan:
libm.so->atan() (244 cycles)
libimf.so->atan() (111 cycles)
atan2:
libm.so->atan2() (312 cycles)
libimf.so->atan2() (50 cycles)
cos:
libm.so->cos() (196 cycles)
libimf.so->cos() (89 cycles)
exp:
libm.so->exp() (309 cycles)
libimf.so->exp() (80 cycles)
hypot:
libm.so->hypot() (124 cycles)
libimf.so->hypot() (52 cycles)
log:
libm.so->log() (178 cycles)
libimf.so->log() (106 cycles)
log10:
libm.so->log10() (185 cycles)
libimf.so->log10() (109 cycles)
pow:
libm.so->pow() (855 cycles)
libimf.so->pow() (175 cycles)
sin:
libm.so->sin() (220 cycles)
libimf.so->sin() (89 cycles)
sqrt:
libm.so->sqrt() (55 cycles)
libimf.so->sqrt() (44 cycles)
tan:
libm.so->tan() (300 cycles)
libimf.so->tan() (131 cycles)
|
P4 3.2 HT laptop/768MB RAM |
|
Back to top |
|
|
sn4ip3r Guru
Joined: 14 Dec 2002 Posts: 325 Location: Tallinn, Estonia
|
Posted: Sat Apr 16, 2005 3:54 pm Post subject: |
|
|
Improvement on all calls, pentium-m dothan 1500.
Code: | acos:
libm.so->acos() (274 cycles)
libimf.so->acos() (150 cycles)
asin:
libm.so->asin() (275 cycles)
libimf.so->asin() (139 cycles)
atan:
libm.so->atan() (162 cycles)
libimf.so->atan() (80 cycles)
atan2:
libm.so->atan2() (149 cycles)
libimf.so->atan2() (40 cycles)
cos:
libm.so->cos() (114 cycles)
libimf.so->cos() (84 cycles)
exp:
libm.so->exp() (199 cycles)
libimf.so->exp() (59 cycles)
hypot:
libm.so->hypot() (121 cycles)
libimf.so->hypot() (99 cycles)
log:
libm.so->log() (144 cycles)
libimf.so->log() (75 cycles)
log10:
libm.so->log10() (149 cycles)
libimf.so->log10() (80 cycles)
pow:
libm.so->pow() (344 cycles)
libimf.so->pow() (112 cycles)
sin:
libm.so->sin() (122 cycles)
libimf.so->sin() (85 cycles)
sqrt:
libm.so->sqrt() (86 cycles)
libimf.so->sqrt() (71 cycles)
tan:
libm.so->tan() (169 cycles)
libimf.so->tan() (127 cycles) |
|
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|