| View previous topic :: View next topic |
| Author |
Message |
Thaidog Veteran


Joined: 19 May 2004 Posts: 1036 Location: Hilton Head, SC
|
Posted: Sat Feb 06, 2010 10:01 pm Post subject: Latest LinuxDNA kernel benchmarks - icc faster than gcc |
|
|
Here is a post of the latest benchmarks we have done for our ICC compiled LinuxDNA kernel patch. The results are impressive. Context switching is particularly impressive - benchmarks were done with LMbench 3.0 (Linus' favorite bench):
| Code: |
Basic system parameters
------------------------------------------------------------------------------
Host OS Description Mhz tlb cache mem scal
pages line par load
bytes
--------- ------------- ----------------------- ---- ----- ----- ------ ----
atom-gcc Linux 2.6.33- x86_64-linux-gnu 1600 64 1.0000 1
atom-icc Linux 2.6.33- x86_64-linux-gnu 1600 64 1.0000 1
Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
atom-gcc Linux 2.6.33- 1600 0.20 0.36 1.97 5.81 7.44 0.49 2.69 461. 1417 4756
atom-icc Linux 2.6.33- 1600 0.21 0.36 2.07 5.85 7.18 0.49 2.90 332. 1320 4565
Basic integer operations - times in nanoseconds - smaller is better
-------------------------------------------------------------------
Host OS intgr intgr intgr intgr intgr
bit add mul div mod
--------- ------------- ------ ------ ------ ------ ------
atom-gcc Linux 2.6.33- 0.6400 0.4100 0.2500 40.4 40.5
atom-icc Linux 2.6.33- 0.6300 0.4100 0.2800 40.1 40.2
Basic uint64 operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host OS int64 int64 int64 int64 int64
bit add mul div mod
--------- ------------- ------ ------ ------ ------ ------
atom-gcc Linux 2.6.33- 0.630 0.7600 96.2 96.5
atom-icc Linux 2.6.33- 0.630 0.7800 95.3 96.0
Basic float operations - times in nanoseconds - smaller is better
-----------------------------------------------------------------
Host OS float float float float
add mul div bogo
--------- ------------- ------ ------ ------ ------
atom-gcc Linux 2.6.33- 3.1000 2.4900 20.8 28.0
atom-icc Linux 2.6.33- 3.1100 2.4900 20.6 27.8
Basic double operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host OS double double double double
add mul div bogo
--------- ------------- ------ ------ ------ ------
atom-gcc Linux 2.6.33- 3.1000 3.1300 39.1 47.0
atom-icc Linux 2.6.33- 3.1100 3.1200 38.8 46.7
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
atom-gcc Linux 2.6.33- 15.0 16.6 13.5 17.9 24.2 22.9 27.0
atom-icc Linux 2.6.33- 4.0300 7.2000 4.3400 9.0700 14.2 12.3 17.8
*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
atom-gcc Linux 2.6.33- 15.0 34.6 34.2 71.1 90.9 158.
atom-icc Linux 2.6.33- 4.030 11.5 24.1 49.5 73.2 216.
*Remote* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host OS UDP RPC/ TCP RPC/ TCP
UDP TCP conn
--------- ------------- ----- ----- ----- ----- ----
atom-gcc Linux 2.6.33-
atom-icc Linux 2.6.33-
File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
atom-gcc Linux 2.6.33- 40.2 30.3 96.7 44.0 46.8K 0.896 3.713
atom-icc Linux 2.6.33- 46.4 34.6 101.1 45.3 45.2K 1.017 3.575
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
UNIX reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
atom-gcc Linux 2.6.33- 522. 1106 384. 1189.1 2970.8 963.9 968.3 2401 1130.
atom-icc Linux 2.6.33- 767. 620. 350. 1196.4 2993.0 986.9 978.8 2430 1145.
Memory latencies in nanoseconds - smaller is better
(WARNING - may not be correct, check graphs)
------------------------------------------------------------------------------
Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses
--------- ------------- --- ---- ---- -------- -------- -------
atom-gcc Linux 2.6.33- 1600 1.9090 9.6120 39.7 286.8
atom-icc Linux 2.6.33- 1600 1.9010 9.5580 39.2 284.3
make[1]: Leaving directory `/root/lmbench-3.0-a9/lmbench-3.0-a9/results'
|
icc kernel is compiled with -O3 -xSSE3_ATOM -ip -fp-model fast=2 -unroll-aggressive -vec-guard-write
gcc kernel is compiled with -O3 -march=atom -mtune=atom
Tests were done on an Atom 330 dual core 64bit cpu. _________________ www.LinuxDNA.com
http://groups.google.com/group/linuxdna
Registered Linux User: 437619
"I'm a big believer in technology over politics" - Linus Torvalds |
|
| Back to top |
|
 |
r3tep Tux's lil' helper


Joined: 10 Sep 2005 Posts: 108
|
Posted: Sun Feb 07, 2010 10:35 am Post subject: |
|
|
| Did you any benchmarking with compatible cpu's not manufactured by Intel? |
|
| Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 27783 Location: 56N 3W
|
Posted: Sun Feb 07, 2010 12:18 pm Post subject: |
|
|
Moved from Gentoo Chat to Unsupported Software.
There is nothing Gentoo related there _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
| Back to top |
|
 |
Thaidog Veteran


Joined: 19 May 2004 Posts: 1036 Location: Hilton Head, SC
|
Posted: Sun Feb 07, 2010 5:59 pm Post subject: |
|
|
| r3tep wrote: | | Did you any benchmarking with compatible cpu's not manufactured by Intel? |
Not yet... we don't have the hardware right now. If anyone has an AMD chip they can try it though  _________________ www.LinuxDNA.com
http://groups.google.com/group/linuxdna
Registered Linux User: 437619
"I'm a big believer in technology over politics" - Linus Torvalds |
|
| Back to top |
|
 |
Sadako Advocate


Joined: 05 Aug 2004 Posts: 3745 Location: sleeping in the bathtub
|
Posted: Sun Feb 07, 2010 6:16 pm Post subject: |
|
|
Just curious, why -O3?
The L1 cache on the atom is fairly small, a little less than the pentium 2 AFAIK, so it'd be interesting to see it what the comparision is like with -Os, too.
Also, looking at the results, while intel is a clear winner wrt context switching as you pointed out, in most other areas gcc is ahead as often as behind.
-march=atom, so you used a build of gcc 4.5?
| Thaidog wrote: | | r3tep wrote: | | Did you any benchmarking with compatible cpu's not manufactured by Intel? |
Not yet... we don't have the hardware right now. If anyone has an AMD chip they can try it though  | I could try it (just got a shiny new phenom II 965, \o/), but I read before that binaries compiled via ICC are more or less intentionally gimped when running on non-intel processors, any idea if this is the case or not?
edit: does ICC require multilib on x86_64?
If not, then there's no reason I couldn't try it.
edit #2: forget what I said about the atom l1 cache, the pentium 2 only had 32 KB in total, I though it had 32 + 32 for instruction + data, my bad...
The core 2's up to the i7's only have 32 + 32, so the 32 + 24 on the atom is much better than I had thought. _________________ "You have to invite me in" |
|
| Back to top |
|
 |
Thaidog Veteran


Joined: 19 May 2004 Posts: 1036 Location: Hilton Head, SC
|
Posted: Sun Feb 07, 2010 11:09 pm Post subject: |
|
|
| Hopeless wrote: | Just curious, why -O3?
The L1 cache on the atom is fairly small, a little less than the pentium 2 AFAIK, so it'd be interesting to see it what the comparision is like with -Os, too.
Also, looking at the results, while intel is a clear winner wrt context switching as you pointed out, in most other areas gcc is ahead as often as behind.
-march=atom, so you used a build of gcc 4.5?
| Thaidog wrote: | | r3tep wrote: | | Did you any benchmarking with compatible cpu's not manufactured by Intel? |
Not yet... we don't have the hardware right now. If anyone has an AMD chip they can try it though  | I could try it (just got a shiny new phenom II 965, \o/), but I read before that binaries compiled via ICC are more or less intentionally gimped when running on non-intel processors, any idea if this is the case or not?
edit: does ICC require multilib on x86_64?
If not, then there's no reason I couldn't try it.
edit #2: forget what I said about the atom l1 cache, the pentium 2 only had 32 KB in total, I though it had 32 + 32 for instruction + data, my bad...
The core 2's up to the i7's only have 32 + 32, so the 32 + 24 on the atom is much better than I had thought. |
O3 is actually the default for compiling the kernel for both GCC and ICC unless you choose optimize for size where it's then -Os. The other benchmarks are close but the average for all other benchmarks but one ICC wins the category - and since context switching is close to almost %50 percent faster that makes for a seriously noticeable performance increase. Especially for things like multitasking.
There are a few files in the kernel that are still 32bit that need to be compiled with icc:
the files under: arch/x86/boot/*
and: arch/x86/kernel/acpi/realmode/*
There are ways around non-native code execution on non-Intel cpus _ I think this thread has the info in it:
http://groups.google.com/group/linuxdna/browse_thread/thread/c43035b9512c6ace/2b72ff9a47cff6fc?lnk=gst&q=64+bit+linuxdna#2b72ff9a47cff6fc
More than likely you need Zack's modified iccvars_intel64.sh file that can be downloaded off the google group. Look for it under "Files". _________________ www.LinuxDNA.com
http://groups.google.com/group/linuxdna
Registered Linux User: 437619
"I'm a big believer in technology over politics" - Linus Torvalds |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|