View previous topic :: View next topic |
Author |
Message |
roylongbottom n00b

Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Wed Aug 21, 2019 11:30 am Post subject: 64 Bit Raspberry Pi 4B Benchmarks |
|
|
64 Bit Raspberry Pi 4B Benchmarks
Previously, I have run my 32 bit and 64 bit benchmarks on the appropriate range of Raspberry Pi computers, up to model 3B+. Details of the benchmarks, results and download links are available from ResearchGate in a
https://www.researchgate.net/publication/327467963_Raspberry_Pi_3B_32_bit_and_64_bit_Benchmarks_and_Stress_Tests
I have also run the 32 bit versions on the Raspberry Pi 4, with results in
https://www.researchgate.net/publication/333973011_Raspberry_Pi_4B_32_Bit_Benchmarks">Raspberry-Pi-4-Benchmarks.pdf
This report contains brief reminders of the benchmarks, with 64 bit results on the Raspberry Pi 4 using Gentoo Operating System. Existing benchmarks were used to provide comparisons with the old 3B+ model and the Pi 4B system using 32 bit Raspbian. The first part is for my original single core programs.
Whetstone Benchmark
This has a number of simple programming loops, with the overall MWIPS rating dependent on floating point calculations, lately those identified as COS and EXP. The last three can be over optimised (N/A), but the time does not affect the overall rating much.
For this simple code, at 64 bits, average Pi 4 performance gain, over the Pi 3B+, was 2.12 times, but only around 1.3 times for straightforward floating point calculations. Then, as should be expected, the Pi 4B 32 bit speed was not much slower.
Code: |
System MHz MWIPS ----- MFLOPS------ ------------ MOPS--------------
1 2 3 COS EXP FIXPT IF EQUAL
Pi 3B+ 64b 1400 1071 383 403 328 20.9 12.4 1704 N/A 1357
Pi 4B 64b 1500 2269 522 534 398 54.8 39.8 2487 N/A 997
Pi4/3B+ 1.07 2.12 1.36 1.32 1.21 2.63 3.21 1.46 N/A 0.73
Pi 4B 32b 1500 1884 516 478 310 54.7 27.1 2498 2247 999
64b/32b 1.00 1.20 1.01 1.12 1.28 1.00 1.47 1.00 N/A 1.00
|
Dhrystone Benchmark
This appears to be the most popular ARM benchmark and often subject to over optimisation. So you can’t compare results from different compilers. Ignoring this, results in VAX MIPS aka DMIPS and comparisons follow. This benchmark has no significant data arrays, suitable for vectorisation.
Using the same 64 bit program, the Pi 4 was more than twice as fast and 52% faster than the 32 bit compilation.
Code: |
DMIPS
System MHz DMIPS /MHz
Pi 3B+ 64b 1400 4028 2.88
Pi 4B 64b 1500 8176 5.45
Pi4/3B+ 1.07 2.03
Pi 4B 32b 1500 5366 3.58
64b/32b 1.00 1.52
|
Linpack Benchmark
The original Linpack benchmark specified the use of double precision (DP) floating point arithmetic, and the code used here is identical to that initially approved for use on old PCs. For the benefit of early ARM computers, the code is also run using single precision (SP) numbers. A version was also produced, replacing the key Daxpy code with NEON Intrinsic Functions, using vector operations, also with single precision calculations.
The Pi 3B+ 32 bit results are also provided for clarification. My results were highlighted in the MagPi magazine, on announcement of the Pi 4, particularly the 2 GFLOPS 32 bit NEON speed. See:
https://www.raspberrypi.org/magpi/raspberry-pi-4-specs-benchmarks/
At 64 bits, Pi 4/3B+ performance ratios were generally higher, than those with the earlier benchmarks. Then, as could be expected, virtually compiler independent performance, using NEON Intrinsic Functions, were similar at 32 bits and 64 bits. The main 64 bit gain was with the compiled single precision version, obtaining the same performance as that via NEON Intrinsics.
Code: |
System MHz ------- MFLOPS --------
DP SP SP NEON
Pi 3B+ 64b 1400 396.6 562.1 604.2
Pi 4B 64b 1500 1059.9 1977.8 1968.6
Pi4/3B+ 1.07 2.67 3.52 3.26
Pi 4B 32b 1500 760.2 921.6 2010.5
64b/32b 1.00 1.39 2.15 0.98
Pi 3B+ 32b 1400 210.5 225.2 562.5
Pi4/3B+ 1.07 3.61 4.09 3.57
|
Livermore Loops Benchmark
This original main benchmark for supercomputers was first introduced in 1970, initially comprising 14 kernels of numerical application, written in Fortran. This was increased to 24 kernels in the 1980s. Following are overall MFLOPS ratings, geometric mean being the official average performance, followed by details from the 24 kernels. Note that these are for double precision calculations
All the ratings indicate reasonably significant performance gains of Pi 4 over Pi 3B+ and 64 bits over 32 bits. Results from the 24 kernels indicate some higher gains. Also note the maximum speed of 2.49 GFLOPS (Double Precision).
The speed of the original Raspberry Pi could be rated as 4.5 times faster than the Cray 1 supercomputer (Geomean 11.9) - see my quote on
https://www.webarchive.org.uk/wayback/archive/20131218132751/http://www.roylongbottom.org.uk/Raspberry%20Pi%20Benchmarks.htm#anchor7a
Now, one core of the Raspberry Pi 4B, at 64 bits, produces performance equivalent to 61 Cray 1 supercomputers.
Code: |
System MHz Maximum Average Geomean Harmean Minimum
Pi 3B+ 64b 1400 737.7 319.4 284.7 250.6 91.6
Pi 4B 64b 1500 2490.5 892 730.3 603.3 212.4
Pi4/3B+ 1.07 3.38 2.79 2.57 2.41 2.32
Pi 4B 32b 1500 1800.2 635.1 519 416.1 155.3
64b/32b 1.00 1.38 1.40 1.41 1.45 1.37
MFLOPS Of 24 Kernels
Pi 3B+ 540 296 539 527 226 175 738 428 484 251 169 245
64b 127 161 291 258 440 520 333 280 310 93 362 209
Pi 4B 2026 997 987 948 372 739 2033 2491 1980 758 495 875
64b 220 404 811 710 753 1124 444 397 1061 414 822 283
Pi4/3B+ 3.75 3.37 1.83 1.80 1.65 4.23 2.76 5.83 4.09 3.02 2.92 3.57
1.73 2.51 2.79 2.75 1.71 2.16 1.33 1.42 3.43 4.48 2.27 1.36
Min 1.33 Max 5.83
Pi 4B 32 746 964 988 943 212 538 1169 1800 1032 469 214 186
32b 159 335 778 623 732 1034 320 350 489 360 749 187
64b/32b 2.72 1.03 1.00 1.00 1.76 1.37 1.74 1.38 1.92 1.62 2.31 4.70
1.38 1.20 1.04 1.14 1.03 1.09 1.39 1.13 2.17 1.15 1.10 1.51
Min 1.00 Max 4.70
|
Next are single core benchmarks that use data in caches and RAM. _________________ Regards
Roy
Last edited by roylongbottom on Sun Aug 25, 2019 11:41 am; edited 1 time in total |
|
Back to top |
|
 |
Sakaki Guru


Joined: 21 May 2014 Posts: 409
|
Posted: Wed Aug 21, 2019 11:50 am Post subject: |
|
|
roylongbottom,
very interesting analysis as always, thanks for all your continued hard work on this!
Will you be posting these results to your Raspberry Pi Benchmarking thread on the RPi forums in due course? _________________ Regards,
sakaki |
|
Back to top |
|
 |
roylongbottom n00b

Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Wed Aug 21, 2019 12:45 pm Post subject: |
|
|
Sakaki wrote: | roylongbottom,
very interesting analysis as always, thanks for all your continued hard work on this!
Will you be posting these results to your Raspberry Pi Benchmarking thread on the RPi forums in due course? |
Yes, nearly identical, including link to new Gentoo _________________ Regards
Roy |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55279 Location: 56N 3W
|
Posted: Wed Aug 21, 2019 4:13 pm Post subject: |
|
|
roylongbottom,
Did you use the same binaries on the Pi3 and Pi4 or rebuild to code to take advantage of the out of order execution available on the Pi4?
Here, I'm being lazy and using Pi3 64 bit code everywhere.
Thank you for your Pi benchmark work. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
roylongbottom n00b

Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Wed Aug 21, 2019 7:09 pm Post subject: |
|
|
NeddySeagoon wrote: | roylongbottom,
Did you use the same binaries on the Pi3 and Pi4 or rebuild to code to take advantage of the out of order execution available on the Pi4?
Here, I'm being lazy and using Pi3 64 bit code everywhere.
Thank you for your Pi benchmark work. |
The benchmarks were those compiled for the Pi 3. As for 32 bit benchmarks, I intend to recompile some on the Pi 4. Are there any special compile options? _________________ Regards
Roy |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55279 Location: 56N 3W
|
Posted: Wed Aug 21, 2019 7:28 pm Post subject: |
|
|
roylongbottom,
On the Pi3, for 64 bit code, I use Code: | CFLAGS="-march=armv8-a+crc -mtune=cortex-a53 -ftree-vectorize -O2 -pipe -fomit-frame-pointer" |
The A53 does not support out of order execution.
The Pi4 has an A72 CPU, which does provide for out of order instruction execution.
If out of order instruction execution requires preparation in the code stream, Code: | CFLAGS="-march=armv8-a+crc -mtune=cortex-a72 -ftree-vectorize -O2 -pipe -fomit-frame-pointer" |
should produce code that is better matched to the Pi4.
I don't know if the A72 takes the A53 in order code stream and does what it can with instruction reordering.
I have not done any 32 bit work on either platform beyond booting 32 bit Raspbian and noting that it works.
That's a great confidence booster when you can't even get a serial console. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
Sakaki Guru


Joined: 21 May 2014 Posts: 409
|
|
Back to top |
|
 |
roylongbottom n00b

Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Thu Aug 22, 2019 11:33 am Post subject: |
|
|
sakaki
I became distracted from reporting some benchmark results after building ATLAS Linear Algebra Subprograms overnight (13 hours), in order to run the High Performance Linpack Benchmark. All went well until the final stage compiling the HPL program, where mpicc could not be found. It was there on the 3B Gentoo, where I successfully installed and ran HPL on a Pi 3B+.
Is mpicc available for downloading for Pi 4 Gentoo? _________________ Regards
Roy |
|
Back to top |
|
 |
Sakaki Guru


Joined: 21 May 2014 Posts: 409
|
Posted: Thu Aug 22, 2019 3:41 pm Post subject: |
|
|
roylongbottom wrote: | All went well until the final stage compiling the HPL program, where mpicc could not be found. It was there on the 3B Gentoo, where I successfully installed and ran HPL on a Pi 3B+.
Is mpicc available for downloading for Pi 4 Gentoo? | Yes, if you do:
Code: |
demouser@pi64 ~ $ sudo emaint sync --repo genpi64
demouser@pi64 ~ $ sudo emerge -v sys-cluster/mpich |
you should get mpicc installed. This is built as a binary package on the binhost, so installation shouldn't take long. Please let me know if there are any problems. _________________ Regards,
sakaki |
|
Back to top |
|
 |
roylongbottom n00b

Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Thu Aug 22, 2019 4:32 pm Post subject: |
|
|
Sakaki
Thanks
Nearly there, mpicc is used but now error is mpif77: Command not found _________________ Regards
Roy |
|
Back to top |
|
 |
roylongbottom n00b

Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Thu Aug 22, 2019 6:05 pm Post subject: |
|
|
Memory Benchmarks
This batch of programs measure speed dependent on data from caches and RAM.
MemSpeed Benchmark
MemSpeed benchmark measures data reading speeds in MegaBytes per second, carrying out calculations on arrays of cache and RAM data, normally sized 2 x 4 KB to 2 x 4 MB. Calculations are as shown in the result headings. For the first two double precision tests, speed MFLOPS can be calculated by dividing MB/second by 8 and 16. For single precision divide by 4 and 8.
Results are provided below for the Gentoo 64 bit version on the Pi 3B+ and Pi 4B, and the Raspbian 32 bit variety on the Pi 4B, then a sample of relative performance, covering data from L1 cache, L2 cache and RAM.
Gains, greater than the 7% CPU MHz difference, were recorded all round by the Pi 4B over the Pi 3B+. The most impressive were on using L2 cache based data and the more intensive floating point calculations.
On the Pi 4B, speeds of 64 bit and 32 bit compilations were similar using RAM based data and executing some integer tests, but significantly faster from cache based floating point calculations.
Code: |
Gentoo 64b Pi 3B+
Memory Reading Speed Test armv8 64 Bit by Roy Longbottom
Start of test Fri Aug 16 12:48:51 2019
Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] x[m]=y[m]
KBytes Dble Sngl Int32 Dble Sngl Int32 Dble Sngl Int32
Used MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S MB/S
8 4813 2897 4350 6180 3954 4831 5378 4324 4324
16 4540 2900 4356 6213 3961 4838 5401 4344 4333
32 4184 2780 4047 5540 3721 4483 5421 4285 4316
64 3784 2678 3803 4776 3547 4171 4925 4087 4051
128 3613 2694 3842 4731 3562 4188 4967 4087 4103
256 3133 2652 3800 4626 3493 4027 4967 4093 4096
512 670 882 1630 2913 2422 2718 3101 3141 2780
1024 587 774 1017 1310 1287 1184 1105 1526 1543
2048 555 746 917 1143 1131 1043 1071 1007 1128
4096 545 691 1130 1039 1015 1140 1045 1087 892
8192 537 795 1139 980 1133 1148 887 854 922
Max MFLOPS 602 725
Gentoo 64b Pi 4B
8 15530 13973 12509 15570 14025 15534 11417 9308 7798
16 15719 14042 12750 15745 14200 15660 11753 9447 7890
32 14062 12228 11435 14052 12699 12855 11864 9459 7937
64 12195 11344 10698 12211 11705 12025 8872 8752 7904
128 12172 11360 10755 12166 11862 11975 8569 8460 7913
256 12228 11369 10697 12123 11790 12082 8073 8222 7896
512 11269 10738 10206 10985 11164 11590 8017 6280 6557
1024 3407 2635 3281 3396 3242 2979 3765 3947 4029
2048 1525 1832 1838 1851 1607 1838 2819 2790 2770
4096 1407 1851 1859 1861 1666 1840 2485 2487 2410
8192 1913 1914 1922 1528 1895 1891 2496 2234 2489
Max MFLOPS 1965 3511
Comparison 64b Pi4/3B+
8 3.23 4.82 2.88 2.52 3.55 3.22 2.12 2.15 1.80
16 3.46 4.84 2.93 2.53 3.58 3.24 2.18 2.17 1.82
256 3.90 4.29 2.82 2.62 3.38 3.00 1.63 2.01 1.93
512 16.82 12.17 6.26 3.77 4.61 4.26 2.59 2.00 2.36
1024 5.80 3.40 3.23 2.59 2.52 2.52 3.41 2.59 2.61
4096 2.58 2.68 1.65 1.79 1.64 1.61 2.38 2.29 2.70
8192 3.56 2.41 1.69 1.56 1.67 1.65 2.81 2.62 2.70
Raspbian 32b Pi 4B
8 8459 4766 13344 8303 4768 15553 7806 9926 9927
16 7142 3918 8649 7103 4094 9309 7899 10086 10056
32 7969 4490 10339 7941 4532 11627 7758 10070 10048
64 8126 4602 9909 8114 4617 11069 7425 8021 8070
128 8302 4651 9623 8311 4657 10836 7374 8049 7934
256 8319 4663 9627 8360 4666 10768 7530 7922 7925
512 8088 4629 9453 8239 4650 10696 5023 7904 7949
1024 3581 3113 3618 3577 3150 3675 5358 2431 1560
2048 1338 1808 1780 1811 1832 1773 2131 950 956
4096 1881 1880 1852 1879 1664 1336 1988 984 1054
8192 1890 1901 1884 1729 1319 1367 2252 1018 1021
Max MFLOPS 1057 1192
Comparison Pi 4B 64b/32b
8 1.84 2.93 0.94 1.88 2.94 1.00 1.46 0.94 0.79
16 2.20 3.58 1.47 2.22 3.47 1.68 1.49 0.94 0.78
256 1.47 2.44 1.11 1.45 2.53 1.12 1.07 1.04 1.00
512 1.39 2.32 1.08 1.33 2.40 1.08 1.60 0.79 0.82
1024 0.95 0.85 0.91 0.95 1.03 0.81 0.70 1.62 2.58
4096 0.75 0.98 1.00 0.99 1.00 1.38 1.25 2.53 2.29
8192 1.01 1.01 1.02 0.88 1.44 1.38 1.11 2.19 2.44
|
NeonSpeed Benchmark
This carries out some of the same calculations as MemSpeed. All results are for 32 bit floating point and integer calculations. Norm functions were as generated by the compiler, using NEON directives and Neon through using Intrinsic Functions.
Unlike running the same programs on the Pi 3B+, using the Pi 4, compiled codes were no longer slower than those produced via Intrinsic Functions. This lead to performance gains of up to over five times.
Except using L1 cache based data, performance was essentially the same using 32 bit and 64 bit benchmarks.
Code: |
Gentoo 64b Pi 3B+
NEON Speed Test armv8 64 Bit V 1.0 Fri Aug 16 2019
Vector Reading Speed in MBytes/Second
Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
KBytes Norm Neon Norm Neon Float Int
16 2715 5110 3945 4826 5426 5598
32 2528 4326 3569 4191 4596 4661
64 2491 4153 3494 4068 4407 4429
128 2537 4228 3583 4120 4461 4473
256 2526 4265 3614 4140 4480 4514
512 1917 2830 2545 2579 2896 2964
1024 1166 1299 1152 1257 1205 1229
4096 1022 1135 1132 1122 1130 1100
16384 1080 1026 1131 1016 1064 1094
65536 996 1120 1061 831 1110 1069
Gentoo 64b Pi 4B
16 13982 16424 12505 15239 16065 17193
32 9554 10753 8981 9657 10970 11025
64 10658 11833 10274 10722 12110 12134
128 10657 11887 10337 10680 11994 11973
256 10709 11970 10360 10774 12003 12083
512 10147 11441 9733 10209 11264 11532
1024 2964 3222 2876 3216 3270 2942
4096 1734 1712 1729 1772 1586 1728
16384 1592 1922 1818 1923 1926 1667
65536 1970 1736 1997 1747 1884 2021
Comparison 64b Pi4/3B+
16 5.15 3.21 3.17 3.16 2.96 3.07
256 4.24 2.81 2.87 2.60 2.68 2.68
512 5.29 4.04 3.82 3.96 3.89 3.89
65536 1.98 1.55 1.88 2.10 1.70 1.89
Raspbian 32b Pi 4B
16 9677 10072 8905 9358 9776 10473
32 10149 10330 9364 9539 9988 10543
64 10948 11708 10466 10568 11318 11994
128 10484 11232 10410 10104 11200 11792
256 10509 11369 10428 10264 11273 11842
512 10406 11066 10134 10054 11075 11467
1024 3069 3202 3159 3166 3204 3203
4096 1721 1910 1908 1882 1903 1900
16384 2023 2009 2008 1965 2032 2013
65536 2073 2074 2074 2073 2068 2064
Comparison Pi 4B 64b/32b
16 1.44 1.63 1.40 1.63 1.64 1.64
256 1.02 1.05 0.99 1.05 1.06 1.02
512 0.98 1.03 0.96 1.02 1.02 1.01
65536 0.95 0.84 0.96 0.84 0.91 0.98
|
BusSpeed Benchmark
This is a read only benchmark with data from caches and RAM. The program reads one word with 32 word address increments, followed by decreasing increments. finally reading all data. This shows were data is read in bursts, enabling estimates being made of bus speeds. The two comparison columns ar for two word and one word increments.
Most data transfers were 2.0 to 2.5 times faster on the Pi 4, including from RAM, and somewhat higher with L2 cache based data.
The 64 bit version still deals with 32 bit words but transferred data somewhat quicker than the 32 bit program, as shown by the Pi 4 results.
Code: |
Gentoo 64b Pi 3B+
BusSpeed armv8 64 Bit Fri Aug 16 12:53:43 2019
Reading Speed 4 Byte Words in MBytes/Second
Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read Inc2 Read
KBytes Words Words Words Words Words All Words All
16 3819 4253 4622 5041 5089 3870
32 1234 1328 2067 3158 4082 3674
64 681 704 1325 2208 3350 3602
128 638 646 1214 2070 3238 3625
256 592 617 1165 1991 3164 3622
512 295 309 640 985 2085 2790
1024 108 120 271 525 1070 1636
4096 98 123 249 486 881 1840
16384 121 114 246 480 977 1642
65536 121 124 248 409 989 1864
Gentoo 64b Pi 4B
Pi4/3B+
16 4999 5042 5665 5885 5891 8217 1.16 2.12
32 1578 2105 3283 4339 5154 7507 1.26 2.04
64 585 911 1855 3085 5163 7918 1.54 2.20
128 590 932 1888 3110 5161 7874 1.59 2.17
256 598 934 1908 3056 5265 7883 1.66 2.18
512 603 939 1822 3019 5124 7716 2.46 2.77
1024 319 482 1060 1885 3283 5721 3.07 3.50
4096 209 253 503 1006 2009 4111 2.28 2.23
16384 209 261 520 1041 2071 4115 2.12 2.51
65536 203 263 489 1011 2023 4036 2.05 2.17
Raspbian 32b Pi 4B
64b/32b
16 3836 4049 4467 5885 4641 5858 1.14 1.14
32 761 1473 2594 3216 3960 4780 1.01 1.01
64 409 801 1684 2422 3745 3940 0.95 0.95
128 406 803 1202 1914 3037 5377 1.32 1.32
256 415 700 1165 2481 4789 5137 1.27 1.27
512 392 760 1243 2455 3764 4264 1.38 1.38
1024 230 256 623 1061 2455 3501 1.59 1.59
4096 197 214 454 938 1852 3195 1.80 1.80
16384 138 215 445 897 1724 3210 1.91 1.91
65536 174 215 398 744 1655 3130 1.61 1.61
|
Fast Fourier Transforms Benchmark
This is a real application provided by my collaborator at Compuserve Forum. There are two versions. The first one is the original C program. The second is an optimised version, originally using my x86 assembly code, but translated back into C code, making use of the partitioning and (my) arrangement to optimise for burst reading from RAM. Three measurements, at each size, using both single and double data, calculating FFT sizes between 1K and 1024K. Results are in milliseconds, with those here, the average of three measurements.
There were gains all round on the Pi 4, compared with the 3B+, mainly between 3 and 4 times on the optimised version, less so using FFT1, with more data transfer speed dependency.
On the Pi 4, performance from the 32 bit compilation was often similar to that at 64 bits. This is probably due to much of the data being read on a skipped sequential basis, not good for vectorisation.
Code: |
Gentoo 64b Pi 3B+
Size FFT1 FFT3
K SP DP SP DP
1 0.13 0.15 0.15 0.17
2 0.29 0.39 0.32 0.38
4 0.76 1.13 0.79 0.85
8 1.93 2.66 1.77 1.94
16 4.02 5.51 4.69 5.14
32 9.50 25.11 9.51 13.67
64 42.53 110.21 25.30 32.25
128 151.08 257.41 57.68 76.71
256 355.88 589.07 129.47 174.85
512 819.91 1324.89 297.80 390.74
1024 1746.23 2943.08 641.50 863.82
Gentoo 64b Pi 4B Pi4/3B+
Size FFT1 FFT3 FFT1 FFT3
K SP DP SP DP SP DP SP DP
1 0.04 0.04 0.04 0.04 3.30 3.62 3.60 4.13
2 0.08 0.14 0.11 0.09 3.81 2.88 2.82 4.03
4 0.25 0.38 0.19 0.22 3.05 2.93 4.13 3.86
8 0.79 1.31 0.46 0.50 2.45 2.04 3.87 3.87
16 2.15 2.91 1.15 1.09 1.87 1.89 4.07 4.71
32 5.71 6.76 2.48 3.18 1.66 3.71 3.83 4.30
64 15.22 51.00 5.43 9.29 2.79 2.16 4.66 3.47
128 83.47 151.95 16.28 24.75 1.81 1.69 3.54 3.10
256 231.24 362.64 39.13 57.28 1.54 1.62 3.31 3.05
512 561.16 765.18 90.20 133.21 1.46 1.73 3.30 2.93
1024 1250.51 1878.44 213.35 303.39 1.40 1.57 3.01 2.85
Raspbian 32b Pi 4B 64b/32b
Size FFT1 FFT3 FFT1 FFT3
K SP DP SP DP SP DP SP DP
1 0.04 0.04 0.06 0.05 0.99 0.96 1.44 1.18
2 0.08 0.12 0.13 0.11 1.04 0.89 1.14 1.18
4 0.32 0.37 0.27 0.24 1.28 0.96 1.42 1.09
8 0.77 0.97 0.58 0.55 0.98 0.74 1.26 1.09
16 1.69 2.01 1.49 1.35 0.78 0.69 1.29 1.24
32 4.37 4.89 2.96 3.63 0.77 0.72 1.19 1.14
64 9.12 26.55 7.46 10.75 0.60 0.52 1.37 1.16
128 55.52 160.11 17.93 26.03 0.67 1.05 1.10 1.05
256 305.92 423.06 41.16 55.06 1.32 1.17 1.05 0.96
512 833.10 854.88 86.93 120.53 1.48 1.12 0.96 0.90
1024 1617.49 1875.52 190.28 266.60 1.29 1.00 0.89 0.88
|
Next Multithreading Benchmarks _________________ Regards
Roy |
|
Back to top |
|
 |
Sakaki Guru


Joined: 21 May 2014 Posts: 409
|
Posted: Thu Aug 22, 2019 7:10 pm Post subject: |
|
|
roylongbottom wrote: | Sakaki
Thanks
Nearly there, mpicc is used but now error is mpif77: Command not found | Looks like mpich needs recompilation with the fortran USE flag enabled. I'll do that tonight and post again here when done. _________________ Regards,
sakaki |
|
Back to top |
|
 |
roylongbottom n00b

Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Thu Aug 22, 2019 7:44 pm Post subject: |
|
|
Sakaki wrote: | roylongbottom wrote: | Sakaki
Thanks
Nearly there, mpicc is used but now error is mpif77: Command not found | Looks like mpich needs recompilation with the fortran USE flag enabled. I'll do that tonight and post again here when done. |
Thanks, but maybe I should try something before putting you to the trouble.
The Pi 3 Gentoo appears to include mpicc, that worked when installing HPL on my Pi 3B+, using the following
https://computenodes.net/2018/06/28/building-hpl-an-atlas-for-the-raspberry-pi/
This installs mpich-3.2. I will try to recompile that. _________________ Regards
Roy |
|
Back to top |
|
 |
roylongbottom n00b

Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Thu Aug 22, 2019 9:37 pm Post subject: |
|
|
Sakaki
My recompile worked, so I now have a working Gentoo Pi 4 HPL Benchmark, but the speed is disappointing, same as the 32 bit version with a maximum of just over 10 GFLOPS (with 4 GB RAM). It might need some compiling parameters changing for HPL (or ATLAS) and wonder if I could find anyone to advise how and where.
At least it is three times faster than using the Gentoo Pi 3B+ version. _________________ Regards
Roy |
|
Back to top |
|
 |
roylongbottom n00b

Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Fri Aug 23, 2019 10:23 am Post subject: |
|
|
sakaki
For an up to date comparison, I have been running that HPL benchmark and my other MP tests on my Pi 3B+, using the new Gentoo. All ran without any problems, but there were two things to report.
The first was that TV display started at 1024 x 786. Settings did not provide an option anywhere near 1920 x 1080.
The second point was that WiFi connected, without any intervention, using the originally entered password. Back on the Pi 4, it still did not connect. _________________ Regards
Roy |
|
Back to top |
|
 |
Sakaki Guru


Joined: 21 May 2014 Posts: 409
|
Posted: Fri Aug 23, 2019 12:23 pm Post subject: |
|
|
roylongbottom wrote: | sakaki
For an up to date comparison, I have been running that HPL benchmark and my other MP tests on my Pi 3B+, using the new Gentoo. All ran without any problems, but there were two things to report.
The first was that TV display started at 1024 x 786. Settings did not provide an option anywhere near 1920 x 1080.
The second point was that WiFi connected, without any intervention, using the originally entered password. Back on the Pi 4, it still did not connect. |
Thanks for the feedback. Not sure what the issue with WiFi is, I have had no issue connecting on the 4B locally. If you could run: Code: | demouser@pi64 ~ $ dmesg > kernel.log |
and email me the results (removing anything sensitive first if you wish), I may be able to pinpoint what is happening.
Incidentally, I have pushed a version of mpich-3.3 (with the fortran USE flag set) to the binhost. To get it use: Code: | demouser@pi64 ~ $ sudo emaint sync --repo genpi64
demouser@pi64 ~ $ sudo emerge -v --oneshot sys-cluster/mpich |
As to the monitor settings, I think I introduced a regression in 1.5.0 by uncommenting the line "hdmi_drive=2" in /boot/config.txt.
Could you try reverting this and see if your monitor compatibility improves? You can do so by simply running the Applications -> Settings -> RPi Config Tool app, and unchecking the "Force audio output in DMT modes" box, then click "Save and Exit" rebooting when prompted. Be sure to confirm your settings when the system comes back up (you'll be prompted about this).
Also, per Neddy's points and the ARM paper on compiler settings linked above, it'd be interesting to see the effect on your benchmarks of e.g. compiling them with: Code: | gcc -march=armv8-a+crc -mtune=cortex-a72 -O2 -pipe a_benchmark.c | options under GCC (and possibly -O3 instead of -O2, if you don't already use that). Code with the above settings will be optimized for the Pi4, but will also run on the Pi3.
The equivalent for optimized code on the Pi3 (which will also run on the Pi4): Code: | gcc -march=armv8-a+crc -mtune=cortex-a53 -O2 -pipe a_benchmark.c |
_________________ Regards,
sakaki |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55279 Location: 56N 3W
|
Posted: Fri Aug 23, 2019 1:02 pm Post subject: |
|
|
roylongbottom,
See Pi 4 Wifi. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55279 Location: 56N 3W
|
Posted: Fri Aug 23, 2019 1:09 pm Post subject: |
|
|
Team,
Sakaki wrote: | Also, per Neddy's points and the ARM paper on compiler settings linked above, it'd be interesting to see the effect on your benchmarks of e.g. compiling them with:
Code:
gcc -march=armv8-a+crc -mtune=cortex-a72 -O2 -pipe a_benchmark.c
options under GCC (and possibly -O3 instead of -O2, if you don't already use that). Code with the above settings will be optimized for the Pi4, but will also run on the Pi3.
The equivalent for optimized code on the Pi3 (which will also run on the Pi4):
Code:
gcc -march=armv8-a+crc -mtune=cortex-a53 -O2 -pipe a_benchmark.c |
It would be interesting to see what was the least worst settings for code to run on both platforms.
My binhost is cortex-a53 but I would rebuild it with cortex-a72 if that produced the best results across both systems.
I'm tempted to do that anyway. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
Sakaki Guru


Joined: 21 May 2014 Posts: 409
|
Posted: Fri Aug 23, 2019 1:12 pm Post subject: |
|
|
NeddySeagoon wrote: | roylongbottom,
See Pi 4 Wifi. |
The 1.5.0 image ought to have the issue pointed out above fixed (commit). _________________ Regards,
sakaki |
|
Back to top |
|
 |
Sakaki Guru


Joined: 21 May 2014 Posts: 409
|
Posted: Fri Aug 23, 2019 1:24 pm Post subject: |
|
|
NeddySeagoon wrote: | Team,
It would be interesting to see what was the least worst settings for code to run on both platforms.
My binhost is cortex-a53 but I would rebuild it with cortex-a72 if that produced the best results across both systems.
I'm tempted to do that anyway. |
Also, ARM has a big.LITTLE architecture that allows work to be transferred on the fly between (e.g.) A72 and A53 cores depending on system load, and because of that there's a "cortex-a72.cortex-a53" mtune variant available for gcc also...
For the 1.5.0 release (and binhost --emptytree @world rebuild), I decided in the end that most people would end up shifting to the Pi4 in time, so migrated to straight-up a72 tuning. _________________ Regards,
sakaki |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55279 Location: 56N 3W
|
Posted: Fri Aug 23, 2019 2:01 pm Post subject: |
|
|
Sakaki,
That's my thinking too but I've not done it yet.
My Acer R13 Chromebook in a big.LITTLE device but for now, it just runs my Pi3 Gentoo. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
roylongbottom n00b

Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Fri Aug 23, 2019 10:06 pm Post subject: |
|
|
Optimisers or Misers
I have been trying the suggested compiling parameters on various benchmarks, via Gentoo on a Pi 4B, but have not found one where they made a great deal of difference - unlike hardware architecture. No doubt there are some.
Below are result for the Livermore loops, comprising 24 program kernels, the most critical at Lawrence Livermore Laboratory for selecting a new supercomputer. The tables show the compile parameters used. The first table indicating the measured MFLOPS for each kernel, and the second one relative ratios compared with
Code: |
MFLOPS
original gcc lloops2.c cpuidc.c -lm -lrt -O3 -march=armv8-a
-o liverloopsPi64
1943 999 951 924 372 681 2067 2538 2041 674 495 862
224 445 812 711 753 1164 443 397 915 408 822 283
gcc 9 - gcc lloops2.c cpuidc.c -lm -lrt -O3 -march=armv8-a
-o liverloopsPi64
1982 986 961 964 384 753 2316 2743 1907 871 500 965
148 411 814 668 725 1167 449 397 1680 557 817 283
gcc 9 - gcc lloops2.c cpuidc.c -lm -lrt -O3 -march=armv8-a
-mtune=cortex-a72 -o liverloopsPi64
1965 962 996 965 388 512 2021 1900 1956 875 483 974
173 400 815 633 748 1184 450 397 1577 560 823 312
gcc 9 - gcc lloops2.c cpuidc.c -lm -lrt -O3 -march=armv8-a+crc
-mtune=cortex-a72 -o liverloopsPi64
1926 960 962 965 382 683 2043 2374 1441 624 500 969
175 413 815 637 748 1172 450 397 1488 553 824 312
gcc 9 - gcc lloops2.c cpuidc.c -lm -lrt -O3 -march=armv8-a+crc
-mtune=cortex-a72 -pipe -o liverloopsPi64
2153 961 992 964 388 668 2056 2399 2088 793 500 973
169 417 814 621 748 1152 449 397 1677 551 822 312
gcc 9 - gcc lloops2.c cpuidc.c -lm -lrt -O2 -march=armv8-a+crc
-mtune=cortex-a72 -pipe -o liverloopsPi64
2206 1218 995 965 206 766 2284 1739 2090 667 500 741
222 365 813 652 746 1116 449 393 639 560 602 125
gcc 9 gcc lloops2.c cpuidc.c -lm -lrt -O3 -march=armv8-a -pipe
-o liverloopsPi64
2130 986 989 965 389 681 2336 1692 1976 678 500 969
177 408 814 668 726 1199 449 397 1651 559 816 283
gcc 9 gcc lloops2.c cpuidc.c -lm -lrt -O3 -o liverloopsPi64
2154 988 977 965 389 731 2328 2841 2078 703 500 977
177 414 815 668 727 1188 450 397 1640 562 820 283
|
Following are the comparisons with speeds of the first, original 64 bit version. Using the same parameters and gcc 9 produced an slight average improvement. the performance went downhill by including suggested parameters, worst was on using -O2 and best with no parameters other than -O3.
Perhaps this is the result of compiling on the target computer.
Code: |
Ratios
original gcc lloops2.c cpuidc.c -lm -lrt -O3 -march=armv8-a Average
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
gcc 9 - gcc lloops2.c cpuidc.c -lm -lrt -O3 -march=armv8-a
-o liverloopsPi64
1.02 0.99 1.01 1.04 1.03 1.10 1.12 1.08 0.93 1.29 1.01 1.12 1.06
0.66 0.93 1.00 0.94 0.96 1.00 1.01 1.00 1.84 1.37 0.99 1.00
gcc 9 - gcc lloops2.c cpuidc.c -lm -lrt -O3 -march=armv8-a
-mtune=cortex-a72 -o liverloopsPi64
1.01 0.96 1.05 1.04 1.04 0.75 0.98 0.75 0.96 1.30 0.97 1.13 1.03
0.77 0.90 1.00 0.89 0.99 1.02 1.02 1.00 1.72 1.37 1.00 1.11
gcc 9 - gcc lloops2.c cpuidc.c -lm -lrt -O3 -march=armv8-a+crc
-mtune=cortex-a72 -o liverloopsPi64
0.99 0.96 1.01 1.04 1.03 1.00 0.99 0.94 0.71 0.92 1.01 1.12 1.02
0.78 0.93 1.00 0.89 0.99 1.01 1.02 1.00 1.63 1.36 1.00 1.11
gcc 9 - gcc lloops2.c cpuidc.c -lm -lrt -O3 -march=armv8-a+crc
-mtune=cortex-a72 -pipe -o liverloopsPi64
1.11 0.96 1.04 1.04 1.04 0.98 0.99 0.95 1.02 1.18 1.01 1.13 1.05
0.76 0.94 1.00 0.87 0.99 0.99 1.01 1.00 1.83 1.35 1.00 1.10
gcc 9 - gcc lloops2.c cpuidc.c -lm -lrt -O2 -march=armv8-a+crc
-mtune=cortex-a72 -pipe -o liverloopsPi64
1.14 1.22 1.05 1.04 0.55 1.13 1.11 0.68 1.02 0.99 1.01 0.86 0.95
0.99 0.82 1.00 0.92 0.99 0.96 1.01 0.99 0.70 1.37 0.73 0.44
gcc 9 gcc lloops2.c cpuidc.c -lm -lrt -O3 -march=armv8-a -pipe
-o liverloopsPi64
1.10 0.99 1.04 1.04 1.05 1.00 1.13 0.67 0.97 1.01 1.01 1.12 1.04
0.79 0.92 1.00 0.94 0.96 1.03 1.01 1.00 1.81 1.37 0.99 1.00
gcc 9 gcc lloops2.c cpuidc.c -lm -lrt -O3 -o liverloopsPi64
1.11 0.99 1.03 1.05 1.04 1.07 1.13 1.12 1.02 1.04 1.01 1.13 1.07
0.79 0.93 1.00 0.94 0.97 1.02 1.01 1.00 1.79 1.38 1.00 1.00
|
_________________ Regards
Roy |
|
Back to top |
|
 |
roylongbottom n00b

Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Sun Aug 25, 2019 9:48 am Post subject: |
|
|
Sakaki wrote: |
As to the monitor settings, I think I introduced a regression in 1.5.0 by uncommenting the line "hdmi_drive=2" in /boot/config.txt.
Could you try reverting this and see if your monitor compatibility improves? You can do so by simply running the Applications -> Settings -> RPi Config Tool app, and unchecking the "Force audio output in DMT modes" box, then click "Save and Exit" rebooting when prompted. Be sure to confirm your settings when the system comes back up (you'll be prompted about this).
|
I reverted that line in config.txt. Then, the particular monitor worked perfectly with full screen displays on both input sockets. The other monitor is better, displaying the coloured square and booting text but then goes off line. _________________ Regards
Roy |
|
Back to top |
|
 |
roylongbottom n00b

Joined: 13 Feb 2017 Posts: 64 Location: Essex, UK
|
Posted: Mon Sep 02, 2019 7:47 am Post subject: |
|
|
Sakaki
My WiFi is now working using v1.5.1 bugfix release, on my two Pi 4s and a Pi 3B+. I have also found that two monitors and a TV display at the correct resolution. _________________ Regards
Roy |
|
Back to top |
|
 |
Sakaki Guru


Joined: 21 May 2014 Posts: 409
|
Posted: Mon Sep 02, 2019 12:02 pm Post subject: |
|
|
roylongbottom wrote: | Sakaki
My WiFi is now working using v1.5.1 bugfix release, on my two Pi 4s and a Pi 3B+. I have also found that two monitors and a TV display at the correct resolution. | Thanks for the feedback, happy to hear these features are now working for you!
PS it appears that the Python interpreter (at least, v3.7.3) runs programs significantly more slowly, on average, in 64bit (both Gentoo and Debian) than in 32bit. I copy my original post to the RPi forums below, as it may be of interest (some further discussion may be found here, ff):
sakaki wrote: | Hello,
apologies for the slight OT, but the Python 64-bit performance discrepancy mentioned above caught my attention (and I observed it also, trying out selfgrams.py), so I decided to try some more detailed benchmarking, the results of which are reported below.
For the tests, I set up a Raspbian Buster system with a 64-bit kernel, but otherwise stock, on a 4GiB Pi4, 1.5GHz, performance CPU governor, Pimoroni fan shim (so no thermal throttling). I then ran the pyperformance benchmark suite in a clean virtualenv. Python v3.7.3 was used. This was the baseline.
I then ran the same test suite in:
- a 32-bit armhf Debian Buster chroot (same 64-bit kernel, Raspbian host OS, physical machine), Python v3.7.3;
- a 64-bit arm64 Debian Buster chroot (ditto);
- a Gentoo64 v1.5.0 system booted under the same kernel, Python v3.7.3 (built under the stock (-O2 no-pgo) settings);
- ditto (but with Python v3.7.3 built using -O3 and profile guided optimization in use (since Debian appear to use this now);
- ditto (but with Python v3.7.4 built using -O3 and profile guided optimization).
I then normalized the reported runtime statistics for each sub-benchmark in the suite, so that 1.00 = the time taken by the 32-bit baseline [1], and then took the median [2] of the full suite's relative performance for each platform as an overall performance measure (lower is better).
The results are tabulated below. Very rough, and with the caveats that apply to any benchmarks, but in summary:
- The Python interpreter (at least, v3.7.3) seems to run programs faster on average in 32-bit than 64-bit (whether Gentoo or Debian), by a significant margin.
- Debian armhf (32-bit) is marginally faster than Raspbian 32-bit, on a median basis.
- The stock (-O2, no pgo) Gentoo 64-bit v3.7.3 is significantly slower than Debian's 64-bit arm64 version... however
- Once I turned on -O3 and pgo (which appears to be Debian's default build settings, and are also now mine for Python from the forthcoming v1.5.1 release onwards ^-^) Gentoo 64 marginally outperformed Debian 64 at v3.7.3 (although was still slower than both 32-bit variants tested).
- The v3.7.4 Gentoo 64-bit Python loses some ground against v3.7.3, but still keeps up with Debian64 v3.7.3 (there are some apparent performance regressions in there, such as unpickle_list, which account for most of this).
Results:
Code: |
pyperformance benchmark, Pi4, common 64-bit kernel, fan shim, 1.5GHz performance governor
Raspbian Debian Debian Gentoo Gentoo Gentoo
32-bit 32-bit 64-bit 64-bit 64-bit 64-bit
stock armhf stock stock -O3 pgo -O3 pgo
Benchmark v3.7.3 v3.7.3 v3.7.3 v3.7.3 v3.7.3 v3.7.4
-----------------------------------------------------------------------------------------
Median (lower=faster) 1.00 0.98 1.19 1.37 1.16 1.19
-----------------------------------------------------------------------------------------
2to3 1.00 1.00 1.23 1.34 1.18 1.20
chameleon 1.00 0.93 1.12 1.32 1.04 1.05
chaos 1.00 0.94 1.26 1.45 1.21 1.22
crypto_pyaes 1.00 0.97 1.23 1.41 1.13 1.20
deltablue 1.00 1.03 1.26 1.45 1.25 1.26
django_template 1.00 0.94 1.28 1.45 1.21 1.24
dulwich_log 1.00 0.93 1.15 1.33 1.13 1.13
fannkuch 1.00 1.08 1.17 1.29 1.06 1.07
float 1.00 1.02 1.16 1.45 1.20 1.26
genshi_text 1.00 1.05 1.27 1.39 1.16 1.20
genshi_xml 1.00 0.99 1.23 1.34 1.11 1.16
go 1.00 1.00 1.19 1.35 1.17 1.18
hexiom 1.00 1.07 1.23 1.47 1.20 1.25
html5lib 1.00 0.98 1.22 1.36 1.19 1.19
json_dumps 1.00 0.89 1.14 1.26 1.02 1.04
json_loads 1.00 0.94 1.09 1.24 0.92 0.93
logging_format 1.00 0.92 1.19 1.35 1.14 1.14
logging_silent 1.00 1.10 1.38 1.47 1.16 1.26
logging_simple 1.00 0.93 1.19 1.35 1.14 1.12
mako 1.00 1.03 1.26 1.46 1.15 1.19
meteor_contest 1.00 1.02 1.18 1.28 1.09 1.13
nbody 1.00 1.03 1.03 1.23 1.04 1.07
nqueens 1.00 1.00 1.22 1.53 1.22 1.24
pathlib 1.00 0.93 1.23 1.44 1.21 1.19
pickle 1.00 0.86 1.08 1.22 0.95 0.97
pickle_dict 1.00 1.03 1.09 1.46 1.09 1.10
pickle_list 1.00 1.03 1.20 1.59 1.10 1.07
pickle_pure_python 1.00 1.05 1.27 1.50 1.19 1.22
pidigits 1.00 0.96 0.57 0.60 0.56 0.56
python_startup 1.00 0.89 1.13 1.27 1.18 1.16
python_startup_no_site 1.00 0.88 1.09 1.25 1.18 1.14
raytrace 1.00 0.98 1.21 1.47 1.24 1.26
regex_compile 1.00 0.99 1.19 1.40 1.16 1.19
regex_dna 1.00 1.12 1.00 0.99 0.92 0.84
regex_effbot 1.00 1.22 1.03 1.04 0.94 0.96
regex_v8 1.00 1.36 1.18 1.28 1.07 1.06
richards 1.00 1.06 1.21 1.38 1.22 1.21
scimark_fft 1.00 0.96 1.07 1.22 1.08 1.14
scimark_lu 1.00 1.03 1.31 1.58 1.28 1.34
scimark_monte_carlo 1.00 1.01 1.27 1.51 1.25 1.41
scimark_sor 1.00 1.06 1.12 1.37 1.22 1.24
scimark_sparse_mat_mul 1.00 0.98 0.91 1.18 1.02 1.04
spectral_norm 1.00 0.98 1.11 1.27 1.10 1.09
sqlalchemy_declarative 1.00 0.94 1.20 1.34 1.23 1.33
sqlalchemy_imperative 1.00 0.90 1.20 1.37 1.24 1.54
sqlite_synth 1.00 0.84 1.12 1.29 1.08 1.12
sympy_expand 1.00 0.99 1.18 1.41 1.19 1.53
sympy_integrate 1.00 0.96 1.23 1.45 1.21 1.36
sympy_str 1.00 0.96 1.21 1.45 1.20 1.44
sympy_sum 1.00 0.93 1.24 1.46 1.23 1.39
telco 1.00 0.85 1.28 1.40 0.98 1.06
tornado_http 1.00 0.92 1.17 1.30 1.16 1.22
unpack_sequence 1.00 1.05 0.96 0.99 0.98 0.95
unpickle 1.00 0.97 1.12 1.32 1.04 1.52
unpickle_list 1.00 1.15 1.28 1.38 1.17 2.25
unpickle_pure_python 1.00 1.10 1.38 1.57 1.28 1.33
xml_etree_generate 1.00 0.96 1.40 1.69 1.31 1.29
xml_etree_iterparse 1.00 1.04 1.16 1.36 1.13 1.07
xml_etree_parse 1.00 0.88 1.00 1.16 0.97 0.99
xml_etree_process 1.00 0.96 1.39 1.58 1.29 1.28
|
Hope that is of some interest! On the basis of the above, I'd expect the performance gap for the Python programs in the chart below (which I copy again for ease of reference) to narrow significantly under the forthcoming v1.5.1 Gentoo 64 release, but still underperform the 32-bit Raspbian tests (in contrast to the Rust, most of the C/C++ tests, which do better under 64-bit). NB for avoidance of doubt this chart has not been updated using the -O3/pgo v3.7.3 or v3.7.4 64-bit Python builds yet.
http://fractal.math.unr.edu/~ejolson/pi/anagram/fame64.png
Best,
sakaki
[1] Oops, just noticed this is reversed to the "divide by 64-bit runtime" metric used in the chart. Apologies for any confusion!
[2] I guess taking logs might have been an idea first >< ... but don't think this will affect things too much.
Edit: just confirmed this: working with log relative performance gives the same comparative ranking:
- Debian 32-bit armhf v3.7.3 (fastest)
- Raspbian 32-bit v3.7.3
- Gentoo 64-bit v3.7.3 -O3 pgo
- Debian 64-bit v3.7.3 / Gentoo 64-bit v3.7.4 -O3 pgo (dead heat)
- Gentoo 64-bit v3.7.3 -02 no-pgo (slowest)
|
_________________ Regards,
sakaki |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|