Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Architectures & Platforms Gentoo on ARM
  • Search

Bootable 64-bit RPi3 Gentoo image (OpenRC/Xfce/VC4) UPDATED

Gentoo on all things ARM. Both 32 bit and 64 bit.
Tell about your hardware and CHOST.
Problems with crossdev targeting ARM hardware go here too.
Locked
  • Print view
Advanced search
482 posts
  • Page 2 of 20
    • Jump to page:
  • Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • …
  • 20
  • Next
Author
Message
Sakaki
Guru
Guru
User avatar
Posts: 409
Joined: Wed May 21, 2014 8:15 pm

Post by Sakaki » Thu Feb 16, 2017 9:22 pm

R0b0t1,
following on from what NeddySeagoon just said, what do you get on your SD-card image if you run lsmod? Check the vc4 module is loaded, here's the output from my gentoo-on-rpi3-64bit image, for example:

Code: Select all

pi64 ~ # lsmod
Module                  Size  Used by
configs                49152  0
cmac                   16384  1
rfcomm                 49152  12
hci_uart               32768  1
btbcm                  16384  1 hci_uart
bnep                   24576  2
bluetooth             397312  37 hci_uart,bnep,btbcm,rfcomm
ipv6                  466944  26
brcmfmac              262144  0
vc4                   139264  3
brcmutil               20480  1 brcmfmac
cfg80211              667648  1 brcmfmac
drm_kms_helper        204800  2 vc4
drm                   454656  6 vc4,drm_kms_helper
rfkill                 32768  5 bluetooth,cfg80211
snd_bcm2835            36864  1
joydev                 20480  0
evdev                  24576  3
snd_pcm               135168  1 snd_bcm2835
syscopyarea            16384  1 drm_kms_helper
sysfillrect            16384  1 drm_kms_helper
sysimgblt              16384  1 drm_kms_helper
fb_sys_fops            16384  1 drm_kms_helper
snd_timer              36864  1 snd_pcm
snd                   102400  5 snd_timer,snd_bcm2835,snd_pcm
uio_pdrv_genirq        16384  0
uio                    24576  1 uio_pdrv_genirq
pi64 ~ # uname -r
4.10.0-rc5-v8
You could also try running "vblank_mode=0 glxgears -info" to see that your Mesa etc. is correctly plumbed in...
To get the accelerated desktop, you need an appropriately configured kernel (& the necessary kernel modules loaded), an appropriately configured Mesa, and appropriately configured X11. See this wiki page for example.
You can look at the /etc/portage/make.conf, /etc/portage/package.use/... on the above image for some pointers too.
Regards,

sakaki
Top
R0b0t1
Apprentice
Apprentice
Posts: 264
Joined: Thu Jun 05, 2008 9:26 pm

Post by R0b0t1 » Thu Feb 16, 2017 10:12 pm

Thanks NeddySeagoon,

When I switched kernels I left Sakaki's config.txt intact. I checked to make sure that the dtoverlay line for the VC4 firmware was there before posting. Please see the other thread I started if you still feel like helping, I don't want to clutter this one.

Sakaki, I will attempt later and edit this post. Thanks.
Top
roylongbottom
n00b
n00b
Posts: 64
Joined: Mon Feb 13, 2017 12:32 pm
Location: Essex, UK
Contact:
Contact roylongbottom
Website

64 Bit Benchmarks

Post by roylongbottom » Thu Feb 23, 2017 4:54 pm

Latest programs converted were my Fast Fourier Transform benchmarks that showed some 64 bit performance improvements. Details, results, source code and execution files can be obtained clicking on the links given earlier or via the www button below.

These execute FFTs sized 1K to 1024K, the larger ones depending on RAM speeds. Using Raspbian (32 bit) and Linux/RPi (64 bit), the short FFTs, with execution times of less than 0.5 milliseconds, produced inconsistent running times. This was only with "on demand" MHz settings and not when running another CPU benchmark at the same time, using a different core, or with a “performance” MHz setting. I haven’t found how to set “performance with Gentoo. Is it possible?

To investigate this, I produced another test that executes 30 1K sized FFTs 500 times, with 32 bit and 64 bit compilations (These are included in the tar.gz file). Example results are below.

Code: Select all

       RPi 3 500 x 30 1K Single Precision FFT milliseconds
 
                   32 Bit Raspbian On Demand

  12.9  12.2   7.4   6.0   6.0   6.4   6.0   6.0   6.0   6.0
   6.1   6.0   6.0   6.0   6.0   6.0   6.1   6.1   6.0   6.2
   6.2   6.0   6.0   6.1   6.0   6.0   6.0   6.0   6.1   6.0
   6.2   6.0   6.0   7.0   6.1   6.0   6.0   6.0   6.1   6.0
   6.2   6.1   6.0   6.0   6.2   6.0   6.0   6.0   6.0   7.2
 To
   6.5   6.3   6.1   6.2   6.1   6.1   6.1   6.1   6.1   6.1
   6.5   6.3   6.1   6.1   6.1   6.1   6.1   6.1   6.1   6.1
   6.4   6.2   6.1   6.1   6.2   6.1   6.1   6.1   6.1   6.1

                  Raspbian With Stress Test

   6.7   6.2   6.0   6.0   6.0   6.0   6.1   6.0   6.1   6.0
   6.5   6.2   6.0   6.0   6.0   6.0   6.0   6.0   6.0   6.0
   6.4   6.2   6.0   6.0   6.0   6.0   6.0   6.1   6.0   6.0
 To
   6.3   6.2   6.0   6.0   6.0   6.0   6.0   6.0   6.0   6.0
   6.3   6.2   6.0   6.0   6.0   6.0   6.0   6.0   6.0   6.0
   6.3   6.2   6.0   6.0   6.1   6.0   6.0   6.0   6.0   6.0

                    64 Bit Gentoo On Demand

  17.5  15.4  11.8   8.6   5.4   5.4   5.4   5.4   5.4   5.4
   5.5   5.8   6.0   5.4   5.5   5.4   5.5   5.4   5.4   5.4
   5.5   5.6   6.1   5.4   5.5   5.4   5.5   5.5   5.4   5.4
 To
   5.7   6.9   5.7   5.4   5.4   5.4   5.5   5.4   5.4   5.4
   5.8   6.8   5.8   5.6   5.4   5.4   5.4   5.5   5.4   5.4
   5.7   6.4   5.7   5.5   5.4   5.4   5.5   5.4   5.4   5.4

                   Gentoo With Stress Test

   5.9   7.2   5.9   5.5   5.4   5.4   5.4   5.4   5.4   5.5
   5.6   6.9   5.7   5.4   5.4   5.4   5.4   5.4   5.4   5.4
   5.6   6.5   5.7   5.4   5.4   5.4   5.4   5.4   5.4   5.4
   5.8   7.1   5.9   5.4   5.4   5.4   5.4   5.4   5.4   5.4
 To
   5.7   6.8   5.7   5.4   5.4   5.4   5.4   5.4   5.4   5.4
   5.7   6.7   6.1   5.4   5.4   5.4   5.4   5.4   5.4   5.4
   5.8   6.6   5.6   5.4   5.4   5.4   5.4   5.4   5.4   5.4

Regards

Roy
Top
NeddySeagoon
Administrator
Administrator
User avatar
Posts: 56088
Joined: Sat Jul 05, 2003 9:37 am
Location: 56N 3W

Post by NeddySeagoon » Thu Feb 23, 2017 6:25 pm

roylongbottom,

All things are possible in Gentoo, its just missing the GUI, so you need to poke about a bit from the console.

Code: Select all

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
will tell the available governors.

Code: Select all

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
will tell the governor in use.

Code: Select all

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
will set the performance governor, provided its available.
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Top
khayyam
Watchman
Watchman
User avatar
Posts: 6227
Joined: Thu Jun 07, 2012 2:45 am
Location: Room 101

Re: 64 Bit Benchmarks

Post by khayyam » Thu Feb 23, 2017 7:12 pm

roylongbottom wrote:I haven’t found how to set “performance with Gentoo. Is it possible?
roylongbottom ... to follow on from NeddySeagoon, there are a number of ways this can be set, sys-power/cpupower is used for this purpose, and is configured, and started, like any other service. However, assuming 'local' is in a runlevel (which it is by default) you could do the following:

Code: Select all

#!/bin/sh

for i in /sys/devices/system/cpu/cpu[0-9]/cpufreq/scaling_governor ; do
    echo performance > "$i"
done
You then 'chmod u+x /etc/local.d/cpufreq-performance.start' and this will be set on boot.

For other tuneables look under '/sys/devices/system/cpu/cpufreq/<governer>', and/or see /usr/src/linux/Documentation/cpu-freq/user-guide.txt.

HTH & best ... khay
Top
NeddySeagoon
Administrator
Administrator
User avatar
Posts: 56088
Joined: Sat Jul 05, 2003 9:37 am
Location: 56N 3W

Post by NeddySeagoon » Thu Feb 23, 2017 8:09 pm

Heh, just like everything else in Gentoo, there are lots of ways to do everything and they are all equally right.
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Top
mDup
Apprentice
Apprentice
Posts: 212
Joined: Fri Apr 14, 2006 11:38 pm

Post by mDup » Fri Feb 24, 2017 2:14 am

Does anyone have a prebuilt Firefox v51.0.1 arm64 gentoo package tarball?
I run gentoo on amlogic s905 and cannot build firefox, but then that's my own fault because I use gcc 6.3.0 for entire portage.
Nevertheless I can run prebuilt rpi3-64 Firefox v50.1.0 package, and so now I wonder if I can get an upgrade.
Top
NeddySeagoon
Administrator
Administrator
User avatar
Posts: 56088
Joined: Sat Jul 05, 2003 9:37 am
Location: 56N 3W

Post by NeddySeagoon » Fri Feb 24, 2017 8:50 am

mDup,

It won't build here. It looks like the build system is broken.

Code: Select all

USE="${ARCH} egl gles1 icu minizip openssl pcre16 postproc python 
     qt5 script sqlite svc threads virt-network xvmc
     -modemmanager -pam -skia"
# skia wants to link to neon stuff it doesn't build, in firefox anyway.
Even with USE="-skia" it tries and fails to use skia.

I'm a gcc-6.3 on arm64 user too.

Code: Select all

genlop -t firefox

     Tue Jan 10 06:34:04 2017 >>> www-client/firefox-50.1.0-r1
       merge time: 6 hours, 53 minutes and 21 seconds.
is the last one I have.
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Top
roylongbottom
n00b
n00b
Posts: 64
Joined: Mon Feb 13, 2017 12:32 pm
Location: Essex, UK
Contact:
Contact roylongbottom
Website

Post by roylongbottom » Fri Feb 24, 2017 11:55 am

NeddySeagoon wrote:roylongbottom,

All things are possible in Gentoo, its just missing the GUI, so you need to poke about a bit from the console.

Code: Select all

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
will tell the available governors.

Code: Select all

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
will tell the governor in use.

Code: Select all

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
will set the performance governor, provided its available.
I had already tried those, but "echo performance" resulted in "Permission denied" and "sudo" made no difference. Trying "su" would not accept a password that I thought was "raspberrypi64". As recommended, I tried a bit more poking, and the command worked after first entering "sudo su" that produced a "pi64" red line prompt - je ne comprends pas and I don't know much French either.
Regards

Roy
Top
khayyam
Watchman
Watchman
User avatar
Posts: 6227
Joined: Thu Jun 07, 2012 2:45 am
Location: Room 101

Post by khayyam » Fri Feb 24, 2017 8:35 pm

roylongbottom wrote:I had already tried those, but "echo performance" resulted in "Permission denied" and "sudo" made no difference. Trying "su" would not accept a password that I thought was "raspberrypi64". As recommended, I tried a bit more poking, and the command worked after first entering "sudo su" that produced a "pi64" red line prompt - je ne comprends pas and I don't know much French either.
roylongbottom ... this is why the use of 'sudo' (as a magic bullet) is frowned upon by more experienced shell users, the expectation is that 'sudo echo foo > /foo' is going to work because prefaced with a magic word, but the shell doesn't interpret that command in the way that inexperienced shell users expect. It is the current shell which interprets the command, not a root shell:

Code: Select all

% sudo echo foo > /foo
zsh: permission denied: /foo
% sudo "echo foo > /foo"
sudo: echo foo > /foo: command not found
% sudo "/bin/echo foo > /foo"
sudo: /bin/echo foo > /foo: command not found
% sudo sh -c "/bin/echo foo > /foo"
% ls -l /foo
-rw------- 1 root root 4 2017-02-24 21:23 /foo
In the above you can see that it is only by running a shell via sudo that the 'command' (the full command that is) is run as superuser, and that 'command' needs protected by quotes (so as to be passed to the shell executing, and not interpreted by the running shell). This fact is a trap for the unwary. So, either invoke a shell, or use 'su -' to acquire one.

best ... khay
Top
mDup
Apprentice
Apprentice
Posts: 212
Joined: Fri Apr 14, 2006 11:38 pm

Post by mDup » Fri Feb 24, 2017 10:29 pm

NeddySeagoon wrote:mDup,

It won't build here. It looks like the build system is broken.

Code: Select all

USE="${ARCH} egl gles1 icu minizip openssl pcre16 postproc python 
     qt5 script sqlite svc threads virt-network xvmc
     -modemmanager -pam -skia"
# skia wants to link to neon stuff it doesn't build, in firefox anyway.
Even with USE="-skia" it tries and fails to use skia.

I'm a gcc-6.3 on arm64 user too.

Code: Select all

genlop -t firefox

     Tue Jan 10 06:34:04 2017 >>> www-client/firefox-50.1.0-r1
       merge time: 6 hours, 53 minutes and 21 seconds.
is the last one I have.
Thanks for information.
Have you been able then to build genpi64 firefox-50.1.0-r1 with gcc-6.3?
I get linker relocation errors, like:

Code: Select all

../../gfx/skia/SkBitmapProcState_matrixProcs.o: In function `SkBitmapProcState::chooseMatrixProc(bool)':
SkBitmapProcState_matrixProcs.cpp:(.text+0xa0c): undefined reference to `ClampX_ClampY_Procs_neon'
/usr/lib/gcc/aarch64-unknown-linux-gnu/6.3.0/../../../../aarch64-unknown-linux-gnu/bin/ld: ../../gfx/skia/SkBitmapProcState_matrixProcs.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against external symbol `ClampX_ClampY_Procs_neon' can not be used when making a shared object; recompile with -fPIC
SkBitmapProcState_matrixProcs.cpp:(.text+0xa10): undefined reference to `ClampX_ClampY_Procs_neon'
Top
NeddySeagoon
Administrator
Administrator
User avatar
Posts: 56088
Joined: Sat Jul 05, 2003 9:37 am
Location: 56N 3W

Post by NeddySeagoon » Fri Feb 24, 2017 10:55 pm

mDup,

I've only tried the firefox in the tree. From your code fragment,

Code: Select all

../../gfx/skia
is a bad sign.
I would expect it to fail using skia.

I'll let sakaki answer for the build in genpi64
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Top
roylongbottom
n00b
n00b
Posts: 64
Joined: Mon Feb 13, 2017 12:32 pm
Location: Essex, UK
Contact:
Contact roylongbottom
Website

OpenMP

Post by roylongbottom » Sat Feb 25, 2017 12:11 pm

I am converting my MP benchmarks to run at 64 bits. Initially they are being successfully compiled and run via OpenSUSE using gcc-6. The multithreaded programs also run via Gentoo but not the OpenMP tests, where libgomp.so.1 is not found and the benchmarks can't be compiled using Gentoo gcc 5.4. Is OpenMP or the library available and, if so, how do I install them?

Future requirement is OpenGL, particularly equivalent of Raspberry Pi freeglut3. Is that available? I installed OpenGL 7.0 (I think) but can't find it.
Regards

Roy
Top
NeddySeagoon
Administrator
Administrator
User avatar
Posts: 56088
Joined: Sat Jul 05, 2003 9:37 am
Location: 56N 3W

Post by NeddySeagoon » Sat Feb 25, 2017 1:57 pm

roylongbottom,

equery can tell you lots of things about installed packages.

Code: Select all

emerge gentoolkit
to install it.
For example

Code: Select all

$ equery b openmp      
 * Searching for openmp ... 
dev-libs/boost-1.63.0 (/usr/include/boost/numeric/odeint/external/openmp)
libgomp.so.1 appears to belong to gcc.
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Top
mDup
Apprentice
Apprentice
Posts: 212
Joined: Fri Apr 14, 2006 11:38 pm

Post by mDup » Sat Feb 25, 2017 2:46 pm

NeddySeagoon wrote:mDup,
I've only tried the firefox in the tree. From your code fragment,

Code: Select all

../../gfx/skia
is a bad sign.
I would expect it to fail using skia.
I'll let sakaki answer for the build in genpi64
Thanks for spotting skia!
When use -skia I can emerge the =www-client/firefox-50.1.0-r1::rpi3 from rpi3-repo
Top
Sakaki
Guru
Guru
User avatar
Posts: 409
Joined: Wed May 21, 2014 8:15 pm

Post by Sakaki » Sat Feb 25, 2017 11:25 pm

@mDup, @NeddySeagoon
mDup wrote:
NeddySeagoon wrote:mDup,
I've only tried the firefox in the tree. From your code fragment,

Code: Select all

../../gfx/skia
is a bad sign.
I would expect it to fail using skia.
I'll let sakaki answer for the build in genpi64
Thanks for spotting skia!
When use -skia I can emerge the =www-client/firefox-50.1.0-r1::rpi3 from rpi3-repo
The gentoo-on-rpi3-64bit image does have -skia set, and as you note I have retained firefox-50.1.0-r1 in the rpi3 overlay; more modern versions I could not get to build even with -skia. Thunderbird I haven't been able to get running reliably on arm64 at all (it builds, but segfaults shortly after starting up).
In case it is useful, the per-package USE flags on the image are as follows:

Code: Select all

pi64 package.use # tail -n 100 *
==> cairo <==
# Per https://wiki.gentoo.org/wiki/Raspberry_Pi_VC4
x11-libs/cairo opengl xlib-xcb

==> claws-mail <==
# requirements of mail-client/claws-mail
dev-libs/libdbusmenu gtk3

==> elogviewer <==
# requirements of app-portage/elogviewer
dev-libs/libpcre pcre16

==> ffmpeg <==
# enable Multi-Media Abstraction Layer (MMAL) decoding support
media-video/ffmpeg	mmal

==> firefox <==
# no sneaky downloading of binary blobs on first run, please...
# and also disable skia; as this seems to try to pull in neon stuff
www-client/firefox -gmp-autoupdate -skia system-harfbuzz system-icu system-jpeg system-libevent system-libvpx
# requirements of firefox
dev-lang/python:2.7 sqlite
media-libs/harfbuzz icu
media-libs/libvpx postproc

==> genup <==
app-portage/genup::sakaki-tools -buildkernel

==> mesa <==
# Per https://wiki.gentoo.org/wiki/Raspberry_Pi_VC4
media-libs/mesa -classic xa xvmc

==> mplayer <==
media-video/mplayer -dvdnav

==> mpv <==
media-video/mpv	-lua -luajit -iconv -uchardet

==> seahorse <==
# requirements of app-crypt/seahorse
app-crypt/pinentry gnome-keyring

==> vlc <==
media-video/vlc gnutls x264

==> xorg-server <==
# Per https://wiki.gentoo.org/wiki/Raspberry_Pi_VC4
x11-base/xorg-server glamor

==> zlib <==
# required by media-video/vlc
sys-libs/zlib minizip


==> zzz_via_autounmask <==
That is in addition to those in /etc/portage/make.conf:

Code: Select all

# Additional USE flags in addition to those specified by the current profile.
USE="bindist -mudflap -sanitize"
USE="${USE} bluetooth egl gles1 gles2 lock thunar qt4 ffmpeg"
USE="${USE} -gnome -kde"
and of course by the default/linux/arm64/13.0/desktop profile.

Incidentally, all the packages used in the image are also available in binary form at my arm64 binhost, at https://www.isshoni.org/pi64.

@roylongbottom - khayyam's suggestion to use a ".start" file to set the performance governor on boot will work, but you need to be a little careful with this approach on the image, as there already is a .start file (/etc/local.d/ondemand_freq_scaling.start) in place to set the ondemand scaling. Be sure to move or delete this file if you are putting an alternative governor setting in place, otherwise the .start file that runs later during startup will "win" (and that will depend upon the lexical ordering of their filenames).
Regards,

sakaki
Top
mDup
Apprentice
Apprentice
Posts: 212
Joined: Fri Apr 14, 2006 11:38 pm

Post by mDup » Sun Feb 26, 2017 4:23 am

Sakaki wrote:@mDup, @NeddySeagoon
mDup wrote:
NeddySeagoon wrote:mDup,
I've only tried the firefox in the tree. From your code fragment,

Code: Select all

../../gfx/skia
is a bad sign.
I would expect it to fail using skia.
I'll let sakaki answer for the build in genpi64
Thanks for spotting skia!
When use -skia I can emerge the =www-client/firefox-50.1.0-r1::rpi3 from rpi3-repo
The gentoo-on-rpi3-64bit image does have -skia set, and as you note I have retained firefox-50.1.0-r1 in the rpi3 overlay; more modern versions I could not get to build even with -skia.[...]
In case it is useful, the per-package USE flags on the image are as follows:[...]
Thanks for the USE flags.
I do not have rpi3 (I have amlogic device) so I do not run your image and do not have your flags to look at readily.
Nice idea to use system- style flags for firefox. I'll adjust it on all my gentoo systems.
Yes, more recent would not get to build even with -skia. I think we are on same page.
Top
khayyam
Watchman
Watchman
User avatar
Posts: 6227
Joined: Thu Jun 07, 2012 2:45 am
Location: Room 101

Post by khayyam » Sun Feb 26, 2017 4:51 am

Sakaki wrote:@roylongbottom - khayyam's suggestion to use a ".start" file to set the performance governor on boot will work, but you need to be a little careful with this approach on the image, as there already is a .start file (/etc/local.d/ondemand_freq_scaling.start) in place to set the ondemand scaling. Be sure to move or delete this file if you are putting an alternative governor setting in place, otherwise the .start file that runs later during startup will "win" (and that will depend upon the lexical ordering of their filenames).
Sakaki, roylongbottom, et al ... all you need do is 'chmod u-x' it, then it won't be run.

Code: Select all

# chmod u-x /etc/local.d/ondemand_freq_scaling.start
best ... khay
Top
roylongbottom
n00b
n00b
Posts: 64
Joined: Mon Feb 13, 2017 12:32 pm
Location: Essex, UK
Contact:
Contact roylongbottom
Website

MultiThreading Benchmarks

Post by roylongbottom » Tue Mar 07, 2017 5:08 pm

Most of my multithreading benchmarks run using 1, 2, 4 and 8 threads. Many have tests that use approximately 12 KB. 120 KB and 12 MB, to use both caches and RAM. The first set attempt to measure maximum MFLOPS. with two test procedures, one with two floating point operations per data word and the other with 32. The latter includes a mixture of multiplications and additions, coded to enable SIMD operation. In this case, using single precision numbers, four at a time, plus linked multiply and add, a top end CPU can execute eight operations per clock cycle per core. It is not clear what the potential maximum MFLOPS is on an ARM Cortex-A53, but eight per core is mentioned. The same benchmark code obtained a maximum of 24 MFLOPS/MHz on a top end quad core Intel CPU, via Linux - see the following:

http://www.roylongbottom.org.uk/linux%2 ... tm#anchor6

Then this ARM CPU might need a different combination of arithmetic operations for higher values, where best case obtained with this benchmark was 2.2 MFLOPS/MHz using a single core.

Following shows the format of the MP-MFLOPS benchmarks with the best 64 bit Raspberry Pi 3 results. Note performance increases using more threads, except when limited by RAM speed. These benchmarks carry out a fixed number of test passes, with each thread carrying out the same calculations on different sections of data. Numeric results produced (x 100000) are output to show that all data has been used.

Code: Select all

 MP-MFLOPS NEON Intrinsics 64 Bit Tue Feb 28 15:37:39 2017

    FPU Add & Multiply using 1, 2, 4 and 8 Threads

        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      697     725     420    2640    2544    2441
 2T     1452    1420     348    5135    5258    4430
 4T     1438    2679     343   10113    9905    5370
 8T     1914    2533     358    9332   10124    6041
 Results x 100000, 12345 indicates ERRORS
 1T    76406   97075   99969   66015   95363   99951
 2T    76406   97075   99969   66015   95363   99951
 4T    76406   97075   99969   66015   95363   99951
 8T    76406   97075   99969   66015   95363   99951

         End of test Tue Feb 28 15:37:43 2017
Benchmarks appropriate for comparison of 32 and 64 bit versions are single and double precision versions, compiled for normal floating point and one using NEON intrinsic functions that are clearly suitable for SIMD operation and are converted to different types of vector operation.
64 bit/32 bit speed comparisons are below. Single precision MP-MFLOPS has the highest gain by using vector instructions, instead of scalar. With compiled intrinsics the systems use different forms of vector instructions.

Code: Select all

 Average 64 bit performance gains

         2 Ops/Word              32 Ops/Word
         12.8     128   12800    12.8     128   12800

 MF SP   4.31    3.87    1.24    2.19    2.35    2.04
 MF DP   2.45    1.71    0.83    1.92    1.92    1.42
 Intrin  1.81    1.84    0.82    1.67    1.75    1.08
There is also an OpenMP benchmark that carries out the same calculations, but the OpenMP Shared Object file is not provided with Gentoo gcc. The other 64 bit Linux, I am testing, included it with gcc 4.8 and gcc-6. As usual benchmark, source codes, details and results are in:

http://www.roylongbottom.org.uk/Rpi3-64 ... rks.tar.gz
http://www.roylongbottom.org.uk/Raspber ... hmarks.htm
Regards

Roy
Top
roylongbottom
n00b
n00b
Posts: 64
Joined: Mon Feb 13, 2017 12:32 pm
Location: Essex, UK
Contact:
Contact roylongbottom
Website

More 64 Bit MultiThreading Benchmarks

Post by roylongbottom » Thu Mar 16, 2017 11:03 am

The other MP benchmarks, included in the tar.gz file, demonstrate some MP and 64 bit performance gains, with others identifying that multithreading provided little or no benefit and, sometimes, much worse performance.

MP-Whetstone - Multiple threads each run the eight test functions at the same time, but with some dedicated variables. MP performance is good but the simple test functions are nit appropriate for more advanced instructions at 64 bits, so relative 32 bit performance is between 0.48 and 2.08.

MP-Dhrystone - This runs multiple copies of the whole program at the same time. Dedicated data arrays are used for each thread but there are numerous other variables that are shared. The latter reduces performance gains via multiple threads and, in some cases, these can be slower than using a single thread. In this case, some quad core improvements are shown as up to 2.5 times faster than a single core. Single core 64 bit/32 bit speed ratio was 1.50 reducing to 1.10 using four threads.

MP-Linpack - The original Linpack Benchmark operates on double precision floating point 100x100 matrices. This one runs on 100x100, 500x500 and 1000x1000 single precision matrices using 0, 1, 2 and 4 separate threads, mainly via NEON intrinsic functions that are compiled into different forms of vector instructions. The benchmark was produced to demonstrate that the original Linpack code could not be converted (by me) to show increased performance using multiple threads. The official line is that users are allowed to implement their own linear equation solver for this purpose. At 100 x 100, data is in L2 cache, others depend more on RAM speed. The critical daxpy function is affected by numerous thread create and join directives, even on using one thread. This leads to slow and constant performance using all thread tests - see example below. The 32 bit version produced slightly slower speeds.

Code: Select all

 Linpack Single Precision MultiThreaded Benchmark
  64 Bit NEON Intrinsics, Wed Mar  8 11:36:25 2017

   MFLOPS 0 to 4 Threads, N 100, 500, 1000

 Threads      None        1        2        4

 N  100     552.47   112.73   105.19   105.31 
 N  500     442.32   303.75   303.64   305.03 
 N 1000     353.88   315.96   309.15   308.31 
MP-BusSpeed - This runs integer read only tests using caches and RAM, each thread accessing the same data, but with staggered starting points. It includes tests with variable address increments, to identify burst reading and bus speeds. The main “Read All” test is intended to identify maximum RAM speed. The benchmark demonstrated some appropriate MP performance gains, but slow 64 bit speeds, with the 32 bit version being 2.5 times faster via cache based data. The reason is that the latter compiled arithmetic as 16 four way NEON operations compared with 64 scalar instructions.

MP-RandMem - The benchmark has cache and RAM read only and read/write tests using sequential and random access, each thread accessing the same data but starting at different points. The read only L1 cache based tests demonstrated MP gains of 3.6 times and 64 bit version 43% faster than the 32 bit variety. Read/write tests produced no multithreading performance improvement and the latest benchmark appeared to be siomewhat slower than the 32 bit version.
Regards

Roy
Top
Zucca
Administrator
Administrator
User avatar
Posts: 4693
Joined: Thu Jun 14, 2007 10:31 pm
Location: Rasi, Finland
Contact:
Contact Zucca
Website

Post by Zucca » Tue Mar 21, 2017 9:49 am

Has anyone tried to convert the existing ext4 filesystem to btrfs?
I think the snapshotting feature of it could be useful there.
..: Zucca :..

Code: Select all

init=/sbin/openrc-init
-systemd -logind -elogind seatd
I am NaN! I am a man!
Top
roylongbottom
n00b
n00b
Posts: 64
Joined: Mon Feb 13, 2017 12:32 pm
Location: Essex, UK
Contact:
Contact roylongbottom
Website

OpenGL and Java Benchmarks

Post by roylongbottom » Sat Mar 25, 2017 11:23 am

OpenGL GLUT Benchmark

This was produced for use on Linux based PCs. It has four tests using coloured or textured simple objects then a wireframe and textured complex kitchen structure. It can be run from a script file specifying different window sizes and a command to disable VSYNC, enabling speeds greater than 60 FPS to be demonstrated. The benchmark, source code and details are in the following:

http://www.roylongbottom.org.uk/Rpi3-64 ... rks.tar.gz
http://www.roylongbottom.org.uk/Raspber ... #anchor19a

In 2012, I approved a request from a Quality Engineer at Canonical, to use this OpenGL benchmark in the testing framework of the Unity desktop software. One reason probably was that a test can be run for extended periods as a stress test.

Below are results from a Raspberry Pi 3, using the experimental desktop GL driver and the new 64 bit version. It can be seen that, using smaller windows, the 32 bit version was much faster running simple coloured objects, with the 64 bit benchmark being ahead with complex structures. Then, performance was quite similar with full screen displays.

Code: Select all

 ######################### RPi 3 Original #########################

 GLUT OpenGL Benchmark 32 Bit Version 1, Wed Jul 27 20:31:52 2016

 Window Size  Coloured Objects  Textured Objects  WireFrm  Texture
    Pixels        Few      All      Few      All  Kitchen  Kitchen
  Wide  High      FPS      FPS      FPS      FPS      FPS      FPS

   320   240    308.4    182.1     82.6     52.3     21.6     13.7
   640   480    129.5    119.6     74.6     49.2     21.6     13.8
  1024   768     54.8     52.2     43.7     39.2     21.4     13.6
  1920  1080     21.5     17.9     20.3     19.6     20.6     13.4


 ########################## RPi 3 Gentoo ##########################

 GLUT OpenGL Benchmark 64 Bit Version 1, Sat Mar 18 18:21:44 2017

 Window Size  Coloured Objects  Textured Objects  WireFrm  Texture
    Pixels        Few      All      Few      All  Kitchen  Kitchen
  Wide  High      FPS      FPS      FPS      FPS      FPS      FPS

   320   240    161.8    116.0     67.1     46.3     26.7     16.7
   640   480     76.8     74.8     49.8     41.4     25.9     16.3
  1024   768     35.7     34.8     29.7     26.7     25.0     15.7
  1920  1080     18.0     18.7     16.4     15.8     17.1     13.1
Java Drawing and Whetstone Benchmarks

After a struggle, I gave up trying to emerge Java but managed to download Oracle JDK 1.8 for temporary use (not installed in the right place?). This could compile Java code and run the Whetstone program but not my JavaDraw benchmark. The benchmarks and results are can be obtained via the above links. On running the Whetstone benchmark, excluding two tests, where each was much faster, the average 64 bit speed was twice as fast.
Regards

Roy
Top
NeddySeagoon
Administrator
Administrator
User avatar
Posts: 56088
Joined: Sat Jul 05, 2003 9:37 am
Location: 56N 3W

Post by NeddySeagoon » Sat Mar 25, 2017 3:12 pm

roylongbottom,

I haven't tried 32 bit Java for the Pi but you can build both Java 1.7 and once you have 1.7 you can use it to build 1.8.
If I got the keywording right, keywording is no longer required.

Its also possible to build Icedtea with Oracles Java. That's documented there too.
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Top
roylongbottom
n00b
n00b
Posts: 64
Joined: Mon Feb 13, 2017 12:32 pm
Location: Essex, UK
Contact:
Contact roylongbottom
Website

64 Bit I/O Benchmarks

Post by roylongbottom » Tue Apr 25, 2017 10:17 am

My DriveSpeed and LanSpeed programs have now been recompiled as DriveSpeed64 and LanSpeed64, with benchmarks, source codes, details and results in the tar.gz and htm files quoted earlier. The code for these is identical, except DriveSpeed opens files to use direct I/O, avoiding caching. LanSpeed normally runs without using local caching. The benchmarks measure writing and reading speeds of relatively large files, random access and numerous small files.

There might be tuning parameter, but DriveSpeed64 produced errors using the installed Gentoo operating system, where direct I/O did not appear to be available. The benchmarks was validated on a different 64 bit system.

DriveSpeed can also be used for testing USB connected drives. This produced errors using flash drives but happened to run testing a micro SD card, via a USB card reader, but only one via a btrfs formatted partition. Results are below but, compared with earlier 32 bit tests, some speeds are not as expected.

Code: Select all

 ################## DriveSpeed64 External SD Card ###################
                    Gentoo via USB, btrfs format

   DriveSpeed RasPi 64 Bit 1.1 Tue Apr  4 10:28:11 2017
 
 Selected File Path: 
 /run/media/demouser/ROOT/home/roy/benchmarks//
 Total MB   29465, Free MB   27511, Used MB    1953

                        MBytes/Second
  MB   Write1   Write2   Write3    Read1    Read2    Read3

   8     5.53    10.64    12.23    29.99    31.88    33.25
  16     6.88     6.82     8.53    31.21    26.41    28.64
 Cached
   8   159.30   175.77   158.98   235.45   229.22   266.71

 Random         Read                       Write
 From MB        4        8       16        4        8       16
 msecs      0.016    0.006    0.006    20.67    50.55    22.84

 200 Files      Write                      Read                  Delete
 File KB        4        8       16        4        8       16     secs
 MB/sec      0.25     0.40     0.97    58.44   160.18   150.07
 ms/file    16.09    20.66    16.87     0.07     0.05     0.11    0.160

  Large Files > Performance restricted by USB speed
  Random      > Writing exceptionally slow, reading far too fast, data cached? 
  Small Files > Writing exceptionally slow, reading far too fast, data cached?
As Samba for Gentoo was initially said to be not tested at 64 bits, the LAN was not available to run the benchmark on this system, but was compiled and run on another 64 bit configuration, accessing a Windows based PC. Results are below,

Code: Select all

######################### LanSpeed64 Example #######################
 
   LanSpeed RasPi 64 Bit 1.0 Tue Apr  4 13:04:06 2017
 
 Selected File Path: 
 /root/Desktop/sharepc/
 Total MB  266240, Free MB  134653, Used MB  131587

                        MBytes/Second
  MB   Write1   Write2   Write3    Read1    Read2    Read3

   8    11.23    11.40    11.40     8.10    11.62    11.64
  16    11.27    11.42    11.44    11.66    11.66    11.64

 Random         Read                       Write
 From MB        4        8       16        4        8       16
 msecs      0.724    0.886    1.333     1.58     1.50     1.37

 200 Files      Write                      Read                  Delete
 File KB        4        8       16        4        8       16     secs
 MB/sec      0.99     1.81     2.73     1.77     3.02     4.50
 ms/file     4.13     4.54     6.01     2.32     2.71     3.64    0.201


                End of test Tue Apr  4 13:04:43 2017
 
 >>>>>>>>>>>>> Comparison with 32 Bit Version Rpi 3 Ph Win <<<<<<<<<<<<<

  Large Files > Similar speeds reflecting 100 Mbps
  Random      > Similar but writing faster, no apparent caching 
  Small Files > Similar speeds
LanSpeed64 was also successfully run targeting the main and USB drives that would not run DriveSpeed64, identifying speeds when data was cached, and suggesting that the earlier failures were due to trying to open files (as used in the programs) to force direct I/O. Details are available in the aforementioned htm report.
Regards

Roy
Top
roylongbottom
n00b
n00b
Posts: 64
Joined: Mon Feb 13, 2017 12:32 pm
Location: Essex, UK
Contact:
Contact roylongbottom
Website

Stress Testing Programs

Post by roylongbottom » Mon May 08, 2017 11:40 am

Stress Testing Programs

The Cortex-A53 CPU, used in the Raspberry Pi 3, is known to be subject to overheating. Assuming correct software implementation, the first noticeable effect is that, as the temperature increases beyond a critical point, the CPU MHz is throttled. At normal room temperatures, this might only occur when all CPU cores are executing at higher speeds, with a possible contribution from graphics activity. When considered important, special cooling arrangements might be needed, where these stress tests will be of use to evaluate different arrangements. For this series of procedures, the RPI 3 board was “out of case”, where recorded temperatures are often shown to be lower than those obtained using a standard plastic enclosure.

A main consideration for stress testing is that programs have parameters to run for defined durations but with short term reports on progress, including performance and, in this case, CPU temperature and clock MHz. These details should also be saved in constantly updated log files. Then, there will be some evidence, if the system crashes.

In this case, multiple programs are run using a different terminal window for each, normally with 15 minutes test duration specified. One of these measures CPU temperature MHz at specified intervals, where vcgencmd function has to be installed (as used by Raspbian). Two of the programs are benchmarks, already reported on, but with alternative run time parameters, and two are new ones, now with programs, source code and detailed results included in:

http://www.roylongbottom.org.uk/Rpi3-64 ... rks.tar.gz
http://www.roylongbottom.org.uk/Raspber ... hmarks.htm

The oldest uses linverloopsPi64, Livermore Loops Benchmark, that has 24 different test kernels, repeated three time with different (cache) memory demands. This was known to produce wrong numeric answers on an overclocked PC. For reliability testing, a parameter specifies a standard time for each of the loops. Results are displayed during the tests, but performance reported and logged is at the end. The main benefit is a continuously changing processing profile.

The other existing benchmark is videogl64 via OpenGL. This has six tests procedures, where one is chosen besides the number of passes and duration of each. The window width and height can be specified, allowing visible screen space where other terminal windows can be displayed.

The new tests have a run time parameter to specify the amount of cache or memory space to use, and carry out high speed integer and floating point calculations via stressintPi64 and burninfpuPi64.

A summary of test results is;

Integer Arithmetic Stress Test - comprising four runs of stressintPi64 using 40 KB of data, aimed at all using L2 cache, with 12 tests each running for 80 seconds. Performance on all cores was essentially the same, with CPU throttling starting after 30 seconds, eventually reducing CPU MHz by nearly 32%, with maximum recorded sample CPU temperature of 84.4 °C. Compared with stand alone results, CPU performance was degraded to a greater extent due to MP overheads.

Floating Point Arithmetic Stress Test - having four burninfpuPi64 test procedures, using L2 cache with 8 operations per data word. Again performance was effectively constant from all cores, with maximum total throughput of 13.7 GFLOPS, reducing by nearly 4 GFLOPS due to CPU throttling down to 843 MHz, again with a maximum temperature of 84.4 °C.

Livermore Loops Stress Test - This uses four copies of the Livermore Loops Benchmark. Overall MFLOPS speeds are shown to be significantly degraded, but RPiHeatMHz64 results demonstrate inconsistent effects of different arithmetic functions. Maximum temperature recorded was 84.9 °C with a CPU MHz of 744.

Integer and OpenGL Stress Tests - The most complicated OpenGL kitchen test was used, along with three Integer Stress Tests., this time using L1 cache based data. The same procedures were used with CPU MHz settings of On-demand and Performance, where results are shown to be virtually the same. The first summary of speeds and temperatures below is with the Performance setting. Then, OpenGL FPS and integer MB/second reduced to around 60% of initial speeds, with many temperatures of 84.9 °C recorded, when CPU MHz temporarily dropped to half speed at 600 MHz. The tests were repeated with the system in a FLIRC case, where the whole aluminium case becomes the heatsink. The performance was consistently high, but temperatures approached the critical CPU throttling would occur.

Code: Select all

       Performance out of case             Performance FLIRC case
       Total   OGL   CPU   CPU             Total   OGL   CPU   CPU
  Secs  MB/s   FPS   MHz    'C        Secs  MB/s   FPS   MHz    'C

     0              1200  55.8           0              1200  44.0
    30          13  1107  80.6          30          13  1200  60.1
    60          11   910  82.7          60          13  1199  63.4
    80  6064     9   850  83.8          80  7116    13  1200  65.0
   160  4656     9   744  84.9         160  7041    13  1199  68.8
   240  4305     8   600  82.7         240  7072    13  1200  70.9
   320  4217     8   600  82.7         320  7075    13  1200  72.0
   400  4209     8   738  84.9         400  7095    13  1200  74.1
   480  4209     8   600  82.7         480  7081    13  1200  75.8
   560  4802     8   738  84.9         560  8067    13  1200  74.7
   640  4768     8   722  84.9         640  8092    13  1200  76.8
   720  4730     8   743  84.9         720  7989    13  1200  77.4
   800  4664     8   823  84.9         800  8050    13  1200  78.4
   880  4712     8   719  84.9         880  7984    13  1200  79.5
   960  5917     8   938  82.7         960  8344    13  1200  74.1
These are the last of my current benchmarks and test programs for Raspberry Pi 3.
Regards

Roy
Top
Locked
  • Print view

482 posts
  • Page 2 of 20
    • Jump to page:
  • Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • …
  • 20
  • Next

Return to “Gentoo on ARM”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic