Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Bootable 64-bit RPi3 Gentoo image (OpenRC/Xfce/VC4) UPDATED
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, ... 18, 19, 20  Next  
This topic is locked: you cannot edit posts or make replies.    Gentoo Forums Forum Index Gentoo on ARM
View previous topic :: View next topic  
Author Message
Sakaki
Guru
Guru


Joined: 21 May 2014
Posts: 409

PostPosted: Thu Feb 16, 2017 9:22 pm    Post subject: Reply with quote

R0b0t1,
following on from what NeddySeagoon just said, what do you get on your SD-card image if you run lsmod? Check the vc4 module is loaded, here's the output from my gentoo-on-rpi3-64bit image, for example:
Code:
pi64 ~ # lsmod
Module                  Size  Used by
configs                49152  0
cmac                   16384  1
rfcomm                 49152  12
hci_uart               32768  1
btbcm                  16384  1 hci_uart
bnep                   24576  2
bluetooth             397312  37 hci_uart,bnep,btbcm,rfcomm
ipv6                  466944  26
brcmfmac              262144  0
vc4                   139264  3
brcmutil               20480  1 brcmfmac
cfg80211              667648  1 brcmfmac
drm_kms_helper        204800  2 vc4
drm                   454656  6 vc4,drm_kms_helper
rfkill                 32768  5 bluetooth,cfg80211
snd_bcm2835            36864  1
joydev                 20480  0
evdev                  24576  3
snd_pcm               135168  1 snd_bcm2835
syscopyarea            16384  1 drm_kms_helper
sysfillrect            16384  1 drm_kms_helper
sysimgblt              16384  1 drm_kms_helper
fb_sys_fops            16384  1 drm_kms_helper
snd_timer              36864  1 snd_pcm
snd                   102400  5 snd_timer,snd_bcm2835,snd_pcm
uio_pdrv_genirq        16384  0
uio                    24576  1 uio_pdrv_genirq
pi64 ~ # uname -r
4.10.0-rc5-v8


You could also try running "vblank_mode=0 glxgears -info" to see that your Mesa etc. is correctly plumbed in...
To get the accelerated desktop, you need an appropriately configured kernel (& the necessary kernel modules loaded), an appropriately configured Mesa, and appropriately configured X11. See this wiki page for example.
You can look at the /etc/portage/make.conf, /etc/portage/package.use/... on the above image for some pointers too.
_________________
Regards,

sakaki
Back to top
View user's profile Send private message
R0b0t1
Apprentice
Apprentice


Joined: 05 Jun 2008
Posts: 264

PostPosted: Thu Feb 16, 2017 10:12 pm    Post subject: Reply with quote

Thanks NeddySeagoon,

When I switched kernels I left Sakaki's config.txt intact. I checked to make sure that the dtoverlay line for the VC4 firmware was there before posting. Please see the other thread I started if you still feel like helping, I don't want to clutter this one.

Sakaki, I will attempt later and edit this post. Thanks.
Back to top
View user's profile Send private message
roylongbottom
n00b
n00b


Joined: 13 Feb 2017
Posts: 64
Location: Essex, UK

PostPosted: Thu Feb 23, 2017 4:54 pm    Post subject: 64 Bit Benchmarks Reply with quote

Latest programs converted were my Fast Fourier Transform benchmarks that showed some 64 bit performance improvements. Details, results, source code and execution files can be obtained clicking on the links given earlier or via the www button below.

These execute FFTs sized 1K to 1024K, the larger ones depending on RAM speeds. Using Raspbian (32 bit) and Linux/RPi (64 bit), the short FFTs, with execution times of less than 0.5 milliseconds, produced inconsistent running times. This was only with "on demand" MHz settings and not when running another CPU benchmark at the same time, using a different core, or with a “performance” MHz setting. I haven’t found how to set “performance with Gentoo. Is it possible?

To investigate this, I produced another test that executes 30 1K sized FFTs 500 times, with 32 bit and 64 bit compilations (These are included in the tar.gz file). Example results are below.

Code:

       RPi 3 500 x 30 1K Single Precision FFT milliseconds
 
                   32 Bit Raspbian On Demand

  12.9  12.2   7.4   6.0   6.0   6.4   6.0   6.0   6.0   6.0
   6.1   6.0   6.0   6.0   6.0   6.0   6.1   6.1   6.0   6.2
   6.2   6.0   6.0   6.1   6.0   6.0   6.0   6.0   6.1   6.0
   6.2   6.0   6.0   7.0   6.1   6.0   6.0   6.0   6.1   6.0
   6.2   6.1   6.0   6.0   6.2   6.0   6.0   6.0   6.0   7.2
 To
   6.5   6.3   6.1   6.2   6.1   6.1   6.1   6.1   6.1   6.1
   6.5   6.3   6.1   6.1   6.1   6.1   6.1   6.1   6.1   6.1
   6.4   6.2   6.1   6.1   6.2   6.1   6.1   6.1   6.1   6.1

                  Raspbian With Stress Test

   6.7   6.2   6.0   6.0   6.0   6.0   6.1   6.0   6.1   6.0
   6.5   6.2   6.0   6.0   6.0   6.0   6.0   6.0   6.0   6.0
   6.4   6.2   6.0   6.0   6.0   6.0   6.0   6.1   6.0   6.0
 To
   6.3   6.2   6.0   6.0   6.0   6.0   6.0   6.0   6.0   6.0
   6.3   6.2   6.0   6.0   6.0   6.0   6.0   6.0   6.0   6.0
   6.3   6.2   6.0   6.0   6.1   6.0   6.0   6.0   6.0   6.0

                    64 Bit Gentoo On Demand

  17.5  15.4  11.8   8.6   5.4   5.4   5.4   5.4   5.4   5.4
   5.5   5.8   6.0   5.4   5.5   5.4   5.5   5.4   5.4   5.4
   5.5   5.6   6.1   5.4   5.5   5.4   5.5   5.5   5.4   5.4
 To
   5.7   6.9   5.7   5.4   5.4   5.4   5.5   5.4   5.4   5.4
   5.8   6.8   5.8   5.6   5.4   5.4   5.4   5.5   5.4   5.4
   5.7   6.4   5.7   5.5   5.4   5.4   5.5   5.4   5.4   5.4

                   Gentoo With Stress Test

   5.9   7.2   5.9   5.5   5.4   5.4   5.4   5.4   5.4   5.5
   5.6   6.9   5.7   5.4   5.4   5.4   5.4   5.4   5.4   5.4
   5.6   6.5   5.7   5.4   5.4   5.4   5.4   5.4   5.4   5.4
   5.8   7.1   5.9   5.4   5.4   5.4   5.4   5.4   5.4   5.4
 To
   5.7   6.8   5.7   5.4   5.4   5.4   5.4   5.4   5.4   5.4
   5.7   6.7   6.1   5.4   5.4   5.4   5.4   5.4   5.4   5.4
   5.8   6.6   5.6   5.4   5.4   5.4   5.4   5.4   5.4   5.4


_________________
Regards

Roy
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54216
Location: 56N 3W

PostPosted: Thu Feb 23, 2017 6:25 pm    Post subject: Reply with quote

roylongbottom,

All things are possible in Gentoo, its just missing the GUI, so you need to poke about a bit from the console.
Code:
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
will tell the available governors.
Code:
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
will tell the governor in use.
Code:
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
will set the performance governor, provided its available.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Thu Feb 23, 2017 7:12 pm    Post subject: Re: 64 Bit Benchmarks Reply with quote

roylongbottom wrote:
I haven’t found how to set “performance with Gentoo. Is it possible?

roylongbottom ... to follow on from NeddySeagoon, there are a number of ways this can be set, sys-power/cpupower is used for this purpose, and is configured, and started, like any other service. However, assuming 'local' is in a runlevel (which it is by default) you could do the following:

/etc/local.d/cpufreq-performance.start:
#!/bin/sh

for i in /sys/devices/system/cpu/cpu[0-9]/cpufreq/scaling_governor ; do
    echo performance > "$i"
done

You then 'chmod u+x /etc/local.d/cpufreq-performance.start' and this will be set on boot.

For other tuneables look under '/sys/devices/system/cpu/cpufreq/<governer>', and/or see /usr/src/linux/Documentation/cpu-freq/user-guide.txt.

HTH & best ... khay
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54216
Location: 56N 3W

PostPosted: Thu Feb 23, 2017 8:09 pm    Post subject: Reply with quote

Heh, just like everything else in Gentoo, there are lots of ways to do everything and they are all equally right.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
mDup
Apprentice
Apprentice


Joined: 14 Apr 2006
Posts: 212

PostPosted: Fri Feb 24, 2017 2:14 am    Post subject: Reply with quote

Does anyone have a prebuilt Firefox v51.0.1 arm64 gentoo package tarball?
I run gentoo on amlogic s905 and cannot build firefox, but then that's my own fault because I use gcc 6.3.0 for entire portage.
Nevertheless I can run prebuilt rpi3-64 Firefox v50.1.0 package, and so now I wonder if I can get an upgrade.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54216
Location: 56N 3W

PostPosted: Fri Feb 24, 2017 8:50 am    Post subject: Reply with quote

mDup,

It won't build here. It looks like the build system is broken.
Code:
USE="${ARCH} egl gles1 icu minizip openssl pcre16 postproc python
     qt5 script sqlite svc threads virt-network xvmc
     -modemmanager -pam -skia"
# skia wants to link to neon stuff it doesn't build, in firefox anyway.

Even with USE="-skia" it tries and fails to use skia.

I'm a gcc-6.3 on arm64 user too.

Code:
genlop -t firefox

     Tue Jan 10 06:34:04 2017 >>> www-client/firefox-50.1.0-r1
       merge time: 6 hours, 53 minutes and 21 seconds.
is the last one I have.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
roylongbottom
n00b
n00b


Joined: 13 Feb 2017
Posts: 64
Location: Essex, UK

PostPosted: Fri Feb 24, 2017 11:55 am    Post subject: Reply with quote

NeddySeagoon wrote:
roylongbottom,

All things are possible in Gentoo, its just missing the GUI, so you need to poke about a bit from the console.
Code:
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
will tell the available governors.
Code:
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
will tell the governor in use.
Code:
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
will set the performance governor, provided its available.


I had already tried those, but "echo performance" resulted in "Permission denied" and "sudo" made no difference. Trying "su" would not accept a password that I thought was "raspberrypi64". As recommended, I tried a bit more poking, and the command worked after first entering "sudo su" that produced a "pi64" red line prompt - je ne comprends pas and I don't know much French either.
_________________
Regards

Roy
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Fri Feb 24, 2017 8:35 pm    Post subject: Reply with quote

roylongbottom wrote:
I had already tried those, but "echo performance" resulted in "Permission denied" and "sudo" made no difference. Trying "su" would not accept a password that I thought was "raspberrypi64". As recommended, I tried a bit more poking, and the command worked after first entering "sudo su" that produced a "pi64" red line prompt - je ne comprends pas and I don't know much French either.

roylongbottom ... this is why the use of 'sudo' (as a magic bullet) is frowned upon by more experienced shell users, the expectation is that 'sudo echo foo > /foo' is going to work because prefaced with a magic word, but the shell doesn't interpret that command in the way that inexperienced shell users expect. It is the current shell which interprets the command, not a root shell:

Code:
% sudo echo foo > /foo
zsh: permission denied: /foo
% sudo "echo foo > /foo"
sudo: echo foo > /foo: command not found
% sudo "/bin/echo foo > /foo"
sudo: /bin/echo foo > /foo: command not found
% sudo sh -c "/bin/echo foo > /foo"
% ls -l /foo
-rw------- 1 root root 4 2017-02-24 21:23 /foo

In the above you can see that it is only by running a shell via sudo that the 'command' (the full command that is) is run as superuser, and that 'command' needs protected by quotes (so as to be passed to the shell executing, and not interpreted by the running shell). This fact is a trap for the unwary. So, either invoke a shell, or use 'su -' to acquire one.

best ... khay
Back to top
View user's profile Send private message
mDup
Apprentice
Apprentice


Joined: 14 Apr 2006
Posts: 212

PostPosted: Fri Feb 24, 2017 10:29 pm    Post subject: Reply with quote

NeddySeagoon wrote:
mDup,

It won't build here. It looks like the build system is broken.
Code:
USE="${ARCH} egl gles1 icu minizip openssl pcre16 postproc python
     qt5 script sqlite svc threads virt-network xvmc
     -modemmanager -pam -skia"
# skia wants to link to neon stuff it doesn't build, in firefox anyway.

Even with USE="-skia" it tries and fails to use skia.

I'm a gcc-6.3 on arm64 user too.

Code:
genlop -t firefox

     Tue Jan 10 06:34:04 2017 >>> www-client/firefox-50.1.0-r1
       merge time: 6 hours, 53 minutes and 21 seconds.
is the last one I have.


Thanks for information.
Have you been able then to build genpi64 firefox-50.1.0-r1 with gcc-6.3?
I get linker relocation errors, like:
Code:
../../gfx/skia/SkBitmapProcState_matrixProcs.o: In function `SkBitmapProcState::chooseMatrixProc(bool)':
SkBitmapProcState_matrixProcs.cpp:(.text+0xa0c): undefined reference to `ClampX_ClampY_Procs_neon'
/usr/lib/gcc/aarch64-unknown-linux-gnu/6.3.0/../../../../aarch64-unknown-linux-gnu/bin/ld: ../../gfx/skia/SkBitmapProcState_matrixProcs.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against external symbol `ClampX_ClampY_Procs_neon' can not be used when making a shared object; recompile with -fPIC
SkBitmapProcState_matrixProcs.cpp:(.text+0xa10): undefined reference to `ClampX_ClampY_Procs_neon'
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54216
Location: 56N 3W

PostPosted: Fri Feb 24, 2017 10:55 pm    Post subject: Reply with quote

mDup,

I've only tried the firefox in the tree. From your code fragment,
Code:
../../gfx/skia
is a bad sign.
I would expect it to fail using skia.

I'll let sakaki answer for the build in genpi64
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
roylongbottom
n00b
n00b


Joined: 13 Feb 2017
Posts: 64
Location: Essex, UK

PostPosted: Sat Feb 25, 2017 12:11 pm    Post subject: OpenMP Reply with quote

I am converting my MP benchmarks to run at 64 bits. Initially they are being successfully compiled and run via OpenSUSE using gcc-6. The multithreaded programs also run via Gentoo but not the OpenMP tests, where libgomp.so.1 is not found and the benchmarks can't be compiled using Gentoo gcc 5.4. Is OpenMP or the library available and, if so, how do I install them?

Future requirement is OpenGL, particularly equivalent of Raspberry Pi freeglut3. Is that available? I installed OpenGL 7.0 (I think) but can't find it.
_________________
Regards

Roy
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54216
Location: 56N 3W

PostPosted: Sat Feb 25, 2017 1:57 pm    Post subject: Reply with quote

roylongbottom,

equery can tell you lots of things about installed packages.
Code:
emerge gentoolkit
to install it.
For example
Code:
$ equery b openmp     
 * Searching for openmp ...
dev-libs/boost-1.63.0 (/usr/include/boost/numeric/odeint/external/openmp)


libgomp.so.1 appears to belong to gcc.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
mDup
Apprentice
Apprentice


Joined: 14 Apr 2006
Posts: 212

PostPosted: Sat Feb 25, 2017 2:46 pm    Post subject: Reply with quote

NeddySeagoon wrote:
mDup,
I've only tried the firefox in the tree. From your code fragment,
Code:
../../gfx/skia
is a bad sign.
I would expect it to fail using skia.
I'll let sakaki answer for the build in genpi64

Thanks for spotting skia!
When use -skia I can emerge the =www-client/firefox-50.1.0-r1::rpi3 from rpi3-repo
Back to top
View user's profile Send private message
Sakaki
Guru
Guru


Joined: 21 May 2014
Posts: 409

PostPosted: Sat Feb 25, 2017 11:25 pm    Post subject: Reply with quote

@mDup, @NeddySeagoon

mDup wrote:
NeddySeagoon wrote:
mDup,
I've only tried the firefox in the tree. From your code fragment,
Code:
../../gfx/skia
is a bad sign.
I would expect it to fail using skia.
I'll let sakaki answer for the build in genpi64

Thanks for spotting skia!
When use -skia I can emerge the =www-client/firefox-50.1.0-r1::rpi3 from rpi3-repo

The gentoo-on-rpi3-64bit image does have -skia set, and as you note I have retained firefox-50.1.0-r1 in the rpi3 overlay; more modern versions I could not get to build even with -skia. Thunderbird I haven't been able to get running reliably on arm64 at all (it builds, but segfaults shortly after starting up).
In case it is useful, the per-package USE flags on the image are as follows:
Code:
pi64 package.use # tail -n 100 *
==> cairo <==
# Per https://wiki.gentoo.org/wiki/Raspberry_Pi_VC4
x11-libs/cairo opengl xlib-xcb

==> claws-mail <==
# requirements of mail-client/claws-mail
dev-libs/libdbusmenu gtk3

==> elogviewer <==
# requirements of app-portage/elogviewer
dev-libs/libpcre pcre16

==> ffmpeg <==
# enable Multi-Media Abstraction Layer (MMAL) decoding support
media-video/ffmpeg   mmal

==> firefox <==
# no sneaky downloading of binary blobs on first run, please...
# and also disable skia; as this seems to try to pull in neon stuff
www-client/firefox -gmp-autoupdate -skia system-harfbuzz system-icu system-jpeg system-libevent system-libvpx
# requirements of firefox
dev-lang/python:2.7 sqlite
media-libs/harfbuzz icu
media-libs/libvpx postproc

==> genup <==
app-portage/genup::sakaki-tools -buildkernel

==> mesa <==
# Per https://wiki.gentoo.org/wiki/Raspberry_Pi_VC4
media-libs/mesa -classic xa xvmc

==> mplayer <==
media-video/mplayer -dvdnav

==> mpv <==
media-video/mpv   -lua -luajit -iconv -uchardet

==> seahorse <==
# requirements of app-crypt/seahorse
app-crypt/pinentry gnome-keyring

==> vlc <==
media-video/vlc gnutls x264

==> xorg-server <==
# Per https://wiki.gentoo.org/wiki/Raspberry_Pi_VC4
x11-base/xorg-server glamor

==> zlib <==
# required by media-video/vlc
sys-libs/zlib minizip


==> zzz_via_autounmask <==
That is in addition to those in /etc/portage/make.conf:
Code:
# Additional USE flags in addition to those specified by the current profile.
USE="bindist -mudflap -sanitize"
USE="${USE} bluetooth egl gles1 gles2 lock thunar qt4 ffmpeg"
USE="${USE} -gnome -kde"
and of course by the default/linux/arm64/13.0/desktop profile.

Incidentally, all the packages used in the image are also available in binary form at my arm64 binhost, at https://www.isshoni.org/pi64.

@roylongbottom - khayyam's suggestion to use a ".start" file to set the performance governor on boot will work, but you need to be a little careful with this approach on the image, as there already is a .start file (/etc/local.d/ondemand_freq_scaling.start) in place to set the ondemand scaling. Be sure to move or delete this file if you are putting an alternative governor setting in place, otherwise the .start file that runs later during startup will "win" (and that will depend upon the lexical ordering of their filenames).
_________________
Regards,

sakaki
Back to top
View user's profile Send private message
mDup
Apprentice
Apprentice


Joined: 14 Apr 2006
Posts: 212

PostPosted: Sun Feb 26, 2017 4:23 am    Post subject: Reply with quote

Sakaki wrote:
@mDup, @NeddySeagoon
mDup wrote:
NeddySeagoon wrote:
mDup,
I've only tried the firefox in the tree. From your code fragment,
Code:
../../gfx/skia
is a bad sign.
I would expect it to fail using skia.
I'll let sakaki answer for the build in genpi64

Thanks for spotting skia!
When use -skia I can emerge the =www-client/firefox-50.1.0-r1::rpi3 from rpi3-repo

The gentoo-on-rpi3-64bit image does have -skia set, and as you note I have retained firefox-50.1.0-r1 in the rpi3 overlay; more modern versions I could not get to build even with -skia.[...]
In case it is useful, the per-package USE flags on the image are as follows:[...]

Thanks for the USE flags.
I do not have rpi3 (I have amlogic device) so I do not run your image and do not have your flags to look at readily.
Nice idea to use system- style flags for firefox. I'll adjust it on all my gentoo systems.
Yes, more recent would not get to build even with -skia. I think we are on same page.
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Sun Feb 26, 2017 4:51 am    Post subject: Reply with quote

Sakaki wrote:
@roylongbottom - khayyam's suggestion to use a ".start" file to set the performance governor on boot will work, but you need to be a little careful with this approach on the image, as there already is a .start file (/etc/local.d/ondemand_freq_scaling.start) in place to set the ondemand scaling. Be sure to move or delete this file if you are putting an alternative governor setting in place, otherwise the .start file that runs later during startup will "win" (and that will depend upon the lexical ordering of their filenames).

Sakaki, roylongbottom, et al ... all you need do is 'chmod u-x' it, then it won't be run.

Code:
# chmod u-x /etc/local.d/ondemand_freq_scaling.start

best ... khay
Back to top
View user's profile Send private message
roylongbottom
n00b
n00b


Joined: 13 Feb 2017
Posts: 64
Location: Essex, UK

PostPosted: Tue Mar 07, 2017 5:08 pm    Post subject: MultiThreading Benchmarks Reply with quote

Most of my multithreading benchmarks run using 1, 2, 4 and 8 threads. Many have tests that use approximately 12 KB. 120 KB and 12 MB, to use both caches and RAM. The first set attempt to measure maximum MFLOPS. with two test procedures, one with two floating point operations per data word and the other with 32. The latter includes a mixture of multiplications and additions, coded to enable SIMD operation. In this case, using single precision numbers, four at a time, plus linked multiply and add, a top end CPU can execute eight operations per clock cycle per core. It is not clear what the potential maximum MFLOPS is on an ARM Cortex-A53, but eight per core is mentioned. The same benchmark code obtained a maximum of 24 MFLOPS/MHz on a top end quad core Intel CPU, via Linux - see the following:

http://www.roylongbottom.org.uk/linux%20multithreading%20benchmarks.htm#anchor6

Then this ARM CPU might need a different combination of arithmetic operations for higher values, where best case obtained with this benchmark was 2.2 MFLOPS/MHz using a single core.

Following shows the format of the MP-MFLOPS benchmarks with the best 64 bit Raspberry Pi 3 results. Note performance increases using more threads, except when limited by RAM speed. These benchmarks carry out a fixed number of test passes, with each thread carrying out the same calculations on different sections of data. Numeric results produced (x 100000) are output to show that all data has been used.
Code:
 MP-MFLOPS NEON Intrinsics 64 Bit Tue Feb 28 15:37:39 2017

    FPU Add & Multiply using 1, 2, 4 and 8 Threads

        2 Ops/Word              32 Ops/Word
 KB     12.8     128   12800    12.8     128   12800
 MFLOPS
 1T      697     725     420    2640    2544    2441
 2T     1452    1420     348    5135    5258    4430
 4T     1438    2679     343   10113    9905    5370
 8T     1914    2533     358    9332   10124    6041
 Results x 100000, 12345 indicates ERRORS
 1T    76406   97075   99969   66015   95363   99951
 2T    76406   97075   99969   66015   95363   99951
 4T    76406   97075   99969   66015   95363   99951
 8T    76406   97075   99969   66015   95363   99951

         End of test Tue Feb 28 15:37:43 2017

Benchmarks appropriate for comparison of 32 and 64 bit versions are single and double precision versions, compiled for normal floating point and one using NEON intrinsic functions that are clearly suitable for SIMD operation and are converted to different types of vector operation.
64 bit/32 bit speed comparisons are below. Single precision MP-MFLOPS has the highest gain by using vector instructions, instead of scalar. With compiled intrinsics the systems use different forms of vector instructions.
Code:
 Average 64 bit performance gains

         2 Ops/Word              32 Ops/Word
         12.8     128   12800    12.8     128   12800

 MF SP   4.31    3.87    1.24    2.19    2.35    2.04
 MF DP   2.45    1.71    0.83    1.92    1.92    1.42
 Intrin  1.81    1.84    0.82    1.67    1.75    1.08

There is also an OpenMP benchmark that carries out the same calculations, but the OpenMP Shared Object file is not provided with Gentoo gcc. The other 64 bit Linux, I am testing, included it with gcc 4.8 and gcc-6. As usual benchmark, source codes, details and results are in:

http://www.roylongbottom.org.uk/Rpi3-64-Bit-Benchmarks.tar.gz
http://www.roylongbottom.org.uk/Raspberry%20Pi%20Benchmarks.htm
_________________
Regards

Roy
Back to top
View user's profile Send private message
roylongbottom
n00b
n00b


Joined: 13 Feb 2017
Posts: 64
Location: Essex, UK

PostPosted: Thu Mar 16, 2017 11:03 am    Post subject: More 64 Bit MultiThreading Benchmarks Reply with quote

The other MP benchmarks, included in the tar.gz file, demonstrate some MP and 64 bit performance gains, with others identifying that multithreading provided little or no benefit and, sometimes, much worse performance.

MP-Whetstone - Multiple threads each run the eight test functions at the same time, but with some dedicated variables. MP performance is good but the simple test functions are nit appropriate for more advanced instructions at 64 bits, so relative 32 bit performance is between 0.48 and 2.08.

MP-Dhrystone - This runs multiple copies of the whole program at the same time. Dedicated data arrays are used for each thread but there are numerous other variables that are shared. The latter reduces performance gains via multiple threads and, in some cases, these can be slower than using a single thread. In this case, some quad core improvements are shown as up to 2.5 times faster than a single core. Single core 64 bit/32 bit speed ratio was 1.50 reducing to 1.10 using four threads.

MP-Linpack - The original Linpack Benchmark operates on double precision floating point 100x100 matrices. This one runs on 100x100, 500x500 and 1000x1000 single precision matrices using 0, 1, 2 and 4 separate threads, mainly via NEON intrinsic functions that are compiled into different forms of vector instructions. The benchmark was produced to demonstrate that the original Linpack code could not be converted (by me) to show increased performance using multiple threads. The official line is that users are allowed to implement their own linear equation solver for this purpose. At 100 x 100, data is in L2 cache, others depend more on RAM speed. The critical daxpy function is affected by numerous thread create and join directives, even on using one thread. This leads to slow and constant performance using all thread tests - see example below. The 32 bit version produced slightly slower speeds.

Code:
 Linpack Single Precision MultiThreaded Benchmark
  64 Bit NEON Intrinsics, Wed Mar  8 11:36:25 2017

   MFLOPS 0 to 4 Threads, N 100, 500, 1000

 Threads      None        1        2        4

 N  100     552.47   112.73   105.19   105.31
 N  500     442.32   303.75   303.64   305.03
 N 1000     353.88   315.96   309.15   308.31

MP-BusSpeed - This runs integer read only tests using caches and RAM, each thread accessing the same data, but with staggered starting points. It includes tests with variable address increments, to identify burst reading and bus speeds. The main “Read All” test is intended to identify maximum RAM speed. The benchmark demonstrated some appropriate MP performance gains, but slow 64 bit speeds, with the 32 bit version being 2.5 times faster via cache based data. The reason is that the latter compiled arithmetic as 16 four way NEON operations compared with 64 scalar instructions.

MP-RandMem - The benchmark has cache and RAM read only and read/write tests using sequential and random access, each thread accessing the same data but starting at different points. The read only L1 cache based tests demonstrated MP gains of 3.6 times and 64 bit version 43% faster than the 32 bit variety. Read/write tests produced no multithreading performance improvement and the latest benchmark appeared to be siomewhat slower than the 32 bit version.
_________________
Regards

Roy
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3339
Location: Rasi, Finland

PostPosted: Tue Mar 21, 2017 9:49 am    Post subject: Reply with quote

Has anyone tried to convert the existing ext4 filesystem to btrfs?
I think the snapshotting feature of it could be useful there.
_________________
..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
roylongbottom
n00b
n00b


Joined: 13 Feb 2017
Posts: 64
Location: Essex, UK

PostPosted: Sat Mar 25, 2017 11:23 am    Post subject: OpenGL and Java Benchmarks Reply with quote

OpenGL GLUT Benchmark

This was produced for use on Linux based PCs. It has four tests using coloured or textured simple objects then a wireframe and textured complex kitchen structure. It can be run from a script file specifying different window sizes and a command to disable VSYNC, enabling speeds greater than 60 FPS to be demonstrated. The benchmark, source code and details are in the following:

http://www.roylongbottom.org.uk/Rpi3-64-Bit-Benchmarks.tar.gz
http://www.roylongbottom.org.uk/Raspberry%20Pi%20Benchmarks.htm#anchor19a

In 2012, I approved a request from a Quality Engineer at Canonical, to use this OpenGL benchmark in the testing framework of the Unity desktop software. One reason probably was that a test can be run for extended periods as a stress test.

Below are results from a Raspberry Pi 3, using the experimental desktop GL driver and the new 64 bit version. It can be seen that, using smaller windows, the 32 bit version was much faster running simple coloured objects, with the 64 bit benchmark being ahead with complex structures. Then, performance was quite similar with full screen displays.

Code:
 ######################### RPi 3 Original #########################

 GLUT OpenGL Benchmark 32 Bit Version 1, Wed Jul 27 20:31:52 2016

 Window Size  Coloured Objects  Textured Objects  WireFrm  Texture
    Pixels        Few      All      Few      All  Kitchen  Kitchen
  Wide  High      FPS      FPS      FPS      FPS      FPS      FPS

   320   240    308.4    182.1     82.6     52.3     21.6     13.7
   640   480    129.5    119.6     74.6     49.2     21.6     13.8
  1024   768     54.8     52.2     43.7     39.2     21.4     13.6
  1920  1080     21.5     17.9     20.3     19.6     20.6     13.4


 ########################## RPi 3 Gentoo ##########################

 GLUT OpenGL Benchmark 64 Bit Version 1, Sat Mar 18 18:21:44 2017

 Window Size  Coloured Objects  Textured Objects  WireFrm  Texture
    Pixels        Few      All      Few      All  Kitchen  Kitchen
  Wide  High      FPS      FPS      FPS      FPS      FPS      FPS

   320   240    161.8    116.0     67.1     46.3     26.7     16.7
   640   480     76.8     74.8     49.8     41.4     25.9     16.3
  1024   768     35.7     34.8     29.7     26.7     25.0     15.7
  1920  1080     18.0     18.7     16.4     15.8     17.1     13.1


Java Drawing and Whetstone Benchmarks

After a struggle, I gave up trying to emerge Java but managed to download Oracle JDK 1.8 for temporary use (not installed in the right place?). This could compile Java code and run the Whetstone program but not my JavaDraw benchmark. The benchmarks and results are can be obtained via the above links. On running the Whetstone benchmark, excluding two tests, where each was much faster, the average 64 bit speed was twice as fast.
_________________
Regards

Roy
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54216
Location: 56N 3W

PostPosted: Sat Mar 25, 2017 3:12 pm    Post subject: Reply with quote

roylongbottom,

I haven't tried 32 bit Java for the Pi but you can build both Java 1.7 and once you have 1.7 you can use it to build 1.8.
If I got the keywording right, keywording is no longer required.

Its also possible to build Icedtea with Oracles Java. That's documented there too.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
roylongbottom
n00b
n00b


Joined: 13 Feb 2017
Posts: 64
Location: Essex, UK

PostPosted: Tue Apr 25, 2017 10:17 am    Post subject: 64 Bit I/O Benchmarks Reply with quote

My DriveSpeed and LanSpeed programs have now been recompiled as DriveSpeed64 and LanSpeed64, with benchmarks, source codes, details and results in the tar.gz and htm files quoted earlier. The code for these is identical, except DriveSpeed opens files to use direct I/O, avoiding caching. LanSpeed normally runs without using local caching. The benchmarks measure writing and reading speeds of relatively large files, random access and numerous small files.

There might be tuning parameter, but DriveSpeed64 produced errors using the installed Gentoo operating system, where direct I/O did not appear to be available. The benchmarks was validated on a different 64 bit system.

DriveSpeed can also be used for testing USB connected drives. This produced errors using flash drives but happened to run testing a micro SD card, via a USB card reader, but only one via a btrfs formatted partition. Results are below but, compared with earlier 32 bit tests, some speeds are not as expected.

Code:
 ################## DriveSpeed64 External SD Card ###################
                    Gentoo via USB, btrfs format

   DriveSpeed RasPi 64 Bit 1.1 Tue Apr  4 10:28:11 2017
 
 Selected File Path:
 /run/media/demouser/ROOT/home/roy/benchmarks//
 Total MB   29465, Free MB   27511, Used MB    1953

                        MBytes/Second
  MB   Write1   Write2   Write3    Read1    Read2    Read3

   8     5.53    10.64    12.23    29.99    31.88    33.25
  16     6.88     6.82     8.53    31.21    26.41    28.64
 Cached
   8   159.30   175.77   158.98   235.45   229.22   266.71

 Random         Read                       Write
 From MB        4        8       16        4        8       16
 msecs      0.016    0.006    0.006    20.67    50.55    22.84

 200 Files      Write                      Read                  Delete
 File KB        4        8       16        4        8       16     secs
 MB/sec      0.25     0.40     0.97    58.44   160.18   150.07
 ms/file    16.09    20.66    16.87     0.07     0.05     0.11    0.160

  Large Files > Performance restricted by USB speed
  Random      > Writing exceptionally slow, reading far too fast, data cached?
  Small Files > Writing exceptionally slow, reading far too fast, data cached?

As Samba for Gentoo was initially said to be not tested at 64 bits, the LAN was not available to run the benchmark on this system, but was compiled and run on another 64 bit configuration, accessing a Windows based PC. Results are below,
Code:
######################### LanSpeed64 Example #######################
 
   LanSpeed RasPi 64 Bit 1.0 Tue Apr  4 13:04:06 2017
 
 Selected File Path:
 /root/Desktop/sharepc/
 Total MB  266240, Free MB  134653, Used MB  131587

                        MBytes/Second
  MB   Write1   Write2   Write3    Read1    Read2    Read3

   8    11.23    11.40    11.40     8.10    11.62    11.64
  16    11.27    11.42    11.44    11.66    11.66    11.64

 Random         Read                       Write
 From MB        4        8       16        4        8       16
 msecs      0.724    0.886    1.333     1.58     1.50     1.37

 200 Files      Write                      Read                  Delete
 File KB        4        8       16        4        8       16     secs
 MB/sec      0.99     1.81     2.73     1.77     3.02     4.50
 ms/file     4.13     4.54     6.01     2.32     2.71     3.64    0.201


                End of test Tue Apr  4 13:04:43 2017
 
 >>>>>>>>>>>>> Comparison with 32 Bit Version Rpi 3 Ph Win <<<<<<<<<<<<<

  Large Files > Similar speeds reflecting 100 Mbps
  Random      > Similar but writing faster, no apparent caching
  Small Files > Similar speeds

LanSpeed64 was also successfully run targeting the main and USB drives that would not run DriveSpeed64, identifying speeds when data was cached, and suggesting that the earlier failures were due to trying to open files (as used in the programs) to force direct I/O. Details are available in the aforementioned htm report.
_________________
Regards

Roy
Back to top
View user's profile Send private message
roylongbottom
n00b
n00b


Joined: 13 Feb 2017
Posts: 64
Location: Essex, UK

PostPosted: Mon May 08, 2017 11:40 am    Post subject: Stress Testing Programs Reply with quote

Stress Testing Programs

The Cortex-A53 CPU, used in the Raspberry Pi 3, is known to be subject to overheating. Assuming correct software implementation, the first noticeable effect is that, as the temperature increases beyond a critical point, the CPU MHz is throttled. At normal room temperatures, this might only occur when all CPU cores are executing at higher speeds, with a possible contribution from graphics activity. When considered important, special cooling arrangements might be needed, where these stress tests will be of use to evaluate different arrangements. For this series of procedures, the RPI 3 board was “out of case”, where recorded temperatures are often shown to be lower than those obtained using a standard plastic enclosure.

A main consideration for stress testing is that programs have parameters to run for defined durations but with short term reports on progress, including performance and, in this case, CPU temperature and clock MHz. These details should also be saved in constantly updated log files. Then, there will be some evidence, if the system crashes.

In this case, multiple programs are run using a different terminal window for each, normally with 15 minutes test duration specified. One of these measures CPU temperature MHz at specified intervals, where vcgencmd function has to be installed (as used by Raspbian). Two of the programs are benchmarks, already reported on, but with alternative run time parameters, and two are new ones, now with programs, source code and detailed results included in:

http://www.roylongbottom.org.uk/Rpi3-64-Bit-Benchmarks.tar.gz
http://www.roylongbottom.org.uk/Raspberry%20Pi%20Benchmarks.htm

The oldest uses linverloopsPi64, Livermore Loops Benchmark, that has 24 different test kernels, repeated three time with different (cache) memory demands. This was known to produce wrong numeric answers on an overclocked PC. For reliability testing, a parameter specifies a standard time for each of the loops. Results are displayed during the tests, but performance reported and logged is at the end. The main benefit is a continuously changing processing profile.

The other existing benchmark is videogl64 via OpenGL. This has six tests procedures, where one is chosen besides the number of passes and duration of each. The window width and height can be specified, allowing visible screen space where other terminal windows can be displayed.

The new tests have a run time parameter to specify the amount of cache or memory space to use, and carry out high speed integer and floating point calculations via stressintPi64 and burninfpuPi64.

A summary of test results is;

Integer Arithmetic Stress Test - comprising four runs of stressintPi64 using 40 KB of data, aimed at all using L2 cache, with 12 tests each running for 80 seconds. Performance on all cores was essentially the same, with CPU throttling starting after 30 seconds, eventually reducing CPU MHz by nearly 32%, with maximum recorded sample CPU temperature of 84.4 °C. Compared with stand alone results, CPU performance was degraded to a greater extent due to MP overheads.

Floating Point Arithmetic Stress Test - having four burninfpuPi64 test procedures, using L2 cache with 8 operations per data word. Again performance was effectively constant from all cores, with maximum total throughput of 13.7 GFLOPS, reducing by nearly 4 GFLOPS due to CPU throttling down to 843 MHz, again with a maximum temperature of 84.4 °C.

Livermore Loops Stress Test - This uses four copies of the Livermore Loops Benchmark. Overall MFLOPS speeds are shown to be significantly degraded, but RPiHeatMHz64 results demonstrate inconsistent effects of different arithmetic functions. Maximum temperature recorded was 84.9 °C with a CPU MHz of 744.

Integer and OpenGL Stress Tests - The most complicated OpenGL kitchen test was used, along with three Integer Stress Tests., this time using L1 cache based data. The same procedures were used with CPU MHz settings of On-demand and Performance, where results are shown to be virtually the same. The first summary of speeds and temperatures below is with the Performance setting. Then, OpenGL FPS and integer MB/second reduced to around 60% of initial speeds, with many temperatures of 84.9 °C recorded, when CPU MHz temporarily dropped to half speed at 600 MHz. The tests were repeated with the system in a FLIRC case, where the whole aluminium case becomes the heatsink. The performance was consistently high, but temperatures approached the critical CPU throttling would occur.

Code:
       Performance out of case             Performance FLIRC case
       Total   OGL   CPU   CPU             Total   OGL   CPU   CPU
  Secs  MB/s   FPS   MHz    'C        Secs  MB/s   FPS   MHz    'C

     0              1200  55.8           0              1200  44.0
    30          13  1107  80.6          30          13  1200  60.1
    60          11   910  82.7          60          13  1199  63.4
    80  6064     9   850  83.8          80  7116    13  1200  65.0
   160  4656     9   744  84.9         160  7041    13  1199  68.8
   240  4305     8   600  82.7         240  7072    13  1200  70.9
   320  4217     8   600  82.7         320  7075    13  1200  72.0
   400  4209     8   738  84.9         400  7095    13  1200  74.1
   480  4209     8   600  82.7         480  7081    13  1200  75.8
   560  4802     8   738  84.9         560  8067    13  1200  74.7
   640  4768     8   722  84.9         640  8092    13  1200  76.8
   720  4730     8   743  84.9         720  7989    13  1200  77.4
   800  4664     8   823  84.9         800  8050    13  1200  78.4
   880  4712     8   719  84.9         880  7984    13  1200  79.5
   960  5917     8   938  82.7         960  8344    13  1200  74.1

These are the last of my current benchmarks and test programs for Raspberry Pi 3.
_________________
Regards

Roy
Back to top
View user's profile Send private message
Display posts from previous:   
This topic is locked: you cannot edit posts or make replies.    Gentoo Forums Forum Index Gentoo on ARM All times are GMT
Goto page Previous  1, 2, 3, ... 18, 19, 20  Next
Page 2 of 20

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum