Hey there,
just got my RPi3 a few days ago and was wondering for the proper CFLAGS.
Obaining them the usual way by running
Code: Select all
gcc -march=native -mtune=cortex-a53 -Q --help=target
always resulted in a segfault. Turns out that the Cortex-A53 and the Cortex-A57 were not implemented in the GCC CPU list for AARCH32 (ARM32bit); whilst being present in the AARCH64.
A patch was suggested way earlier down the road here:
https://gcc.gnu.org/ml/gcc-patches/2014 ... 00585.html
So I ran the ebuild commands manually, patching the file to contain the two CPUs, after that it would suddenly recognize the "native" switch on my RPi3.
Of course, this is not a permanent solution, as any GCC update will waste your manually patched installation.
For your reference, I have run the usual command (see above) to obtain the flags being used for the RPi3 with
-march=native, please find them below:
Die folgenden Optionen sind zielspezifisch:
-mabi= aapcs-linux
-mabort-on-noreturn [ausgeschaltet]
-mandroid [ausgeschaltet]
-mapcs [ausgeschaltet]
-mapcs-float [ausgeschaltet]
-mapcs-frame [ausgeschaltet]
-mapcs-reentrant [ausgeschaltet]
-mapcs-stack-check [ausgeschaltet]
-march= armv8-a+crc
-marm [eingeschaltet]
-masm-syntax-unified [ausgeschaltet]
-mbig-endian [ausgeschaltet]
-mbionic [ausgeschaltet]
-mcallee-super-interworking [ausgeschaltet]
-mcaller-super-interworking [ausgeschaltet]
-mcpu= [Standard]
-mfix-cortex-m3-ldrd [eingeschaltet]
-mfloat-abi= hard
-mfp16-format= none
-mfpu= vfpv3-d16
-mglibc [eingeschaltet]
-mhard-float
-mlittle-endian [eingeschaltet]
-mlong-calls [ausgeschaltet]
-mneon-for-64bits [ausgeschaltet]
-mnew-generic-costs [ausgeschaltet]
-mold-rtx-costs [ausgeschaltet]
-mpic-data-is-text-relative [eingeschaltet]
-mpic-register=
-mpoke-function-name [ausgeschaltet]
-mprint-tune-info [ausgeschaltet]
-mrestrict-it [eingeschaltet]
-msched-prolog [eingeschaltet]
-msingle-pic-base [ausgeschaltet]
-mslow-flash-data [ausgeschaltet]
-msoft-float
-mstructure-size-boundary= 0x20
-mthumb [ausgeschaltet]
-mthumb-interwork [eingeschaltet]
-mtls-dialect= gnu
-mtp= auto
-mtpcs-frame [ausgeschaltet]
-mtpcs-leaf-frame [ausgeschaltet]
-mtune= cortex-a53
-muclibc [ausgeschaltet]
-munaligned-access [eingeschaltet]
-mvectorize-with-neon-double [ausgeschaltet]
-mvectorize-with-neon-quad [eingeschaltet]
-mword-relocations [ausgeschaltet]
Bekannte ARM-ABIs (für Verwendung mit Option -mabi=):
aapcs aapcs-linux apcs-gnu atpcs iwmmxt
Bekannte ARM-Architekturen (für Verwendung mit Option -march=):
armv2 armv2a armv3 armv3m armv4 armv4t armv5 armv5e armv5t armv5te armv6 armv6-m armv6j armv6k armv6s-m armv6t2 armv6z armv6zk armv7 armv7-a armv7-m armv7-r armv7e-m armv7ve armv8-a armv8-a+crc iwmmxt
iwmmxt2 native
Bekannte __fp16-Formate (für Verwendung mit der Option -mfp16-format=):
alternative ieee none
Bekannte ARM-FPUs (für Verwendung mit Option -mfpu=):
crypto-neon-fp-armv8 fp-armv8 fpv4-sp-d16 fpv5-d16 fpv5-sp-d16 neon neon-fp-armv8 neon-fp16 neon-vfpv4 vfp vfp3 vfpv3 vfpv3-d16 vfpv3-d16-fp16 vfpv3-fp16 vfpv3xd vfpv3xd-fp16 vfpv4 vfpv4-d16
Gültige Argumente für -mtp=:
auto cp15 soft
Bekannte Gleitkomma-ABIs (für Verwendung mit Option -mfloat-abi=):
hard soft softfp
Bekannte ARM-CPUs (für Verwendung mit Optionen -mcpu= und -mtune=):
arm1020e arm1020t arm1022e arm1026ej-s arm10e arm10tdmi arm1136j-s arm1136jf-s arm1156t2-s arm1156t2f-s arm1176jz-s arm1176jzf-s arm2 arm250 arm3 arm6 arm60 arm600 arm610 arm620 arm7 arm70 arm700 arm700i
arm710 arm7100 arm710c arm710t arm720 arm720t arm740t arm7500 arm7500fe arm7d arm7di arm7dm arm7dmi arm7m arm7tdmi arm7tdmi-s arm8 arm810 arm9 arm920 arm920t arm922t arm926ej-s arm940t arm946e-s arm966e-s
arm968e-s arm9e arm9tdmi cortex-a12 cortex-a15 cortex-a15.cortex-a7 cortex-a17 cortex-a17.cortex-a7 cortex-a5 cortex-a53 cortex-a57 cortex-a57.cortex-a53 cortex-a7 cortex-a72 cortex-a72.cortex-a53 cortex-a8
cortex-a9 cortex-m0 cortex-m0.small-multiply cortex-m0plus cortex-m0plus.small-multiply cortex-m1 cortex-m1.small-multiply cortex-m3 cortex-m4 cortex-m7 cortex-r4 cortex-r4f cortex-r5 cortex-r7 ep9312
exynos-m1 fa526 fa606te fa626 fa626te fa726te fmp626 generic-armv7-a iwmmxt iwmmxt2 marvell-pj4 mpcore mpcorenovfp native strongarm strongarm110 strongarm1100 strongarm1110 xgene1 xscale
Zu verwendende TLS-Dialekte:
gnu gnu2
Sorry for that stuff being in German, "eingeschaltet" means "enabled" in opposite "ausgeschaltet" means "disabled".
Obtaining this output with
"-march=native -mtune=cortex-a53" or
"-march=armv8-a+crc -mtune=cortex-a53" gives the same results, so setting the CFLAGS statically will be the same as if the autodetect did it for you.
I have obtained my CFLAGS and am setting them statically to
Code: Select all
CFLAGS="-march=armv8-a+crc -mtune=cortex-a53 -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -ftree-vectorize -funsafe-math-optimizations -O2 -pipe"
in order to avoid hassle when GCC gets upgraded and is not capable of
"-march=native" on a Cortex-A53 on AARCH32 again.
My -mfpu differs from the GCC output, but is supposed to work on AARCH32, as it says here:
http://infocenter.arm.com/help/index.js ... 24052.html It says:
The -mfpu option is ignored with AArch64 targets, for example aarch64-arm-none-eabi. Use the -mcpu option to override the default FPU for aarch64-arm-none-eabi targets. For example, to prevent the use of floating-point instructions or floating-point registers for the aarch64-arm-none-eabi target use the -mcpu=name+nofp+nosimd option. Subsequent use of floating-point data types in this mode is unsupported.
Now if it's not affecting the AARCH64 branch, why on earth would we be enabled to set it as
"crypto-neon-fp-armv8" on AARCH32 when it's always referenced as 64bit stuff?
So I assumed it would have an effect on AARCH32 and was right - see the helloworld below.
According to the GCC manpage, you need to enable the
"-funsafe-math-optimizations" to enable GCC creating NEON-optimized code, as it would refuse to do so otherwise. The below is taken from the GCC manpage.
If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=‘neon’), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision.
I've tested my CFLAGS above with a simple helloworld, and asked diff to tell me if there was any difference in the output files when using different flags.
I can confirm that the helloworld is different with
"-mfpu=vfpv3-d16" compared with
"-mfpu=crypto-neon-fp-armv8" but works as expected with both of those.
If anyone knows about the mfpu being right or wrong at neon-fp-armv8 I'd be most grateful, however - I'll head on and re-compile my base system with those flags and let you know how it turns out.
UPDATE
Overnight, I had the RPi emerge GCC (changing the CHOST actually, but that shouldn't matter here) - important is that the GCC compiled with my flags listed above works well and compiles itself and other programs just fine.
There's no stupid questions, only stupid answers.