Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Binhost: Illegal instruction (LUA) (SandyBridge VS Znver1)
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
kgdrenefort
Apprentice
Apprentice


Joined: 19 Sep 2023
Posts: 195
Location: Somewhere in the 77

PostPosted: Wed Apr 17, 2024 3:16 pm    Post subject: Binhost: Illegal instruction (LUA) (SandyBridge VS Znver1) Reply with quote

Hello,

From this topic, I realized and confirmed something was very wrong with the configuration between my binhost machine (a nspawn within my main desktop) and my client (an old laptop).

Both are not using same CPU and so some settings was needed if I wanted to build packages that suits the client CPU spec.

This resulted in improper build of lua, making Awesome window manager failing to boot, despite X was working.

The solution was to rebuid, on the client, lua and make sure it was using the good slot for it. Now, Awesome works if I use startx command. 

But I do need to prevent from happening again, of course. So I have, with your helps, to find out what I did wrong.

Some infos about the protagonist:

Client:
- Manufacturer and model: HP Elitebook 8560w
- CPU: Intel(R) Core(TM) i7-2820QM CPU @ 2.30GHz
- Output of /proc/cpuinfo: https://bpa.st/RN4A
- Family: SandyBridge
- Flags (from GCC documentation):
Code:
Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE and PCLMUL instruction set support.

- Actual state of /etc/portage/make.conf:
(Please note that for how, I have disabled the reaching of binary package, I un-comment it for testing purpose only as it is not fixed yet !)
Code:
# These settings were set by the catalyst build script that automatically
# built this stage.
# Please consult /usr/share/portage/config/make.conf.example for a more
# detailed example.
COMMON_FLAGS="-march=native -O2 -pipe"
CFLAGS="${COMMON_FLAGS}"
CXXFLAGS="${COMMON_FLAGS}"
FCFLAGS="${COMMON_FLAGS}"
FFLAGS="${COMMON_FLAGS}"

MAKEOPTS="-j2 -l2"

USE="X grub acpi bash-completion branding cups curl colord dbus git gui hddtemp lm-sensors man ncurses networkmanager pcmcia pcre posix scanner spell systemd udev udisks unicode upower usb vim-syntax x264 -bluetooth -geoip -geolocation -gnome -gnome-keyring -gtk -gtk-doc -handbook -kde -plasma -qt5 -qt6 -semantic-desktop -telemetry -tk -wayland -webkit -wifi"

VIDEO_CARDS="nouveau"

L10N="en fr"

# NOTE: This stage was built with the bindist Use flag enabled

# This sets the language of build output to English.
# Please keep this setting intact when reporting bugs.
LC_MESSAGES=C.utf8

GENTOO_MIRRORS="https://mirrors.ircam.fr/pub/gentoo-distfiles/ \
    https://gentoo.mirrors.ovh.net/gentoo-distfiles/ \
    https://mirrors.soeasyto.com/distfiles.gentoo.org/"

ACCEPT_LICENSE="-* @FREE @BINARY-REDISTRIBUTABLE"

### Binary package settings (client) ###
#EMERGE_DEFAULT_OPTS="${EMERGE_DEFAULT_OPTS} --getbinpkg"
#FEATURES="getbinpkg"
#EMERGE_DEFAULT_OPTS="${EMERGE_DEFAULT_OPTS} --usepkg-exclude 'sys-kernel/gentoo-sources virtual/* www-servers/lighttpd'"
#PORTAGE_BINHOST="http://192.168.1.103:81/packages"

- Actual state of /etc/portage/package.use/00cpu-flags:
Code:
*/* CPU_FLAGS_X86: aes avx mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3

- Output of cpuid2cpuflags (looks useless since it's just above, but in doubt if I forget something…):
Code:
CPU_FLAGS_X86: aes avx mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3


- Output of resolve-march-native:
Code:
-march=sandybridge -maes --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=8192



Binhost:

- Manufacturer and model: Home-mounted computer and from my selection of each hardware part, running a Gentoo into nspawn.
- CPU: AMD Ryzen 5 2600 Six-Core Processor
- Output of /proc/cpuinfo: https://bpa.st/QS2Q
- Family: znver1
- Flags (from GCC documentation):
Code:
AMD Family 17h core based CPUs with x86-64 instruction set support. (This supersets BMI, BMI2, F16C, FMA, FSGSBASE, AVX, AVX2, ADCX, RDSEED, MWAITX, SHA, CLZERO, AES, PCLMUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM, XSAVEC, XSAVES, CLFLUSHOPT, POPCNT, and 64-bit instruction set extensions.)

- Actual state of /etc/portage/make.conf:
Code:
# These settings were set by the catalyst build script that automatically
# built this stage.
# Please consult /usr/share/portage/config/make.conf.example for a more
# detailed example.
COMMON_FLAGS="-march=x86-64-v2 -O2 -pipe -mavx -mavx256-split-unaligned-store -mpclmul -mxsave -mxsaveopt"
CFLAGS="${COMMON_FLAGS}"
CXXFLAGS="${COMMON_FLAGS}"
FCFLAGS="${COMMON_FLAGS}"
FFLAGS="${COMMON_FLAGS}"

MAKEOPTS="-j8 -l8"

USE="X grub acpi bash-completion branding cups curl colord dbus git gui hddtemp lm-sensors man ncurses networkmanager pcmcia pcre posix scanner spell systemd udev udisks unicode upower usb vim-syntax x264 -bluetooth -geoip -geolocation -gnome -gnome-keyring -gtk -gtk-doc -handbook -kde -plasma -qt5 -qt6 -semantic-desktop -telemetry -tk -wayland -webkit -wifi"

VIDEO_CARDS="nouveau"

L10N="en fr"

# NOTE: This stage was built with the bindist Use flag enabled

# This sets the language of build output to English.
# Please keep this setting intact when reporting bugs.
LC_MESSAGES=C.utf8

GENTOO_MIRRORS="https://mirrors.ircam.fr/pub/gentoo-distfiles/ \
    https://gentoo.mirrors.ovh.net/gentoo-distfiles/ \
    https://mirrors.soeasyto.com/distfiles.gentoo.org/"

ACCEPT_LICENSE="-* @FREE @BINARY-REDISTRIBUTABLE"

### Binary package setting (host) ###
BINPKG_FORMAT="gpkg"
FEATURES="buildpkg"
EMERGE_DEFAULT_OPTS="${EMERGE_DEFAULT_OPTS} --usepkg-exclude 'sys-kernel/gentoo-sources virtual/* www-servers/lighttpd'"

- Actual state of /etc/portage/package.use/00cpu-flags:
Code:
*/* CPU_FLAGS_X86: aes avx mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3

- Output of cpuid2cpuflags:
Code:
CPU_FLAGS_X86: aes avx avx2 f16c fma3 mmx mmxext pclmul popcnt rdrand sha sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3

- Output of resolve-march-native:
Code:
-march=znver1 --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=512


---

So far as I know, as I did for another laptop using the same desktop but in a different nspawn, I tried to setup the binhost to keep only what is known for the family SandyBridge. From beginning of this nspawn life's, it used the parameters sets for SandyBridge, not Znver1.

Since I'm new to the process, I'm being what I think is "lazy" by packaging everything that the binhost install (in case I would forgot something, making me think it's safer, this way). And when I wanted a binary, which the goal is to use 100% of the binhost's binaries on the client, I let the configuration into make.conf force to ask for them, then finally before it emerge them if they all are tagged binary, which was the case until the non-working LUA problem arise.

Since then, some packages were installed and rebuilt with the client's settings, -march=native could not being wrong I guess.

As writed above, the make.conf for the binhost use these settings:
Code:
COMMON_FLAGS="-march=x86-64-v2 -O2 -pipe -mavx -mavx256-split-unaligned-store -mpclmul -mxsave -mxsaveopt


Which was, for me, all OK regarding what is suggesting the GCC page for sandybridge:
Code:
Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE and PCLMUL instruction set support.


As these flags are in the list above: mavx -mavx256-split-unaligned-store -mpclmul -mxsave -mxsaveopt or into the output of this tricks someone give me on #gentoo and reported into this topic:

Code:
satanBinhost ~ # diff -u0 <(flags x86-64-v2) <(flags znver1) | cat
--- /dev/fd/63   2024-04-17 17:02:47.127994844 +0200
+++ /dev/fd/62   2024-04-17 17:02:47.131328152 +0200
@@ -12 +12 @@
-  -mabm                             [disabled]
+  -mabm                             [enabled]
@@ -15,2 +15,2 @@
-  -madx                             [disabled]
-  -maes                             [disabled]
+  -madx                             [enabled]
+  -maes                             [enabled]
@@ -29 +29 @@
-  -march=                           x86-64-v2
+  -march=                           znver1
@@ -31,2 +31,2 @@
-  -mavx                             [disabled]
-  -mavx2                            [disabled]
+  -mavx                             [enabled]
+  -mavx2                            [enabled]
@@ -34 +34 @@
-  -mavx256-split-unaligned-store    [disabled]
+  -mavx256-split-unaligned-store    [enabled]
@@ -58,2 +58,2 @@
-  -mbmi                             [disabled]
-  -mbmi2                            [disabled]
+  -mbmi                             [enabled]
+  -mbmi2                            [enabled]
@@ -65 +65 @@
-  -mclflushopt                      [disabled]
+  -mclflushopt                      [enabled]
@@ -67 +67 @@
-  -mclzero                          [disabled]
+  -mclzero                          [enabled]
@@ -78 +78 @@
-  -mf16c                            [disabled]
+  -mf16c                            [enabled]
@@ -83 +83 @@
-  -mfma                             [disabled]
+  -mfma                             [enabled]
@@ -89 +89 @@
-  -mfsgsbase                        [disabled]
+  -mfsgsbase                        [enabled]
@@ -118 +118 @@
-  -mlzcnt                           [disabled]
+  -mlzcnt                           [enabled]
@@ -124 +124 @@
-  -mmovbe                           [disabled]
+  -mmovbe                           [enabled]
@@ -127 +127 @@
-  -mmove-max=                       128
+  -mmove-max=                       256
@@ -132 +132 @@
-  -mmwaitx                          [disabled]
+  -mmwaitx                          [enabled]
@@ -145 +145 @@
-  -mpclmul                          [disabled]
+  -mpclmul                          [enabled]
@@ -151 +151 @@
-  -mprefer-vector-width=            none
+  -mprefer-vector-width=            128
@@ -155 +155 @@
-  -mprfchw                          [disabled]
+  -mprfchw                          [enabled]
@@ -160,2 +160,2 @@
-  -mrdrnd                           [disabled]
-  -mrdseed                          [disabled]
+  -mrdrnd                           [enabled]
+  -mrdseed                          [enabled]
@@ -175 +175 @@
-  -msha                             [disabled]
+  -msha                             [enabled]
@@ -186 +186 @@
-  -msse4a                           [disabled]
+  -msse4a                           [enabled]
@@ -196 +196 @@
-  -mstore-max=                      128
+  -mstore-max=                      256
@@ -204 +204 @@
-  -mtune=                           generic
+  -mtune=                           znver1
@@ -218,4 +218,4 @@
-  -mxsave                           [disabled]
-  -mxsavec                          [disabled]
-  -mxsaveopt                        [disabled]
-  -mxsaves                          [disabled]
+  -mxsave                           [enabled]
+  -mxsavec                          [enabled]
+  -mxsaveopt                        [enabled]
+  -mxsaves                          [enabled]

Code:
satanBinhost ~ # diff -u0 <(flags x86-64-v2) <(flags sandybridge) | cat
--- /dev/fd/63   2024-04-17 17:02:14.401578937 +0200
+++ /dev/fd/62   2024-04-17 17:02:14.404912244 +0200
@@ -29 +29 @@
-  -march=                           x86-64-v2
+  -march=                           sandybridge
@@ -31 +31 @@
-  -mavx                             [disabled]
+  -mavx                             [enabled]
@@ -33,2 +33,2 @@
-  -mavx256-split-unaligned-load    [disabled]
-  -mavx256-split-unaligned-store    [disabled]
+  -mavx256-split-unaligned-load    [enabled]
+  -mavx256-split-unaligned-store    [enabled]
@@ -145 +145 @@
-  -mpclmul                          [disabled]
+  -mpclmul                          [enabled]
@@ -204 +204 @@
-  -mtune=                           generic
+  -mtune=                           sandybridge
@@ -218 +218 @@
-  -mxsave                           [disabled]
+  -mxsave                           [enabled]
@@ -220 +220 @@
-  -mxsaveopt                        [disabled]
+  -mxsaveopt                        [enabled]


If you could give an extra check by the way, pretty please, I would appreciate it. Because I think the problem is just here. If LUA got an illegal instruction, so far as I understood, the wrong flags on the binhost makes it incompatible with SandyBridge's CPU.

I'll guess the fault is on me, but something is grinding my gears:

How so many packages were able to being built and installed on the client, but LUA fails ? I might miss necessary knowledge, because it is really not obvious to me. While doing all the installation process of the client, with binary packages from the binhost from the very beginning, I was expecting to not even finish the Gentoo's installation from these binaries if I misconfigured it's binary building process by settings incorrect flags.

I was wrong !

If you need more information, please ask. I tried hard to make all above as obvious / clear to read as I can. But that is a good payload of output and it started to confuse me.

Regards,
GASPARD DE RENEFORT Kévin
_________________
«Gentoo does not have problems, only learning opportunities.» - NeddySeagoon
«If your Gentoo installation isn't valuable to you, feel free to continue to ignore the instructions.» - figueroa
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9537
Location: beyond the rim

PostPosted: Wed Apr 17, 2024 4:21 pm    Post subject: Reply with quote

Unfortunately without more information this will turn into a guessing game. While SIGILL is most likely caused by a binary containing an instruction not supported by the current CPU, it can also have other causes (e.g. https://github.com/luau-lang/luau/issues/446 )
So unless you can determine what exactly caused the problem within LUA (which opcode or at least which function) I don't think there is much you can do as from a quick glance your flags seem to be fine. And of course there is always the tiny chance of hardware failure, compiler or kernel bugs, and so on. If you still have the failing binary you could try sth. like https://github.com/baryluk/elf-opcode-stats to check which opcodes are used by it and compare that to the working binary.
Back to top
View user's profile Send private message
kgdrenefort
Apprentice
Apprentice


Joined: 19 Sep 2023
Posts: 195
Location: Somewhere in the 77

PostPosted: Thu Apr 18, 2024 9:40 am    Post subject: Reply with quote

Genone wrote:
Unfortunately without more information this will turn into a guessing game. While SIGILL is most likely caused by a binary containing an instruction not supported by the current CPU, it can also have other causes (e.g. https://github.com/luau-lang/luau/issues/446 )
So unless you can determine what exactly caused the problem within LUA (which opcode or at least which function) I don't think there is much you can do as from a quick glance your flags seem to be fine. And of course there is always the tiny chance of hardware failure, compiler or kernel bugs, and so on. If you still have the failing binary you could try sth. like https://github.com/baryluk/elf-opcode-stats to check which opcodes are used by it and compare that to the working binary.


Hello and thanks for these information.

From talking with a friend of mine, doing sys admin & development, he suggest me to try a package that seems to heavily use LUA: minetest. Maybe it'll bring more information, maybe not. It's free to try anyway…

After some talking I might have a dirty workaround that I think of:

If there is only a problem with dev-lang/lua (which isn't 100% sure at this moment, that is just the first package to have made troubles), I could ask the client to only compile this package and not use a binary. If it's the building the real cause behind all this.

As I explained to my friend, that is weird to have only this single package having these illegal instruction, made me think it's maybe not really my settings at the root of the issue.

I'll dig further into this problem, tho, because that is interesting and, if it's a bug somewhere, it would be neat to find it out and report it.

I'll get back into this topic after using the elf-opcode-stats from github and checking further the bug post in your reply.

Thanks, as usual.

Regards,
GASPARD DE RENEFORT Kévin
_________________
«Gentoo does not have problems, only learning opportunities.» - NeddySeagoon
«If your Gentoo installation isn't valuable to you, feel free to continue to ignore the instructions.» - figueroa
Back to top
View user's profile Send private message
kgdrenefort
Apprentice
Apprentice


Joined: 19 Sep 2023
Posts: 195
Location: Somewhere in the 77

PostPosted: Fri Apr 19, 2024 11:28 am    Post subject: Reply with quote

Hello,

Done a quicktest: Minetest runs great from the binary of the binhost.

I really start to think I simple did something bad but without any needs to fix beside changing LUA slots, or I had bad luck ! I mean, that were only lua that was a problem, everything else is doing great so far.

I'll try to rebuild a new LUA from my binhost and push it to the client and changes the slot to use a binary package I made, then seeing how it goes.

If it goes well, I'll simply assume I did something wrong at some point and probably won't reproduce, until it happens again later.

If it does not I'll try to keep going searching the root of this problem and as a workaround, force lua to be compiled on client instead of retrieving a failed-build.

Regards,
GASPARD DE RENEFORT Kévin
_________________
«Gentoo does not have problems, only learning opportunities.» - NeddySeagoon
«If your Gentoo installation isn't valuable to you, feel free to continue to ignore the instructions.» - figueroa
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum