[Solved] >=wpa_supplicant-2.10 panics my kernels

sublogic · Last edited by sublogic on Thu Apr 21, 2022 11:16 pm; edited 2 times in total

EDIT: deagol gave a patch in post 8701895 that will probably make it into the kernel tree. Marking as solved.

I've been affected by the recent wpa-supplicant issue as well (thanks jburns for the tip about tkip). But on my old, old laptop it's worse. How old ?

sublogic · Posted: Fri Mar 25, 2022 3:42 am Post subject:

Progress. I got a crash dump. Here's the dmesg with the panic, starting at the point where I started net.wlp8s9 . Now to teach myself kernel debugging

.

Gentlenoob · n00b Joined: 10 Apr 2008 Posts: 66

Guess I won't be of much help, just got triggered by having an old laptop (Samsung Q35) with similar (same?) processor, fairly up to date, in particular wpa_supplicant-2.10-r1 with tkip set now, although not in use. Works just fine, WiFi hardware seems different, though (Intel 3945ABG).

This I've never seen

Hu · Moderator Joined: 06 Mar 2007 Posts: 21631

That kernel stack trace might be more useful with verbose debug information, so that we could see file and line number details.

Or you could ignore it and just disable TKIP. No one should be using TKIP if avoidable. I see in your earlier post that disabling TKIP caused other problems, but fixing those might be easier than debugging this kernel crash.

drvolk68 · n00b Joined: 26 Mar 2022 Posts: 1

I want just let you know that i am having the same issue (no kernel crash, but no AP available when scan), on my desktop and my notebook. Both have ryzen CPU , maybe this has something to do with it? I also use ~amd64 as keyword in make.conf and a hardened gentoo kernel. I had to install libressl overlay to get back to 2.9 Version which works without any problem.

UPDATE:
Just noticed that there is another thread for exactly the issue i have. There the solution was to set tkip use flag (seams to be that old routers or so need that .. did not undertand realy

Hu · Moderator Joined: 06 Mar 2007 Posts: 21631

sublogic · Posted: Sun Mar 27, 2022 12:00 am Post subject:

sublogic · Posted: Sun Mar 27, 2022 4:39 am Post subject:

Sigh. I'm doing everything right. Kernel has debugging info, got a post-panic dump (vmcore) with kexec, pulled the dmesg with vmcore-dmesg utility but the panic traceback at the end has no line numbers. To do more I need the crash utility. So I emerged it, but:

Hu · Moderator Joined: 06 Mar 2007 Posts: 21631

divide error may mean that the kernel attempted a division by zero. A cursory inspection of the affected code shows there are several sites which use % where the denominator comes from a variable. One such, which could even match your output, is rtl8180_tx, which will take:

sublogic · Posted: Sun Mar 27, 2022 10:01 pm Post subject:

Hu · Moderator Joined: 06 Mar 2007 Posts: 21631

Rutcha · Posted: Tue Mar 29, 2022 1:12 am Post subject:

same here for ' Intel(R) Centrino(R) Advanced-N 6230 AGN, REV=0xB0 '

sublogic · Posted: Wed Mar 30, 2022 1:35 am Post subject:

sublogic · Posted: Wed Mar 30, 2022 3:19 am Post subject:

Okay, here's what I found so far using the crash and objdump utilities. The start of the panic in dmesg:

Hu · Moderator Joined: 06 Mar 2007 Posts: 21631

Rutcha · Posted: Thu Mar 31, 2022 7:06 pm Post subject:

I'm sorry! My problem was not wpa_suplicant related.

I was brief and short at first, but because I didn't want to disturb much this thread. I actually panicked myself after updating wpa_supplicant and having no wifi/internet and not being able to google myself out of it.
I'm sorry!
I had not had any kernel panick, I was only unable to connect to wifi. Wpa_supplicant echoed only attempts to connect and failures ( wpa_supplicant -i wlp1s0 -c /etc/wpa_supplicant/wpa_supplicant.conf )
I tried a usb dongle after I posted here and still could'nt get a connection, so it was unlikely to be a hardware issue.
It turns out everything went back to normal after I got rid of a custom wireless-regdb database I had previously tampered to allow use of a wider frequency spectrum. I suppose I had to update that table too after updating wpa_supplicant, but anyway - for simplicity I changed back USE flags for wpa_supplicant and my kernel .config. I don't know for sure what made it worked, but I decided being more conservative in all these and something did make it go back to normal. First my wifi but not 5ghz, then everything.

Sorry again and thanks for being prompt.
Have a nice day!

sublogic · Posted: Fri Apr 01, 2022 8:26 pm Post subject:

Quick update to post 8696272: I found the "dev" argument to rtl8180_tx(). The kernel is compiled with -mregparm=3 so the arguments are passed in registers, not in the stack where I was looking for them. Anyway, the dev struct looks plausible. I won't post it as it is pretty big. The problem is this:

sublogic · Posted: Sat Apr 16, 2022 5:15 am Post subject:

Small progress. I obtained a new crash dump from a 5.15.32-gentoo-r1 kernel. In the crashing function,

deagol · n00b Joined: 12 Jul 2014 Posts: 61

Quite informative how to debug kernel crashes!

But for me it looks like you this question is already (mostly)answered:

sublogic · Posted: Sun Apr 17, 2022 3:11 am Post subject:

deagol · n00b Joined: 12 Jul 2014 Posts: 61

We can also disable control port in mac80211, when you prefer kernel patches. Not tested it, but removing those lines in net/mac80211/main.c should to the trick:

deagol · n00b Joined: 12 Jul 2014 Posts: 61

I had a closer look at the code and I think I found the issue. I'm not sure that this is the proper fix: But this aligns control port frames to how "normal" packets are handled.
For me it looks like that is also wrong: WME and the mac80211 pull API are different things and it currently looks like that only mac80211 drivers implementing the pull API are able to correctly use WME (QoS).

Nevertheless this patch should fix the issue if my understanding of what happens is right:

sublogic · Posted: Mon Apr 18, 2022 6:52 pm Post subject:

Okay. I 'll put the kernel under revision control and try your patch.

Incidentally, "iperf3 -S 0xE0 -c <IP address>" doesn't crash. Sorry.

sublogic · Posted: Tue Apr 19, 2022 2:52 am Post subject:

@deagol: what do you know, the patch worked !

We must be using different kernel versions. I have 5.15.32-r1 . I had to apply the patch 34 lines earlier than in your diff, but with a two-liner that was easy.

I'd like to follow the thread on the wireless mailing list. Please post a link.

THANK YOU ! Good work.

deagol · n00b Joined: 12 Jul 2014 Posts: 61

Things are not as complex as I initially assumed. Mac80211 - and anybody else who wants - is allowed to set the skb priority as it desires and the driver just should not select a not available queue based on that.

Can you undo all previous patches and test if this also fixes the issue?