Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
dhcpcd + wpa_clii: carrier lost [solved]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Mon Oct 15, 2018 10:21 pm    Post subject: dhcpcd + wpa_clii: carrier lost [solved] Reply with quote

My machine connects to a router using openrc (oldnet) w/ dhcpcd and wpa_client. (There is only the wireless interface.)
About once per hour - sometimes less, sometimes considerably more often - I see log entries like these:
Quote:
22:34:21 [dhcpcd] wifi0: carrier lost
22:34:21 [wpa_cli] interface wifi0 DISCONNECTED
22:34:21 [wpa_cli] executing 'false /etc/init.d/net.wifi0 --quiet stop' failed
22:34:21 [dhcpcd] wifi0: deleting route to 192.168.0.0/24
22:34:21 [dhcpcd] wifi0: deleting default route via 192.168.0.1
22:35:38 [kernel] wifi0: authenticate with ...
22:35:38 [kernel] wifi0: send auth to ... (try 1/3)
22:35:38 [kernel] wifi0: authenticated
22:35:38 [kernel] wifi0: associate with ... (try 1/3)
22:35:38 [kernel] wifi0: RX AssocResp from ... (capab=0x1431 status=0 aid=1)
22:35:38 [kernel] wifi0: associated
22:35:38 [dhcpcd] wifi0: carrier acquired
22:35:38 [wpa_cli] interface wifi0 CONNECTED

The time span after deleting the route and re-authentification can be close to zero, some minutes (as above), and sometimes even infinite.
Of course, until the re-authentication happens, the machine is practically disconnected from the network.
I was first conjecturing some connection problem, but if I manually restart /etc/init.d/net.wifi0, the re-authentification happens very quickly (even if before it appeared to hang forever).
So it looks like either something is misconfigured or not working.
Any idea what might be wrong?


Last edited by mv on Wed Oct 17, 2018 1:41 pm; edited 1 time in total
Back to top
View user's profile Send private message
Anon-E-moose
Watchman
Watchman


Joined: 23 May 2008
Posts: 6095
Location: Dallas area

PostPosted: Mon Oct 15, 2018 10:25 pm    Post subject: Reply with quote

power saving option?
_________________
PRIME x570-pro, 3700x, 6.1 zen kernel
gcc 13, profile 17.0 (custom bare multilib), openrc, wayland
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Tue Oct 16, 2018 5:57 am    Post subject: Reply with quote

Anon-E-moose wrote:
power saving option?

I do not understand what you mean. My card is
Quote:
Realtek Semiconductor Co., Ltd. RTL8192CE PCIe Wireless Network Adapter

and I do not see any such option in the corresponding rtlwifi family of drivers in the kernel (nor in any other wlan driver). I have set
Code:
CONFIG_SND_HDA_POWER_SAVE_DEFAULT=20

but I guess that this concerns only the sound card and is unrelated.
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Tue Oct 16, 2018 6:17 am    Post subject: Reply with quote

Actually, I do not understand at all what is going on: what is this strange "carrier" which is lost? Can't I tell dhcpcd to ignore this loss or prolong some timeout instead of immediately deleting the route? And is really the kernel itself trying to connect or is it actually dhcpcd retrying all of the time? (But if the latter, why can it happen that this fails for such a long time but works immediately after triggering net.wlan0 restart [which presumably essentially means to restart dhcpcd]).

And another observation I made: The "carrier lost" happens quite often immediately when I am visiting webpages which require a quick exchange of information from both sides (e.g. in the forum when logging in or attempting to send a posting).
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Tue Oct 16, 2018 9:29 am    Post subject: Reply with quote

mv ... the above log is what I'd expect to see when the signal is dropped, you'll get a better idea of why if you set wpa_supplicant to log:

/etc/conf.d/net:
wpa_supplicant_wifi0="-Dnl80211 -dd -f /var/log/wpa_supplicant.log"

You should then see "reason" and a code in wpa_supplicant.log ... and the code can be translated.

mv wrote:
Actually, I do not understand at all what is going on: what is this strange "carrier" which is lost? Can't I tell dhcpcd to ignore this loss or prolong some timeout instead of immediately deleting the route?

The carrier is the (radio) link between the STA and the AP, if this is lost then it needs to be re-established. Other protocols (such as dhcp) expect this to be contiguous, and will react (ie, by marking the network as disestablished) if this link is absent. If the carrier is reestablished (which is what /etc/wpa_supplicant/wpa_cli attempts to do on receiving 'DISCONNECTED') then dhcp will again request the lease, and so bring the network back up. You want this to happen because this is the only way to recover from carrier loss.

mv wrote:
And is really the kernel itself trying to connect or is it actually dhcpcd retrying all of the time? (But if the latter, why can it happen that this fails for such a long time but works immediately after triggering net.wlan0 restart [which presumably essentially means to restart dhcpcd]).

The kernel doesn't do anything, except control the hardware, it's wpa_supplicant that does all the supplication. wpa_supplicant will attempt to reconnect, if there is some kind of radio interference, or if the driver has some bug, wpa_supplicant may not see the AP, and so marks it as unavailable (for an allotted time period), it then moves on to other network{} stanza's, attempting to find an AP. A driver bug can cause this process to hang, while a restart will reset the driver and/or wpa_supplicant.

The causes may be a shitty wireless driver/firmware (most likely), or radio interference (from adjacent networks), so you should provide 'modinfo $driver', and some details of the network (ie, G,N,mixed), as well as the wpa_supplicant.log (can be big, best to pastebin).

mv wrote:
And another observation I made: The "carrier lost" happens quite often immediately when I am visiting webpages which require a quick exchange of information from both sides (e.g. in the forum when logging in or attempting to send a posting).

It would be hard to attribute the cause in this way, some issues are triggered when in idling, but have no issues when pulling a continuous stream, others you can't see what's going on because hidden in firmware (all you see is the effect). I'd expect you to see some 'missed beacon' or 'beacon loss', or your 'signal level' to be high (which equates to tighter packed frames ... and so more errors) ... whatever the case its hard to differentiate between environmental noise (as the cause) or poor driver/firmware. You should also provide the output of 'iw dev wifi0 station dump'.

best ... khay
Back to top
View user's profile Send private message
UberLord
Retired Dev
Retired Dev


Joined: 18 Sep 2003
Posts: 6835
Location: Blighty

PostPosted: Tue Oct 16, 2018 9:36 am    Post subject: Reply with quote

Carrier lost means that the network interface is reporting that the connection to the other end is down for whatever reason.

You can tell dhcpcd to ignore this by using the "nolink" directive in /etc/dhcpcd.conf.
However, the kernel will still think the network is down and traffic won't actually flow.
As such, whilst you no longer see carrier lost messages (from dhcpcd at least) you have the false sense that the network is fine when really it isn't.

Now, there are very good reasons for removing routes and addresses on carrier down. All modern network management programs do this today.
The *only* exception to this is dhcpcd on NetBSD because we have DaD in the kernel there.
_________________
Use dhcpcd for all your automated network configuration needs
Use dhcpcd-ui (GTK+/Qt) as your System Tray Network tool
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Tue Oct 16, 2018 10:02 am    Post subject: Reply with quote

@all posters: Thanks a lot for the explanations. So it really seems that the kernel is the only one responsible for dropping for the connection loss and (sometimes not) switching on again.

I will try the log option and report back if I find something.
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Tue Oct 16, 2018 10:34 am    Post subject: Reply with quote

mv wrote:
@all posters: Thanks a lot for the explanations. So it really seems that the kernel is the only one responsible for dropping for the connection loss and (sometimes not) switching on again.

mv ... you're welcome, depending on the driver you may be able to turn of certain features (like 802.11n, hwcrypt, etc), the output of 'modinfo' will show these options as 'parm', and what to provide in modprobe.d/*.conf as the switch (bool, or what-have-you). Search the forum for card/parm, as there are many reports here of issues similar to yours.

mv wrote:
I will try the log option and report back if I find something.

I was meaning to provide a link to deuthentication reason codes ... but forgot :)

best ... khay
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Tue Oct 16, 2018 3:29 pm    Post subject: Reply with quote

Of course, now it took a long time until the first failure, but finally I have some result:

The "bad" reason for disconnecting appears to be
Code:
Reason 6 (Class 2 frame received from nonauthenticated STA)

Occasionally, there is also Reason 4 (disassociated due to inactivity), but the latter only happens when the machine is idle for a while (hence, the disconnection is perhaps correct, although I do not see the meaning if it connects immediately again).
However, I completely fail to understand what this reason 6 above means.
Maybe some of my iptables rules are too strict? OTOH, I observed no timely related tcp or udp blocks (I log them all), and I block only icmptype 11 and 3 (but letting codes 4,9,10,13 pass). Maybe icmptaype 11 must not be blocked?

Concerning power saving, I obtained
modinfo rtl8192ce wrote:
parm: swenc:Set to 1 for software crypto (default 0)
parm: ips:Set to 0 to not use link power save (default 1)
parm: swlps:Set to 1 to use SW control power save (default 0)
parm: fwlps:Set to 1 to use FW control power save (default 1)
parm: aspm:Set to 1 to enable ASPM (default 1)
parm: debug_level:Set debug level (0-5) (default 0) (int)
parm: debug_mask:Set debug mask (default 0) (ullong)

I will retry with aspm and fwlps set to 0 (not sure what I should do with swlps).
Back to top
View user's profile Send private message
Anon-E-moose
Watchman
Watchman


Joined: 23 May 2008
Posts: 6095
Location: Dallas area

PostPosted: Tue Oct 16, 2018 3:37 pm    Post subject: Reply with quote

re aspm - https://lkml.org/lkml/2018/2/14/290

Edit to add: looking at google it seems that card is *ahem* a little crappy *ahem*
_________________
PRIME x570-pro, 3700x, 6.1 zen kernel
gcc 13, profile 17.0 (custom bare multilib), openrc, wayland
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Tue Oct 16, 2018 5:29 pm    Post subject: Reply with quote

mv wrote:
Code:
Reason 6 (Class 2 frame received from nonauthenticated STA)

mv ... that is bleed over from an adjacent network, if you have access to the AP, and so can change channel, you might scan (using airodump-ng from net-wireless/aircrack-ng) for what is operating on what frequency, and choose another channel for your AP. The further separated adjacent networks are from your channel/signal the better. That can be difficult because the spectrum in most areas is over subscribed, a neighbour need only buy a signal boosting antenna and it can cause havoc with your signal if both are operating on the same, or an adjacent, channel. If you have 5Ghz hardware then disabling 2.4Ghz may get you a less congested part of the spectrum, but 5Ghz is fairly common now, and you may just be running into more of the same, only on a different part of the spectrum.

mv wrote:
Occasionally, there is also Reason 4 (disassociated due to inactivity), but the latter only happens when the machine is idle for a while (hence, the disconnection is perhaps correct, although I do not see the meaning if it connects immediately again).

This can happen if/when the card goes into powesave, but it's only a buggy driver that will cause the network to drop (it should be able to reduce power consumption when idle, but not so low as to cause the carrier to be dropped). The rtl8192ce seems to have a number of options ITR (fw, sw, link), can't say which combination will be the best, but as currently both swlps and fwlps are enabled you might try disabling one or other.

mv wrote:
However, I completely fail to understand what this reason 6 above means. Maybe some of my iptables rules are too strict? OTOH, I observed no timely related tcp or udp blocks (I log them all), and I block only icmptype 11 and 3 (but letting codes 4,9,10,13 pass). Maybe icmptaype 11 must not be blocked?

It's on the physical layer, so nothing to do with iptables (which is on the transport layer).

best ... khay
Back to top
View user's profile Send private message
josephg
l33t
l33t


Joined: 10 Jan 2016
Posts: 783
Location: usually offline

PostPosted: Tue Oct 16, 2018 11:43 pm    Post subject: Reply with quote

Code:
$ modinfo -p mac80211
max_nullfunc_tries:Maximum nullfunc tx tries before disconnecting (reason 4). (int)
max_probe_tries:Maximum probe tries before disconnecting (reason 4). (int)
beacon_loss_count:Number of beacon intervals before we decide beacon was lost. (int)
probe_wait_ms:Maximum time(ms) to wait for probe response before disconnecting (reason 4). (int)
ieee80211_default_rc_algo:Default rate control algorithm for mac80211 to use (charp)

You could play around with the parameters of mac80211 module, if you suspect frequent unnecessary disconnections.

It might be that your AP is actually disconnecting you, perhaps after a specified timeout. There is nothing you could do from the client side, other than look at your wifi router config.
_________________
"Growth for the sake of growth is the ideology of the cancer cell." Edward Abbey
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Wed Oct 17, 2018 5:23 am    Post subject: Reply with quote

With the setting
Code:
ips=0 swlps=0 fwlps=0 aspm=0

the reason 4 dropped down to a single occurence that night, and reason 6 did not occur.

However, it was during night when perhaps some neighbors switched off their routers. I am afraid that reason 6 is really physical: It seems my card can only do 2.4 GHz, and it can see 10-15 SSIDs. If really each SSID needs its own channel, I suppose that my router simply changes frequency regularly (it was set to "auto"), and that the delay is when finding a new channel (or that sometimes there is no new channel at all). The only thing I do not understand now is why things speed up when they hang when restarting net.wlan0 manually.

I will try now a random fixed frequency on the router, hoping that the other nearby routers will detect and avoid it...
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Wed Oct 17, 2018 11:38 am    Post subject: Reply with quote

mv wrote:
However, it was during night when perhaps some neighbors switched off their routers. I am afraid that reason 6 is really physical: It seems my card can only do 2.4 GHz, and it can see 10-15 SSIDs. If really each SSID needs its own channel, I suppose that my router simply changes frequency regularly (it was set to "auto"), and that the delay is when finding a new channel (or that sometimes there is no new channel at all).

mv ... "10-15 SSIDs" would be extremely low, this is probably all that NetworkManager, or wpa_gui, shows, but I expect in excess of 50 AP's within beacon range. That is not necessarily an issue, it's those AP's close, and on the same, or adjacent, channels, that would cause you problems. Understand that these "auto" settings are manufactures attempting to deal with an over-subscribed spectrum, they are a blunt tool, you need to look at the spectrum yourself and analyse what is where (in relation to the AP, and clients) and what (static) channel is going to provide the best seperation. If "auto" is flipping between channels on such a regular basis then it's almost certainly non-optimal.

If the situation can't be improved then there are other tricks you might be able to use, such as blocking the unwanted signals with the use of strategically placed radio reflectors/blockers (otherwise known as aluminium foil ;) ... or using a directional/parabolic antenna to focus the AP's signal in a certain direction (increasing the signal quality, and lessening the noise). These you can make with foil, cardboard, and glue ... or you could buy 2.5GHz 5dB antenna to replace the 1dB that your (probably ISP supplied) wireless-router came with.

mv wrote:
The only thing I do not understand now is why things speed up when they hang when restarting net.wlan0 manually.

I've never had an AP with an "auto" setting but I expect what happens is this: when "reconnecting" wpa_supplicant doesn't scan to see if the AP is still on the same channel, it's expected that all that was lost was the carrier, not that the AP decided to change channel. However, it does scan channels looking for SSID on execution (unless 'freq=' is provided in the network stanza).

mv wrote:
I will try now a random fixed frequency on the router, hoping that the other nearby routers will detect and avoid it...

Best to look at the spectrum and make the decision about a fixed channel from that.

best ... khay
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Wed Oct 17, 2018 1:41 pm    Post subject: Reply with quote

Thanks for your help.

The SSIDs I mentioned were from wpa_supplicant.log. Their frequencies is logged in wpa_supplicant.log, too, but I do not know the relation between the frequencies and the 13 channels I can choose in my router for 2.4 GHz.

Anyway, it seems my "random" choice of the channel was fine: I had no "carrier lost" message since then, i.e. since 8 hours. So I guess I can mark the topic as "solved".
Back to top
View user's profile Send private message
Anon-E-moose
Watchman
Watchman


Joined: 23 May 2008
Posts: 6095
Location: Dallas area

PostPosted: Wed Oct 17, 2018 2:03 pm    Post subject: Reply with quote

I made my router stick to a single channel a while back instead of auto.

I have a tablet and one of the apps (don't remember which), let me see the difference between the different channels strengths and conflicts with other routers channels.
_________________
PRIME x570-pro, 3700x, 6.1 zen kernel
gcc 13, profile 17.0 (custom bare multilib), openrc, wayland
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Wed Oct 17, 2018 2:26 pm    Post subject: Reply with quote

mv wrote:
Thanks for your help.

mv ... you're welcome.

mv wrote:
The SSIDs I mentioned were from wpa_supplicant.log. Their frequencies is logged in wpa_supplicant.log, too, but I do not know the relation between the frequencies and the 13 channels I can choose in my router for 2.4 GHz.

You'd see more ESSID/BSSID's in airodump-ng as it doesn't care about signal quality, only if it sees a beacon. You can find a frequency/channel correspondance here (plus other useful info re overlap, etc).

mv wrote:
Anyway, it seems my "random" choice of the channel was fine: I had no "carrier lost" message since then, i.e. since 8 hours. So I guess I can mark the topic as "solved".

OK, good. If you have airodump-ng you can use the '--channel,-c' switch and look at what else is on the currently selected channel ... if you wanted to verify this as a good option (you should see your ESSID/BSSID as having the stongest signal, and so top of the list).

best ... khay
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Fri Oct 19, 2018 9:57 pm    Post subject: Reply with quote

The fixed frequency turned out to be a disaster: There was a time, I wasn't able to connect at all.
After switching back to "auto" mode with my smartphone(!), it first worked fine, but eventually there was an apparent "rush hour" where hell broke loose: All "random" sort of reasons (1, 3, 4, 6, ...) within a few minutes, and I was never getting to a stable connection.

I found that my router can show traffic by itself, and when the chaos started there were 60 different traffic signal on 2.4GHz - apparently no chance to get a stable connection.
OTOH, there were only 10 traffic signals on 5Ghz.

I bought now a dual-band network card, and currently everything seems to work fine with it. I guess it will work at least until my neighbors arm, too ;)
Back to top
View user's profile Send private message
Anon-E-moose
Watchman
Watchman


Joined: 23 May 2008
Posts: 6095
Location: Dallas area

PostPosted: Fri Oct 19, 2018 10:00 pm    Post subject: Reply with quote

Yeah the 2.4 can get overpopulated. It should be better, as the freq is higher but the range is shorter, thus cutting down on some interference.
_________________
PRIME x570-pro, 3700x, 6.1 zen kernel
gcc 13, profile 17.0 (custom bare multilib), openrc, wayland
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum