View previous topic :: View next topic |
Author |
Message |
chrisk2305 Tux's lil' helper
Joined: 05 Sep 2007 Posts: 110
|
Posted: Mon Dec 12, 2016 12:29 pm Post subject: Bonding stopped working after power outage |
|
|
Hi,
I have a problem with my Dual 10Gbe NIC and bonding since a power outage a few days ago. After cold booting I noticed that my server had no connectivity. Bond was brought up normally though. Checked the switch config and everything seemed fine. Cleared mac adress table on the switch and rebooted - but still no luck.
I haven't seen any errors in the log.
I could ping other devices in the same subnet though but was not able to reach the gateway. Did a traceroute and it took 3000ms! to reach the gateway. Then I disconnected one of the two fibre cables and voila internet, etc. was working again.
Do you guys have any idea what the problem could be?
Thanks in advance,
Christian |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3135
|
Posted: Mon Dec 12, 2016 7:52 pm Post subject: |
|
|
I suppose that bond is connected with at least 2 wires to a managed switch.
Is the same mode configured on both ends of the link? Mismatch at this point will only let it work by accident (so you should have _sort_of_ connectivity), but the packet loss that can occur in such scenario would make any smart protocol repeat at reduced rate, and then reduce rate and repeat again, and reduce rate and repeat....
Just a guess.
Providing some more details on your setup and pointing out devices that were affected by power outage could allow for another guess.
Also, do you often restart pieces of your equipment? Perhaps you hotfixed setup in runtime on some device and forgot to make this change permanent. |
|
Back to top |
|
|
chrisk2305 Tux's lil' helper
Joined: 05 Sep 2007 Posts: 110
|
Posted: Tue Dec 13, 2016 8:18 am Post subject: |
|
|
Hi,
sorry I did not provide enough infomation. Yes the bond consists of two LC Cables with the appropriate SFP+ Modules. Has been working for 6 months without a problem.
Switch is a D-Link DGS-1510-28X and the NIC in the Server is Dual Port 10GBe with Broadcom Chipset (NetXtreme II driver). Bond is configured via netctl.
here the output of the bonding status with one nic (eth3) disconnected:
Code: |
cat /proc/net/bonding/bond4
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 74:d0:2b:98:c2:25
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 1
Actor Key: 13
Partner Key: 1
Partner Mac Address: 00:00:00:00:00:00
Slave Interface: eth2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 74:d0:2b:98:c2:25
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: churned
Actor Churned Count: 0
Partner Churned Count: 1
details actor lacp pdu:
system priority: 65535
system mac address: 74:d0:2b:98:c2:25
port key: 13
port priority: 255
port number: 1
port state: 77
details partner lacp pdu:
system priority: 65535
system mac address: 00:00:00:00:00:00
oper key: 1
port priority: 255
port number: 1
port state: 1
Slave Interface: eth3
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 1
Permanent HW addr: 74:d0:2b:98:c2:27
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 2
Partner Churned Count: 2
details actor lacp pdu:
system priority: 65535
system mac address: 74:d0:2b:98:c2:25
port key: 0
port priority: 255
port number: 2
port state: 69
details partner lacp pdu:
system priority: 65535
system mac address: 00:00:00:00:00:00
oper key: 1
port priority: 255
port number: 1
port state: 1
|
Here is the netctl service file:
Code: |
Description='Bond Interface'
Interface='bond4'
Connection=bond
BindsToInterfaces=('eth2' 'eth3')
IP=static
Address=('192.168.1.2/24')
Gateway=('192.168.1.1')
DNS=('192.168.1.1')
|
Kernel Options in grub.conf
Code: |
title Gentoo Linux 4.6.4
root (hd0,0)
kernel /boot/vmlinuz-4.6.4-gentoo root=/dev/md125 init=/usr/lib/systemd/systemd bonding.mode=4 bonding.miimon=100
|
Thanks! |
|
Back to top |
|
|
bbgermany Veteran
Joined: 21 Feb 2005 Posts: 1844 Location: Oranienburg/Germany
|
Posted: Tue Dec 13, 2016 2:22 pm Post subject: |
|
|
Hi,
have you checked your switch, whether it still has a valid bond/lacp/etherchannel/portchannel configuration on the ports where your server is attached to?
greets, bb _________________ Desktop: Ryzen 5 5600G, 32GB, 2TB, RX7600
Notebook: Dell XPS 13 9370, 16GB, 1TB
Server #1: Ryzen 5 Pro 4650G, 64GB, 16.5TB
Server #2: Ryzen 4800H, 32GB, 22TB |
|
Back to top |
|
|
chrisk2305 Tux's lil' helper
Joined: 05 Sep 2007 Posts: 110
|
Posted: Wed Dec 14, 2016 1:35 pm Post subject: |
|
|
yes I checked the switch and everything is fine there. I just rebooted the server with only one cable attached (which worked) and had no connectivty. Just out of curiosity I attached the second cable (bond worked fine and enslaved eth3) but still no connectivity. Then I removed the second calbe again and voila connectivity was there.
here the dmesg output:
Code: |
97.020592] bond4: link status definitely up for interface eth3, 10000 Mbps full duplex
[ 97.021183] bond4: first active interface up!
[ 97.021788] IPv6: ADDRCONF(NETDEV_CHANGE): macvtap0: link becomes ready
[ 241.396098] bond4: Removing an active aggregator
[ 241.396397] bond4: Releasing backup interface eth2
[ 241.396682] bond4: the permanent HWaddr of eth2 - 74:d0:2b:98:c2:25 - is still in use by bond4 - set the HWaddr of eth2 to a different address to avoid conflicts
[ 241.397328] bond4: first active interface up!
[ 241.676587] bond4: Removing an active aggregator
[ 241.676848] bond4: Releasing backup interface eth3
[ 242.069901] bond4 (unregistering): Released all slaves
[ 242.102068] IPv6: ADDRCONF(NETDEV_UP): bond4: link is not ready
[ 242.102404] 8021q: adding VLAN 0 to HW filter on device bond4
[ 242.651348] bnx2x 0000:03:00.0 eth2: using MSI-X IRQs: sp 55 fp[0] 57 ... fp[7] 64
[ 242.902742] 8021q: adding VLAN 0 to HW filter on device eth2
[ 242.940906] bnx2x 0000:03:00.0 eth2: NIC Link is Up, 10000 Mbps full duplex, Flow control: none
[ 242.942916] bond4: Enslaving eth2 as a backup interface with an up link
[ 243.459481] bnx2x 0000:03:00.1 eth3: using MSI-X IRQs: sp 65 fp[0] 67 ... fp[7] 74
[ 243.716956] 8021q: adding VLAN 0 to HW filter on device eth3
[ 243.755862] bnx2x 0000:03:00.1 eth3: NIC Link is Up, 10000 Mbps full duplex, Flow control: none
[ 243.757492] bond4: Enslaving eth3 as a backup interface with an up link
[ 243.758101] IPv6: ADDRCONF(NETDEV_CHANGE): bond4: link becomes ready
[ 286.267443] bnx2x 0000:03:00.1 eth3: NIC Link is Down
[ 286.268483] bnx2x 0000:03:00.1 eth3: speed changed to 0 for port eth3
[ 286.301446] bond4: link status definitely down for interface eth3, disabling it
|
|
|
Back to top |
|
|
chrisk2305 Tux's lil' helper
Joined: 05 Sep 2007 Posts: 110
|
Posted: Wed Dec 14, 2016 2:46 pm Post subject: |
|
|
Hi again,
I double checked the switch config and saw that the protocol was changed to static instead of lacp.
Thanks for you help guys...I was just blind |
|
Back to top |
|
|
|