Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Bonding stopped working after power outage
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
chrisk2305
Tux's lil' helper
Tux's lil' helper


Joined: 05 Sep 2007
Posts: 110

PostPosted: Mon Dec 12, 2016 12:29 pm    Post subject: Bonding stopped working after power outage Reply with quote

Hi,

I have a problem with my Dual 10Gbe NIC and bonding since a power outage a few days ago. After cold booting I noticed that my server had no connectivity. Bond was brought up normally though. Checked the switch config and everything seemed fine. Cleared mac adress table on the switch and rebooted - but still no luck.

I haven't seen any errors in the log.

I could ping other devices in the same subnet though but was not able to reach the gateway. Did a traceroute and it took 3000ms! to reach the gateway. Then I disconnected one of the two fibre cables and voila internet, etc. was working again.

Do you guys have any idea what the problem could be?

Thanks in advance,
Christian
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3135

PostPosted: Mon Dec 12, 2016 7:52 pm    Post subject: Reply with quote

I suppose that bond is connected with at least 2 wires to a managed switch.
Is the same mode configured on both ends of the link? Mismatch at this point will only let it work by accident (so you should have _sort_of_ connectivity), but the packet loss that can occur in such scenario would make any smart protocol repeat at reduced rate, and then reduce rate and repeat again, and reduce rate and repeat....

Just a guess.
Providing some more details on your setup and pointing out devices that were affected by power outage could allow for another guess.
Also, do you often restart pieces of your equipment? Perhaps you hotfixed setup in runtime on some device and forgot to make this change permanent.
Back to top
View user's profile Send private message
chrisk2305
Tux's lil' helper
Tux's lil' helper


Joined: 05 Sep 2007
Posts: 110

PostPosted: Tue Dec 13, 2016 8:18 am    Post subject: Reply with quote

Hi,

sorry I did not provide enough infomation. Yes the bond consists of two LC Cables with the appropriate SFP+ Modules. Has been working for 6 months without a problem.

Switch is a D-Link DGS-1510-28X and the NIC in the Server is Dual Port 10GBe with Broadcom Chipset (NetXtreme II driver). Bond is configured via netctl.

here the output of the bonding status with one nic (eth3) disconnected:

Code:

 cat /proc/net/bonding/bond4
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 74:d0:2b:98:c2:25
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 1
        Actor Key: 13
        Partner Key: 1
        Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eth2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 74:d0:2b:98:c2:25
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: churned
Actor Churned Count: 0
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 74:d0:2b:98:c2:25
    port key: 13
    port priority: 255
    port number: 1
    port state: 77
details partner lacp pdu:
    system priority: 65535
    system mac address: 00:00:00:00:00:00
    oper key: 1
    port priority: 255
    port number: 1
    port state: 1

Slave Interface: eth3
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 1
Permanent HW addr: 74:d0:2b:98:c2:27
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 2
Partner Churned Count: 2
details actor lacp pdu:
    system priority: 65535
    system mac address: 74:d0:2b:98:c2:25
    port key: 0
    port priority: 255
    port number: 2
    port state: 69
details partner lacp pdu:
    system priority: 65535
    system mac address: 00:00:00:00:00:00
    oper key: 1
    port priority: 255
    port number: 1
    port state: 1


Here is the netctl service file:

Code:

Description='Bond Interface'
Interface='bond4'
Connection=bond
BindsToInterfaces=('eth2' 'eth3')



IP=static
Address=('192.168.1.2/24')
Gateway=('192.168.1.1')
DNS=('192.168.1.1')


Kernel Options in grub.conf

Code:

title Gentoo Linux 4.6.4
root (hd0,0)
kernel /boot/vmlinuz-4.6.4-gentoo root=/dev/md125 init=/usr/lib/systemd/systemd bonding.mode=4 bonding.miimon=100


Thanks!
Back to top
View user's profile Send private message
bbgermany
Veteran
Veteran


Joined: 21 Feb 2005
Posts: 1844
Location: Oranienburg/Germany

PostPosted: Tue Dec 13, 2016 2:22 pm    Post subject: Reply with quote

Hi,

have you checked your switch, whether it still has a valid bond/lacp/etherchannel/portchannel configuration on the ports where your server is attached to?

greets, bb
_________________
Desktop: Ryzen 5 5600G, 32GB, 2TB, RX7600
Notebook: Dell XPS 13 9370, 16GB, 1TB
Server #1: Ryzen 5 Pro 4650G, 64GB, 16.5TB
Server #2: Ryzen 4800H, 32GB, 22TB
Back to top
View user's profile Send private message
chrisk2305
Tux's lil' helper
Tux's lil' helper


Joined: 05 Sep 2007
Posts: 110

PostPosted: Wed Dec 14, 2016 1:35 pm    Post subject: Reply with quote

yes I checked the switch and everything is fine there. I just rebooted the server with only one cable attached (which worked) and had no connectivty. Just out of curiosity I attached the second cable (bond worked fine and enslaved eth3) but still no connectivity. Then I removed the second calbe again and voila connectivity was there.

here the dmesg output:

Code:

97.020592] bond4: link status definitely up for interface eth3, 10000 Mbps full duplex
[   97.021183] bond4: first active interface up!
[   97.021788] IPv6: ADDRCONF(NETDEV_CHANGE): macvtap0: link becomes ready
[  241.396098] bond4: Removing an active aggregator
[  241.396397] bond4: Releasing backup interface eth2
[  241.396682] bond4: the permanent HWaddr of eth2 - 74:d0:2b:98:c2:25 - is still in use by bond4 - set the HWaddr of eth2 to a different address to avoid conflicts
[  241.397328] bond4: first active interface up!
[  241.676587] bond4: Removing an active aggregator
[  241.676848] bond4: Releasing backup interface eth3
[  242.069901] bond4 (unregistering): Released all slaves
[  242.102068] IPv6: ADDRCONF(NETDEV_UP): bond4: link is not ready
[  242.102404] 8021q: adding VLAN 0 to HW filter on device bond4
[  242.651348] bnx2x 0000:03:00.0 eth2: using MSI-X  IRQs: sp 55  fp[0] 57 ... fp[7] 64
[  242.902742] 8021q: adding VLAN 0 to HW filter on device eth2
[  242.940906] bnx2x 0000:03:00.0 eth2: NIC Link is Up, 10000 Mbps full duplex, Flow control: none
[  242.942916] bond4: Enslaving eth2 as a backup interface with an up link
[  243.459481] bnx2x 0000:03:00.1 eth3: using MSI-X  IRQs: sp 65  fp[0] 67 ... fp[7] 74
[  243.716956] 8021q: adding VLAN 0 to HW filter on device eth3
[  243.755862] bnx2x 0000:03:00.1 eth3: NIC Link is Up, 10000 Mbps full duplex, Flow control: none
[  243.757492] bond4: Enslaving eth3 as a backup interface with an up link
[  243.758101] IPv6: ADDRCONF(NETDEV_CHANGE): bond4: link becomes ready
[  286.267443] bnx2x 0000:03:00.1 eth3: NIC Link is Down
[  286.268483] bnx2x 0000:03:00.1 eth3: speed changed to 0 for port eth3
[  286.301446] bond4: link status definitely down for interface eth3, disabling it
Back to top
View user's profile Send private message
chrisk2305
Tux's lil' helper
Tux's lil' helper


Joined: 05 Sep 2007
Posts: 110

PostPosted: Wed Dec 14, 2016 2:46 pm    Post subject: Reply with quote

Hi again,

I double checked the switch config and saw that the protocol was changed to static instead of lacp.

Thanks for you help guys...I was just blind ;)
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum