Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
very odd behaviour with bridge / Intel e1000e driver
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
Atom2
Apprentice
Apprentice


Joined: 01 Aug 2011
Posts: 185

PostPosted: Tue Feb 04, 2014 5:32 pm    Post subject: very odd behaviour with bridge / Intel e1000e driver Reply with quote

Hello Forum.
I have installed a new gentoo system that will be used for XEN with a dom0 kernel based on hardened 3.11.7.

The mainboard has two 1 GB network interfaces on-board, one being an Intel 82579LM and the other being an Intel 82574L. Both of these on-board NICs work with the Intel e1000e kernel driver.

In order to be able to communictate with domUs from the dom0 under XEN it is recommended to use a bridge interface (commonly named xenbr0) and link the virtual domU interfaces to the bridge. The bridge in dom0 obviously has to be backed by a real ethernet device in order to be able to communicate with the outside world from the machine.

Creating the bridge works flawlessly and once the domU is started communication works as expected. But leave the system idle for some time and communication suddenly stops working and only comes up again if I ping to the domU from the dom0 a couple of times (the first few pings fail) or if I login to the domU using ssh from the dom0. I seem to have tracked this issue down to a very odd behaviour/bug somewhere in the kernel/the Intel e1000e driver/the bridging that does not make any sense to me - unless I completely and fundamentally misunderstand something:

When the bridge starts up it takes over the MAC address of the ensalved device and ifconfig reads as follows:
Code:
enp7s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        ether 00:18:7d:1d:71:de  txqueuelen 1000  (Ethernet)
        RX packets 4600  bytes 640581 (625.5 KiB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 251  bytes 31744 (31.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 16  memory 0xf7740000-f7760000

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 0  (Local Loopback)
        RX packets 8  bytes 772 (772.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8  bytes 772 (772.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vif1.0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        ether fe:ff:ff:ff:ff:ff  txqueuelen 32  (Ethernet)
        RX packets 149  bytes 34204 (33.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 4423  bytes 603293 (589.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

xenbr0: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST>  mtu 9000
        inet 192.168.19.2  netmask 255.255.255.0  broadcast 192.168.19.255
        ether 00:18:7d:1d:71:de  txqueuelen 0  (Ethernet)
        RX packets 4748  bytes 591939 (578.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 276  bytes 29469 (28.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
Please note the vif1.0 interface is the virtual dom0 interface for use in the domU. The second Intel adapter does not show up in the listing above because I have removed net.enp0s25 from the default runlevel. The relevant parts of the above output from ifconfig stays the same even when the error jumps in.

Using tcpdump I have think I have managed to track this down to inccorect answers from the dom0 xenbr0 interface to ARP requests from the domU. Initially everything is o.k., but later on the MAC address replied back changes, resulting in a breakdown of communication - until a dom0 initiated ping refreshes the arp-cache in the domU.
Code:
16:51:59.673731 ARP, Request who-has 192.168.19.2 tell 192.168.19.3, length 28
16:51:59.673775 ARP, Reply 192.168.19.2 is-at 00:18:7d:1d:71:de (oui Unknown), length 28
[...]
16:57:04.753562 ARP, Request who-has 192.168.19.2 tell 192.168.19.3, length 28
16:57:04.764520 ARP, Reply 192.168.19.2 is-at 00:18:7d:1d:72:74 (oui Unknown), length 46
16:57:06.473074 IP 192.168.19.3.764 > 192.168.19.2.nfs: Flags [F.], seq 585, ack 489, win 210, options [nop,nop,TS val 1441 ecr 4294952503], length 0
16:57:11.481566 ARP, Request who-has 192.168.19.2 tell 192.168.19.3, length 28
16:57:11.492898 ARP, Reply 192.168.19.2 is-at 00:18:7d:1d:72:74 (oui Unknown), length 46
16:57:13.201065 IP 192.168.19.3.764 > 192.168.19.2.nfs: Flags [F.], seq 585, ack 489, win 210, options [nop,nop,TS val 2114 ecr 4294952503], length 0
16:57:18.209603 ARP, Request who-has 192.168.19.2 tell 192.168.19.3, length 28
16:57:18.220945 ARP, Reply 192.168.19.2 is-at 00:18:7d:1d:72:74 (oui Unknown), length 46

Now comes the funny part: The new MAC address sent in response to domUs query is the MAC address of the other, currently inactive (and not linked to the bridge) Intel network card in my machine.

This behaviour is the same regardless of whether the driver is statically linked into the kernel or compiled as a module. To me it looks like the Intel e1000e driver from time to time wrongly picks the MAC address from the lowest numbered card (in terms of the PCI bus as shown by lspci - see further below) instead of the active card it should actually take.

I am currently at loss and would very much appreciate if somebody could support me in resolving this issue - even if it is just by telling me that I did something wrong or that my thought process is flawed. At the moment I am running out of ideas having already spent countless hours until I arrived at the ARP issue only which actually started off as an unreliable network communication between dom0 and domU with an NFS mounted filesystem only; currently it looks as if it were something fundamentally bigger.

Further information: My /etc/conf.d/net file which includes the bridge configuration:
Code:
config_enp0s25="null"
config_enp7s0="null"
mtu_enp7s0="9000"
bridge_xenbr0="enp7s0"
mtu_xenbr0="9000"
brctl_xenbr0="stp off setfd 0 sethello 0"
rc_net_xenbr0_need="net.enp7s0"
config_xenbr0="192.168.19.2 netmask 255.255.255.0 broadcast 192.168.19.255"
routes_xenbr0="default gw 192.168.19.1"
dns_servers_xenbr0="192.168.19.1"
dns_domain_lo="xxxxx.com"


Information about the NICs:
Code:
# lspci | fgrep Gigabit
00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 05)
07:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection



Thanks in advance,

Atom2
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum