Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Kernel & Hardware
  • Search

Xen DomU Networking Stops working under load

Kernel not recognizing your hardware? Problems with power management or PCMCIA? What hardware is compatible with Gentoo? See here. (Only for kernels supported by Gentoo.)
Post Reply
Advanced search
23 posts • Page 1 of 1
Author
Message
koan
Apprentice
Apprentice
User avatar
Posts: 169
Joined: Mon May 01, 2006 9:55 pm
Location: Melbourne
Contact:
Contact koan
Website

Xen DomU Networking Stops working under load

  • Quote

Post by koan » Sun Aug 24, 2008 1:10 pm

Hello,

I have a 2.6.21 xen DomU running pv under a 2.6.21 Dom0.

In general everything works well, no problems. However, if I load the network card, it (almost) stops sending or receiving packets.

Everything is normally pretty low access - I am running a couple of database servers and snmp server and a few other things in the domu, but the traffic frequency is extremely low - one or two users max.

If I start something like bittorrent, it will kill the networking almost immediately. I was running a Samba server on the domu for a while, but whenever I would save a file of significant size, it would kill the network.

If I xm console in, the network card appears fine, and there are a small number of bytes ticking up on transmit and receive. Nothing in messages or dmesg.

The dom0 network is fine, and I am using a single physical nic to bridge to. I have three other vms (all hvm) running on the same Xen box and they don't suffer any network issues.

I am not sure how to proceed with diagnosing this...

Cheers,

Paul
Top
koan
Apprentice
Apprentice
User avatar
Posts: 169
Joined: Mon May 01, 2006 9:55 pm
Location: Melbourne
Contact:
Contact koan
Website

  • Quote

Post by koan » Sun Aug 24, 2008 1:38 pm

The domU can ping itself incidentally, and restarting the domU interface doesn't help.

Sometimes it seems to right itself, sometimes it needs a reboot. tcpdump on the dom0 doesn't show anything.

Also, I use this domU for asterisk, and when the network is working, there are no problems with calls. So the issue seems to be more about taxing the virtual nic than packet frequency.
Top
koan
Apprentice
Apprentice
User avatar
Posts: 169
Joined: Mon May 01, 2006 9:55 pm
Location: Melbourne
Contact:
Contact koan
Website

  • Quote

Post by koan » Sun Aug 31, 2008 11:51 am

Well, ok.

I have changed the kernel in the domU from 2.6.21 to 2.6.25 - same problem. In the dom0 I have changed the network card without change.

The domU doesn't recognise anything is wrong in the messages or dmesg - it just cannot connect to anything. If I shut it down, it hangs but isn't connectible via the xm console.

If I xm destroy, it destroys. If I then try to xm create the domain again, I get

Code: Select all

Error: Device 0 (vif) could not be connected. Hotplug scripts not working.
Nothing appears in the xen-hotplug.log.

xend.log gives:

Code: Select all

...
[2008-08-31 19:14:11 5531] DEBUG (DevController:595) hotplugStatusCallback /local/domain/0/backend/vif/11/0/hotplug-status.
[2008-08-31 19:15:51 5531] DEBUG (XendDomainInfo:1897) XendDomainInfo.destroy: domid=11
...
So it tries for a while and then destroys the VM.

All the other VMs are working fine at this point, but if I shutdown and attempt to restart any, they will fail to get the vif too.

Adding interfaces to the bridge seems to work fine, so I guess the problem must be in the creation of the vif interface. Or at least, the vif breaks, and then xen is no longer able to create a new one.

I am not sure at what stage this takes place - prior to vif-script by the look of it...
Top
bbgermany
Veteran
Veteran
User avatar
Posts: 1844
Joined: Mon Feb 21, 2005 8:19 am
Location: Oranienburg/Germany

  • Quote

Post by bbgermany » Mon Sep 01, 2008 2:39 pm

How do you create the xenbr interface? Im doing it this way:

/etc/conf.d/net

Code: Select all

config_eth0=( "null" )
config_eth1=( "null" )
bridge_xenbr0="eth0 eth1"
config_xenbr0=( "192.168.23.252 netmask 255.255.255.0" )
routes_xenbr0=( "default via 192.168.23.1" )
dns_servers=( "192.168.23.20" )
dns_domain="xxx.xxx"
dns_search="xxx.xxx xxx.yyy"

brctl_xenbr0=(
        "setfd 0"
        "sethello 0"
        "stp off"
)
/etc/xen/xend-config.sxp

Code: Select all

(network-script network-dummy)
This solved the script issues while creating the bridge.

bb
Desktop: Ryzen 7 5800X, 32GB, 2TB, RX7700XT
Noebook: Dell XPS 13 9370, 16GB, 1TB
Server #1: Ryzen 5 Pro 4650G, 64GB, 16.5TB
Server #2: Ryzen 4800H, 32GB, 22TB
Top
koan
Apprentice
Apprentice
User avatar
Posts: 169
Joined: Mon May 01, 2006 9:55 pm
Location: Melbourne
Contact:
Contact koan
Website

  • Quote

Post by koan » Mon Sep 01, 2008 11:48 pm

I am running the network-bridge script for the bridge create - with a slight mod as I have multiple addresses on my physical nic, and these were not getting set up correctly on the bridge.

Your script sets up the bridge normally, but then also does this:

Code: Select all

        "setfd 0"
        "sethello 0"
        "stp off" 
My bridge has forward delay set to zero, and stp off. So the only difference is that you have the hello time set to zero, whereas mine is 2 seconds.

With stp off, I am not sure that hello time does anything - but googling it I have found a number of occasions where setting hello time to something other than zero fixed some Xen networking issues (high numbers of interrupts).

Can you remember why you have it set it to zero?
Top
bbgermany
Veteran
Veteran
User avatar
Posts: 1844
Joined: Mon Feb 21, 2005 8:19 am
Location: Oranienburg/Germany

  • Quote

Post by bbgermany » Tue Sep 02, 2008 6:17 am

iirc, i used the gentoo wiki entry to configure my xen. there was this. i use multiple addresses on the bridge as well. iproute2 did the trick for me.

bb
Desktop: Ryzen 7 5800X, 32GB, 2TB, RX7700XT
Noebook: Dell XPS 13 9370, 16GB, 1TB
Server #1: Ryzen 5 Pro 4650G, 64GB, 16.5TB
Server #2: Ryzen 4800H, 32GB, 22TB
Top
koan
Apprentice
Apprentice
User avatar
Posts: 169
Joined: Mon May 01, 2006 9:55 pm
Location: Melbourne
Contact:
Contact koan
Website

  • Quote

Post by koan » Tue Sep 02, 2008 2:20 pm

The forward delay and stp settings are default in current gentoo Xen installs. The hello interval relates to the frequency that bpdu is issued, and so it is unlikely to have any baring on my issue.

I'll give it a test at some point, because right now I have exhausted the leads available to me - at least, the ones I can think of. Well I do have another, and that is to build Xen with a stock kernel from another distribution, to see if it helps. But that represents a whole new set of difficulties, as I couldn't find a stock kernel that did everything I wanted, which is why I came back to gentoo...
Top
maslo64
n00b
n00b
User avatar
Posts: 14
Joined: Thu Sep 04, 2008 9:17 pm
Location: Slovakia
Contact:
Contact maslo64
Website

  • Quote

Post by maslo64 » Thu Sep 04, 2008 10:17 pm

Hello Koan,
I have exactly same issue with Xen.When I am starting domU I noticed message:

Code: Select all

Bringing up eth0
 *     dhcp
 *       Running dhcpcd ...err, eth0: Failed to lookup hostname via DNS: Name or service not known
                                               [ ok ]
 *       eth0 received address 192.168.1.122/24
After login to system is everything fine, but when i am doing something like "emerge -eauDN world" some packages are transfered to domU and after while it looks that bridge is down, and again after while network intercase is working again .

Below is my bonding setup for eth0 and eth1 in dom0

Code: Select all

   config_eth0=( "null" )
   config_eth1=( "null" )
   RC_NEED_bond0=("net.eth0 net.eth1")
   slaves_bond0="eth0 eth1"
   config_bond0=( "null" )

   RC_NEED.xenbr0="net.bond0"

   bridge_xenbr0="bond0"

bridge_xenbr0="bond0"
config_xenbr0=("dhcp")
brctl_xenbr0=(
        "setfd 0"
        "sethello 0"
        "stp off"
)
I am now trying to test if it`s not caused by bonding or ipv6.

Any help help will be appreciated
Top
koan
Apprentice
Apprentice
User avatar
Posts: 169
Joined: Mon May 01, 2006 9:55 pm
Location: Melbourne
Contact:
Contact koan
Website

  • Quote

Post by koan » Thu Sep 04, 2008 10:35 pm

Hi,

I am not using bonded nics, or IPv6.

Xen 3.3 came into unstable a couple of days ago, so I upgraded, but the problem still remains.

Someone on the Xensource mailing list suggested lowering the NIC rate so that the domU never transfers at a speed that breaks networking, but last time I tried to test, the break happened at 3.6Mbs. That is pretty slow!

So the changes I have made are:

1) Change domU kernel (2.6.21, 2.6.24, 2.6.25)
2) Change domU userland (gentoo, ubuntu)
3) Change dom0 physical nic + driver (Realtek 8169 -> 8168)
4) Change Xen version (3.2.1 -> 3.3)

The only thing I haven't changed is the dom0 kernel. I am using a stock 2.6.21 gentoo kernel, so it would be great if anyone watching this that has pv domUs working under a 2.6.21 kernel would post their .config so I can compare it to mine.

Paul
Top
maslo64
n00b
n00b
User avatar
Posts: 14
Joined: Thu Sep 04, 2008 9:17 pm
Location: Slovakia
Contact:
Contact maslo64
Website

  • Quote

Post by maslo64 » Fri Sep 05, 2008 10:02 am

Hmm, so I switched back to eth0 -> xenbr0 configuration and disabled IPV6 and everything is ok now.
I am going to try different bonding modes. And if the problem persist i thing I have to try NAT :(
Top
bbgermany
Veteran
Veteran
User avatar
Posts: 1844
Joined: Mon Feb 21, 2005 8:19 am
Location: Oranienburg/Germany

  • Quote

Post by bbgermany » Fri Sep 05, 2008 10:09 am

What kind of bond do ya use? Maybe your switch doesnt support the mode and so packages get lost at transfer.

bb
Desktop: Ryzen 7 5800X, 32GB, 2TB, RX7700XT
Noebook: Dell XPS 13 9370, 16GB, 1TB
Server #1: Ryzen 5 Pro 4650G, 64GB, 16.5TB
Server #2: Ryzen 4800H, 32GB, 22TB
Top
maslo64
n00b
n00b
User avatar
Posts: 14
Joined: Thu Sep 04, 2008 9:17 pm
Location: Slovakia
Contact:
Contact maslo64
Website

  • Quote

Post by maslo64 » Fri Sep 05, 2008 10:27 am

I was using mode=1 , but when I was testing pluging-> unplugin cables , connections was not restored.
Then I tried mode=0 which was working fine from dom0 , but issue with domU appear.
Top
bbgermany
Veteran
Veteran
User avatar
Posts: 1844
Joined: Mon Feb 21, 2005 8:19 am
Location: Oranienburg/Germany

  • Quote

Post by bbgermany » Fri Sep 05, 2008 10:57 am

Did you try as mode 5 (balance-tlb) or 6 (balance-alb) as well? Mode 0 is round-robbing and 1 is active-backup. If youre switch supports Link Aggregation Control Protocol (LACP), you should consider mode 4 (802.3ad).

bb
Desktop: Ryzen 7 5800X, 32GB, 2TB, RX7700XT
Noebook: Dell XPS 13 9370, 16GB, 1TB
Server #1: Ryzen 5 Pro 4650G, 64GB, 16.5TB
Server #2: Ryzen 4800H, 32GB, 22TB
Top
maslo64
n00b
n00b
User avatar
Posts: 14
Joined: Thu Sep 04, 2008 9:17 pm
Location: Slovakia
Contact:
Contact maslo64
Website

  • Quote

Post by maslo64 » Fri Sep 05, 2008 12:41 pm

I am going to test this today in the evening , because i don`t have console access to server and try again and again and again 8) .
Top
maslo64
n00b
n00b
User avatar
Posts: 14
Joined: Thu Sep 04, 2008 9:17 pm
Location: Slovakia
Contact:
Contact maslo64
Website

  • Quote

Post by maslo64 » Sat Sep 06, 2008 10:20 am

Still no luck with bonding , I tried 6 modes for bonding , but still no progres. I am going to downgrade kernels for dom0 and domU from 2.6.21 -> 21.6.18-r12 and check if it helps. I also set "sethello 2" as its known bug for xen as I found and you have right about this.
Top
koan
Apprentice
Apprentice
User avatar
Posts: 169
Joined: Mon May 01, 2006 9:55 pm
Location: Melbourne
Contact:
Contact koan
Website

  • Quote

Post by koan » Sat Sep 06, 2008 2:17 pm

Ok,

It looks like the bonding issue isn't related to the original issue report on this thread, but thats ok, we can share ;)

Anyway, in an effort to eliminate the nic as the source of the problem I used another with different drivers, but the problem remained. They were both realtek however, and it was pointed out that this would not necessarily discount a driver issue.

I am testing with a 10/100 tulip card and it is looking promising. No lockups yet and I have shifted a couple of gigs across the link.
Top
maslo64
n00b
n00b
User avatar
Posts: 14
Joined: Thu Sep 04, 2008 9:17 pm
Location: Slovakia
Contact:
Contact maslo64
Website

  • Quote

Post by maslo64 » Sat Sep 06, 2008 3:41 pm

I am sorry if I mess up your thread with my own problem :)
Anyway, I looks that I have reached the solution. My configuration is HP DL 380G5 and network card is :

03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)
05:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)

After i recompiled kernel with driver for NW card as modul, it reported "Call trace" to dmesg log.
I tried to compile and install drivers from Broadcom website ,but can`t (don`t know how, all my attempts was unsuccessfull) howto enable
ZLIB_INFLATE in kernel, thus I can`t load the module.

Everythinkg looks fine when I compiled it as kernel module thru 'make0 menuconfig' and set mode=5

So this are my configs:

Code: Select all

master ~ # cat /etc/conf.d/net
   config_eth0=( "null" )
   config_eth1=( "null" )
   RC_NEED_bond0=("net.eth0 net.eth1")
   slaves_bond0="eth0 eth1"
   config_bond0=( "null" )

   RC_NEED_xenbr0="net.bond0"

   bridge_xenbr0="bond0"

 config_xenbr0=("dhcp")
 brctl_xenbr0=(
        "setfd 0"
        "sethello 2"
        "stp off"
)

Code: Select all

master ~ # cat /etc/modules.autoload.d/kernel-2.6
bnx2
bonding miimon=100 mode=5
loop max_loop=256
master ~ # uname -a
Linux 2.6.18-xen-r12 #9 SMP Sat Sep 6 14:41:34 CEST 2008 x86_64 Intel(R) Xeon(R) CPU E5420 @ 2.50GHz GenuineIntel GNU/Linux
master ~ # cat /xen/reference/gentoo.xen.cfg
kernel = "/boot/vmlinuz-2.6.21-xenU"
memory = 1024
name = "reference"
vif = [ 'mac=00:16:3E:6A:49:54, bridge=xenbr0'  ]
dhcp = "dhcp"
disk = ['file:/xen/reference/gentoo64.img,sda1,w', \
        'file:/xen/reference/gentoo64_lvm.img,sdc1,w', \
        'file:/xen/reference/swap_disk.img,sds1,w', ]
root = "/dev/sda1 ro"
extra = "gentoo=nodevfsi"
master ~ #
app-emulation/xen-3.3.0
app-emulation/xen-tools-3.3.0


For me it looks that somewhere in /usr/src/linux/drivers/net/bonding/* isn`t everyting right when using bnx2.
Top
Hibbelharry
Tux's lil' helper
Tux's lil' helper
User avatar
Posts: 88
Joined: Tue May 27, 2003 3:07 pm
Location: Bremen, Northern Germany

  • Quote

Post by Hibbelharry » Sun Sep 07, 2008 7:49 pm

You might try disabling checksum offloading to hardware by using ethtool. This solved some network dying problems wit xen for me.

Greetz
Hibbelharry
Top
maslo64
n00b
n00b
User avatar
Posts: 14
Joined: Thu Sep 04, 2008 9:17 pm
Location: Slovakia
Contact:
Contact maslo64
Website

  • Quote

Post by maslo64 » Mon Sep 08, 2008 6:33 am

Hello Hibbelharry,
My problem is solved as I can tell now. I can`t add [solved] to topis as this isn`t my thread and I mess up koan's thread :)
btw. koan helped switching to diferent drivers ?
Top
bbgermany
Veteran
Veteran
User avatar
Posts: 1844
Joined: Mon Feb 21, 2005 8:19 am
Location: Oranienburg/Germany

  • Quote

Post by bbgermany » Mon Sep 08, 2008 8:39 am

Hibbelharry wrote:You might try disabling checksum offloading to hardware by using ethtool. This solved some network dying problems wit xen for me.

Greetz
Hibbelharry
Im having checksum offload disabled for tx not rx. Did you disable both?

Code: Select all

zeus ~ # ethtool -k eth1
Offload parameters for eth1:
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
zeus ~ #
bb
Desktop: Ryzen 7 5800X, 32GB, 2TB, RX7700XT
Noebook: Dell XPS 13 9370, 16GB, 1TB
Server #1: Ryzen 5 Pro 4650G, 64GB, 16.5TB
Server #2: Ryzen 4800H, 32GB, 22TB
Top
BlackEye
l33t
l33t
Posts: 756
Joined: Wed Dec 04, 2002 9:55 pm
Location: Germany
Contact:
Contact BlackEye
Website

  • Quote

Post by BlackEye » Fri Oct 17, 2008 6:50 am

Is there a solution for this problem?
I have exact the same problem as the original poster!

Instead of samba I discovert the problem by using nfs. Copying large files (several MBs) over NFS and the virtual network of xen is unrecoverable dying. I need to restart the whole dom0 to be able to restart the domU and using the network again.
By reduceing the rsize and wsize of nfs I observed that this problem my not appear again. However - this could be related to the fact that the transmission is slower with these changes and maybe the bug isn't affected then. The strange thing is, that I could copy large files using netcat between domU and dom0 without any problems.
I'm afraid that this is a security issue. I could lower the rsize and wsize but what happens if one is able to send a large packet though the pipe to crash my connection.

Is there any real solution for this problem? New kernels? New bugfixes? Or any other hints?

I use xen 3.3 with 2.6.21-xen kernel sources (dom0 and domU).
Any help would be really appreciated!

Greetings,
Martin
Top
koan
Apprentice
Apprentice
User avatar
Posts: 169
Joined: Mon May 01, 2006 9:55 pm
Location: Melbourne
Contact:
Contact koan
Website

  • Quote

Post by koan » Fri Oct 17, 2008 7:54 am

I am currently running with a non-Realtek based 10/100 card, and haven't experienced any issues with the network failing even at max.

However, I do want this to be a gigabit connection, so I have a dlink gig card waiting to test, and I'll report back if I get good results (or not).

My other plan is to use the SUSE Xen patchset against the Gentoo 2.6.25 kernel to see if that helps. I have the kernel built, but it remains to be seen if it even boots - other people have working Gentoo installs with this mix of kernel, but I don't know whether this will address the networking problem.

What network card are you using?
Top
BlackEye
l33t
l33t
Posts: 756
Joined: Wed Dec 04, 2002 9:55 pm
Location: Germany
Contact:
Contact BlackEye
Website

  • Quote

Post by BlackEye » Fri Oct 17, 2008 9:59 am

koan wrote:What network card are you using?
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)

The Realtek Ethernet Controller seems to have some issues with xen. I read something about this on the net. Unfortunately I can't change the NIC because this is a root-server which I haven't any direct access to.

About the kernel source I use, maybe this link is interesting for you too -> http://forums.gentoo.org/viewtopic-t-709908.html
There you can get a new vanilla with xen patches. This is the kernel I currently use on my dom0.
If I use NFS with this kernel and without setting rsize and wsize I got horrible transferrates. If I manually set rsize and wsize to 8192 I got a vast better result (you can see my post in the other thread about this).

However - I dont know if this is the real solution for this problem or not.

About the NIC: Although I found some issues about the realtek in conjunction with xen - I really don't know why this should have anything to do with it because the whole transfer between dom0 and the domUs are getting over the virtual devices..
Top
Post Reply

23 posts • Page 1 of 1

Return to “Kernel & Hardware”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy