Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Xen guest not working through OpenVswitch on 5.15.32
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
Colt45
Tux's lil' helper
Tux's lil' helper


Joined: 05 Sep 2007
Posts: 122
Location: Central Washington

PostPosted: Fri Apr 29, 2022 10:01 pm    Post subject: Xen guest not working through OpenVswitch on 5.15.32 Reply with quote

I have a Gentoo system running Xen and one of the guests is an OpnSense router. Im using OpenVswitch to configure the networking.
Under 5.15.26, it works fine. 5.15.32 it does not work. There is no connectivity to the Opnsense guest. I have not obviously gone through and checked every version in between. 5.15.26 was what I was running and after reboot into 5.15.32 I noticed it wasnt working. I want back to 5.15.26 and it works.
This is what the OVS looks like
Code:
    Bridge vbr0
        Port vif1.0-emu
            Interface vif1.0-emu
                error: "could not open network device vif1.0-emu (No such device)"
        Port bond0
            Interface enp65s0f1
            Interface enp65s0f0
        Port vif1.0
            Interface vif1.0
        Port vbr0
            Interface vbr0
                type: internal
        Port vif2.0
            Interface vif2.0
    Bridge vbr1
        Port enp65s0f2
            Interface enp65s0f2
        Port vbr1
            Interface vbr1
                type: internal
        Port vif1.1-emu
            Interface vif1.1-emu
                error: "could not open network device vif1.1-emu (No such device)"
        Port vif1.1
            Interface vif1.1

The concerning part obviously is the "No such device" Im not sure what thats caused by. Yet even though it says that, it works! At least under 5.15.26. I suspect this is the reason 5.15.32 is not working, but I dont know how to fix it.
This is what the networking portion of the xen config looks like for the opnsense VM
Code:
vif = [ 'script=vif-openvswitch,bridge=vbr0',
        'script=vif-openvswitch,bridge=vbr1'
]
Back to top
View user's profile Send private message
Colt45
Tux's lil' helper
Tux's lil' helper


Joined: 05 Sep 2007
Posts: 122
Location: Central Washington

PostPosted: Sat Apr 30, 2022 2:50 am    Post subject: Reply with quote

So I found "Vertio Network Driver" was disabled in the kernel. I enabled it and recompilied, booted into 5.15.32 and that fixed the error in the openvswitch listing. However the network is still not coming up on the guest.
Back to top
View user's profile Send private message
Colt45
Tux's lil' helper
Tux's lil' helper


Joined: 05 Sep 2007
Posts: 122
Location: Central Washington

PostPosted: Sun May 01, 2022 11:19 pm    Post subject: Reply with quote

Currently working on building every kernel from 26-32 so I can step through them rapidly when I get a chance.
Back to top
View user's profile Send private message
Colt45
Tux's lil' helper
Tux's lil' helper


Joined: 05 Sep 2007
Posts: 122
Location: Central Washington

PostPosted: Mon May 02, 2022 4:40 am    Post subject: Reply with quote

I downloaded directly from kernel.org 5.15.26-5.15.32
I built and installed each version, then rebooted starting at 5.15.26, when that worked I went to 5.15.27 and so on.
The failure occurs at 5.15.29. Meaning 5.15.28 is good and working, 5.15.29 does not work. Now to figure out which of the hundreds of changes is the culprit.
https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.15.29
These are my favorites from the changelog of 5.15.29
Code:
commit 2708ceb4e5cc84ef179bad25a2d7890573ef78be
Author: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Date:   Tue Feb 22 01:18:17 2022 +0100

    Revert "xen-netback: Check for hotplug-status existence before watching"
   
    [ Upstream commit e8240addd0a3919e0fd7436416afe9aa6429c484 ]
   
    This reverts commit 2afeec08ab5c86ae21952151f726bfe184f6b23d.
   
    The reasoning in the commit was wrong - the code expected to setup the
    watch even if 'hotplug-status' didn't exist. In fact, it relied on the
    watch being fired the first time - to check if maybe 'hotplug-status' is
    already set to 'connected'. Not registering a watch for non-existing
    path (which is the case if hotplug script hasn't been executed yet),
    made the backend not waiting for the hotplug script to execute. This in
    turns, made the netfront think the interface is fully operational, while
    in fact it was not (the vif interface on xen-netback side might not be
    configured yet).
   
    This was a workaround for 'hotplug-status' erroneously being removed.
    But since that is reverted now, the workaround is not necessary either.
   
    More discussion at
    https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u
   
    Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
    Reviewed-by: Paul Durrant <paul@xen.org>
    Reviewed-by: Michael Brown <mbrown@fensystems.co.uk>
    Link: https://lore.kernel.org/r/20220222001817.2264967-2-marmarek@invisiblethingslab.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

commit fe39ab30dcc204e321c2670cc1cf55904af35d01
Author: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Date:   Tue Feb 22 01:18:16 2022 +0100

    Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"
   
    [ Upstream commit 0f4558ae91870692ce7f509c31c9d6ee721d8cdc ]
   
    This reverts commit 1f2565780e9b7218cf92c7630130e82dcc0fe9c2.
   
    The 'hotplug-status' node should not be removed as long as the vif
    device remains configured. Otherwise the xen-netback would wait for
    re-running the network script even if it was already called (in case of
    the frontent re-connecting). But also, it _should_ be removed when the
    vif device is destroyed (for example when unbinding the driver) -
    otherwise hotplug script would not configure the device whenever it
    re-appear.
   
    Moving removal of the 'hotplug-status' node was a workaround for nothing
    calling network script after xen-netback module is reloaded. But when
    vif interface is re-created (on xen-netback unbind/bind for example),
    the script should be called, regardless of who does that - currently
    this case is not handled by the toolstack, and requires manual
    script call. Keeping hotplug-status=connected to skip the call is wrong
    and leads to not configured interface.
   
    More discussion at
    https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u
   
    Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
    Reviewed-by: Paul Durrant <paul@xen.org>
    Link: https://lore.kernel.org/r/20220222001817.2264967-1-marmarek@invisiblethingslab.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>


But Im no kernel expert. Does anyone have any ideas how to go about determining the exact problem?
Back to top
View user's profile Send private message
Colt45
Tux's lil' helper
Tux's lil' helper


Joined: 05 Sep 2007
Posts: 122
Location: Central Washington

PostPosted: Mon May 02, 2022 2:52 pm    Post subject: Reply with quote

Colt45 wrote:
I downloaded directly from kernel.org 5.15.26-5.15.32
I built and installed each version, then rebooted starting at 5.15.26, when that worked I went to 5.15.27 and so on.
The failure occurs at 5.15.29. Meaning 5.15.28 is good and working, 5.15.29 does not work. Now to figure out which of the hundreds of changes is the culprit.
https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.15.29
These are my favorites from the changelog of 5.15.29
Code:
commit 2708ceb4e5cc84ef179bad25a2d7890573ef78be
Author: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Date:   Tue Feb 22 01:18:17 2022 +0100

    Revert "xen-netback: Check for hotplug-status existence before watching"
   
    [ Upstream commit e8240addd0a3919e0fd7436416afe9aa6429c484 ]
   
    This reverts commit 2afeec08ab5c86ae21952151f726bfe184f6b23d.
   
    The reasoning in the commit was wrong - the code expected to setup the
    watch even if 'hotplug-status' didn't exist. In fact, it relied on the
    watch being fired the first time - to check if maybe 'hotplug-status' is
    already set to 'connected'. Not registering a watch for non-existing
    path (which is the case if hotplug script hasn't been executed yet),
    made the backend not waiting for the hotplug script to execute. This in
    turns, made the netfront think the interface is fully operational, while
    in fact it was not (the vif interface on xen-netback side might not be
    configured yet).
   
    This was a workaround for 'hotplug-status' erroneously being removed.
    But since that is reverted now, the workaround is not necessary either.
   
    More discussion at
    https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u
   
    Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
    Reviewed-by: Paul Durrant <paul@xen.org>
    Reviewed-by: Michael Brown <mbrown@fensystems.co.uk>
    Link: https://lore.kernel.org/r/20220222001817.2264967-2-marmarek@invisiblethingslab.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

commit fe39ab30dcc204e321c2670cc1cf55904af35d01
Author: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Date:   Tue Feb 22 01:18:16 2022 +0100

    Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"
   
    [ Upstream commit 0f4558ae91870692ce7f509c31c9d6ee721d8cdc ]
   
    This reverts commit 1f2565780e9b7218cf92c7630130e82dcc0fe9c2.
   
    The 'hotplug-status' node should not be removed as long as the vif
    device remains configured. Otherwise the xen-netback would wait for
    re-running the network script even if it was already called (in case of
    the frontent re-connecting). But also, it _should_ be removed when the
    vif device is destroyed (for example when unbinding the driver) -
    otherwise hotplug script would not configure the device whenever it
    re-appear.
   
    Moving removal of the 'hotplug-status' node was a workaround for nothing
    calling network script after xen-netback module is reloaded. But when
    vif interface is re-created (on xen-netback unbind/bind for example),
    the script should be called, regardless of who does that - currently
    this case is not handled by the toolstack, and requires manual
    script call. Keeping hotplug-status=connected to skip the call is wrong
    and leads to not configured interface.
   
    More discussion at
    https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u
   
    Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
    Reviewed-by: Paul Durrant <paul@xen.org>
    Link: https://lore.kernel.org/r/20220222001817.2264967-1-marmarek@invisiblethingslab.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>


But Im no kernel expert. Does anyone have any ideas how to go about determining the exact problem?

I reverted these two commits and now the network is operating normally in 5.15.32
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21633

PostPosted: Tue May 03, 2022 1:16 am    Post subject: Reply with quote

Colt45 wrote:
But Im no kernel expert. Does anyone have any ideas how to go about determining the exact problem?
Normally, git bisect would be the next step, to let you examine good and bad kernels without needing to visit every commit in the path. However, now that you found the offending commits, this is unnecessary.
Colt45 wrote:
I reverted these two commits and now the network is operating normally in 5.15.32
Good. The next step then is to report this so that those commits can be handled properly upstream. Since you are reverting revert commits, that suggests the reverted commits were important. Upstream may be able to come up with a commit that implements the useful parts of these reverted commits while not introducing the problems that motivated reverting the commits.
Back to top
View user's profile Send private message
Colt45
Tux's lil' helper
Tux's lil' helper


Joined: 05 Sep 2007
Posts: 122
Location: Central Washington

PostPosted: Tue May 03, 2022 1:47 am    Post subject: Reply with quote

Hu wrote:
Colt45 wrote:
But Im no kernel expert. Does anyone have any ideas how to go about determining the exact problem?
Normally, git bisect would be the next step, to let you examine good and bad kernels without needing to visit every commit in the path. However, now that you found the offending commits, this is unnecessary.
Colt45 wrote:
I reverted these two commits and now the network is operating normally in 5.15.32
Good. The next step then is to report this so that those commits can be handled properly upstream. Since you are reverting revert commits, that suggests the reverted commits were important. Upstream may be able to come up with a commit that implements the useful parts of these reverted commits while not introducing the problems that motivated reverting the commits.

I am trying right now to bring this up with Xen but they dont make it easy.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum