Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
wan performance sucks: raw vs ssh tunnel vs wireguard
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
jesnow
l33t
l33t


Joined: 26 Apr 2006
Posts: 856

PostPosted: Sun Feb 19, 2023 5:41 pm    Post subject: wan performance sucks: raw vs ssh tunnel vs wireguard Reply with quote

Hi everybody:

I previously posted about setting up wireguard, that's now wrapped up.

https://forums.gentoo.org/viewtopic-t-1161123-highlight-.html

My experience in practical use is still that the network performance over WAN is much much worse than in the LAN. It doesn't make sense. The standard answer is "what do you expect, wi-fi is slow". But there is no wi-fi involved here. I have wired ethernet all the way. My WAN ping is 30ms (that's supposed to be good), there's no reason for network performance overall to suck. I'm being methodical about it, and already discovered a bum ethernet card, which I replaced.

I next want to show my testing of pure network performance using raw, ssh tunnel and wireguard tunnel methods. For those of you who don't want to read it all: Raw WAN connections are slower than connections on the LAN by only about 13%. Tunneling through ssh and wireguard is slower by about another 30-40%, and wireguard beats ssh head to head by about 35% in both transmit and receive. It's a very consistent and not so very surprising result, though I didn't expect wg to beat ssh by so much.

My setup is: local (pogacar) and remote (merckx) machines are both old-ish 3GHz core i7 machines with 16GB memory, connected by 1GBE to their respective providers. Ping is a consistent 30ms in both directions, and both get <1ms ping and 930 MB/s in their respective LANs. I have iperf3 server running on merckx. I run the iperf3 client on pogacar in both forward and reverse mode. So I can transfer data 6 ways: 1,2) by an open raw port in my router, 3,4) through an ssh tunnel (running on localhost:45201) on another open port, 5,6) through a wireguard tunnel in yet a third open public port.

This was a very simple test based on iperf3 default settings. I simply ran iperf3 in transmit and receive mode in both directions. I spent a few hours tuning various network parameters and found out, guess what: Linux has pretty good settings, don't mess with them. I could make network throughput dramatically *worse* without much effort, but never made a dent on improving either ssh or wg performance. So my recommendation is, don't mess with the default network settings. Smarter people than me created them.

Here are the command I ran

Code:

jesnow@pogacar ~ $ cat netperf
#! /bin/bash
iperf3 -c merckx.vesarius.net -p 55202
iperf3 -c merckx.vesarius.net -p 55202 -R
iperf3 -c localhost -p 45201
iperf3 -c localhost -p 45201 -R
iperf3 -c merckxw
iperf3 -c merckxw -R


Yes, I left iperf3 running on that port on that server for now, and you can try it too. Please don't DDOS me.
Here are the data.

Code:

jesnow@pogacar ~ $ ./netperf
Connecting to host merckx.vesarius.net, port 55202
[  5] local 130.39.190.4 port 54816 connected to 104.176.81.55 port 55202
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  85.2 MBytes   715 Mbits/sec    0   3.48 MBytes       
[  5]   1.00-2.00   sec   105 MBytes   881 Mbits/sec    0   3.48 MBytes       
[  5]   2.00-3.00   sec   104 MBytes   870 Mbits/sec    0   3.48 MBytes       
[  5]   3.00-4.00   sec   102 MBytes   860 Mbits/sec    0   3.48 MBytes       
[  5]   4.00-5.00   sec   105 MBytes   881 Mbits/sec    0   3.48 MBytes       
[  5]   5.00-6.00   sec   104 MBytes   871 Mbits/sec    0   3.48 MBytes       
[  5]   6.00-7.00   sec   104 MBytes   870 Mbits/sec    0   3.48 MBytes       
[  5]   7.00-8.00   sec   104 MBytes   870 Mbits/sec    0   3.48 MBytes       
[  5]   8.00-9.00   sec   105 MBytes   881 Mbits/sec    0   3.48 MBytes       
[  5]   9.00-10.00  sec   104 MBytes   870 Mbits/sec    0   3.48 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1022 MBytes   857 Mbits/sec    0             sender
[  5]   0.00-10.03  sec  1022 MBytes   854 Mbits/sec                  receiver

iperf Done.
Connecting to host merckx.vesarius.net, port 55202
Reverse mode, remote host merckx.vesarius.net is sending
[  5] local 130.39.190.4 port 52580 connected to 104.176.81.55 port 55202
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  27.3 MBytes   229 Mbits/sec                 
[  5]   1.00-2.00   sec  89.2 MBytes   748 Mbits/sec                 
[  5]   2.00-3.00   sec  93.0 MBytes   781 Mbits/sec                 
[  5]   3.00-4.00   sec  93.1 MBytes   781 Mbits/sec                 
[  5]   4.00-5.00   sec  93.5 MBytes   784 Mbits/sec                 
[  5]   5.00-6.00   sec  93.8 MBytes   786 Mbits/sec                 
[  5]   6.00-7.00   sec  93.2 MBytes   782 Mbits/sec                 
[  5]   7.00-8.00   sec  93.4 MBytes   783 Mbits/sec                 
[  5]   8.00-9.00   sec  93.4 MBytes   784 Mbits/sec                 
[  5]   9.00-10.00  sec  93.5 MBytes   784 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.03  sec   866 MBytes   724 Mbits/sec    0             sender
[  5]   0.00-10.00  sec   863 MBytes   724 Mbits/sec                  receiver

iperf Done.
Connecting to host localhost, port 45201
[  5] local 127.0.0.1 port 36060 connected to 127.0.0.1 port 45201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  72.5 MBytes   608 Mbits/sec    4   4.37 MBytes       
[  5]   1.00-2.00   sec  63.8 MBytes   535 Mbits/sec    8   4.37 MBytes       
[  5]   2.00-3.00   sec  63.8 MBytes   535 Mbits/sec    5   4.37 MBytes       
[  5]   3.00-4.00   sec  63.8 MBytes   535 Mbits/sec    2   4.37 MBytes       
[  5]   4.00-5.00   sec  63.8 MBytes   535 Mbits/sec    2   4.37 MBytes       
[  5]   5.00-6.00   sec  62.5 MBytes   524 Mbits/sec    6   4.37 MBytes       
[  5]   6.00-7.00   sec  63.8 MBytes   535 Mbits/sec   13   4.37 MBytes       
[  5]   7.00-8.00   sec  63.8 MBytes   535 Mbits/sec    2   4.37 MBytes       
[  5]   8.00-9.00   sec  63.8 MBytes   535 Mbits/sec    0   4.37 MBytes       
[  5]   9.00-10.00  sec  65.0 MBytes   545 Mbits/sec    0   4.37 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   646 MBytes   542 Mbits/sec   42             sender
[  5]   0.00-10.03  sec   637 MBytes   533 Mbits/sec                  receiver

iperf Done.
Connecting to host localhost, port 45201
Reverse mode, remote host localhost is sending
[  5] local 127.0.0.1 port 44686 connected to 127.0.0.1 port 45201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  54.0 MBytes   453 Mbits/sec                 
[  5]   1.00-2.00   sec  64.0 MBytes   537 Mbits/sec                 
[  5]   2.00-3.00   sec  63.9 MBytes   536 Mbits/sec                 
[  5]   3.00-4.00   sec  63.9 MBytes   536 Mbits/sec                 
[  5]   4.00-5.00   sec  63.6 MBytes   534 Mbits/sec                 
[  5]   5.00-6.00   sec  63.2 MBytes   531 Mbits/sec                 
[  5]   6.00-7.00   sec  63.4 MBytes   532 Mbits/sec                 
[  5]   7.00-8.00   sec  63.5 MBytes   532 Mbits/sec                 
[  5]   8.00-9.00   sec  63.4 MBytes   532 Mbits/sec                 
[  5]   9.00-10.00  sec  63.6 MBytes   533 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.04  sec   638 MBytes   533 Mbits/sec    2             sender
[  5]   0.00-10.00  sec   626 MBytes   526 Mbits/sec                  receiver

iperf Done.
Connecting to host merckxw, port 5201
[  5] local 10.0.17.2 port 51338 connected to 10.0.17.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  69.5 MBytes   583 Mbits/sec    0   3.22 MBytes       
[  5]   1.00-2.00   sec  88.8 MBytes   744 Mbits/sec    0   3.22 MBytes       
[  5]   2.00-3.00   sec  90.0 MBytes   755 Mbits/sec    0   3.22 MBytes       
[  5]   3.00-4.00   sec  88.8 MBytes   744 Mbits/sec    0   3.22 MBytes       
[  5]   4.00-5.00   sec  88.8 MBytes   744 Mbits/sec    0   3.22 MBytes       
[  5]   5.00-6.00   sec  90.0 MBytes   755 Mbits/sec    0   3.22 MBytes       
[  5]   6.00-7.00   sec  88.8 MBytes   744 Mbits/sec    0   3.22 MBytes       
[  5]   7.00-8.00   sec  88.8 MBytes   744 Mbits/sec    0   3.22 MBytes       
[  5]   8.00-9.00   sec  82.5 MBytes   692 Mbits/sec    1   2.32 MBytes       
[  5]   9.00-10.00  sec  77.5 MBytes   650 Mbits/sec    0   2.53 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   853 MBytes   716 Mbits/sec    1             sender
[  5]   0.00-10.03  sec   853 MBytes   713 Mbits/sec                  receiver

iperf Done.
Connecting to host merckxw, port 5201
Reverse mode, remote host merckxw is sending
[  5] local 10.0.17.2 port 45546 connected to 10.0.17.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  16.1 MBytes   135 Mbits/sec                 
[  5]   1.00-2.00   sec  70.3 MBytes   590 Mbits/sec                 
[  5]   2.00-3.00   sec  89.5 MBytes   751 Mbits/sec                 
[  5]   3.00-4.00   sec  90.9 MBytes   763 Mbits/sec                 
[  5]   4.00-5.00   sec  90.6 MBytes   760 Mbits/sec                 
[  5]   5.00-6.00   sec  88.7 MBytes   744 Mbits/sec                 
[  5]   6.00-7.00   sec  89.7 MBytes   752 Mbits/sec                 
[  5]   7.00-8.00   sec  91.9 MBytes   771 Mbits/sec                 
[  5]   8.00-9.00   sec  91.9 MBytes   771 Mbits/sec                 
[  5]   9.00-10.00  sec  88.5 MBytes   742 Mbits/sec                 
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.03  sec   811 MBytes   678 Mbits/sec    0             sender
[  5]   0.00-10.00  sec   808 MBytes   678 Mbits/sec                  receiver

iperf Done.


So the encryption and transport overhead of the two tunnel methods are substantial but not awful. You can get a lot done with 65% of your network speed, since you got a secure connection for it. There's no inherent reason that moving data across that link should be noticeably slower than between local machines.

The next step in the performance question is at the os and application layer.

Hint -- there are horrors ahead.

Cheers,
Jon.
Back to top
View user's profile Send private message
Goverp
Advocate
Advocate


Joined: 07 Mar 2007
Posts: 2003

PostPosted: Sun Feb 19, 2023 6:18 pm    Post subject: Reply with quote

Out of interest, what is the WAN? Is it asymmetric? I only get 17 MB down and 800 KB up on my copper wires and ADSL, but you obviously have something faster. It may be worth logging on to your router/modem, if that's possible, and seeing what speeds it thinks the lines are.
_________________
Greybeard
Back to top
View user's profile Send private message
jesnow
l33t
l33t


Joined: 26 Apr 2006
Posts: 856

PostPosted: Sun Feb 19, 2023 6:40 pm    Post subject: Reply with quote

Goverp wrote:
Out of interest, what is the WAN? Is it asymmetric? I only get 17 MB down and 800 KB up on my copper wires and ADSL, but you obviously have something faster. It may be worth logging on to your router/modem, if that's possible, and seeing what speeds it thinks the lines are.


Thanks, I'll add that. It's AT&T fiber to the home, which does 1GBE up and down. Ordinary internet speedtest apps get ~870 Mb/s in both directions (close to what I measure all the way to work), the same is true at my work location, about the same ~870 Mb/s up and down.

That's the whole point, now that my WAN connection in principle has LAN speeds all the way to work, I'm not understanding why everything in the WAN is still much slower. What I've shown here is that the slowdown is not happening at the network link layer.

Cheers,

Jon.
Back to top
View user's profile Send private message
pingtoo
l33t
l33t


Joined: 10 Sep 2021
Posts: 925
Location: Richmond Hill, Canada

PostPosted: Sun Feb 19, 2023 7:12 pm    Post subject: Reply with quote

Maybe it is my poor English comprehensive skill. But I don't get what is your expectation for WAN connection? Are you expecting it will be same or very very close to LAN speed?

I think in part of your seeking answer already lie in your initial post, you already state that ping response between WAN nodes is 30ms compare to LAN nodes 1ms.

In some way the "30ms" or "1ms" are known as latency.

In TCP/IP network usually we measure number of bytes transfers and number of packets transferred. Normal LAN ethernet packet size are 1500 (or less) so in order to transfer 1 MBytes of data you need proximate of 699 packets if the latency is 1ms, this mean it will take 699 * 1ms = 699 * 0,001 = 0.699 second to transmit the 1 MBytes of data.

However if the same 1 MBytes of data is to transmit over the WAN which have 30ms latency than it is 699 * 30ms = 699 * 0.03 = 20.97 seconds to transmit the 1 MBytes of data.
And usually WAN packet size varies from network to network so it is even harder to measure.

My above explain is purely illustration only. The try WAN is never simple as I posted above.

So it is very important if you want to increase transmission speed by trying to reduce the latency.

in TCP/IP network one of the factor that introduce latency is number of hops. A hop is defined as from one node to the next node on the same wire.
Back to top
View user's profile Send private message
s0ulslack1
n00b
n00b


Joined: 06 Mar 2022
Posts: 20

PostPosted: Sun Feb 19, 2023 7:23 pm    Post subject: Reply with quote

Throughput is not Latency. You have 2 different locations over different providers going over multiple networks which include alot more hops vs your LAN.
Back to top
View user's profile Send private message
deagol
n00b
n00b


Joined: 12 Jul 2014
Posts: 61

PostPosted: Sun Feb 19, 2023 9:28 pm    Post subject: Reply with quote

pingtoo wrote:
In TCP/IP network usually we measure number of bytes transfers and number of packets transferred. Normal LAN ethernet packet size are 1500 (or less) so in order to transfer 1 MBytes of data you need proximate of 699 packets if the latency is 1ms, this mean it will take 699 * 1ms = 699 * 0,001 = 0.699 second to transmit the 1 MBytes of data.

However if the same 1 MBytes of data is to transmit over the WAN which have 30ms latency than it is 699 * 30ms = 699 * 0.03 = 20.97 seconds to transmit the 1 MBytes of data.
And usually WAN packet size varies from network to network so it is even harder to measure.
Latency as part or multiplicator would also be my main suspect here. But the argument above is not right:
With TCP/IP of course more than one packet can be in flight. Which you seem to assume in your calculation above. Now how many packets can be in flight depends on the window size the receiver tells the sender in the ack's.
(Linux is also dynamically scaling the Window size and normally does a great job, allowing to also utilize the full bandwidth.)

With increased latency more unacknowledged (in flight) bytes (packets) must be allowed, so the sender can fully utilize the available bandwidth. We need a higher window size, allowing the sender to send packets at full speed till at least the ack for the first packet arrives.

If the window is too small, the sender has to stop transmitting once all bytes the window permitted are send. We basically get bursts of full speed and then a total idle connection. In intervals of the latency between the two systems...

In my experience tcp window size between two linux systems is normally not causing suboptimal throughput. (Short of not tcp/IP compliant systems on the connection or some "tuning" on one or both linux systems)
More likely are some packet loss on the connection, requiring retransmits and breaking the ideal tx scenario assumed above. (Around 10% Packet loss things really break down and even 1% will cause serious degradation of the throughput.)
Even more often the problem is on application level: if e.g smb only allows 1MB of Data to be transmitted without acknowledgment on smb level the biggest tcp/ip window won't help till smb also allows to send enough bytes to use the available bandwidth...

Now iperf - especially with udp transport - should not suffer problems on application level.
But to analyse that properly require to take captures of the traffic and look at those with wireshark.
We then either see what's wrong - or that the sender simply is not using the available bandwidth. Which dictates at what to look next.
Ideally the traffic is captured at both ends simultaneously, but either end should be sufficient for an initial assessment.
In reality getting the captures right and analysing the captures may need quite some efforts. Sometimes the issue is obvious, sometimes it need many hours.
Back to top
View user's profile Send private message
jesnow
l33t
l33t


Joined: 26 Apr 2006
Posts: 856

PostPosted: Sun Feb 19, 2023 9:53 pm    Post subject: Reply with quote

It's not that simple. By your reasoning no video streaming would ever be possible, but we stream HD video every day. Those protocols are optimized in such a way that latency doesn't matter, only throughput.

But you're asking for the larger context: I'm trying to mount filesystems over the internet so I can work anywhere in the world exactly like I do at home here. Those protocols are obviously *not* optimized to make latency unimportant. So that's why I'm going methodically to see if everything is functioning correctly at each level, and then seeing what I can change to improve my ability to work.

My conclusion in today's post is that tunneling protocols don't cost us very much throughput, and that wireguard is definitively faster than ssh. That's a step in the right direction.



pingtoo wrote:
Maybe it is my poor English comprehensive skill. But I don't get what is your expectation for WAN connection? Are you expecting it will be same or very very close to LAN speed?

I think in part of your seeking answer already lie in your initial post, you already state that ping response between WAN nodes is 30ms compare to LAN nodes 1ms.

In some way the "30ms" or "1ms" are known as latency.

In TCP/IP network usually we measure number of bytes transfers and number of packets transferred. Normal LAN ethernet packet size are 1500 (or less) so in order to transfer 1 MBytes of data you need proximate of 699 packets if the latency is 1ms, this mean it will take 699 * 1ms = 699 * 0,001 = 0.699 second to transmit the 1 MBytes of data.

However if the same 1 MBytes of data is to transmit over the WAN which have 30ms latency than it is 699 * 30ms = 699 * 0.03 = 20.97 seconds to transmit the 1 MBytes of data.
And usually WAN packet size varies from network to network so it is even harder to measure.

My above explain is purely illustration only. The try WAN is never simple as I posted above.

So it is very important if you want to increase transmission speed by trying to reduce the latency.

in TCP/IP network one of the factor that introduce latency is number of hops. A hop is defined as from one node to the next node on the same wire.


Yes we used to be able to know something about where our packets were going using traceroute, but it hasn't been that way for a long time. When I changed institutes from France to Germany in 1994, we discovered that IP packets between the two national backbones were routed through New York. Things improved when everybody got on EUNET and I have no idea what they do now.

Cheers,
Jon.
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3133

PostPosted: Sun Feb 19, 2023 10:02 pm    Post subject: Reply with quote

Quote:
Throughput is not Latency
No, but latency does affect throughput.
Naive implementation of TCP would:
1) send a data packet
2) wait for ACK or timeout
3) send another data packet or resend dropped packet
4) wait for ACK or timeout
...rinse and repeat...
Round trip time makes a huge difference to transfer speed in this scenario.

A less naive implementation would keep multiple packets in flight, possibly even increasing the transmit buffer to cover all the latency encountered on link, but the more packets you keep in the loop the more data you have to resend after network drops a single packet.
The sender will also retransmit data if network delivers some packets out of order, which will never in a soho lan, but might happen in a load-balancing link.

Quote:
By your reasoning no video streaming would ever be possible, but we stream HD video every day.
This is wrong. Streaming video requires that available throughput is higher than video's bitrate. This ensures the data stream is not throttled by network.
Testing network speed means you're pushing the link to its limits. This guarantees the network will drop some of your data to force you to slow down.

2 completely different scenarios.


Last edited by szatox on Sun Feb 19, 2023 10:05 pm; edited 1 time in total
Back to top
View user's profile Send private message
jesnow
l33t
l33t


Joined: 26 Apr 2006
Posts: 856

PostPosted: Sun Feb 19, 2023 10:04 pm    Post subject: Reply with quote

Well what this definitely shows is that mt throughput is >50% of the wire speed, that's pretty good on an encrypted connection. What comes next is to see how well operating systems and application shove data through that bandwidth. the answer is: Horrors await.

Cheers,
Jon.
Back to top
View user's profile Send private message
pingtoo
l33t
l33t


Joined: 10 Sep 2021
Posts: 925
Location: Richmond Hill, Canada

PostPosted: Sun Feb 19, 2023 11:01 pm    Post subject: Reply with quote

I am glad the TCP window size brought in to conversation.

as matter of fact I am not able to answer where jesnow point, that stream HD video or every my own example of 1 MB transmit that take 20 some seconds. Because that not the reality. I admit there are a lot of factors in WAN that will affect performance.

In my post I just trying to point out WAN is a lot different than LAN. And there are a lot more factors that you have no control. And it usually display those factors blend in the ping response which I termed "latency".

I am not sure if there are any parameters for tuning Wireguard for window size (send/receive buffer) but as posted in this thread, many already point out the "latency" is the main concern and turning the window size will most likely give you the best bang on the buck.

I just want to give you my experience not to make the change of window size to global to everything that will running on the Wireguard peer nodes. I learn my lesson that if you did that you will run out of memory sooner than you expected and it is actually hard to understand when you have to debug in the production environment.
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3133

PostPosted: Mon Feb 20, 2023 11:58 am    Post subject: Reply with quote

Well, wireguard itself doesn't really need any window, since it uses UDP (and is stealthy), and it does not need to guarantee delivery, just like the plain copper wire itself does not guarantee it.
However, the point still stands: most of the data transferred will be sent using TCP (inside the tunnel).
So, wg should be able to operate at link-speed (if you have enough CPU power for the compression to meet the demand), but underlying WAN will always peak lower than underlying LAN of the same theoretical speed.
Back to top
View user's profile Send private message
jesnow
l33t
l33t


Joined: 26 Apr 2006
Posts: 856

PostPosted: Sat Feb 25, 2023 11:44 pm    Post subject: Reply with quote

I did all of this just with iperf, deciding that the raw network speed with wg was consistently better than ssh. I continued testing (I did a *lot* of it) using samba over both. Of course WAN speed is going to be less than LAN speed, and of course latency is the culprit. But It's a really really bad culprit, as it turns out. Much worse for file performance than it is for raw network pipe speed -- many orders of magnitude of slowdown! I did not expect that. I will start a new thread about this.

Most of the testing and tweaking posts about samba predate GBE becoming common in WANs.

More shortly.

Cheers,
jon.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum