Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Openssl performance issues
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
jesnow
l33t
l33t


Joined: 26 Apr 2006
Posts: 856

PostPosted: Sun Dec 04, 2022 7:43 pm    Post subject: Openssl performance issues Reply with quote

I am using a lot of ssh these days via ssh tunnels, but the performance is pretty bad. My new rabbit hole is to make sure that I have a correct ssl setup and that it's performing optimally. I'm writing down what I found in case it is useful later. Please criticize me in this thread if I make any mistakes, and please help if you have any ideas about how to troubleshoot my rotten throughput. I'm tunneling from my place of work (workstations vanaert and pogacar) to my home (bartali and merckx). All have wired ethernet and get ~1Gb/s conneciton speeds to the internet. It's not that easy it turns out to find out what's really going on because there aren't many introductory guides for people like me. Skip to the end for a description of the problems I've been getting.

Introduction

Openssl is a robust, full-featured Open Source toolkit for the Transport Layer Security (TLS). Openssl uses cryptographic engines to implement ciphers that exchange certificates to establish encrypted connections outside and inside the operating system instance.

https://www.openssl.org/
https://packages.gentoo.org/packages/dev-libs/openssl

The number of gentoo packages that depend on openssl is impressive, and it's sometimes surprising what all uses it. It's really a fundamental part of the linux operating system and gentoo in particular. Fortunately, it's pretty trouble-free. The default implementation included in gentoo "just works" and if you follow the gentoo installation guide, you have a solid engine and plenty of available ciphers of excellent quality to work with. Most people can stop reading right here.

Openssl vs libressl

Those of us old enough remember that there was a controversy, a fork, many flame wars, and a confusing situation for users. For a while there was a choice of which TLS implementation to use, openssl or libressl. This is a feature of open source software, not a bug. Many electrons have been spilled over this. But this particular competition is now resolved in favor of openssl, it's a long story:

https://wiki.gentoo.org/wiki/LibreSSL

ssl vs ssh

Most of us knowingly only use ssl by using ssh to log into another machine. ssl does much much more, it's the underlying technology, also used by a lot of other communication protocols, but still the best way to test it in the real world is through ssh. You should have public key encryption set up in ssh, at the very least generated keys installed for localhost. How to do that is beyond the scope of this article, there are a million guides out there.

https://wiki.gentoo.org/wiki/SSH

Versions

Openssl is under steady development, but the changes are intended to be transparent to the user, and only really relevant to developers. Currently in tree are:

Code:

jesnow@bartali ~ $ equery list openssl -p
 * Searching for openssl ...
[-P-] [M ] dev-libs/openssl-1.0.2u-r1:0
[IP-] [  ] dev-libs/openssl-1.1.1q:0/1.1
[-P-] [ ~] dev-libs/openssl-1.1.1s:0/1.1
[-P-] [M~] dev-libs/openssl-3.0.7:0/3


1.0.2 is deprecated, there are two version of 1.1.1, and there is a 3.0 version that incorporates some big under the hood changes, so I'm going to limit myself to 1.1.1 until 3.0 goes stable in gentoo. There are constant security updates.

openssl command

Openssl's user interface is the openssh command which has an excellent man page. It has its own internal cli shell, kind of like sftp does, where you can give commands and see their results. Most things can be done from the system shell by using command line arguments. This seems more convenient to me because it gives you access to history, which the internal cli does not.

Code:

jesnow@bartali ~ $ openssl version
OpenSSL 1.1.1q  5 Jul 2022
jesnow@bartali ~ $ openssl
OpenSSL> version
OpenSSL 1.1.1q  5 Jul 2022
OpenSSL> quit
jesnow@bartali ~ $


Openssl performance

The "openssl speed" command gives extensive benchmarking of the internal performance of the openssl stack using all the available ciphers and a variety of block sizes. Even the summary of the output is a bit daunting:

Code:


OpenSSL 1.1.1q  5 Jul 2022
built on: Sun Jul 17 19:44:42 2022 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: x86_64-pc-linux-gnu-gcc -fPIC -pthread -m64 -Wa,--noexecstack -O2 -march=x86-64 -pipe -fno-strict-aliasing -Wa,--noexecstack -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG  -DOPENSSL_NO_BUF_FREELISTS
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md2                  0.00         0.00         0.00         0.00         0.00         0.00
mdc2             18974.29k    20271.81k    20910.41k    20977.66k    21020.67k    21129.90k
md4             105874.85k   321972.84k   744771.81k  1116778.51k  1315785.39k  1329971.20k
md5             146403.54k   339441.17k   595976.02k   728138.41k   783141.50k   787111.47k
hmac(md5)        60493.44k   189497.54k   439870.50k   656708.32k   764900.69k   782448.33k
sha1            162097.83k   351767.01k   702521.45k   993588.57k  1093025.79k  1102555.53k
rmd160           53346.66k   126231.69k   229794.42k   288154.97k   311227.51k   311077.55k
rc4             482920.52k   797182.25k   919254.53k   976198.66k   981281.45k   982870.70k
des cbc          81204.94k    83484.01k    83802.62k    83806.55k    83992.58k    84262.91k
des ede3         30962.74k    31396.61k    31408.04k    31390.38k    31358.98k    31490.05k
idea cbc        106584.92k   110611.41k   111297.45k   111853.57k   111894.53k   111869.95k
seed cbc         90919.63k    95345.56k    95699.63k    96067.93k    95783.59k    95890.09k
rc2 cbc          59871.08k    61316.80k    61841.72k    62059.88k    61952.34k    61773.14k
rc5-32/12 cbc   295779.62k   328290.77k   336580.01k   338123.43k   336980.65k   338231.30k
blowfish cbc    133783.65k   141040.42k   142769.83k   143025.83k   143431.23k   143527.13k
cast cbc        118728.76k   127222.69k   129503.46k   129676.63k   130129.51k   130061.65k
aes-128 cbc     253446.09k   261655.32k   262920.28k   263661.23k   263495.68k   264525.14k
aes-192 cbc     217894.45k   224816.81k   225606.14k   226702.68k   223349.42k   225836.18k
aes-256 cbc     186522.45k   195011.20k   196768.68k   198703.95k   195739.32k   194035.71k
camellia-128 cbc   112228.90k   181183.55k   207081.39k   215213.74k   217022.46k   217366.53k
camellia-192 cbc   101708.66k   141470.12k   156160.34k   161113.43k   162922.50k   162725.89k
camellia-256 cbc   100172.22k   141594.88k   156356.27k   161319.25k   163190.10k   163015.34k
sha256           88805.05k   196290.47k   366010.37k   461756.07k   494441.81k   500230.83k
sha512           58311.16k   233388.71k   402951.77k   602200.06k   702980.10k   726947.16k
whirlpool        38657.68k    82188.91k   135896.49k   164015.00k   173129.73k   174344.39k
aes-128 ige     232342.22k   248786.02k   253756.07k   257386.84k   258916.35k   258790.74k
aes-192 ige     205817.59k   214868.31k   215990.10k   221858.13k   217251.84k   218054.66k
aes-256 ige     182099.57k   190099.82k   193063.42k   194466.47k   194874.03k   194707.46k
ghash          1395582.57k  4997389.55k  7899978.41k  8950806.19k  9575333.89k  9473845.93k
rand             15550.39k    62755.20k   242432.81k   825508.39k  2768350.84k  3304924.92k



What's obvious is that there's a pretty big variation in speed between the different ciphers. Whether that makes a difference or not in the real world is another matter, because modern processors are fast compared to the width of network pipes they are trying to push the encrypted bytes through. Or are they?

Hardware acceleration

Both Intel and AMD feature on-die hardware acceleration for cryptographic calculations. For intel they are called AES-NI and for AMD they are called ccp. Until pretty recently openssl was not configured to use these features out of the box. The present situation on that is cloudy. Previously there were two different approaches to hardware acceleration: af_alg and codedev. It looks like neither is installed by default. So for example:

Code:

jesnow@bartali ~ $ openssl engine -t -c
(rdrand) Intel RDRAND engine
 [RAND]
     [ available ]
(dynamic) Dynamic engine loading support
     [ unavailable ]


That was supposed to show the hardware encryption engine, if it is available, but doesn't. But apparently, some hardware encryption is going on. I'm encouraged by the compiler flag -DAESNI_ASM in the output above. Based on this documentation from the OpenWRT project, you can turn the hardware encryption on and off using an environment variable:

https://openwrt.org/docs/techref/hardware/cryptographic.hardware.accelerators

Like this:

Stock (AES-NI on?):
Code:

jesnow@pogacar ~ $ openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 151067326 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 40611140 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 10344060 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 2596655 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 324284 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 16384 size blocks: 161880 aes-128-cbc's in 3.00s
OpenSSL 1.1.1q  5 Jul 2022
built on: Sun Jul 17 19:44:42 2022 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: x86_64-pc-linux-gnu-gcc -fPIC -pthread -m64 -Wa,--noexecstack -O2 -march=x86-64 -pipe -fno-strict-aliasing -Wa,--noexecstack -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG  -DOPENSSL_NO_BUF_FREELISTS
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc     805692.41k   866370.99k   882693.12k   886324.91k   885511.51k   884080.64k


Hardware AES-NI switched off:
Code:

jesnow@pogacar ~ $ OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 60032317 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 18128897 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 4796872 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 1196739 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 150431 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 16384 size blocks: 75582 aes-128-cbc's in 3.00s
OpenSSL 1.1.1q  5 Jul 2022
built on: Sun Jul 17 19:44:42 2022 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: x86_64-pc-linux-gnu-gcc -fPIC -pthread -m64 -Wa,--noexecstack -O2 -march=x86-64 -pipe -fno-strict-aliasing -Wa,--noexecstack -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG  -DOPENSSL_NO_BUF_FREELISTS
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc     320172.36k   386749.80k   409333.08k   408486.91k   410776.92k   412778.50k


So the stock (ie using the crypto hardware) is about 4x faster (this is a 10yo Intel i7) than the software-only implementation, suggesting that openssl indeed incorporates hardware encryption for some ciphers. Interestingly this worked the same on my AMD machine as well.

Crypto Throughput:

All of this is just numerology unless you can measure bytes moved per second by the application, in this case ssh. What seems to be a standard way of doing this is the following script:

Code:

jesnow@bartali ~ $ for i in `ssh -Q cipher`; do dd if=/dev/zero bs=1M count=100 2> /dev/null   | ssh -c $i localhost "(time -p cat) > /dev/null" 2>&1   | grep real | awk '{print "'$i': "100 / $2" MB/s" }'; done
aes128-ctr: 526.316 MB/s
aes192-ctr: 526.316 MB/s
aes256-ctr: 555.556 MB/s
aes128-gcm@openssh.com: 555.556 MB/s
aes256-gcm@openssh.com: 555.556 MB/s
chacha20-poly1305@openssh.com: 588.235 MB/s
jesnow@bartali ~ $


I have seen this used on multiple sites to gauge ssh performance. On this machine (AMD Ryzen 3600) the throughput to itself is well above what can fit through a 1GBE network pipe, so if you're getting that kind of throughput to localhost, the network pipe is going to be the limiting factor. Here's the same local throughput test on an intel 11K at work:

Code:

jesnow@vanaert ~ $ for i in `ssh -Q cipher`; do dd if=/dev/zero bs=1M count=100 2> /dev/null   | ssh -c $i localhost "(time -p cat) > /dev/null" 2>&1   | grep real | awk '{print "'$i': "100 / $2" MB/s" }'; done
aes128-ctr: 625 MB/s
aes192-ctr: 833.333 MB/s
aes256-ctr: 769.231 MB/s
aes128-gcm@openssh.com: 769.231 MB/s
aes256-gcm@openssh.com: 769.231 MB/s
chacha20-poly1305@openssh.com: 769.231 MB/s
jesnow@vanaert ~ $


On an older intel i7 (merckx, that happens to be my home file and external sshd server):

Code:

jesnow@merckx ~ $ for i in `ssh -Q cipher`; do dd if=/dev/zero bs=1M count=100 2> /dev/null   | ssh -c $i localhost -p 2224 "(time -p cat) > /dev/null" 2>&1   | grep real | awk '{print "'$i': "100 / $2" MB/s" }'; done
aes128-ctr: 100 MB/s
aes192-ctr: 106.383 MB/s
aes256-ctr: 97.0874 MB/s
aes128-gcm@openssh.com: 106.383 MB/s
aes256-gcm@openssh.com: 104.167 MB/s
chacha20-poly1305@openssh.com: 196.078 MB/s
jesnow@merckx ~ $


This is significant, since, we're now down below the theoretical network throughput of ~116MB/s, meaning that even with 100GBE I could never do better than that, and probably on top of the samba protocol the cryptographic overhead for any tunneled connection is a significant part of the overall workload. This is probably a very real bottleneck in my system, and probably accounts for the big performance hit I take when using my home server while at work.

Using ssh between machines:

From one machine to the other in the local network, I get performance close to the theoretical maximum. In the local net at work I get:

Code:

jesnow@vanaert ~ $ for i in `ssh -Q cipher`; do dd if=/dev/zero bs=1M count=200 2> /dev/null   | ssh -T -c $i pogacar "(time -p cat) > /dev/null" 2>&1   | grep real | awk '{print "'$i': "200 / $2" MB/s" }'; done
3des-cbc: 108.108 MB/s
aes128-cbc: 111.111 MB/s
aes192-cbc: 111.111 MB/s
aes256-cbc: 111.111 MB/s
aes128-ctr: 109.89 MB/s
aes192-ctr: 109.89 MB/s
aes256-ctr: 111.111 MB/s
aes128-gcm@openssh.com: 109.89 MB/s
aes256-gcm@openssh.com: 111.111 MB/s
chacha20-poly1305@openssh.com: 111.111 MB/s
jesnow@vanaert ~ $


And in my local net at home I get:

Code:

jesnow@bartali ~ $ for i in `ssh -Q cipher`; do dd if=/dev/zero bs=1M count=200 2> /dev/null   | ssh -T -c $i merckx "(time -p cat) > /dev/null" 2>&1   | grep real | awk '{print "'$i': "200 / $2" MB/s" }'; done
3des-cbc: 81.3008 MB/s
aes128-cbc: 83.3333 MB/s
aes192-cbc: 108.108 MB/s
aes256-cbc: 104.712 MB/s
aes128-ctr: 107.527 MB/s
aes192-ctr: 99.0099 MB/s
aes256-ctr: 104.167 MB/s
aes128-gcm@openssh.com: 104.167 MB/s
aes256-gcm@openssh.com: 100 MB/s
chacha20-poly1305@openssh.com: 102.041 MB/s
jesnow@bartali ~ $


This performance in the local net is about the same as merckx got just talking to itself.

Into the tunnel: Throughput problems!

Finally, I have ssh tunnels (with conneciton sharing) running between each of my work machines (vanaert and pogacar) and my home server, merckx. Here's where thing break down and I really stop understanding. I get much worse performance tunneling over the work to home ethernet connection (>1GBE all the way) than I do in either local network. In the download direction (from the point of view of work) I get:

Code:
jesnow@bartali ~ $ for i in `ssh -Q cipher`; do dd if=/dev/zero bs=1M count=1 2> /dev/null   | ssh -T -c $i vanaert "(time -p cat) > /dev/null" 2>&1   | grep real | awk '{print "'$i': "1 / $2" MB/s" }'; done
3des-cbc: 3.7037 MB/s
aes128-cbc: 11.1111 MB/s
aes192-cbc: 11.1111 MB/s
aes256-cbc: 11.1111 MB/s
aes128-ctr: 5.88235 MB/s
aes192-ctr: 7.69231 MB/s
aes256-ctr: 7.14286 MB/s
aes128-gcm@openssh.com: 7.14286 MB/s
aes256-gcm@openssh.com: 11.1111 MB/s
chacha20-poly1305@openssh.com: 7.14286 MB/s


I wasn't expecting to get the same performance I get in my local net, but factor 10 seems to be a really big hit to take for going through the ssh tunnel. But it gets worse! In the upload direction (from the point of view of Work), it's even slower:

Code:

jesnow@vanaert ~ $ for i in `ssh -Q cipher`; do dd if=/dev/zero bs=1M count=1 2> /dev/null   | ssh -T -c $i merckx "(time -p cat) > /dev/null" 2>&1   | grep real | awk '{print "'$i': "1 / $2" MB/s" }'; done
3des-cbc: 1.26582 MB/s
aes128-cbc: 1.44928 MB/s
aes192-cbc: 0.877193 MB/s
aes256-cbc: 1.42857 MB/s
aes128-ctr: 0.970874 MB/s
aes192-ctr: 1.81818 MB/s
aes256-ctr: 2 MB/s
aes128-gcm@openssh.com: 1.40845 MB/s
aes256-gcm@openssh.com: 1.49254 MB/s
chacha20-poly1305@openssh.com: 1.07527 MB/s


I have been getting a similar pattern using iperf3 through the tunnel, but that's a story for another day. All of the work and home computers get ~800 mb/s connection to the internet.

Conclusion:

The ssh tunnel seems to be costing me a huge performance penalty: On a gigabit ethernet connection I'm getting 10MB/s in one direction and 1MB/s in the other! But it's not clear why. I went into this thinking a misconfig of openssl was maybe to blame, or at least that I could improve the situation by choosing a better cipher or enabling the hardware crypto. In fact what I seem to have found is big differences in network performance that have some other explanation, despite the fact that all four machines have a super fast cabled ethernet connection.
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9532
Location: beyond the rim

PostPosted: Wed Dec 14, 2022 12:42 pm    Post subject: Reply with quote

First, your tunnel test is only using a 1MB sample size, so any delay caused by connection setup/teardown will have a much greater effect than in your other tests using a 100MB / 200MB sample size.
Also your tests show that the bottleneck is not openssl itself, but rather an issue specific to the tunneling setup, so the title is a bit misleading. More so given that you've omitted any information about how you setup your tunnels. Like using an ssh connection over an ssh tunnel obviously won't work very well as you're encrypting/decrypting the data twice on both ends for no reason (though I'm not an expert on ssh tunnels, so that could be wrong).

What is really odd is that the 256 bit ciphers in your tunnel test seem to perform much better than the 128 bit variants. Should really retest with a much larger sample to ensure you're actually looking at bandwidth rather than (random) latency. As a general rule: high-level performance tests should run for more than just milliseconds, esp. if network connections are involved.
Or just drop the |grep|awk part and look at the actual times. Also look at actual CPU usage during your tests, esp. if it changes significantly between different tests.

Last but not least, ssh has a -v option for diagnostics that might be useful to you.
Back to top
View user's profile Send private message
no101
n00b
n00b


Joined: 10 Oct 2022
Posts: 11
Location: Piney Woods

PostPosted: Wed Dec 14, 2022 4:29 pm    Post subject: Reply with quote

You might be experiencing "TCP Meltdown". Running TCP over TCP can cause problems. Wireguard uses UDP for exactly this reason and OpenVPN suggests using UDP as well. VPN is an inversion of what you're doing but the issue is the same.

I suspect tcp meltdown because the localhost network is perfect: you never get dropped packets or congestion. Since you never get errors, localhost is pretty useless for testing network code performance. There's other differences too like you will likely get zero-copy network transmission because the kernel knows it can simply pass the existing buffer.

I don't really have a good reference but OpenVPN has a simple explanation in their FAQ.
https://openvpn.net/faq/what-is-tcp-meltdown/
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3137

PostPosted: Wed Dec 14, 2022 10:59 pm    Post subject: Reply with quote

I just want to add that in my experience SSH makes a very slow data pipe. Even an otherwise idle system wouldn't transfer more than 20MBps per connection.
Fortunately, in my particular case I could simply work around this by splitting a job into a few thousand chunks and sending up to 20 of those in parallel for a total bandwidth of 400MBps but I still do not consider it a good general-purpose solution.

Just saying. If you do have a full control over both ends, you'll be better of simply shoving your data down netcat wrapped in wireguard.
Anyway, what do you need it for? I wonder what other options are available.
Back to top
View user's profile Send private message
jesnow
l33t
l33t


Joined: 26 Apr 2006
Posts: 856

PostPosted: Fri Jan 27, 2023 4:36 pm    Post subject: Reply with quote

To summarize:

I've got networking running very reliably on multiple machines outside my home network using dynamic dns, a pinhole in my router firewall on a random port, ssh reverse tunneling (using openssh) and cifs. The nice thing is it all works about the same as it does at home, and my home systems can connect easily to my remote systems at work through the tunnel. It's like a simple vpn.

Here's my remote machine .ssh/config:

Code:

Host *
        ForwardX11 yes
        ForwardX11Trusted yes
    controlmaster auto
    controlpath /tmp/ssh-%r@%h:%p
     ServerAliveInterval 60
     ServerAliveCountMax 10
    ConnectTimeout 300



Host merckx
        Hostname merckx.*****.***
        User ******
        RemoteForward 42223 localhost:22
        RemoteForward 43632 localhost:3632
        RemoteForward 44000 localhost:4000
        LocalForward 44445 merckx:445
        LocalForward 44000 bartali:4000
        Port *****

host vanaert
        user jesnow


It's a little cumbersome in that I have to bring up the tunnel by hand on the remote machine with

Code:

jesnow@pogacar ~ $ autossh -M 0 -f -T -N merckx


but since passwordless login is all set up, this takes one second and I can see error messages if something goes wrong. Up comes the tunnel, and I can mount a samba share easily with

Code:

mount -t cifs //localhost/jesnow /mnt/merckx-jesnow -o port=44445,credentials=/root/smb-merckx/.cred,vers=3.11,uid=jesnow,gid=users


It all works fine on the download side. I can copy files *from* my home server at near-network speeds over samba, to my work machines, but copying them in the reverse direction is *to* the server extremely slow as documented above. I'm not using samba above, so that's not an issue, samba is doing its job fine, it's the tunnel that's slowing everything down, but only in one direction. There is no visible load on the server at home during transfers, and the net connection is nowhere near saturated.

Responding to comments above:
I don't think the TCP over TCP issue is what's wrong. That would slow transfers in both direction. Because I get acceptable speed in one direction it just doesn't make sense that it won't go both ways. But after a few months of messing with it, I'm still no further than I was when I posted.

@Genone: My tunnel setup is above.

@szatox: I'm surprised by that too. Yes, ssh is a slow data pipe, but I think this is way too slow, and I'm just missing one detail for it to be only 80% (which would be enough) instead of 1%, which really doesn't work.

@no101: I agree that localhost and the local net aren't that interesting, I just wanted to show that it all works. "TCP meltdown" probably is why I only get about 80% speed from vanaert (remote)

Well, so I was hoping someone would say "ah we get this all the time, be sure you set X in .ssh/config", useful tips like that is what got me as far as I've gotten. I thought it was an inefficiency in the crypto setup, so in the post above I set out to test that proposition. I now don't think that's the case -- yes it's not as fast as it might be, but doesn't explain the up/down disparity.

I think I'm at a dead end in getting this particular setup to work. Probably the "just use wireguard" option is the way to go. But there's a lot of work behind that "just": I have firewalls at both ends and a heterogeneous gaggle of 6 machines.

Anybody who has seen this one-way slowdown before, please let me know.

Cheers,
Jon
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum