Previously I was having trouble working because of single digit MB/s WAN file throughput even though I have GBE all the way from work to home. Learn from my mistakes. I started trying all kinds of things before I looked at my network hardware. That was dumb:
viewtopic-t-1161123-highlight-.html
viewtopic-t-1161386-highlight-.html
viewtopic-t-1161818-highlight-.html
Then there was the setup of a 2.5GBE network card in gentoo (yes redundant and unnecessary at this moment), which was a whole saga. Finally it worked at near enough to theoretical throughput. The hardware setup is all in the above links so I won't repeat it all. It's 10yo Core i7 gentoo -- WAN -- 10yo Core i7 gentoo.
I had followed the advice: "just set up wireguard", wow that "just" was a doozy. Wireguard is hard. This is I think why so many businesses have sprung up offering to manage wireguard for you with their proprietary software and servers. But *I* of course had to do it the hard way. Just set up wireguard they said, it will be easy they said. Ok well I did it. Works great now.
So having obtained near "enough" to theoretical pure WAN throughput I set out to benchmark file performance. Still horrible. Felt like dialup! What I had noticed and wanted to figure out is this:
1) Transferring large files was OK, not perfect, maybe 30-40MB/s. BUT it was tricky to benchmark because you have at least three things in the way besides ssh or wireguard plus network transport (which we now think are working OK): The disks on both sides, the OS (just gentoo) and the application. I noticed that rsync and cp had often quite different speeds. And what I mostly use is Dolphin, which who knows what protocol that uses or how to make a benchmark script for it.
2) Small single files have lower throughput, maybe half. No surprise there, this is well known.
3) Certain applications take *FOREVER* in the WAN to do simple things. The file|open and file|save dialogs in libreoffice for example take ~10 seconds to populate the directory you're trying to save to, and you can sit there and watch each directory item load one by one at morse code speeds. Directory traversal.
4) Copying folders with even a few files goes to near-zero throughput. A kernel tree where the tarball is a few seconds may take half an hour.
So I set about to testing samba with cp and rsync (avoiding the obvious advantages rsync brings, just doing file transfers). I transferred four files of different sizes in both directions. 1GB, 100MB, 20MB and 1MB. The last of those was a subfolder from the kernel source tree with ~295 files in 32 directories. This test was in samba with every combination of wireguard and ssh tunnel transport. So for each one of a dozen or so configurations I had 32 speed measurements. I'll try to summarize them now. Anybody who wants the data are welcome to them.
** LAN performance: Figuring that's the best I could expect ever. Recall I was now getting ~930Mb/s netperf LAN speed in both directions (0.1ms), so I should expect a maximum of 120 MB/s file throughput. Indeed I got 109MB/s download and 64MB/s single 1GB file upload speed. OK. BUT Immediately I noticed that the folder of many files (just 1MB total for heaven's sake!) was super slow even on the LAN. Transferring that folder was >500 times slower than an equivalent sized single file, in the *LAN*! And I get perfectly adequate responsiveness in the LAN. You maybe notice that getting Dolphin thumbnails for image folders is a little slow but so what. That was a surprise, 500 times slower! it was taking nearly 30ms per file just to do the overhead of reading it. But wait, there's more.
** Starting with the default WAN configuration, I was now getting maximum file transfer speeds of about 40MB/s in both directions. 1/3 of wire speed. Wireguard was again the clear winner in tunnel speed over ssh to the point that I'm going to stop using ssh for tunneling. The Kernel Samples (small files) folder though was now *5300* times slower than an equivalent WAN single-file transfer. Holy cow. Per-file disk overhead is somewhere around 400msec. Nearly half a second to open and close any file.
** I switched to an insanely faster machine at work, with a much higher clock rate, core count and disk speed. Strangely, all the speeds went down very slightly. No explanation for that, same OS and applications. The faster client cpu/disk made not one bit of difference. But the new machine is vastly more responsive at the user interface. I think because it loads applications much faster.
** I did a bunch of Samba optimizations (like using "socket options: TCP_NODELAY") which they promise can up to double throughput. Nope. I tried ksmbd, nope. I tried moving the server test folder to an ssd (a staid 300MB/sec) from the 140MB/s rotating disk -- no difference. The standard Samba optimizations did zero for me.
** Things that maybe did help: Add "noserverino" the the mount command. That brought the per-file overhead down to 0.3s up to the server and 0.13 seconds down (from the server). Close random open files! I'm trying to work during all of this so I have a few dozen open files, dolphin instances and whatnot. Closing all of them made a big improvement.
After all of that my performance is now up to ~46MB/sec in the upload direction and ~ 51MB/s in the download direction for large single files, and the perceived responsiveness is much better. The per-file transfer overhead is down to 330ms (1/3 second!) and the upload direction is 130ms. While that is massive improvement, it's still slow and awful to work with.
I don't think there's much left to get out of the total transfer rate, as it doesn't seem to be affected by much I do. But the killer is clearly the per-file overhead. The ~100x decrease in transfer from LAN to WAN rate does seem to be affected by the total latency, but I think there are latencies built into in samba and in cp and rsync that could be improved. Samba is not a WAN protocol despite the "Common Internet" in the name. That was/is marketing b&^%$&t I bought for years.
My questions to you now:
What's next to try? I was thinking:
** Test NFS vs samba. I will drop ssh tunnels from my testing though, just wireguard. Linux-NFS-wireguard-NFS-Linux is a stack people claim works well.
** Test NFS and Samba with their built-in encryption instead of wireguard: Linux-Samba-Linux. Both are now secure. Use a nonstandard port though.
** Turn off oplocks. Those things have to be expensive.
** There is an industry of companies offering "WAN optimization" to solve this exact problem. They employ a number of promising sounding techniques to reduce the number of round trips to get a single file read. It's a combinations of caching, deduplicaiton, request bundling, and lots of other cool sounding stuff, all built on top of these same FOSS network technologies, but extending them for the enterprise deep-pockets world.
https://www.gartner.com/reviews/market/wan-optimization.
Is there anything like that you don't have to pay big bucks for, or is that layer a 100% proprietary harvesting of FOSS IP?
** Lastly: Syncthing, which appears to be FOSS. And resilio (a monetization of the bittorrent protocol) which is very much not free. It's like caching your entire volume locally and letting the daemon synchronize as it has time and bandwidth.
** To be clear, anything involving an external cloud server is out. I won't even consider it. I own my data, I own everything it exists on, period.
Does anybody have experience with those things? Any help gratefully received. Thank you for even reading this far.
Cheers,
Jon.


