How do I inhibit that a large file spoils the page cache?

as.gentoo · Guru Joined: 07 Aug 2004 Posts: 319

Hello,

I have a partition containing videos and music. I won't watch the same movie(s) every day! However, watching a movie will (in case the page cache is fully used) kick older data from the page cache, most probably data that is really suited to be kept in RAM.
I know that there is a tool that can force data to be kept in the page cache ( dev-util/vmtouch ), I need the opposite.

In ZFS there are the <primarycache|secondarycache>=<all|none|metadata> mount options but I wouldn't like to create a second zpool solely containing one drive.
FUSE has/had (?) a direct_io flag but I've read that it was deactivated and there is the overhead… I did not find a mount option like that in the ext4, jfs, reiserfs and xfs man pages.

Can / how do I tell the kernel to omit putting files / directory contents into the page cache?

Thanks in advance!

EDIT (2018-08-17): man mount.fuse states that direct_io refers to reading and writing from/to the page cache. Sorry for not pointing that out earlier.

VinzC · Posted: Fri Aug 10, 2018 1:27 pm Post subject:

Just being curious here.

From my (relatively poor however) technical understanding of file systems, I don't expect the page cache to be (at least) directly involved in caching files from the disk. The filesystem cache, however is — did you mean that cache, BTW? It is involved for write caching and you can turn it off with mount -o sync. But I guess you're referring to read-caching, right?

There's also buffering, which is always needed, be it at least to accommodate for disk latencies, and I doubt you could safely disable it without noticeable performance issues. From my understanding of I/O, the Linux kernel will determine how much memory it needs and what pages it needs to evict, possibly involving the swap file, according to what memory is used. Memory usage is adjusted continuously with the system's load, usage, aso. And if buffering requires the page cache, well, then let it be.

Next I don't expect huge files to be cached in large chunks. Megabytes maybe but gigabytes, probably not. Instead you can expect caching in small portions, usually 4-KB pages. And even then the filesystem cache is cleared by the kernel at given checkpoints, (correct me if I'm wrong) depending on the I/O scheduler, which you've selected in your kernel configuration, and the mount options you've selected.

I may be totally wrong, though so please educate me if so ;-)

.

I'd like to ask: what makes you believe the page cache is involved at all and why do you believe that is a problem?
_________________
Gentoo addict: tomorrow I quit, I promise!... Just one more emerge...
1739!

as.gentoo · Guru Joined: 07 Aug 2004 Posts: 319

Hello, thanks for replying

https://en.wikipedia.org/wiki/Page_cache reads:

Ant P. · Watchman Joined: 18 Apr 2009 Posts: 6920

I can't remember where I read this, but iirc the page cache is supposed to recognise sequential reads of large files and evict them early for exactly this reason. I can't find any tunables related to this in /proc/ or /sys/ though and I'm wondering if it really does this.

as.gentoo · Guru Joined: 07 Aug 2004 Posts: 319

I think your information is related to the device-mapper cache described here: /usr/src/linux/Documentation/device-mapper/cache-policies.txt

Ant P. · Watchman Joined: 18 Apr 2009 Posts: 6920

I'm pretty sure it doesn't care about the layout on disk in that case, only whether the read() calls increment monotonically. There'd probably be a lot of complaints about non-deterministic performance if it were the opposite.

VinzC · Posted: Thu Aug 16, 2018 5:16 am Post subject:

as.gentoo · Guru Joined: 07 Aug 2004 Posts: 319

as.gentoo · Guru Joined: 07 Aug 2004 Posts: 319

Looks like all previously cached data is replaced. :-\

VinzC · Posted: Sun Aug 19, 2018 10:27 am Post subject:

Goverp · Advocate Joined: 07 Mar 2007 Posts: 2007

As no-one seems to have mentioned it yet, I shall:
swappiness
_________________
Greybeard

VinzC · Posted: Tue Aug 21, 2018 4:23 am Post subject:

Goverp · Advocate Joined: 07 Mar 2007 Posts: 2007

Good question. I mentioned swappiness 'cos lots of articles on the web mention decreasing swappiness to "cure" the impact of processing large files on other applications. I was parroting that answer, but it may not be correct.

Swapping certainly has a big impact on a swapped application; you obviously have to wait for it to be swapped in before you get a response. But I was assuming the problem expressed in the original post was poor response time. Actually, rereading that, I'm not sure what the actual problem is, or even if there is one.

It's normal operation for old pages to be kicked from the page cache when something new comes along. Why is that a problem? If there's a response time problem, then OK, there may be an issue with caching a huge file when we know in fact each block of its data will be processed once and once only, so there's no need to cache it at all, but do we know it's actually causing a problem? As mentioned above, the kernel should recognise reading a file sequentially, so its likely only leaving it in cache 'cos either that algorithm's not working here, or there's no pressure to use the other stuff cached. Why keep all the browser tabs memory resident as there's a limit how fast you can switch between them and read the results? Whereas you might pause and backspace a video to see "did that really happen?".

Actually, I do wonder if swappiness is too high. If browser tabs count as swappable units (Chromium sandbox?), having that many open when reading a big file might cause a tab that's not been touched for ages to be swapped out. Or actually almost anything might cause swapping if the tab's not been hit for an hour. And that would cause a noticeable response lag when you next come back to it.
_________________
Greybeard

VinzC · Posted: Tue Aug 21, 2018 2:30 pm Post subject:

1clue · Advocate Joined: 05 Feb 2006 Posts: 2569

What we really need is for browsers to ask the kernel to inhibit caching by mime type.

That of course would apply to any other app that uses video files, but doing this at the kernel level entirely would require some number of bytes to be cached, where a known data stream which should not be cached would be easily recognizable by the user app.

Yes I understand that would mean that all the user apps which deal with these big files would have to get on board, and it would be like herding cats.

VinzC · Posted: Tue Aug 21, 2018 5:47 pm Post subject:

1clue · Advocate Joined: 05 Feb 2006 Posts: 2569

VinzC · Posted: Tue Aug 21, 2018 8:22 pm Post subject:

as.gentoo · Guru Joined: 07 Aug 2004 Posts: 319

1clue · Advocate Joined: 05 Feb 2006 Posts: 2569

Goverp · Advocate Joined: 07 Mar 2007 Posts: 2007

VinzC · Posted: Fri Aug 24, 2018 9:20 am Post subject:

tholin · Apprentice Joined: 04 Oct 2008 Posts: 203

1clue · Advocate Joined: 05 Feb 2006 Posts: 2569

IMO what it comes down to is that:

The kernel manages the caches of various things using information it can know about those things.
An application can know more about data it uses than the kernel can in terms of its repeated use.
Having more information about the nature of data allows for better caching algorithms.

The part about "write it yourself" from a few posts up made me think of a filesystem driver partially in user space, with a module that accepts hints from the end-user app but has defaults which make it act just like it does today.

It seems overly complicated and exploitable, but it might be an interesting exercise WRT the problem at hand.

Back in the days when a good programmer could code in assembly language and actually get better performance than the compiler did, we spent a lot of time thinking about code optimization. When compilers had inarguably surpassed us, we started to realize that the modules that really needed to be optimized were not necessarily the ones that we thought needed to be optimized. It may be that this issue is really not a big deal, or that it actually is. I can't tell. What's needed is real data on the "as-is" and "modified" scenarios.

as.gentoo · Guru Joined: 07 Aug 2004 Posts: 319