Very generic errors when using `sysfs`.

Message

leyvi · Post by **leyvi** » Sun Dec 07, 2025 6:54 pm

Running the command

echo "defer" > /sys/kernel/mm/transparent_hugepages/defrag

(as described by the kernel documentation) results in a no such file or directory error, even though it is definitely there.

At the same time,

Code: Select all

echo "8" > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

works fine, but

Code: Select all

echo "16" > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages

results in an echo: I/O error.

What gives?

Post by Hu » Sun Dec 07, 2025 7:43 pm

For each of those files, what is the output of cat file? What is the output of uname -r? The behavior could have changed between kernels, so if you want us to research this for you, we need to know what kernel you are using.

Alternatively, you could pull the kernel source and research it yourself. The first file appears to be managed by mm/huge_memory.c. The other two appear to be in mm/hugetlb.c.

Post by **pietinger** » Sun Dec 07, 2025 7:48 pm

leyvi wrote:Running the command
Code: Select all
echo "defer" > /sys/kernel/mm/transparent_hugepages/defrag
(as described by the kernel documentation) results in a no such file or directory error, even though it is definitely there.

Try again without an "s" ->

Code: Select all

echo "defer" > /sys/kernel/mm/transparent_hugepage/defrag

(only the other directory is .../hugepages)

leyvi · Post by **leyvi** » Mon Dec 08, 2025 2:19 pm

pietinger wrote:
leyvi wrote:Running the command
Code: Select all
echo "defer" > /sys/kernel/mm/transparent_hugepages/defrag
(as described by the kernel documentation) results in a no such file or directory error, even though it is definitely there.
Try again without an "s" ->
Code: Select all
echo "defer" > /sys/kernel/mm/transparent_hugepage/defrag
(only the other directory is .../hugepages)

Thanks, didn't realize. /sys/kernel/mm/hugepages is still causing problems though.

leyvi · Post by **leyvi** » Mon Dec 08, 2025 2:24 pm

Hu wrote:For each of those files, what is the output of cat file? What is the output of uname -r? The behavior could have changed between kernels, so if you want us to research this for you, we need to know what kernel you are using.

Alternatively, you could pull the kernel source and research it yourself. The first file appears to be managed by mm/huge_memory.c. The other two appear to be in mm/hugetlb.c.

The value of nr_hugepages (which works) is 8, while nr_overcommit_hugepages is 0. Kernel release is 6.18.0-gentoo.

I'll go check out the documentation, but I doubt I'll understand it. I know C quite well, but I have no clue how the kernel is written

leyvi · Post by **leyvi** » Mon Dec 08, 2025 2:29 pm

I found this:

Code: Select all

4263   │ static ssize_t nr_overcommit_hugepages_show(struct kobject *kobj,
4264   │                     struct kobj_attribute *attr, char *buf)
4265   │ {
4266   │     struct hstate *h = kobj_to_hstate(kobj, NULL);
4267   │     return sysfs_emit(buf, "%lu\n", h->nr_overcommit_huge_pages);
4268   │ }
4269   │
4270   │ static ssize_t nr_overcommit_hugepages_store(struct kobject *kobj,
4271   │         struct kobj_attribute *attr, const char *buf, size_t count)
4272   │ {
4273   │     int err;
4274   │     unsigned long input;
4275   │     struct hstate *h = kobj_to_hstate(kobj, NULL);
4276   │
4277   │     if (hstate_is_gigantic(h))
4278   │         return -EINVAL;
4279   │
4280   │     err = kstrtoul(buf, 10, &input);
4281   │     if (err)
4282   │         return err;
4283   │
4284   │     spin_lock_irq(&hugetlb_lock);
4285   │     h->nr_overcommit_huge_pages = input;
4286   │     spin_unlock_irq(&hugetlb_lock);
4287   │
4288   │     return count;
4289   │ }

No clue what I'm looking at here, though I understand what's going on, I have no idea what it does. I don't know if I have time to get familiar with kernel sources...

Goverp · Post by **Goverp** » Mon Dec 08, 2025 6:39 pm

leyvi wrote: At the same time,
Code: Select all
echo "8" > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
works fine, but
Code: Select all
echo "16" > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages
results in an echo: I/O error.

IIUC, the first call will reserve 8 GB of real memory for the huge page pool, and the second would allow it to grow to 24 GB if applications try to use it. Perhaps it's failing 'cos you have less that 24 GB main memory?

Goverp · Post by **Goverp** » Mon Dec 08, 2025 6:49 pm

leyvi wrote:I found this:

Code: Select all

4263   │ static ssize_t nr_overcommit_hugepages_show(struct kobject *kobj,
4264   │                     struct kobj_attribute *attr, char *buf)
4265   │ {
4266   │ ...
4268   │ }
4269   │
4270   │ static ssize_t nr_overcommit_hugepages_store(struct kobject *kobj,
4271   │         struct kobj_attribute *attr, const char *buf, size_t count)
4272   │ {
4273   │     int err;
4274   │     unsigned long input;
4275   │     struct hstate *h = kobj_to_hstate(kobj, NULL);
4276   │
4277   │     if (hstate_is_gigantic(h))
4278   │         return -EINVAL;
4279   │ ...
4289   │ }

No clue what I'm looking at here, though I understand what's going on, I have no idea what it does. I don't know if I have time to get familiar with kernel sources...

IIUC, this is simply some code to move the "nr_overcommit" number from a parameter to the kernel's huge page control block, using spin locks to serialize access to said control block. There's no real processing apart from that test for "hstate_is_gigantic" - which I'm guessing is true, as 1 GB seems pretty gigantic to me! However, that interpretation means you can only set "nr_overcommit" for not-quite-as-huge pages, which on my system seem to be 2 MB, rather than 1 GB.

I've just tried this on my machine, and it bears out the above. I could set nr_hugepages and nr_overcommit_hugepages for 2 MB, but only nr_hugepages for 1 GB. (Not sure how much memory I have left...)

leyvi · Post by **leyvi** » Mon Dec 08, 2025 7:00 pm

Goverp, several months before the current RAM shortage started, I had a need for a new laptop. I figured that the current AI-madness would result in chip shortages eventually, so I splurged and bought the beefiest mobile workstation on the market: a Framework 16, which I configured with 64GiB of RAM, among other things. RAM is now 3.5x more expensive than it was when I got the laptop, so I'm feeling pretty good about myself

Hugepages is a great way to make some use of all that RAM, since my ultra-low-bloat setup uses only a few hundred megabytes of RAM at idle.

leyvi · Post by **leyvi** » Mon Dec 08, 2025 7:03 pm

Goverp wrote:
leyvi wrote:I found this:
Code: Select all
4263   │ static ssize_t nr_overcommit_hugepages_show(struct kobject *kobj,
4264   │                     struct kobj_attribute *attr, char *buf)
4265   │ {
4266   │ ...
4268   │ }
4269   │
4270   │ static ssize_t nr_overcommit_hugepages_store(struct kobject *kobj,
4271   │         struct kobj_attribute *attr, const char *buf, size_t count)
4272   │ {
4273   │     int err;
4274   │     unsigned long input;
4275   │     struct hstate *h = kobj_to_hstate(kobj, NULL);
4276   │
4277   │     if (hstate_is_gigantic(h))
4278   │         return -EINVAL;
4279   │ ...
4289   │ }
No clue what I'm looking at here, though I understand what's going on, I have no idea what it does. I don't know if I have time to get familiar with kernel sources...
IIUC, this is simply some code to move the "nr_overcommit" number from a parameter to the kernel's huge page control block, using spin locks to serialize access to said control block. There's no real processing apart from that test for "hstate_is_gigantic" - which I'm guessing is true, as 1 GB seems pretty gigantic to me! However, that interpretation means you can only set "nr_overcommit" for not-quite-as-huge pages, which on my system seem to be 2 MB, rather than 1 GB.

I've just tried this on my machine, and it bears out the above. I could set nr_hugepages and nr_overcommit_hugepages for 2 MB, but only nr_hugepages for 1 GB. (Not sure how much memory I have left...)

What CPU do you have? (I have both 1GiB & 2MiB support as well, I'm on Zen 4.)

leyvi · Post by **leyvi** » Mon Dec 08, 2025 7:23 pm

Goverp wrote:
leyvi wrote:I found this:
Code: Select all
4263   │ static ssize_t nr_overcommit_hugepages_show(struct kobject *kobj,
4264   │                     struct kobj_attribute *attr, char *buf)
4265   │ {
4266   │ ...
4268   │ }
4269   │
4270   │ static ssize_t nr_overcommit_hugepages_store(struct kobject *kobj,
4271   │         struct kobj_attribute *attr, const char *buf, size_t count)
4272   │ {
4273   │     int err;
4274   │     unsigned long input;
4275   │     struct hstate *h = kobj_to_hstate(kobj, NULL);
4276   │
4277   │     if (hstate_is_gigantic(h))
4278   │         return -EINVAL;
4279   │ ...
4289   │ }
No clue what I'm looking at here, though I understand what's going on, I have no idea what it does. I don't know if I have time to get familiar with kernel sources...
IIUC, this is simply some code to move the "nr_overcommit" number from a parameter to the kernel's huge page control block, using spin locks to serialize access to said control block. There's no real processing apart from that test for "hstate_is_gigantic" - which I'm guessing is true, as 1 GB seems pretty gigantic to me! However, that interpretation means you can only set "nr_overcommit" for not-quite-as-huge pages, which on my system seem to be 2 MB, rather than 1 GB.

I've just tried this on my machine, and it bears out the above. I could set nr_hugepages and nr_overcommit_hugepages for 2 MB, but only nr_hugepages for 1 GB. (Not sure how much memory I have left...)

So just to confirm: it's impossible for me to enable any number of overcommit hugepages? If so, I wonder what the reasoning there was...

By the way, thanks for the code explanation! I'd still need to read through some more code to really understand, like wherever the hstate_is_gigantic() is defined.

pingtoo · Post by **pingtoo** » Mon Dec 08, 2025 8:05 pm

it all go back to what is "page" and what is "huge page" and why you need "huge page".

If you don't know what/when/where you need them, you don't need them.

If you do know where/when those can be of use then you would already know why.

hint, huge page is something not allow to swap out in to swap space. And having very large number does not help in term of performance. (so does setup a number larger than total physical memory does it make any sense?)

leyvi · Post by **leyvi** » Mon Dec 08, 2025 8:35 pm

pingtoo wrote:it all go back to what is "page" and what is "huge page" and why you need "huge page".

If you don't know what/when/where you need them, you don't need them.

If you do know where/when those can be of use then you would already know why.

hint, huge page is something not allow to swap out in to swap space. And having very large number does not help in term of performance. (so does setup a number larger than total physical memory does it make any sense?)

I do know what they are, and I do use them (need is a strong word, but hugepages are useful to me).

As mentioned previously, I have far more RAM on my laptop than many servers have, and I use very little of it. I might as well put more of it into hugepages, and enjoy more of a speedup in those few applications that use it (virtualization, mostly, but also JVM, web-browsers, my download manager, Portage, etc).

I have spent months researching how to make better use of hugepages. I've even read (physical) books on the AMD64 architecture, and part of that was about the TLB. I'm no expert, but I know how it works.

pingtoo · Post by **pingtoo** » Mon Dec 08, 2025 9:34 pm

leyvi wrote:
pingtoo wrote:it all go back to what is "page" and what is "huge page" and why you need "huge page".

If you don't know what/when/where you need them, you don't need them.

If you do know where/when those can be of use then you would already know why.

hint, huge page is something not allow to swap out in to swap space. And having very large number does not help in term of performance. (so does setup a number larger than total physical memory does it make any sense?)
I do know what they are, and I do use them (need is a strong word, but hugepages are useful to me).

As mentioned previously, I have far more RAM on my laptop than many servers have, and I use very little of it. I might as well put more of it into hugepages, and enjoy more of a speedup in those few applications that use it (virtualization, mostly, but also JVM, web-browsers, my download manager, Portage, etc).

I have spent months researching how to make better use of hugepages. I've even read (physical) books on the AMD64 architecture, and part of that was about the TLB. I'm no expert, but I know how it works.

OK, can you explain to me what/how portage use "huge page"?

It is very good that you did your research. so as you already understand what huge page work. so what is the effect of setting "echo "16" > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages" do you expect?
Understand that kernel page is a way to to define how memory (physical ram) arranged by kernel and how kernel dish out when asked. the setting you asking is you want your kernel to allow you when there is/are an application(s) ran on your kernel if it ask for a setting and you want you kernel to allow it, right? So the error is telling you the current setting (either due to kernel configuration or run time detection) it will not allow. the "I/O" error maybe a little mis-leading but the actual thing is it mean the value is not allowed.

Are you wanting to know where in kernel configuration to allow you to make changes so it can allow value of 16 of 1GB "huge page"?

Goverp · Post by **Goverp** » Tue Dec 09, 2025 11:07 am

leyvi wrote:...What CPU do you have? (I have both 1GiB & 2MiB support as well, I'm on Zen 4.)

Zen 3.

Goverp · Post by **Goverp** » Tue Dec 09, 2025 11:29 am

According to Google Gemini, the function of nr_overcommit_hugepages is to permit collecting 2 MB huge pages into 1 GB huge pages when there's a demand for the gigantic pages. Hence there's no meaning in setting it for 1 GB pages. However, when I tried to drill down into where Gemini found this, it seems to be a third-party description rather than authoritative kernel documentation. There's also the issue that dynamically allocating 1 GB huge page is going to be hard on a running system because of memory fragmentation.
( * I assume it's Gemini: I typed "gigantic huge page overcommit" into Firefox, and the answer came from Google's AI Overview)

Something else I can't nail down is whether simply configuring support and allocation of huge pages is sufficient, or whether applications need to be configured or developed to exploit them, and if it's a configuration item, where that's configured.

leyvi · Post by **leyvi** » Tue Dec 09, 2025 8:01 pm

pingtoo wrote:OK, can you explain to me what/how portage use "huge page"?

Sure. While Portage doesn't use it directly, if PORTAGE_TMPDIR is in tmpfs, then there's a mount option you can use, huge=always (which I recommend putting in fstab), which instructs the kernel to back tmpfs with transparent hugepages. In theory, this should provide a significant advantage over the default, since Portage does a lot of large file creation/deletion very fast. In practice, I have no clue, as I haven't been able to think of a way to benchmark it effectively

pingtoo wrote:It is very good that you did your research. so as you already understand what huge page work. so what is the effect of setting "echo "16" > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_overcommit_hugepages" do you expect?
Understand that kernel page is a way to to define how memory (physical ram) arranged by kernel and how kernel dish out when asked. the setting you asking is you want your kernel to allow you when there is/are an application(s) ran on your kernel if it ask for a setting and you want you kernel to allow it, right? So the error is telling you the current setting (either due to kernel configuration or run time detection) it will not allow. the "I/O" error maybe a little mis-leading but the actual thing is it mean the value is not allowed.

Are you wanting to know where in kernel configuration to allow you to make changes so it can allow value of 16 of 1GB "huge page"?

Here's the outcome I was expecting:

Code: Select all

 214   │ ``/proc/sys/vm/nr_overcommit_hugepages`` specifies how large the pool of
 215   │ huge pages can grow, if more huge pages than ``/proc/sys/vm/nr_hugepages`` are
 216   │ requested by applications.  Writing any non-zero value into this file
 217   │ indicates that the hugetlb subsystem is allowed to try to obtain that
 218   │ number of "surplus" huge pages from the kernel's normal page pool, when the
 219   │ persistent huge page pool is exhausted. As these surplus huge pages become
 220   │ unused, they are freed back to the kernel's normal page pool.

pingtoo · Post by **pingtoo** » Tue Dec 09, 2025 8:53 pm

leyvi wrote:...
Here's the outcome I was expecting:

Code: Select all

 214   │ ``/proc/sys/vm/nr_overcommit_hugepages`` specifies how large the pool of
 215   │ huge pages can grow, if more huge pages than ``/proc/sys/vm/nr_hugepages`` are
 216   │ requested by applications.  Writing any non-zero value into this file
 217   │ indicates that the hugetlb subsystem is allowed to try to obtain that
 218   │ number of "surplus" huge pages from the kernel's normal page pool, when the
 219   │ persistent huge page pool is exhausted. As these surplus huge pages become
 220   │ unused, they are freed back to the kernel's normal page pool.

Did you tried on /proc/sys/vm/nr_overcommit_hugepages? you initial post show you did it on different location.

as for making tmpfs use hugepage (in fact Transparent Huge Page -- THP) in the case of Portage (specifically compiler dump intermediate object files in /tmp). I doubted that will have any significant benefit. Usually compiler intermediate object were read/write in sequential order so there is no benefit use a mmap access so the compiler does not actually use mmap call for the intermediate object file, The default tmpfs memory in kernel allocate can handle this easily, On the other hand the THP is not easily managed, kernel will need to trigger separated kernel thread to handle this will create unnecessary context switch.

leyvi · Post by **leyvi** » Tue Dec 09, 2025 9:20 pm

pingtoo wrote:Did you tried on /proc/sys/vm/nr_overcommit_hugepages? you initial post show you did it on different location.

The procfs interface to hugetlbfs is (basically) deprecated, and is there only for backwards-compatibility. sysfs is now preferred:

Code: Select all

 214   │ ``/proc/sys/vm/nr_overcommit_hugepages`` specifies how large the pool of
 215   │ huge pages can grow, if more huge pages than ``/proc/sys/vm/nr_hugepages`` are
 216   │ requested by applications.  Writing any non-zero value into this file
 217   │ indicates that the hugetlb subsystem is allowed to try to obtain that
 218   │ number of "surplus" huge pages from the kernel's normal page pool, when the
 219   │ persistent huge page pool is exhausted. As these surplus huge pages become
 220   │ unused, they are freed back to the kernel's normal page pool.
 221   │
 222   │ When increasing the huge page pool size via ``nr_hugepages``, any existing
 223   │ surplus pages will first be promoted to persistent huge pages.  Then, additional
 224   │ huge pages will be allocated, if necessary and if possible, to fulfill
 225   │ the new persistent huge page pool size.
 226   │
 227   │ The administrator may shrink the pool of persistent huge pages for
 228   │ the default huge page size by setting the ``nr_hugepages`` sysctl to a
 229   │ smaller value.  The kernel will attempt to balance the freeing of huge pages
 230   │ across all nodes in the memory policy of the task modifying ``nr_hugepages``.
 231   │ Any free huge pages on the selected nodes will be freed back to the kernel's
 232   │ normal page pool.
 233   │
 234   │ Caveat: Shrinking the persistent huge page pool via ``nr_hugepages`` such that
 235   │ it becomes less than the number of huge pages in use will convert the balance
 236   │ of the in-use huge pages to surplus huge pages.  This will occur even if
 237   │ the number of surplus pages would exceed the overcommit value.  As long as
 238   │ this condition holds--that is, until ``nr_hugepages+nr_overcommit_hugepages`` is
 239   │ increased sufficiently, or the surplus huge pages go out of use and are freed--
 240   │ no more surplus huge pages will be allowed to be allocated.
 241   │
 242   │ With support for multiple huge page pools at run-time available, much of
 243   │ the huge page userspace interface in ``/proc/sys/vm`` has been duplicated in
 244   │ sysfs.
 245   │ The ``/proc`` interfaces discussed above have been retained for backwards
 246   │ compatibility. The root huge page control directory in sysfs is::
 247   │
 248   │     /sys/kernel/mm/hugepages
 249   │
 250   │ For each huge page size supported by the running kernel, a subdirectory
 251   │ will exist, of the form::
 252   │
 253   │     hugepages-${size}kB
 254   │
 255   │ Inside each of these directories, the set of files contained in ``/proc``
 256   │ will exist.  In addition, two additional interfaces for demoting huge
 257   │ pages may exist::
 258   │
 259   │         demote
 260   │         demote_size
 261   │     nr_hugepages
 262   │     nr_hugepages_mempolicy
 263   │     nr_overcommit_hugepages
 264   │     free_hugepages
 265   │     resv_hugepages
 266   │     surplus_hugepages

pingtoo wrote:as for making tmpfs use hugepage (in fact Transparent Huge Page -- THP) in the case of Portage (specifically compiler dump intermediate object files in /tmp). I doubted that will have any significant benefit. Usually compiler intermediate object were read/write in sequential order so there is no benefit use a mmap access so the compiler does not actually use mmap call for the intermediate object file, The default tmpfs memory in kernel allocate can handle this easily, On the other hand the THP is not easily managed, kernel will need to trigger separated kernel thread to handle this will create unnecessary context switch.

Hang on a minute; isn't it true that if (and this should usually be the case) every file in the build tree fits in a single (huge)page (or a few pages, for large binaries), the number of system calls during the build process will be dramatically reduced (the read() and write() calls still have to go somewhere, in this case tmpfs, and if a file only needs one page, then that's less reading farther down in the kernel, I think)? Isn't that desirable?

pingtoo · Post by **pingtoo** » Tue Dec 09, 2025 10:41 pm

leyvi wrote:
pingtoo wrote:Did you tried on /proc/sys/vm/nr_overcommit_hugepages? you initial post show you did it on different location.
The procfs interface to hugetlbfs is (basically) deprecated, and is there only for backwards-compatibility. sysfs

OK, I trust you know what you are doing. In this case maybe there is not enough consecutive memory page to allow? I don't know. as far as I understand Huge Page is better allocate early or else can fail because memory fragmentation.

pingtoo wrote:as for making tmpfs use hugepage (in fact Transparent Huge Page -- THP) in the case of Portage (specifically compiler dump intermediate object files in /tmp). I doubted that will have any significant benefit. Usually compiler intermediate object were read/write in sequential order so there is no benefit use a mmap access so the compiler does not actually use mmap call for the intermediate object file, The default tmpfs memory in kernel allocate can handle this easily, On the other hand the THP is not easily managed, kernel will need to trigger separated kernel thread to handle this will create unnecessary context switch.
Hang on a minute; isn't it true that if (and this should usually be the case) every file in the build tree fits in a single (huge)page (or a few pages, for large binaries), the number of system calls during the build process will be dramatically reduced (the read() and write() calls still have to go somewhere, in this case tmpfs, and if a file only needs one page, then that's less reading farther down in the kernel, I think)? Isn't that desirable?

Linux default perform read/write in 8KB. so using a default 2MB hugepage size. assume the object file is 2MB + 1byte, therefor the write operation will need 1 hugepage for first 2MB plus second hugepage for that 1byte so the allocation will always be 1+1 hugepage operations and the read operation will perform first 256 read call on first hugepage and the second 1 read (still 8KB, but return 1 byte) and the 2nd read call will not care if it is consecutive because there will not have any seek operation for tmpfs type at least not in this case.

leyvi · Post by **leyvi** » Wed Dec 10, 2025 5:34 pm

I'd like to point out (I think I forgot to mention this): I am running this command in a local.d script.

pingtoo · Post by **pingtoo** » Wed Dec 10, 2025 7:16 pm

leyvi wrote:I'd like to point out (I think I forgot to mention this): I am running this command in a local.d script.

I guess you mean this is early start in system.

But in fact local service is last thing rc run, so it actually quiet late.

If you want it very very early you would likely put it in boot level. may be even better is use kernel command line to setup.

if you just want a large number for nr_hugepages just give it a large number. No memory actually allocated. see /proc/meminfo/{HugePages_Total | HugePages_Free | HugePages_Rsvd}

Very generic errors when using `sysfs`.

Very generic errors when using `sysfs`.

Re: Very generic errors when using `sysfs`.

Re: Very generic errors when using `sysfs`.

Re: Very generic errors when using `sysfs`.