lseek(..., SEEK_HOLE) giving ENOENT

ddawson · n00b Joined: 24 Jul 2018 Posts: 21 Location: United States

I have chromium installed. Lately, just for it I'm setting PORTAGE_TMPDIR to somewhere with plenty of space (I normally have it in /tmp, but there just isn't enough RAM for this gargantuan package), and now it's failing at the install stage with the following output:

sam_ · Developer Joined: 14 Aug 2020 Posts: 1678

What filesystem is PORTAGE_TMPDIR on in this instance?

ddawson · n00b Joined: 24 Jul 2018 Posts: 21 Location: United States

Ext4. And if it matters, features are has_journal ext_attr dir_index filetype meta_bg extent 64bit flex_bg inline_data sparse_super large_file huge_file dir_nlink extra_isize metadata_csum

Also, in that virtual machine I mentioned, I also used ext4, though with defaults. I might try it again with matching features. And use a kernel with the same config as well.

ddawson · n00b Joined: 24 Jul 2018 Posts: 21 Location: United States

I've made progress in debugging this. First of all, it's happening with other packages, e.g. dev-qt/qtgui. After debugging the kernel and filesystem for a while, I found the following.

This only happens when feature inline_data is enabled, because the affected code first checks for this before checking for inline data being present.
The problem file for dev-qt/qtgui is temp/qtgui-qconfig.h, which is 207 bytes, too large to be inline. The on-disk inode is good. It has i_flags == 0x80000 (EXT4_EXTENTS_FL), as should be expected.
However, for some reason, in memory, the flags (in struct ext4_inode_info) appear to be messed up. I found i_flags == 0x2210000000, which includes EXT4_INLINE_DATA_FL. The kernel code determines the size of the inline data (in this case, 111 bytes) to be less than the file size, tries to find more, and can't, so it returns -ENOENT.
I haven't yet determined why i_flags has a bad value.

Hu · Moderator Joined: 06 Mar 2007 Posts: 21651

With what kernel version(s) have you observed this? Have you found a simpler reproducer than running emerge affected-package? If this is only observed with emerge (so far), does disabling the Portage sandbox have any effect? As a user library, the sandbox should not be causing this, but it could be causing the relevant programs to use different system calls than the non-sandbox path, allowing them to hit a kernel bug that is not seen when using the non-sandbox path.

If you mount a tmpfs on PORTAGE_TMPDIR, does the emerge then succeed? I would expect so since this appears to be in the ext4 fs code, but if not, that would suggest a more general VFS problem and that ext4 is just where the error finally becomes noticeable.

grknight · Retired Dev Joined: 20 Feb 2015 Posts: 1666

Fun quote from Ted Tso at https://bugzilla.kernel.org/show_bug.cgi?id=200681#c5 :

ddawson · n00b Joined: 24 Jul 2018 Posts: 21 Location: United States

ddawson · n00b Joined: 24 Jul 2018 Posts: 21 Location: United States

Okay, I think I have a simple way to reproduce this.

ddawson · n00b Joined: 24 Jul 2018 Posts: 21 Location: United States

One can also see the bad flags just by running this command line:

Spacey · n00b Joined: 07 Apr 2024 Posts: 1

For what it is worth, I am running into this too. Many months ago I couldn't figure this out, so I just started excluding that one package from updates. But then a 2nd package came along. And then after a profile update, I wanted to emerge world and found a 3rd affected package. hddtemp, qemu, and edk2-ovmf-bin are the ones I ran into.

They all broke with an Errno 2 on copying README.gentoo ... just like yours. I tested on 6.1.12 and 6.6.21 and both had the same issue. I did an strace, and ended up deciding this was the point where things broke: