Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Bug in Zram
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1144
Location: Romania

PostPosted: Mon Mar 18, 2024 3:41 am    Post subject: Bug in Zram Reply with quote

Hi

I've encountered a really unpleasant bug that hangs zram. Usually I use a zram drive for portage to build. It uses zstd. Funny thing is, if I try to format that zram device as ext4, portage hangs while untaring archive into it, xfs hangs while configuring or compiling. Not sure why FS hangs from start and why takes a bit longer. It seems am able to trigger it in 6.6 kernel line, but not in 6.1 line. So far, was only able to trigger it on ARM. I do have some x64 systems, but somehow those are not affected by the looks of it. Use the same layout there, but it doesn't trigger on intel machines. Those machines are on 6.6.13. maybe the regression is more recent. Have to test more i guess.

I was assuming I did something wrong in building the kernel for my PIs 5, because with the same config, 6.1.77 zram works as expected, however on 6.6.21 it hangs. Was ok blaming myself or figuring out its a bug somewhere in the raspberry pi part of the kernel, but, I am also trying asahi/gentoo on one of my Apple M1 machines and triggered it again on a fedora kernel that came with vmlinuz-6.6.3-413.asahi.fc39.aarch64+16k.

To trigger it, or test, I just zramctl /dev/zram0 -a zstd -s 16G -t8; mkfs.xfs -L ramfs /dev/zram0; mount /dev/zram0 /ramfs -o noatime,nodiratime,discard; PORTAGE_TMPDIR="/ramfs" PORTAGE_TMPFS="/ramfs" emerge iptraf-ng.

It is most likely a bug that needs to be reported, but before doing that, was going to ask if it happens to anyone else. What versions of kernel. What arch. Might be easier to find it if others also encountered it.
Back to top
View user's profile Send private message
sam_
Developer
Developer


Joined: 14 Aug 2020
Posts: 1678

PostPosted: Mon Mar 18, 2024 7:59 am    Post subject: Reply with quote

If you have a simple reproducer, I'd just bisect it.
Back to top
View user's profile Send private message
Child_of_Sun_24
Guru
Guru


Joined: 28 Jul 2004
Posts: 578

PostPosted: Mon Mar 18, 2024 10:40 am    Post subject: Reply with quote

Here on amd64 and kernel 6.8.1-gentoo no problems with zram. But i use btrfs for /var/tmp/portage. I also use zstd compression.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54245
Location: 56N 3W

PostPosted: Mon Mar 18, 2024 12:17 pm    Post subject: Reply with quote

axl,

arm or arm64?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1144
Location: Romania

PostPosted: Mon Mar 18, 2024 5:32 pm    Post subject: Reply with quote

Neddy arm64 both. rpi 5 and apple m1.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54245
Location: 56N 3W

PostPosted: Mon Mar 18, 2024 8:08 pm    Post subject: Reply with quote

axl,

Code:
Pi5 ~ # uname -a
Linux Pi5 6.6.21-v8-16k+ #1743 SMP PREEMPT Thu Mar 14 11:40:50 GMT 2024 aarch64 GNU/Linux
works here, as does 6.1.69-v8-16k.

That's running your trigger on xfs.

I did not run
Code:
zramctl /dev/zram0 -a zstd -s 16G -t8; mkfs.xfs -L ramfs /dev/zram0; mount /dev/zram0 /ramfs -o noatime,nodiratime,discard; PORTAGE_TMPDIR="/ramfs" PORTAGE_TMPFS="/ramfs" emerge iptraf-ng
as a single command line. I waited for each command to complete before I issued the next.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1144
Location: Romania

PostPosted: Mon Mar 18, 2024 8:22 pm    Post subject: Reply with quote

Thank you for taking the time to test Neddy. Hmmm. That's the stock kernel from github right?

Its not the pause between the commands. I only wrote them here on the forum like that. In real life I write them manually 1 by 1 with 5 - 10 seconds between them. And the whole thing only hangs when portage starts writing in there. I've noticed that it doesn't hang when portage is mkdir .distcc and creates the cpu sockets, but only when it untars archives as portage. I tried to untar manually as root and it didn't hang.

Guess I will have to think of more creative ways to test. I'll try to install the stock kernel from github and see if that triggers it when portage is writing to it. Thanks.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54245
Location: 56N 3W

PostPosted: Mon Mar 18, 2024 8:24 pm    Post subject: Reply with quote

axl,

Its a stock binary kernel from the Foundation firmware repo, just like https://wiki.gentoo.org/wiki/Raspberry_Pi_Install_Guide#Installing_the_Raspberry_Pi_Foundation_files
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1144
Location: Romania

PostPosted: Mon Mar 18, 2024 11:53 pm    Post subject: Reply with quote

well, it seems only zstd is affected. Had a bit of time to test and LZ4 is working just fine. I'll do more testing as soon as I get some more free time.
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1144
Location: Romania

PostPosted: Tue Mar 19, 2024 5:30 am    Post subject: Reply with quote

Might have spoken too early. Still hangs the same way when trying to do fstrim. And without that, its back to square one. I noticed in 6.6 kernel they have a new option in block devices: ZRAM_MULTI_COMP. aint that. Tried on and off. Have to get myself a proper x86 to test. Sux to test on arm. Asahi only has one kernel, and not exactly sure how to bisect between the two branches of 6.1 and 6.6 that raspberry pi has.Not sure there's a direct correlation of patches between those 2 branches. But I might be wrong. Wouldn't be the first time.
Back to top
View user's profile Send private message
gentoo_ram
Guru
Guru


Joined: 25 Oct 2007
Posts: 474
Location: San Diego, California USA

PostPosted: Tue Mar 19, 2024 10:31 pm    Post subject: Reply with quote

I'm currently stable on my RPi5 using the raspberry-pi linux kernel sources. /var/tmp/portage is formatted ext4. But I'm using 4k kernel page size. I was getting weird effects with the 16k page size. I think I remember zram hangs when using the 16k. Maybe that's what you are seeing? I figured it had to do with the filesystem being 4k page size when the kernel was 16k page size. If you aren't using 4k page size, try using that instead.

Code:
user@genpi ~ $ zramctl
NAME       ALGORITHM DISKSIZE  DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd            4G    4K   68B   20K       4 [SWAP]
/dev/zram1 zstd            8G 14.5M  1.1M  1.9M       4 /var/tmp/portage
user@genpi ~ $ uname -a
Linux genpi 6.8.0-v8+ #13 SMP PREEMPT Mon Mar 18 13:22:52 PDT 2024 aarch64 GNU/Linux
user@genpi ~ $ lsblk -t
NAME         ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE  RA WSAME
zram0                0   4096   4096    4096    4096    0               128    0B
zram1                0   4096   4096    4096    4096    0               128    0B
nvme0n1              0    512      0     512     512    0 none     1023 128    0B
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1144
Location: Romania

PostPosted: Thu Mar 21, 2024 6:09 am    Post subject: Reply with quote

gentoo_ram wrote:
I'm currently stable on my RPi5 using the raspberry-pi linux kernel sources. /var/tmp/portage is formatted ext4. But I'm using 4k kernel page size. I was getting weird effects with the 16k page size. I think I remember zram hangs when using the 16k. Maybe that's what you are seeing? I figured it had to do with the filesystem being 4k page size when the kernel was 16k page size. If you aren't using 4k page size, try using that instead.


Is this 16k thing documented anywhere? I am starting to get corruptions even in 6.1 line now. that 16k thing is the only new thing in this equation.

Code:
/usr/lib/gcc/aarch64-unknown-linux-gnu/13/../../../../aarch64-unknown-linux-gnu/bin/ld: warning: lib64/libLLVMCodeGen.a(StackMapLivenessAnalysis.cpp.o) has a corrupt string table index
/usr/lib/gcc/aarch64-unknown-linux-gnu/13/../../../../aarch64-unknown-linux-gnu/bin/ld: error: lib64/libLLVMCodeGen.a: ELF section name out of range


Not the first time. And not really happy to be unable to actually rely on this functionality. I dont really think its the hardware. Its not overclocked. Its not just one pi. The are set ondemand, with proper cooling. And its not just PIs. Its happening on M1 too, and that has a different kernel completely. (but also 16k pages).
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1144
Location: Romania

PostPosted: Thu Mar 21, 2024 6:59 am    Post subject: Reply with quote

still weird if you think about it. how come neddy wasn't able to trigger it? he has 16k pages.

meanwhile am noticing that after a few days of usage, even on kernel 6.1, zram drive simply becomes unusable. Was trying to compile clang on one of the new pi5. Had 3 days uptime. At one point, couldn't even get libcxx compiled. That's kinda small. Every time it would fill at least one of the files with garbage. randomly. Works fine if you reset machine or reset the zram device and start over.
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1144
Location: Romania

PostPosted: Thu Mar 21, 2024 7:55 am    Post subject: Reply with quote

Seems like you were right. 4k pages on 6.6 line seems to work just fine. Thanks for the suggestion.
Back to top
View user's profile Send private message
gentoo_ram
Guru
Guru


Joined: 25 Oct 2007
Posts: 474
Location: San Diego, California USA

PostPosted: Thu Mar 21, 2024 9:00 pm    Post subject: Reply with quote

That's another person with the same observation I had. At least I know I wasn't crazy! I noticed it pretty quickly after I changed my kernel config to move to 16k pages after running stable for a while at 4k. The page size was the only thing I changed when zram started acting badly.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum