Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Other Things Gentoo
  • Search

GBs of memory wasted in Percpu thanks to stale cgroups

Still need help with Gentoo, and your question doesn't fit in the above forums? Here is your last bastion of hope.
Post Reply
Advanced search
18 posts • Page 1 of 1
Author
Message
eaf
n00b
n00b
Posts: 13
Joined: Fri Apr 27, 2018 1:24 am

GBs of memory wasted in Percpu thanks to stale cgroups

  • Quote

Post by eaf » Thu Dec 19, 2024 6:48 pm

Hi,

I've been hunting a significant memory leak on my system where every day the amount of used memory would go up by a few GB. I'm not talking about caches, buffers, ARC, etc, I'm talking about Percpu in /proc/meminfo that climbed all the way up to 50GB at some point.

I think I've traced it down to cgroups (because I also noticed that I had an explosion of them) and then to elogind and OpenRC.

Apparently, elogind creates a new cgroup for every new login. With cgroups v.1 it was also setting a per-cgroup release_agent to /lib64/elogind/elogind-cgroups-agent that was supposed to be called when the corresponding cgroup became empty. That agent would then cleanup the empty cgroup. On systemd installations the cleanup would be done by systemd.

With cgroups v.2 the cleanup mechanism has changed, someone is now supposed to be monitoring the corresponding cgroup.events file, and when that file has "populated 0" in it, get rid of the cgroup. I guess, elogind does not support this cleanup mechanism, because tens of thousands of empty cgroups were left lying around on my system.

I think, and I may be totally wrong here, that the issue is that OpenRC by default mounts cgroups v.2 under /sys/fs/cgroup, and elogind doesn't know how to do cgroup cleanup for v.2.

Has anybody observed this pileup of unused cgroups and Percpu memory on their setups? Am I perhaps missing some sort of an /etc/init.d service that I neglected to activate and that would do this cleanup automatically thereby avoiding this pileup?

Thanks!
Top
pingtoo
Advocate
Advocate
User avatar
Posts: 2183
Joined: Fri Sep 10, 2021 8:37 pm
Location: Richmond Hill, Canada

  • Quote

Post by pingtoo » Thu Dec 19, 2024 7:01 pm

How to find this "cground pile up" symptom? Or if I don't see it in a obvious way that mean I don't have this situation?
Top
eaf
n00b
n00b
Posts: 13
Joined: Fri Apr 27, 2018 1:24 am

  • Quote

Post by eaf » Thu Dec 19, 2024 7:24 pm

"grep Percpu /proc/meminfo" was showing tens of GB allocated by "per cpu" allocators.

"cat /proc/cgroups" was showing tens of thousands of groups on my setup. Once I noticed that, I looked for cgroups in /sys/fs/cgroup that had empty cgroup.procs file or "populated 0" in cgroup.events file. Most of those groups counted by /proc/cgroups were found empty. Upon destroying them, the Percpu in /proc/meminfo dropped from 50GB to 2GB.

This box sees a ton of ssh and sftp traffic, which I guess accounts for the rapid growth of abandoned per-session cgroups.
Top
Hu
Administrator
Administrator
Posts: 24398
Joined: Tue Mar 06, 2007 5:38 am

  • Quote

Post by Hu » Thu Dec 19, 2024 7:37 pm

With what version(s) of elogind did you observe this? The output of emerge --pretend --verbose sys-apps/openrc sys-auth/elogind might be useful.
Top
eaf
n00b
n00b
Posts: 13
Joined: Fri Apr 27, 2018 1:24 am

  • Quote

Post by eaf » Thu Dec 19, 2024 7:42 pm

Code: Select all

[ebuild   R    ] sys-apps/openrc-0.54.2::gentoo  USE="netifrc pam sysvinit unicode -audit -bash -caps -debug -newnet -s6 (-selinux) -sysv-utils" 245 KiB
[ebuild   R    ] sys-auth/elogind-252.9-r2::gentoo  USE="acl pam policykit -audit -cgroup-hybrid -debug -doc (-selinux) -test" 1,878 KiB
Top
pingtoo
Advocate
Advocate
User avatar
Posts: 2183
Joined: Fri Sep 10, 2021 8:37 pm
Location: Richmond Hill, Canada

  • Quote

Post by pingtoo » Thu Dec 19, 2024 8:40 pm

eaf wrote:"grep Percpu /proc/meminfo" was showing tens of GB allocated by "per cpu" allocators.

"cat /proc/cgroups" was showing tens of thousands of groups on my setup. Once I noticed that, I looked for cgroups in /sys/fs/cgroup that had empty cgroup.procs file or "populated 0" in cgroup.events file. Most of those groups counted by /proc/cgroups were found empty. Upon destroying them, the Percpu in /proc/meminfo dropped from 50GB to 2GB.

This box sees a ton of ssh and sftp traffic, which I guess accounts for the rapid growth of abandoned per-session cgroups.
Thanks for the information.

Code: Select all

me@rpi5 ~ $ cat /proc/meminfo |grep Per
Percpu:             1664 kB

Code: Select all

me@rpi5 ~ $ cat /proc/cgroups 
#subsys_name	hierarchy	num_cgroups	enabled
cpuset	0	93	1
cpu	0	93	1
cpuacct	0	93	1
blkio	0	93	1
memory	0	93	0
devices	0	93	1
freezer	0	93	1
net_cls	0	93	1
perf_event	0	93	1
net_prio	0	93	1
pids	0	93	1
Linux rpi5 6.6.31+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.31-1+rpt1 (2024-05-29) aarch64 GNU/Linux

This is on RPI 5 with rpi 16k page knernel.
Top
eaf
n00b
n00b
Posts: 13
Joined: Fri Apr 27, 2018 1:24 am

  • Quote

Post by eaf » Thu Dec 19, 2024 8:48 pm

That's cool, and that's what I would expect to see too. But aren't you running Debian, and likely systemd too? I'm wondering, if perhaps I'm seeing some conflicting configuration on Gentoo where OpenRC mounts cgroups v.2 and elogind can't cope with it. But I didn't specially configure any of that, it's all default.
Top
pingtoo
Advocate
Advocate
User avatar
Posts: 2183
Joined: Fri Sep 10, 2021 8:37 pm
Location: Richmond Hill, Canada

  • Quote

Post by pingtoo » Thu Dec 19, 2024 9:03 pm

eaf wrote:That's cool, and that's what I would expect to see too. But aren't you running Debian, and likely systemd too? I'm wondering, if perhaps I'm seeing some conflicting configuration on Gentoo where OpenRC mounts cgroups v.2 and elogind can't cope with it. But I didn't specially configure any of that, it's all default.
No, I am just using the RPI's kernel, my rootfs is Gentoo based.

my make.profile is

Code: Select all

make.profile -> ../../var/db/repos/gentoo/profiles/default/linux/arm64/23.0/desktop/gnome/systemd
So yes. I am using systemd.

Code: Select all

me@rpi5 ~ $ mount|grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
Top
sublogic
Guru
Guru
User avatar
Posts: 388
Joined: Mon Mar 21, 2022 3:02 am
Location: Pennsylvania, USA

  • Quote

Post by sublogic » Thu Dec 19, 2024 11:38 pm

I see it too! But not on the same scale as eaf.

Code: Select all

$ mount | grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)

$ ls /sys/fs/cgroup
10  31  51  70  92                      memory.stat
11  32  52  71  c1                      openrc.apt-cacher-ng
12  33  53  72  c2                      openrc.avahi-daemon
13  34  54  73  c3                      openrc.bluetooth
14  35  55  76  c4                      openrc.cronie
15  36  56  78  cgroup.controllers      openrc.cupsd
16  37  57  79  cgroup.max.depth        openrc.dbus
17  38  58  8   cgroup.max.descendants  openrc.display-manager
18  39  59  80  cgroup.procs            openrc.distccd
19  4   6   81  cgroup.stat             openrc.net.wlp6s0
20  40  60  82  cgroup.subtree_control  openrc.ntpd
21  41  61  83  cgroup.threads          openrc.rasdaemon
22  42  62  84  cpu.stat                openrc.rpc.idmapd
23  43  63  85  cpu.stat.local          openrc.rpc.statd
24  45  64  86  cpuset.cpus.effective    openrc.rpcbind
25  46  65  87  cpuset.mems.effective    openrc.rsyncd
26  47  66  88  elogind                 openrc.sshd
27  48  67  89  io.cost.model           openrc.sysklogd
28  49  68  9   io.cost.qos             openrc.udev
29  5   69  90  io.stat
30  50  7   91  memory.reclaim
Among the two-digit cgroups, 80 and c2 are my xfce4 session and a tigervnc session. The others are stale.

Code: Select all

$ grep -l populated\ 1 /sys/fs/cgroup/??/cgroup.events
/sys/fs/cgroup/80/cgroup.events
/sys/fs/cgroup/c2/cgroup.events

$ grep -l populated\ 0 /sys/fs/cgroup/??/cgroup.events
/sys/fs/cgroup/10/cgroup.events
/sys/fs/cgroup/11/cgroup.events
...
/sys/fs/cgroup/91/cgroup.events
/sys/fs/cgroup/92/cgroup.events
/sys/fs/cgroup/c1/cgroup.events
/sys/fs/cgroup/c3/cgroup.events
/sys/fs/cgroup/c4/cgroup.events
Top
eaf
n00b
n00b
Posts: 13
Joined: Fri Apr 27, 2018 1:24 am

  • Quote

Post by eaf » Fri Dec 20, 2024 1:50 am

It's definitely elogind that's creating these groups:

Code: Select all

mkdir("/sys/fs/cgroup/4041", 0755)      = 0
Interestingly, its source code does have some inotify handlers, and it should be able to recognize changes to cgroups_events and should be able to do cleanup. Yet, it doesn't.

Also, if I change /etc/rc.conf to mount /sys/fs/cgroup in "legacy" mode, then elogind starts creating cgroups in a different place, and then openrc controller takes care of the cleanup by running /lib/rc/sh/cgroup-release-agent.sh for each released group:

Code: Select all

mkdir("/sys/fs/cgroup/openrc/5", 0755)  = 0
I figure, I'll open an issue with elogind devs, and perhaps they'll tell me right off the bat what's missing here.
Top
sam_
Developer
Developer
User avatar
Posts: 2816
Joined: Fri Aug 14, 2020 12:33 am

  • Quote

Post by sam_ » Sun Dec 22, 2024 12:47 am

For completeness, the bug OP has filed seems to be https://github.com/elogind/elogind/issues/296
Top
eaf
n00b
n00b
Posts: 13
Joined: Fri Apr 27, 2018 1:24 am

  • Quote

Post by eaf » Sun Dec 22, 2024 6:14 pm

I poked around elogind source code, and I kinda wish I didn't. There's a lot of #if 0 sprinkled all over the place, hundreds of lines of commented code at a time, and the functions that are supposed to setup monitoring of cgroup.events files are never even called. It might be intentional, they do say right before a big chunk of disabled inotify code that "elogind is not init, and does not install the agent here." And I get it that elogind was extracted from systemd, so some scars are supposed to be present, but boy was that an invasive surgery, and things were just left patched and bandaged throughout the code. No reaction from elogind folks about the issue. I start thinking that we're just lucky that whatever works works.

So, the options to avoid the leak appear to be:
  • Switch to the original systemd;
  • Change /etc/rc.conf to mount cgroups v.1, and then openrc will take care of the cleanup;
  • Set up a cronjob to scan empty cgroups and delete them manually.
Top
sam_
Developer
Developer
User avatar
Posts: 2816
Joined: Fri Aug 14, 2020 12:33 am

  • Quote

Post by sam_ » Mon Dec 23, 2024 4:23 am

eaf wrote:I poked around elogind source code, and I kinda wish I didn't. There's a lot of #if 0 sprinkled all over the place, hundreds of lines of commented code at a time, and the functions that are supposed to setup monitoring of cgroup.events files are never even called. It might be intentional, they do say right before a big chunk of disabled inotify code that "elogind is not init, and does not install the agent here." And I get it that elogind was extracted from systemd, so some scars are supposed to be present, but boy was that an invasive surgery, and things were just left patched and bandaged throughout the code. No reaction from elogind folks about the issue. I start thinking that we're just lucky that whatever works works.
[...]
I'm afraid that I've held this opinion too for quite some time.
Top
Yamakuzure
Advocate
Advocate
User avatar
Posts: 2323
Joined: Wed Jun 21, 2006 11:06 am
Location: Adendorf, Germany
Contact:
Contact Yamakuzure
Website

  • Quote

Post by Yamakuzure » Fri Dec 27, 2024 8:53 am

Are you sure it must be eloginds fault?

Code: Select all

 ~ # grep -i cgroup_mode /etc/rc.conf 
#rc_cgroup_mode="unified"

 ~ # mount | grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)

 ~ # loginctl --version
elogind 255 (255)
+PAM +AUDIT -SELINUX -APPARMOR +SMACK -SECCOMP +ACL +UTMP default-hierarchy=unified

 ~ # grep Percpu /proc/meminfo 
Percpu:            10496 kB
That doesn't look like "tens of gigabyte" to me.
... There's a lot of #if 0 sprinkled all over the place ...
Yes. This was done so that the original systemd code that is not needed can be kept in. This is required to migrate their commits automatically instead of having to do it manually line-by-line. At the end of the day, elogind is just systemd-logind cut out of the swamp and then enhanced with a rather small list of extras.

If you think elogind does something wrong here or is missing out on something, I'll look into it. But probably not before next year, sorry.
Edited 220,176 times by Yamakuzure
Top
eaf
n00b
n00b
Posts: 13
Joined: Fri Apr 27, 2018 1:24 am

  • Quote

Post by eaf » Fri Dec 27, 2024 2:33 pm

Yes, I'm afraid all evidence points to elogind:
  • Every time when I ssh in and out of the box, a new cgroup is leaked under /sys/fs/cgroup;
  • strace of elogind and a peek into its source code show that it's elogind that's creating the cgroups;
  • With every newly created cgroup, Percpu in /proc/meminfo goes up 1536KB;
  • Once the accumulated cgroups are manually destroyed with cgdelete, the Percpu memory is reclaimed.
The effect of the leak will depend on the number of SSH sessions that get open and closed per second. I suspect, it also depends on the number of CPUs in the box or perhaps even NUMA nodes, this one reports 128 CPUs in /proc/cpuinfo. On another box with 56 CPUs I see only 256KB Percpu increase for each leaked cgroup.
Top
gentoo_ram
Guru
Guru
Posts: 528
Joined: Thu Oct 25, 2007 10:04 pm
Location: San Diego, California USA

  • Quote

Post by gentoo_ram » Sat Dec 28, 2024 4:31 pm

Interesting that I am not seeing this leak issue. Running OpenRC and elogind. Running on a Raspberry Pi 5 with the RPI kernel. I generally SSH into the device a couple times a day.

Code: Select all

host ~ # mount | grep cgr
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)

host ~ # grep cpu /proc/meminfo 
Percpu:              880 kB

host ~ # loginctl --version
elogind 252.9 (252.9)
+PAM -AUDIT -SELINUX -APPARMOR +SMACK -SECCOMP -GCRYPT +ACL -BLKID -KMOD +UTMP default-hierarchy=unified

host ~ # uname -a
Linux host 6.12.3-v8+ #22 SMP PREEMPT Mon Dec  9 17:10:10 PST 2024 aarch64 GNU/Linux

host ~ # ls -1F /sys/fs/cgroup/
cgroup.controllers
cgroup.max.depth
cgroup.max.descendants
cgroup.procs
cgroup.stat
cgroup.subtree_control
cgroup.threads
cpuset.cpus.effective
cpuset.cpus.isolated
cpuset.mems.effective
cpu.stat
cpu.stat.local
elogind/
io.stat
openrc.apache2/
openrc.avahi-daemon/
openrc.bluetooth/
openrc.cronie/
openrc.dbus/
openrc.distccd/
openrc.dovecot/
openrc.net.end0/
openrc.net.wlan0/
openrc.nfs/
openrc.ntpd/
openrc.postfix/
openrc.rpcbind/
openrc.rpc.statd/
openrc.rsyncd/
openrc.samba/
openrc.scanmsgs/
openrc.sshd/
openrc.syslog-ng/
openrc.udev/
openrc.upsd/
openrc.upsdrv/
openrc.upsmon/
Top
Yamakuzure
Advocate
Advocate
User avatar
Posts: 2323
Joined: Wed Jun 21, 2006 11:06 am
Location: Adendorf, Germany
Contact:
Contact Yamakuzure
Website

  • Quote

Post by Yamakuzure » Sun Dec 29, 2024 11:00 am

So, if this issue is specific to SSH'ing into the box, then it is no wonder I have not yet been able to reproduce it.

Another hard-to-reproduce (alleged) bug with SSH'ing into OpenRC+elogind run boxes exists, which I have not been able to reproduce yet. Maybe the information you gathered can help me track both issues down. That would be nice!
Edited 220,176 times by Yamakuzure
Top
sublogic
Guru
Guru
User avatar
Posts: 388
Joined: Mon Mar 21, 2022 3:02 am
Location: Pennsylvania, USA

  • Quote

Post by sublogic » Mon Dec 30, 2024 1:02 am

Yamakuzure wrote:So, if this issue is specific to SSH'ing into the box, then it is no wonder I have not yet been able to reproduce it.
You can ssh to localhost, or switch to a VT console and log in there. Check /proc/$$/cgroup before and after to confirm that you are in a new cgroup. Also check with loginctl that you have a new session. When you exit the ssh or the VT login, the new cgroup is not reclaimed.
Top
Post Reply

18 posts • Page 1 of 1

Return to “Other Things Gentoo”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic