Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Speeding up complilation on Raspberry Pi 5 bare metal
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo on ARM
View previous topic :: View next topic  
Author Message
netsplit
n00b
n00b


Joined: 10 Jun 2024
Posts: 20

PostPosted: Fri Jul 11, 2025 5:40 pm    Post subject: Speeding up complilation on Raspberry Pi 5 bare metal Reply with quote

Obviously it's never gonna be blazing but so far I've noticed:

By default the kernel is in power saving mode. It can be set to ondemand with echo "ondemand" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor. You can also change the default in your kernel config.

MAKEOPTS -ln seems to be overly aggressive compared to amd64. Setting it forced many gcc builds to a single thread for me. Unsetting it seems to gave fixed that. However now it seems I might need to investigate water cooling because it throttles from the heat now lol
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 55432
Location: 56N 3W

PostPosted: Fri Jul 11, 2025 6:29 pm    Post subject: Reply with quote

netsplit,

For cooling you need the official Pi5 fan assisted cooler. That keeps the Pi out of thermal throttling, unless you put the Pi and cooler into a case that restricts the airflow.
With MAKEOPTS="-j4" and --jobs=1 that will let you build Chromium in only 32h.

You could also try cross distcc and ccache. Both have their drawback as they don't work for everything.

-- edit --

Try
Code:
# vcgencmd get_throttled && vcgencmd measure_temp && vcgencmd measure_clock arm
throttled=0x0
temp=65.9'C
frequency(0)=2400030464
to see what's going on and
Code:
# genlop -c

 Currently merging 1 out of 1

 * www-client/chromium-139.0.7258.31

       current merge time: 2 hours, 20 minutes and 28 seconds.
       ETA: 1 day, 4 hours, 31 minutes and 22 seconds.
That estimate is about 2 hors short.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
netsplit
n00b
n00b


Joined: 10 Jun 2024
Posts: 20

PostPosted: Sat Jul 12, 2025 12:02 am    Post subject: Reply with quote

NeddySeagoon wrote:
netsplit,

For cooling you need the official Pi5 fan assisted cooler. That keeps the Pi out of thermal throttling, unless you put the Pi and cooler into a case that restricts the airflow.
With MAKEOPTS="-j4" and --jobs=1 that will let you build Chromium in only 32h.


When updates finish I'll try building Chromium. A 32 hour build sounds like an experience.

Quote:

You could also try cross distcc and ccache. Both have their drawback as they don't work for everything.


I tried getting distcc working but ran into problems. Also tried setting up an arm64 vm, and discovered Qemu has some issues with threading. The Pi seems to build fast enough. One
thing that seems to make a difference is the storage medium. It builds a lot faster off an nvme drive than a usb stick.

Quote:

Try
Code:
# vcgencmd get_throttled && vcgencmd measure_temp && vcgencmd measure_clock arm
throttled=0x0
temp=65.9'C
frequency(0)=2400030464
to see what's going on and
Code:
# genlop -c


Code:

raspberrypi ~ # vcgencmd get_throttled && vcgencmd measure_temp && vcgencmd measure_clock arm
throttled=0x0
temp=67.5'C
frequency(0)=2400037120


Seems like it's doing okay. I don't have the raspberry pi heatsink fan but I have a 3rd party one. At some point it's going to be installed in my car and I live in a sunny place so it'll probably need better cooling, at least when the car starts.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 55432
Location: 56N 3W

PostPosted: Sat Jul 12, 2025 9:59 am    Post subject: Reply with quote

netsplit,

Chromium on Pi5 needs both -mcpu and -march unset to avoid build failures.
The bug has been reported to Gentoo but not yet upstream.
It needs more work first.

NVMe is a lot faster than USB. Especially if your USB is bulk mode only.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
netsplit
n00b
n00b


Joined: 10 Jun 2024
Posts: 20

PostPosted: Sat Jul 12, 2025 3:04 pm    Post subject: Reply with quote

NeddySeagoon wrote:
netsplit,

Chromium on Pi5 needs both -mcpu and -march unset to avoid build failures.
The bug has been reported to Gentoo but not yet upstream.
It needs more work first.

NVMe is a lot faster than USB. Especially if your USB is bulk mode only.


Thanks for the tips!

I think the storage medium was the biggest surprise, but it makes sense. Even on an NVMe (that I've verified is running at PCIe 3.0) I'm seeing CPU usage percentages are ranging between 25% to 75% per build thread. Assuming the CPU usage percents are accurate, the CPU isn't the current bottleneck. When I'll do the chromium build I'll try setting the build environment to use a ram drive.
Back to top
View user's profile Send private message
netsplit
n00b
n00b


Joined: 10 Jun 2024
Posts: 20

PostPosted: Wed Jul 16, 2025 3:40 am    Post subject: Reply with quote

Just a follow up, tmpfs was actually worse. 38.25 hour build. Oddly the build threads had higher CPU use. It seems the RAM file system had more CPU overhead then I would have guessed. I also suspect there was some swap usage which would have negated the point of tmpfs.

Here's the setup:
pi 5, 16gb ram

/etc/fstab
Code:

#size=10G would fail for not enough disk space
tmpfs           /var/tmp/tmpfs  tmpfs   size=102401M,uid=portage,gid=portage,mode=775        0 0


/etc/portage/env/tmpfs.conf
Code:

PORTAGE_TMPDIR="/var/tmp/tmpfs"


/etc/portage/package.env
Code:

www-client/chromium             tmpfs.conf



Code:

[ebuild   R    ] www-client/chromium-138.0.7204.92:0/stable::gentoo  USE="cups hangouts official proprietary-codecs pulseaudio qt6 rar screencast system-harfbuzz system-png system-zstd wayland -X -bindist -bundled-toolchain -custom-cflags -debug -ffmpeg-chromium -gtk4 (-headless) -kerberos -pax-kernel (-pgo) (-selinux) (-system-icu) -test -vaapi (-widevine)" L10N="-af -am -ar -bg -bn -ca -cs -da -de -el -en-GB -es -es-419 -et -fa -fi -fil -fr -gu -he -hi -hr -hu -id -it -ja -kn -ko -lt -lv -ml -mr -ms -nb -nl -pl -pt-BR -pt-PT -ro -ru -sk -sl -sr -sv -sw -ta -te -th -tr -uk -ur -vi -zh-CN -zh-TW" LLVM_SLOT="20 -19" 0 KiB



Results:
Code:

raspberrypi ~ # genlop -t chromium
 * www-client/chromium

     Tue Jul 15 07:07:49 2025 >>> www-client/chromium-138.0.7204.92
       merge time: 1 day, 14 hours, 14 minutes and 10 seconds.



So tmpfs added an extra 6 hours lol.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 55432
Location: 56N 3W

PostPosted: Wed Jul 16, 2025 9:31 am    Post subject: Reply with quote

netsplit,

That result does not surprise me. If you have the RAM to build in RAM, the kernel cache will do it anyway.
Building in tmpfs saves writes that will never be read. That may be a good thing for SSDs.
Reads/writes are all DMA, so time savings from not setting up DMA will be too small to measure.

If you have swap, the content of tmpfs can be moved to swap under pressure of RAM.

When you don't have swap, there is no home on disk fbr dynamically allocated RAM, so the kernel has to 'swap' in other ways.
It can drop clean pages, then reload them later. This includes code that it will execute real soon now.
It can write 'dirty' pages out, so that they are clean, then drop/reload them.

All my Pis have either 8G of swap, for RAM <= 8G or 16G on the 16G ones.
It may not be used but it makes it easy to spot when things are being pushed a bit hard.
Then it's time to reduce MAKEOPTS on a per package basis.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23652

PostPosted: Wed Jul 16, 2025 1:19 pm    Post subject: Reply with quote

netsplit wrote:
Here's the setup:
pi 5, 16gb ram

/etc/fstab
Code:
#size=10G would fail for not enough disk space
tmpfs           /var/tmp/tmpfs  tmpfs   size=102401M,uid=portage,gid=portage,mode=775        0 0
Normalizing that size:
Code:
$ numfmt --to=iec --from=iec 102401M
101G
You told the kernel to allow up to ~101G in the tmpfs, but you only had 16G of real RAM to use, with that split between the tmpfs pages and ordinary usage. As you and Neddy noted, swapping is bad for performance, and the configuration you set here makes it easy to overload the tmpfs to the point that swapping is needed. Typical guidance for C++ heavy programs is to plan for 2GiB RAM per compiler process, so even if your 10G had been sufficient disk space and you had only emerge running, you could not count on using more than ~3 concurrent compiler processes. I'm assuming you bumped to 101G without measuring exactly how much you needed, but even if bumping to 20G had been sufficient, that would have left you with at most negative 2 compilers running, if you wanted to avoid swapping. You need at least positive 1 compilers running to make forward progress, and ideally you want enough compiler processes to saturate every CPU core. Chromium is well known to be huge and slow to build. I think the Pi 5 is just not powerful enough to build Chromium in tmpfs in a reasonable time, and arguably not powerful enough to build Chromium in reasonable time at all.
Back to top
View user's profile Send private message
netsplit
n00b
n00b


Joined: 10 Jun 2024
Posts: 20

PostPosted: Wed Jul 16, 2025 2:43 pm    Post subject: Reply with quote

NeddySeagoon wrote:
netsplit,

That result does not surprise me. If you have the RAM to build in RAM, the kernel cache will do it anyway.
Building in tmpfs saves writes that will never be read. That may be a good thing for SSDs.
Reads/writes are all DMA, so time savings from not setting up DMA will be too small to measure.

If you have swap, the content of tmpfs can be moved to swap under pressure of RAM.

When you don't have swap, there is no home on disk fbr dynamically allocated RAM, so the kernel has to 'swap' in other ways.
It can drop clean pages, then reload them later. This includes code that it will execute real soon now.
It can write 'dirty' pages out, so that they are clean, then drop/reload them.

All my Pis have either 8G of swap, for RAM <= 8G or 16G on the 16G ones.
It may not be used but it makes it easy to spot when things are being pushed a bit hard.
Then it's time to reduce MAKEOPTS on a per package basis.


The swap usage was indeed not surprising when I discovered just how much space Chromium wants to build. Still tried it. Swap had 32GB available so I'm certain it had enough at least. It shouldn't ever need more than 48gb (32+16) total memory. Was mostly just testing things to learn. The actual paging is quite interesting. Thank you for shedding more light on it. I used to believe setting swapiness to 0 was ideal because it'd prevent swap usage. It might have been better on old systems where using the page file would cause freezes, but 0 in hindsight probably wasn't ideal.


Hu wrote:
netsplit wrote:
Here's the setup:
pi 5, 16gb ram

/etc/fstab
Code:
#size=10G would fail for not enough disk space
tmpfs           /var/tmp/tmpfs  tmpfs   size=102401M,uid=portage,gid=portage,mode=775        0 0
Normalizing that size:
Code:
$ numfmt --to=iec --from=iec 102401M
101G
You told the kernel to allow up to ~101G in the tmpfs, but you only had 16G of real RAM to use, with that split between the tmpfs pages and ordinary usage. As you and Neddy noted, swapping is bad for performance, and the configuration you set here makes it easy to overload the tmpfs to the point that swapping is needed. Typical guidance for C++ heavy programs is to plan for 2GiB RAM per compiler process, so even if your 10G had been sufficient disk space and you had only emerge running, you could not count on using more than ~3 concurrent compiler processes. I'm assuming you bumped to 101G without measuring exactly how much you needed, but even if bumping to 20G had been sufficient, that would have left you with at most negative 2 compilers running, if you wanted to avoid swapping. You need at least positive 1 compilers running to make forward progress, and ideally you want enough compiler processes to saturate every CPU core. Chromium is well known to be huge and slow to build. I think the Pi 5 is just not powerful enough to build Chromium in tmpfs in a reasonable time, and arguably not powerful enough to build Chromium in reasonable time at all.


I meant to do 10 GB + 1 MB because 10G was erroring out with an error of need 10 gigs of disk space, so assumed it was a greater than check (but hindsight perhaps emerge checks for gibibytes, and tmpfs works in gigabytes). Anyway silly math goof aside, lucky for me tmpfs didn't try to take all that, and just took ram needed for actual virtual storage. I was attempting to use ram storage to saturate the CPU cores, because it seemed a single lane PCIe 3.0 bus wasn't enough. The problems you noted prevented that. Thank you for catching my goof with tmpfs.

Anyway I agree with your conclusion. if I ever need Chromium on Raspberry Pi 5 I'll use a bin package.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo on ARM All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum