Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] Load too high
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
cfgauss
l33t
l33t


Joined: 18 May 2005
Posts: 688
Location: USA

PostPosted: Sat Jul 14, 2018 5:28 pm    Post subject: [SOLVED] Load too high Reply with quote

I have a quad-core machine and often htop measures my 1-minute load average in the 10-20 range. htop sorts by CPU% and, strangely, the max CPU% is 0.0 when this happens. During this high load, KDE is unusable and I have to ssh in to the box to reboot.

This high load seems to be triggered by an emerge world but even after I kill the emerge process, the high load continues.

How can I find out which processes are killing the CPU?

Any debugging hints will be gratefully received.

[SOLVED] Many thanks to both Ant P. and khayyam for their knowledgeable suggestions. I ended up following Ant P.'s instructions to install the BFQ IO scheduler in my gentoo-sources-4.17.6 kernel: I set the CONFIG_SCSI_MQ_DEFAULT=y flag and used his echo | tee one-liner in an /etc/local.d boot script. As a test I re-emerged chromium, libreoffice, and qtwebengine, long-running emerges which, in the past, have completely frozen the GUI. When the load was in the 10.0-20.0 range, responsiveness did, of course, suffer but the box never froze and during periods of that load seemed normally responsive. I am delighted with this solution as I had feared my only other recourse was to buy newer, more powerful hardware or drop Gentoo and use a binary distribution. [/SOLVED]


Last edited by cfgauss on Wed Jul 18, 2018 3:55 am; edited 1 time in total
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Sat Jul 14, 2018 6:10 pm    Post subject: Re: Load too high Reply with quote

cfgauss wrote:
I have a quad-core machine and often htop measures my 1-minute load average in the 10-20 range. htop sorts by CPU% and, strangely, the max CPU% is 0.0 when this happens. During this high load, KDE is unusable and I have to ssh in to the box to reboot.

cfgauss ... to be clear, do you mean '0.10-20' or '10-20.0'? The former would be perfectly normal, the latter extreme. As you see no process using those resources then we should probably assume that the kernel is, you should probably save the output of dmesg:

Code:
% ssh user@host "dmesg > dmesg-$(date +%s)"


cfgauss wrote:
This high load seems to be triggered by an emerge world but even after I kill the emerge process, the high load continues.

Which suggests you're triggering a kernel bug, or segfault ... so, again, we should see dmesg.

best ... khay
Back to top
View user's profile Send private message
cfgauss
l33t
l33t


Joined: 18 May 2005
Posts: 688
Location: USA

PostPosted: Sat Jul 14, 2018 6:48 pm    Post subject: Re: Load too high Reply with quote

khayyam wrote:
cfgauss ... to be clear, do you mean '0.10-20' or '10-20.0'?

Sadly, 10.0-20.0.
khayyam wrote:
Code:
% ssh user@host "dmesg > dmesg-$(date +%s)"

After a reboot, the system is normal. Would a dmesg in this state be useful or only a dmesg in a high-load state?
Back to top
View user's profile Send private message
PrSo
Tux's lil' helper
Tux's lil' helper


Joined: 01 Jun 2017
Posts: 136

PostPosted: Sat Jul 14, 2018 7:21 pm    Post subject: Reply with quote

During that "stalling" can you observe in iotop extremely high queue of processes that are at 99.9% on IO> column?

I am fighting with this since 4.14 (now I am on 4.17) and sometimes, especially compiling qtwebengine when stalls occurs, I have "load avg" at 40.0!!!

And yes, it starts swapping at that moment.(scheduler is deadline)

Things gets better when i set up MAKEOPTS="-j4 -l3.6" (I have 4 core AMD APU and I am using portage in tmpfs which is limited to 5GB because of ssd) accordingly to https://wiki.gentoo.org/wiki/EMERGE_DEFAULT_OPTS

My machine has only 8 GB of ram.

I have tried other schedules as well but the same thing happens. BTW setting up in kernel config file CONFIG_SCSI_MQ_DEFAULT=y cause hang on resume from suspend.
Back to top
View user's profile Send private message
cfgauss
l33t
l33t


Joined: 18 May 2005
Posts: 688
Location: USA

PostPosted: Sat Jul 14, 2018 7:54 pm    Post subject: Reply with quote

PrSo wrote:
During that "stalling" can you observe in iotop extremely high queue of processes that are at 99.9% on IO> column

Just installed iotop and will look at this during the next high load incident.
PrSo wrote:
I am fighting with this since 4.14 (now I am on 4.17) and sometimes, especially compiling qtwebengine when stalls occurs, I have "load avg" at 40.0!!!

I'm on 4.17.5 and have seen 40.0, also. High loads usually happen if a single emerge is lengthy: qtwebengine, libreoffice, chromium.
PrSo wrote:
Things gets better when i set up MAKEOPTS="-j4 -l3.6"

I have the same MAKEOPTS as well as FEATURES="ccache". Neither seems to help.
PrSo wrote:
My machine has only 8 GB of ram.

As do I.

I would think if this really were a kernel bug that there would be posts everywhere. Since there aren't, I'm assuming the problem is related to my particular hardware/software setup. I sincerely hope I don't have to buy new hardware simply to run Gentoo.
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Sat Jul 14, 2018 8:18 pm    Post subject: Re: Load too high Reply with quote

cfgauss wrote:
After a reboot, the system is normal. Would a dmesg in this state be useful or only a dmesg in a high-load state?

cfgauss ... the kernel buffer ring, from which dmesg is derived, will have been cleared by a reboot, so at the time of occurrence ... that is why I provided the command via ssh, as it would save the additional overhead of acquiring a shell, etc.

If you have a logger and are logging crit, err, warn, segfault, then it's possible that this is in /var/log/messages (or whatever file you log such things to).

best ... khay
Back to top
View user's profile Send private message
cfgauss
l33t
l33t


Joined: 18 May 2005
Posts: 688
Location: USA

PostPosted: Sat Jul 14, 2018 8:31 pm    Post subject: Reply with quote

Here is the portion of /var/log/messages that appears to be kernel errors. Does this shed any light on the problem?
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Sat Jul 14, 2018 9:04 pm    Post subject: Reply with quote

cfgauss wrote:
Here is the portion of /var/log/messages that appears to be kernel errors. Does this shed any light on the problem?

cfgauss ... it does, but those hangs may be the result, rather than the cause, so we really need to see dmesg, 'ps auxwww', and your .config.

best ... khay
Back to top
View user's profile Send private message
cfgauss
l33t
l33t


Joined: 18 May 2005
Posts: 688
Location: USA

PostPosted: Sat Jul 14, 2018 9:09 pm    Post subject: Reply with quote

khayyam wrote:
cfgauss wrote:
Here is the portion of /var/log/messages that appears to be kernel errors. Does this shed any light on the problem?

cfgauss ... it does, but those hangs may be the result, rather than the cause, so we really need to see dmesg, 'ps auxwww', and your .config.

During the next high load incident I can collect dmesg and ps auxwww. What, exactly, is .config?
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6920

PostPosted: Sat Jul 14, 2018 9:51 pm    Post subject: Reply with quote

Those kernel errors are all hanging on disk IO. Set a better default scheduler, bfq for disks or kyber/deadline for SSD.
Back to top
View user's profile Send private message
bunder
Bodhisattva
Bodhisattva


Joined: 10 Apr 2004
Posts: 5934

PostPosted: Sat Jul 14, 2018 10:03 pm    Post subject: Reply with quote

I can't find a more recent link (I know I saw another one of these recently), but BFQ and kyber kindof suck. :P

https://www.phoronix.com/scan.php?page=article&item=linux-415-iosched
_________________
Neddyseagoon wrote:
The problem with leaving is that you can only do it once and it reduces your influence.

banned from #gentoo since sept 2017
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6920

PostPosted: Sat Jul 14, 2018 10:23 pm    Post subject: Reply with quote

That link only shows that CFQ wins on synthetic tests with a high-end laptop, fast SSD and stock Ubuntu. Good for a stereotypical kernel developer to shoo people away with a "works for me", not so much for real world users on real world hardware.

My suggestion comes from my own experience with 8-9 years of BFQ on Gentoo. Among other things, I've compiled Chromium on my netbook, eating into swap and load average going into space, while it remains somewhat responsive.
Back to top
View user's profile Send private message
bunder
Bodhisattva
Bodhisattva


Joined: 10 Apr 2004
Posts: 5934

PostPosted: Sun Jul 15, 2018 1:21 pm    Post subject: Reply with quote

Load is relative, I have a VM that hits over 150 every time I run a test suite on it, it's perfectly usable while the tests are running. I'm more inclined to believe the graphs, regardless of the "high end laptop" because they're trying to test raw disk performance and block schedulers. If they ran the test on a raspberry pi they would be spending a lot more time in wait cycles because the hardware is so slow (therefore tainting the test).
_________________
Neddyseagoon wrote:
The problem with leaving is that you can only do it once and it reduces your influence.

banned from #gentoo since sept 2017
Back to top
View user's profile Send private message
cfgauss
l33t
l33t


Joined: 18 May 2005
Posts: 688
Location: USA

PostPosted: Sun Jul 15, 2018 9:39 pm    Post subject: Reply with quote

Ant P. wrote:
Those kernel errors are all hanging on disk IO. Set a better default scheduler, bfq for disks or kyber/deadline for SSD.

Does BFQ have any advantages over CFQ for rotational HDDs (rather than SSDs)? If so, could you provide a link for configuring BFQ (kernel/grub/udev (?) etc)?

Many thanks for this suggestion.
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6920

PostPosted: Mon Jul 16, 2018 4:15 pm    Post subject: Reply with quote

In the kernel:
make menuconfig:
[*] Enable the block layer  --->
├   [*]   Enable support for block device writeback throttling
├   [*]     Multiqueue writeback throttling
└── IO Schedulers  --->
    ├   Default I/O scheduler (BFQ)
    └── <*> BFQ I/O scheduler

Depending on the kernel version it may be called BFQ-MQ instead. The other two options are unrelated to BFQ but highly recommended as they help with IO overloading too.

You can check what you already have enabled using "cat /sys/block/sd*/queue/scheduler", and change it at runtime with e.g. "echo bfq | sudo tee /sys/block/sd*/queue/scheduler"
Back to top
View user's profile Send private message
cfgauss
l33t
l33t


Joined: 18 May 2005
Posts: 688
Location: USA

PostPosted: Tue Jul 17, 2018 12:19 am    Post subject: Reply with quote

Ant P. wrote:
In the kernel:
make menuconfig:
[*] Enable the block layer  --->
├   [*]   Enable support for block device writeback throttling
├   [*]     Multiqueue writeback throttling
└── IO Schedulers  --->
    ├   Default I/O scheduler (BFQ)
    └── <*> BFQ I/O scheduler

Depending on the kernel version it may be called BFQ-MQ instead. The other two options are unrelated to BFQ but highly recommended as they help with IO overloading too.

You can check what you already have enabled using "cat /sys/block/sd*/queue/scheduler", and change it at runtime with e.g. "echo bfq | sudo tee /sys/block/sd*/queue/scheduler"

Here's my IO Schedulers choices in my sys-kernel/gentoo-sources-4.17.6 kernel:
Code:
    < > Deadline I/O scheduler                                         
    < > CFQ I/O scheduler                                             
        Default I/O scheduler (No-op)  --->                           
    <*> MQ deadline I/O scheduler                                     
    <*> Kyber I/O scheduler                                         
    <*> BFQ I/O scheduler

Also the Default I/O scheduler only lists the single-queue schedulers, Deadline, CFQ, and No-op.
Code:
~$ cat /sys/block/sd*/queue/scheduler
[noop]
[noop]

Is it possible to install BFQ in this kernel?
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6920

PostPosted: Tue Jul 17, 2018 2:01 am    Post subject: Reply with quote

It looks like vanilla/gentoo doesn't have the traditional BFQ patch, you either need to boot with scsi_mod.use_blk_mq=1 or set CONFIG_SCSI_MQ_DEFAULT=y.

The default option there won't change the mq scheduler (default is mq-deadline), you'll need to put the echo|tee command in an /etc/local.d/ script to set it at boot.
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Tue Jul 17, 2018 2:10 am    Post subject: Reply with quote

cfgauss ...

you can either set 'elevator=bfq' as a kernel parameter (ie, in grub.cfg, or what-have-you), or have the following run via 'local':

/etc/local.d/iosched-bfq.start:
#!/bin/sh
set -e

PATH="$PATH:/lib/rc/bin"
export PATH

sysfs_sched="/sys/block/sda/queue/scheduler"

if [ -n "$1" ] ; then
    scheduler="$1"
else
    filename="$(basename -- $0 .${0##*.})"
    scheduler="${filename#*-}"
fi

ebegin "Setting I/O scheduler to $scheduler"

# in case the required scheduler happens to be a built as a module.
modprobe -q "${scheduler}-iosched" || exit $?

if [ -e "$sysfs_sched" ] && ( grep -q "$scheduler" "$sysfs_sched" ) ; then
    ebegin "Setting scheduler to $scheduler"
    echo "$scheduler" > "$sysfs_sched"
fi

eend $?

Note that the filename, or a parameter passed to the script, will set the "$scheduler" to be used.

best ... khay
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum