Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Wrong ACPI thermal reading causes shutdown
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Rinne
n00b
n00b


Joined: 14 Sep 2008
Posts: 17

PostPosted: Wed Feb 10, 2016 7:34 pm    Post subject: Wrong ACPI thermal reading causes shutdown Reply with quote

Hi all,

I'm currently having a problem with random shutdowns due to alleged overheating.

The following message is logged in /var/log/messages:

Code:
Feb 10 20:06:01 Raven kernel: thermal thermal_zone0: critical temperature reached(80 C),shutting down
Feb 10 20:06:05 Raven shutdown[6201]: shutting down for system halt
Feb 10 20:06:05 Raven root[6208]: ACPI event unhandled: thermal_zone LNXTHERM:00 000000f0 00000001
Feb 10 20:06:05 Raven init[1]: Switching to runlevel: 0
Feb 10 20:06:06 Raven sshd[1449]: Received signal 15; terminating.
...


thermal_zone0 is the only zone acpi recognizes if I use
Code:
acpi -t
.
I don't even know which sensor this zone represents
Code:
rinne@Raven ~ $ acpi -t
Thermal 0: ok, 0.0 degrees C


The temperature sometimes randomly switches to 7°C, and apparently, at some points to >80°C, thus causing a shutdown.

I have a Mainboard with an it87 chipset with rather controversial driver support that requires me to use
Code:
acpi_enforce_resources=lax
, so I might have an interference here.

I have lm_sensors installed and all my (real) temperatures are rather ok:
Code:
acpitz-virtual-0
Adapter: Virtual device
temp1:         +0.0°C  (crit = +80.0°C)

k10temp-pci-00c3
Adapter: PCI adapter
CPU:           +0.0°C  (high = +70.0°C)
                       (crit = +70.0°C, hyst = +69.0°C)

it8772-isa-0a30
Adapter: ISA adapter
in0:          +0.50 V  (min =  +0.00 V, max =  +0.10 V)  ALARM
in1:          +1.52 V  (min =  +0.00 V, max =  +3.06 V)
in2:          +2.02 V  (min =  +0.00 V, max =  +3.06 V)
in3:          +2.06 V  (min =  +0.00 V, max =  +3.06 V)
in4:          +1.13 V  (min =  +0.00 V, max =  +3.06 V)
in5:          +1.13 V  (min =  +0.00 V, max =  +3.06 V)
in6:          +2.22 V  (min =  +0.00 V, max =  +3.06 V)
3VSB:         +3.31 V  (min =  +0.00 V, max =  +6.12 V)
Vbat:         +3.34 V 
fan1:           0 RPM  (min =   18 RPM)  ALARM
fan2:         384 RPM  (min =   10 RPM)
fan3:         740 RPM  (min =   13 RPM)
temp1:        +42.0°C  (low  = +75.0°C, high = +80.0°C)  sensor = thermistor
temp2:        +43.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
temp3:         +0.0°C  (low  = +127.0°C, high = +80.0°C)  sensor = thermal diode
intrusion0:  ALARM


The shutdown can also happen in both idle and under load.

Can I somehow safely disable the shutdown function, or better completely disable the broken sensor?
Or can anyone think of any other possible fix?

Best regards,
Rinne
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9677
Location: almost Mile High in the USA

PostPosted: Fri Feb 12, 2016 12:38 am    Post subject: Reply with quote

Does it do this if you don't override acpi_enforce_resources=lax ? Unfortunately this indeed could cause issues. I stopped using lm_sensors with machines that have this issue, which truly is a real potential for stepping on one another's toes.

You could try stop compiling the kernel with CONFIG_ACPI_THERMAL ...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Roman_Gruber
Advocate
Advocate


Joined: 03 Oct 2006
Posts: 3846
Location: Austro Bavaria

PostPosted: Fri Feb 12, 2016 6:58 pm    Post subject: Reply with quote

the grub kernel line allows some stuff like acpi=off.

i suggest you fire up some rescue livecd, and check if it happens there, if so than it is probably a kernel issue. if it happens there too than it is a hardware issue.

i assume you already inspected your hardware cooling system.

I also expect that you are using or have already tested the latest kernel release from kernel.org. not hte stable gentoo sources.
Back to top
View user's profile Send private message
Rinne
n00b
n00b


Joined: 14 Sep 2008
Posts: 17

PostPosted: Sun Feb 14, 2016 1:41 pm    Post subject: Reply with quote

Hi, thanks a lot for your replies.

eccerr0r wrote:
Does it do this if you don't override acpi_enforce_resources=lax ? Unfortunately this indeed could cause issues. I stopped using lm_sensors with machines that have this issue, which truly is a real potential for stepping on one another's toes.

You could try stop compiling the kernel with CONFIG_ACPI_THERMAL ...


it only happened since I configured acpi_enfore_resources=lax. However the ACPI readings were wrong before as well, they just didn't jump around as much.
I removed CONFIG_ACPI_THERMAL and it seems to have done the trick. I need lm_sensors to use fancontrol. Unfortunately the Motherboards BIOS fan control doesn't work as expected (I'm very unsatisfied with the motherboard in general).
It seems to have issues with recognizing the fans pwm range. With fan control it works flawlessly for some reason.


tw04l124 wrote:
the grub kernel line allows some stuff like acpi=off.

i suggest you fire up some rescue livecd, and check if it happens there, if so than it is probably a kernel issue. if it happens there too than it is a hardware issue.

i assume you already inspected your hardware cooling system.

I also expect that you are using or have already tested the latest kernel release from kernel.org. not hte stable gentoo sources.


The hardware cooling is fine. I actually put quite some effort into it and sensor readings are ok.
For now I'm going with disabling CONFIG_ACPI_THERMAL. Completely disabling ACPI doesn't make sense to me.
I'll follow the it87 driver news on new kernel releases. For now I'm not keen on leaving stable gentoo-sources.

Thanks a lot again.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9677
Location: almost Mile High in the USA

PostPosted: Sun Feb 14, 2016 8:38 pm    Post subject: Reply with quote

Indeed, outright disabling ACPI will cause serious issues with the system including disabling additional cores on the machine.

However I do not have much hope for the I2C drivers when they conflict with ACPI. There's not really a way to safely make sure they do not conflict with each other - unless somehow they find the semaphore that ACPI uses, and have the i2c driver use the same lock. Else it pretty much means one or the other - ACPI or lm_sensors, I ended up forced to use the former and not have the additional features of lm_sensors. (Or pray that they do not clash... In this case it is praying as it very likely will cause a conflict at sometime or another when ACPI and lm_sensors tries to use the i2c bus at the same time.))
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum