Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[solved] SATA power problem after starting XFCE nxserver
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
mbar
Veteran
Veteran


Joined: 19 Jan 2005
Posts: 1990
Location: Poland

PostPosted: Fri May 18, 2012 7:17 am    Post subject: [solved] SATA power problem after starting XFCE nxserver Reply with quote

Hi all, after some months trying to track down this issue, I'm asking for help :)

I have a home file server with Gentoo (~amd keyword) and 9 SATA drives, boots to console only, but from time to time I have a need to run some GUI applications (like brasero / xfburn). I have XFCE and nxserver installed.
Everything is OK until I login from another computer using nxclient. Then, in dmesg the following happens:

Code:
ata1.00: configured for UDMA/133
ata1: EH complete
ata3.00: configured for UDMA/133
ata3: EH complete
ata4.00: configured for UDMA/133
ata4: EH complete
ata7.00: configured for UDMA/100
ata7: EH complete
ata8.00: configured for UDMA/100
ata8: EH complete
ata9.00: configured for UDMA/100
ata9: EH complete
ata10.00: configured for UDMA/100
ata10: EH complete
ata13.00: configured for UDMA/100
ata13: EH complete
ata14.00: configured for UDMA/100
ata14: EH complete
EXT4-fs (sda3): re-mounted. Opts: commit=30,commit=0


It started happening like 4 months ago.
Because my server is built with desktop components, I was thinking that mainboard was failing (it was running since 2009 almost non-stop), so I bought a new mainboard and RAM and replaced it yesterday. Unfortunately it did not solve my problem.

Right now I have:
Code:
00:00.0 Host bridge: Advanced Micro Devices [AMD] nee ATI RX780/RX790 Chipset Host Bridge
00:02.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD790 PCI to PCI bridge (external gfx0 port A)
00:04.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD790 PCI to PCI bridge (PCI express gpp port A)
00:05.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD790 PCI to PCI bridge (PCI express gpp port B)
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD790 PCI to PCI bridge (PCI express gpp port F)
00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]
00:12.0 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:12.1 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0 USB OHCI1 Controller
00:12.2 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:13.0 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:13.1 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0 USB OHCI1 Controller
00:13.2 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:14.0 SMBus: Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller (rev 3c)
00:14.1 IDE interface: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 IDE Controller
00:14.2 Audio device: Advanced Micro Devices [AMD] nee ATI SBx00 Azalia (Intel HDA)
00:14.3 ISA bridge: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 LPC host controller
00:14.4 PCI bridge: Advanced Micro Devices [AMD] nee ATI SBx00 PCI to PCI Bridge
00:14.5 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control
01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI RV370 5B60 [Radeon X300 (PCIE)]
01:00.1 Display controller: Advanced Micro Devices [AMD] nee ATI RV370 [Radeon X300SE]
02:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01)
03:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01)
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
05:07.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)


My main problem with ata error handling is that after the dmesg message, HDD power saving stops completely until reboot! So my drives never stop spinning, even after restarting hdparm service.

What can I do now? PSU is relatively new, 400W Chieftec. If you need any other info, I'll post it.


Last edited by mbar on Tue Nov 06, 2012 1:13 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54232
Location: 56N 3W

PostPosted: Fri May 18, 2012 6:29 pm    Post subject: Reply with quote

mbar,

I suspect that with 9 drives on a 400w PSU you are pushing your luck.

Your CPU core and all of your drive spin motors will run off the 12v. The data sheet I linked to is far from clear.
It claims a 12v combined power od 336w but 336w is far mre than can be provide by the 18A Max output current. (216w)

I don't know your drives but 1A at 12v each is reasonable, there is half of your 18A gone already.
Your CPU core will run off the 12v too. A modern CPU working hard van easlity soak up your other 9A on it own.
On top of that you have your case fans and graphics card.

I suspect the HDDs reset because of PSU issues. For testing can you disconnect some of the drives?

At 34 euros, thats not a particularly good PSU. You cannot get good quality parts at that price.
Do you have another higher power PSU you can switch to for testing?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
mbar
Veteran
Veteran


Joined: 19 Jan 2005
Posts: 1990
Location: Poland

PostPosted: Fri May 18, 2012 6:49 pm    Post subject: Reply with quote

I suspect that you may be right. The problem started about time I added another 2 drives to my RAID. The drives are almost all Samsungs (6 x 1,5 TB and 2 x 1 TB) and single Seagate 320 GB for system. Phenom II X3 and 16 GB of RAM doesn't help the case either...

Will try removing some HDDs, I guess it is also time to try 500W PSU.

But have you any idea why ATA EH happens during the start of nxserver/xorg/xfce combo?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54232
Location: 56N 3W

PostPosted: Fri May 18, 2012 9:31 pm    Post subject: Reply with quote

mbar,

The CPU load is high while the program loads and initialises. That pushes up the power consumption.

Its not just output power you need to look at in a PSU. With so many HDDs, you want one that can supply a lot of power on the 12v.
Choose one that has two separate 12v supplies. One for your HDD and one for your CPU.
Something like this
Notice it has 12v1 and 12v2 for a total of 27A at 12v.

Its a poor choice, that link is just to show the feature - at that price you will be lucky if it lasts a month past the end of the warranty period.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
mbar
Veteran
Veteran


Joined: 19 Jan 2005
Posts: 1990
Location: Poland

PostPosted: Sat May 19, 2012 5:50 am    Post subject: Reply with quote

Guess it's PSU hunting time...
Back to top
View user's profile Send private message
mbar
Veteran
Veteran


Joined: 19 Jan 2005
Posts: 1990
Location: Poland

PostPosted: Mon May 21, 2012 7:46 am    Post subject: Reply with quote

Corsair CX600 V2 http://www.corsair.com/builder-series-cx600-v2-80plus-certified-power-supply.html
or
XFX Core 550W http://xfxforce.com/en-gb/Products/Power-Supply/XFX/Pro-Series/ProSeries-550W-PSU/550W-Core-Edition-Full-Wired-Bronze.aspx

are within my budget. Which one?
I started getting CRC errors on files copied to my file server, is this consistent with PSU problems? Fortunately the filesystem itself (XFS) is healthy, just did a xfs_check.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54232
Location: 56N 3W

PostPosted: Mon May 21, 2012 5:26 pm    Post subject: Reply with quote

mbar

Corsair is known to be good. I don't know the other one.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
mbar
Veteran
Veteran


Joined: 19 Jan 2005
Posts: 1990
Location: Poland

PostPosted: Mon May 21, 2012 6:53 pm    Post subject: Reply with quote

Did some testing today. Right now I'm not sure that PSU is the cause.
I left only one HDD connected (system drive), and the problem is still here:
Code:
ata7: SATA link down (SStatus 0 SControl 0)
ata9: SATA link down (SStatus 0 SControl 0)
ata8: SATA link down (SStatus 0 SControl 0)
ata10: SATA link down (SStatus 0 SControl 0)
ata12: SATA link down (SStatus 0 SControl 310)
ata13: SATA link down (SStatus 0 SControl 310)
ata14: SATA link down (SStatus 0 SControl 310)
md: Skipping autodetection of RAID arrays. (raid=autodetect will force)
EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null)
VFS: Mounted root (ext4 filesystem) readonly on device 8:3.
devtmpfs: mounted
Freeing unused kernel memory: 460k freed
BFS CPU scheduler v0.420 by Con Kolivas.
scsi 14:0:0:0: Direct-Access     USB2.0   Mobile Disk      1.00 PQ: 0 ANSI: 2
sd 14:0:0:0: Attached scsi generic sg2 type 0
sd 14:0:0:0: [sdb] 983808 512-byte logical blocks: (503 MB/480 MiB)
sd 14:0:0:0: [sdb] Write Protect is on
sd 14:0:0:0: [sdb] Mode Sense: 0b 00 80 00
sd 14:0:0:0: [sdb] No Caching mode page present
sd 14:0:0:0: [sdb] Assuming drive cache: write through
sd 14:0:0:0: [sdb] No Caching mode page present
sd 14:0:0:0: [sdb] Assuming drive cache: write through
 sdb: sdb1
sd 14:0:0:0: [sdb] No Caching mode page present
sd 14:0:0:0: [sdb] Assuming drive cache: write through
sd 14:0:0:0: [sdb] Attached SCSI removable disk
udevd[1177]: starting version 182
EXT4-fs (sdb1): mounting ext2 file system using the ext4 subsystem
EXT4-fs (sdb1): mounted filesystem without journal. Opts: (null)
EXT4-fs (sdb1): mounting ext2 file system using the ext4 subsystem
EXT4-fs (sdb1): mounted filesystem without journal. Opts: (null)
EXT4-fs (sda3): re-mounted. Opts: commit=30
Adding 8388604k swap on /dev/sda1.  Priority:-1 extents:1 across:8388604k
EXT4-fs (sda2): mounting ext2 file system using the ext4 subsystem
EXT4-fs (sda2): mounted filesystem without journal. Opts: (null)
EXT4-fs (sdb1): mounting ext2 file system using the ext4 subsystem
EXT4-fs (sdb1): mounted filesystem without journal. Opts: (null)
EXT4-fs (sdb1): mounting ext2 file system using the ext4 subsystem
EXT4-fs (sdb1): mounted filesystem without journal. Opts: (null)
r8169 0000:04:00.0: eth0: link down
r8169 0000:04:00.0: eth0: link down
r8169 0000:04:00.0: eth0: link up
ata1.00: configured for UDMA/133
ata1: EH complete
EXT4-fs (sda3): re-mounted. Opts: commit=30,commit=0


I noticed (refreshing dmesg every moment) that ata EH happens at the end of xfce/nxserver load, at the (almost) precise moment that xfce desktop is shown in nxclient window. Maybe there is some kind of rogue service/process that tries to detect something? Mount something?

What are my options for debugging this, considering the problem is software in nature?
Back to top
View user's profile Send private message
mbar
Veteran
Veteran


Joined: 19 Jan 2005
Posts: 1990
Location: Poland

PostPosted: Tue Nov 06, 2012 1:16 pm    Post subject: Reply with quote

Solved with help from user fraser!
This was caused by sys-power/upower package. After modyfying USE flags, unmerging upower and a poweroff / poweron cycle, everything is back to normal. No more ATA EH in dmesg and blocked HDD sleep.

Seems to be a problem with udev flag for xfce-base/xfce4-session?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum