Nothing to do with Gentoo (My PSU is failing)

LIsLinuxIsSogood · Veteran Joined: 13 Feb 2016 Posts: 1179

HI,
First off, I apologize for a quesiton that really has nothing to do with Gentoo but I would like to ask it is equipment related and I don't have another convenient place to go with this, and I've seen similar issues arise on this forum here and get handled smoothly.

The issue I'm having is a power issue to the main board, which as everyone knows involves at least the two central components of Mobo and PSU.

Now a few years back I bought both of these second hand from someone that did provide a warning about some potential issues...too bad I don't recall which of these the warning actually was about!!! That might save time if I knew that now. I'm guessing it is PSU since those tend to fail, and would like to confirm that it did not also damage the motherboard in any way.

So far the noticeable questionable situation has been the connection where I've experienced every so often (over the course of several years) an issue with the PC sometimes refusing to power on, or powering on/off (rebooting in a cyclical fashion), and most of the time it is resolved by physically manipulating the cable at the connector location. By the way once the PC is turned on it almost always continues to work, except for several times (at least) when I've bumped into it hard enough to maybe have a similar connection issue occurring. (This is not a production host/server or anything like that.) But since machine tends to work most of the time I went ahead with testing the PSU, with instructions on how to do so coming form a tutorial I found on youtube. It involved turning on the PSU (by shorting green and black wires on the 24-pin connector male end) I was able to get the thing running and then take Voltage readings with a digital multimeter, which appeared questionable/unusual in terms of results and I suspect it to be a failed PSU for the following reason, which is the blue wire was running at 11.31V (a substantial portion off from 12V, and even more than 5% off or however much is considered acceptable, i don't know whether that is the case.)

Therefore I would like help to confirm the results about if my tests are accurate and what that would mean in terms of the damage assessed. If it is good reason to replace the PSU then that would be fine for me to proceed to do but if I should also consider replacing/buying a new board as well. That is just in case could there have been damage done to the motherboard at some point and how would I know that?!

saboya · Guru Joined: 28 Nov 2006 Posts: 552 Location: Brazil

Not sure what the blue wire is for, but the 12v cables that actually power your devices are the yellow ones.

Jaglover · Posted: Fri Jan 18, 2019 2:28 pm Post subject:

Generally, PSU should be tested under load. In simplest case, hook it up to the motherboard, start the computer and measure voltages. The blue wire is unlikely used by your computer, modern motherboards have no use for -12 V. In any case, -12 V has allowed tolerance 10%, so yours is within limits.
_________________
My Gentoo installation notes.
Please learn how to denote units correctly!

NeddySeagoon · Posted: Fri Jan 18, 2019 6:58 pm Post subject:

LIsLinuxIsSogood,

PC power supplies are 'switched mode' power supplies. Its good for size and efficiency.
Switched mode power supplies should not normally be operated at no load. A few have a tendency to self destruct when you do that.
All switched mode power supplies depend on some minimum load to regulate.

A PC power supply provides, +12v, +5v, +3.3v, 0v, -5v and -12v, with the negative voltages being optional on newer revisions of the ATX specification.
Only one output is actually regulated, either the +5v or +3.3v.

The +12v operates the HDD spin motors, which have servo controls anyway and powers the CPU core voltage regulator on the motherboard.
The +5v operates the HDD electronics and odds and ends on the motherboard.
The +3.3v operates the rest of the motherboard.

If you want to measure the PSU output voltages, do it carefully, while the PC operates.

There are several failure modes to look for but PSU (metal box) failures are usually rare, total and spectacular. You won't overlook it.
So your PSU is probably good.

Switch the PC off and remove the cover. You will want a good inspection lamp.
Unplug the auxiliary 12v connector at the motherboard. It has only Black (0v) and Yellow (+12v) wires.
Have a good look at the plastic parts and the pins. There should be no charring of the plastic on either half and the contacts should be bright Yellow. That's a very thin layer of gold.
Charring of the plastic indicates the connector has been getting hot. If that's present reconnect the connector. Take care its the right way round and 'waggle' the jointed connector in an attempt to reduce the contact resistance. There is about 10A or so carried by that connector, so the contact resistance must be low to avoid heating.
This sort of problem shows up mostly under high CPU loads as the actual current is related to the CPU load.

Look at the region around the CPU. You may see 10 to 20 cylindrical objects. They should all be the same, fitted flat to the motherboard, with no jelly leaking out.
Do not touch the jelly if its there. It may be one of several unpleasant materials. Bulging tops and/or leaking contents show that the CPU core voltage regulator has failed.
These things can be replaced, a) if you can get them and b) if you are moderately skilled in the use of a soldering iron.
They all need to be changed if even one has failed.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

Goverp · Advocate Joined: 07 Mar 2007 Posts: 2007

I disagree that PSU failures are rare. Old PSUs seem to loose the ability to provide all the stated current. I've had two fail gracefully.

I've just replaced the one that came with my 10-year old desktop. Admittedly, it was rated at about 215W and with all the disks it was drawing 205W (or something similar, I forget). The desktop's symptoms were (a) the disks took about 30 seconds to spin up - if I hit enter on the Grub boot menu too soon, Grub failed to find stage 2, and (b) once booted, plugging a drive into my USB-3 hub caused a click as one drive reset itself, complete with messages in syslog. A new 750W power supply (they were out of the cheapest 500W ones) cured it. (and meant I could install a gee whiz graphics card with more compute power than the desktop it's attached to!)
_________________
Greybeard

NeddySeagoon · Posted: Sat Jan 19, 2019 3:46 pm Post subject:

Goverp,

In the interests of keeping things simple, I glossed over rated power output and useful power output.
There are limits on the total output power and separately, on some combinations of output voltages. You need to stop before you hit the first limit.
I've glossed over HDD stalled motor currents too. The kernel SCSI stack has had a feature for a long time to avoid all the HDD spinning up at the same time and embarrassing the PSU with the spin up current demand.

What you describe sounds like expected behaviour from an intermittently overloaded PSU.

I agree that PSU output quality gets worse as a PSU ages. Ripple in particular gets measurably worse. I've not had any like that interfere with normal operation but I tend to derate my PSUs at new, as I know I will connect more stuff throughout the like of the system.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

LIsLinuxIsSogood · Veteran Joined: 13 Feb 2016 Posts: 1179

So unfortunately I am having a hard time to determine the course of action...I will start by replacing the PSU. But then does it makes sense to wait to test the motherboard until I've put in the new PSU? With all the detailed information provided in the post I don't think anybody said that I have to purchase a new PSU, which is what I'm asking now of course.

NeddySeagoon · Posted: Sun Jan 20, 2019 12:57 pm Post subject:

LIsLinuxIsSogood,

If you have, or can borrow, a PSU to test with, do it. Don't spend money yet.

Look at the connectors and Vcore regulator. That's only a visual exam.
Post images if you want a second opinion.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

Ashie · n00b Joined: 09 Apr 2016 Posts: 54

Do inspect all capacitors on the mainboard and inside the PSU for signs of bulging or leaking. If any found, there is your problem

Caution with the PSU : The PSU contains one or two large capacity capacitors at its primary, which are charged at up to 320VDC (in PSU's without PFC) or 450VDC (in PSU's with PFC) whenever the PSU is plugged into the wall, and can stay charged for long periods after it is unplugged. They can give quite big electric shock if you touch any internal conductors of the PSU's primary side (component leads and on the underside of the PCB). There is no danger from just removing the cover to take a look, but don't poke a finger in there

Many modern (approx Core ix era, some of the last Core 2 boards too) mainboards have solid Aluminum electrolytic capacitors. Those don't tend to fail as often and if you have those, most likely they are intact. They can be identified by lack of rupture lines (shaped like "X" "Y" "K" etc) pressed into their tops. However it seems to me that modern mainboards are plagued with other reliability problems (not capacitor related), which were not as common with older boards

The output voltages of a PC power supply can go wrong not only in terms of presence/absence or measured average value, but also in terms of ripple. Intact power supply will keep ripples to a minimum. A failing power supply can put out excessive ripple, which can do anything from random failures to no POST to hardware damage, while still being invisible to a multimeter measurement of DC voltage

Some power supplies will put out high ripple just because of poor quality of the power supply itself, even when brand new. They get worse as they age. Those power supplies tend to be included for free with PC cases or be sold at PC shops as the cheapest option for a PS. If you post a picture of your power supply with its cover off, it will be possible to tell if it's one of them

A failing ATX12V connector won't necessarily have effect on performance... Thing is, it feeds the CPU VRM on the mainboard - a buck converter that steps the 12V at approx 10A down to some 1.2V at approx 100A. A buck converter can keep working happily even when it's input voltage is a little low, unless some extra effort is made to detect this condition

A couple Watts of resistive loss is enough to melt the connector. That would be approx. 0.2V at 10A. 11.8V is not only sufficient for the VRM to work, it's even still within ATX spec. That is, the VRM should not detect this condition as a fault at all, even if all protections are in place

Twenty Watts of resistive loss is probably enough to set the connector on fire. That would be approx 1.6V at 12A. (higher current draw since the buck tries to make up for the lower input voltage). 10.4V is out of spec and ideally there gotta be an undervoltage protection to shut down the VRM, but if there isn't one, the VRM itself is capable of continuing to work just fine even in this condition....

Just look if there are any signs of heating on the connector, not hard to spot if it's been going on for a while

As you mentioned having to play with the power connector to get the PC going, i think you might be having an entirely different problem

The big chips of the mainboard chipset (MCH, ICH) are soldered to the board with an array of tin balls (BGA soldering). Those tend to fail as result of combination of mechanical stress on the board (as result of flexing when inserting RAM sticks, connectors, pressure applied by the CPU cooler, etc), heating/cooling cycles (especially if overheating) which again translate into mechanical stress, and sometimes iffy soldering quality. (The lead free solder, while not the root cause of the problem, is much more susceptible to failure from all those causes compared to leaded solder, so can be considered as a contributing factor)

When a solder point in a BGA had failed, it means a solder point that intermittemntly loses contact. It might lose contact or start contacting again whenever it heats above a certain temperature, whenever the board is flexed, and such. It is possible, that you have a failing solder point on the mainboard, which you happen to get to touch again every time you flex the board by playing with the power connectors. It will progressively fail until this won't help anymore

Such fault is repairable but it requires some moderate or serious messing with the board (repair by reflow or complete replacement of the solder balls, respectively)

LIsLinuxIsSogood · Veteran Joined: 13 Feb 2016 Posts: 1179

https://imgur.com/a/kpO9s2l
These are the specs that written on the back of the PSU. Taking it apart seems sort of like a last resort (I feel).

The connector apears in the images as well, but I can't see any problems with that.

As for the motherboard and the explanation provided, what ways are there to test the PSU, anything Or should I just go about replacing the PSU and see if the problem persists or not thereby determining if something "bigger" could be wrong with the board.

UPDATE:
I'm getting more since I decided to boot the machine again with the same components, and while BIOS were reset (no biggie there) However some noise seems to be coming at the start immediately at the same time the power is provided to board/peripherals. It sounds like it could be specific to one of the drives, so I could disconnect each one and one at a time find out if one is causing that. I would really prefer to not have the drives fail until I've been able to maybe access the disks for copying data off of if they still work. What is the preferred strategy for testing a drive...testing it could that cause it to fail even just like I assume any activity involving reading or writing may cause damage? Also a faint beep is heard after the loud scratchy/crashy noise of the disk. What does that mean?

REUPDATE:
Nevermind about the loud noise I totally forgot that I had placed another drive in there that never belonged in there, when I went to check what was causing the noise it was the first one I disconnected.

Ashie · n00b Joined: 09 Apr 2016 Posts: 54

This is a good quality PSU. I think as long as it powers on at all, provides all right voltages (in their average values i.e. what you can measure on multimeter), and there aren't puffed capacitors, it is unlikely to have other problems

To test whether it powers on (if it doesn't with the mainboard), connect the AC input, and short the PS-ON Green wire in the 24 pin connector (4th from the side, Orang-Blue-Black-Green) to one of the Earth Black wires. The fan gotta spin, and you gotta be able to measure the right voltages on the outputs. (This is not indication that the PS is fully intact, but it does show that all the switching and rectification components are working)

+12V (Yellow)
+5V (Red)
+3.3V (Orange)
-12V (Blue... This voltage is not essential for PC power up)
+5V standy (Violet, gotta be 5V whether PS is on or off)
PowerGood (Grey, gotta be at 5V when PS is on. This is an internal "PS is OK" signal)

To test for whether capacitors are intact, open and inspect (can't do much else without having loads and oscilloscope to test it electrically). It is not perfectly accurate, but good indication (capacitors can fail without visible damage, but it is rare for the type of capacitors used here)

No signs of heating means your connector is ok

This is as much as you can test without replacing a PS and without having more specialized instruments to test this one. If you take another PS for the test, make sure that other one is not failing either..

Ashie · n00b Joined: 09 Apr 2016 Posts: 54

If you have failing hard drives, leave them disconnected until you are ready to actually backup them. All the extra spin up/down cycles aren't doing them good...

An intact hard drive will spin up in a single go, and then will make some more sounds of head actuator going in different places for a few seconds, and then stay quiet (just spinning) untill accessed by OS. Some drives have an actuator locking mechanism that will make an audible click when it releases, this happens at the same time with spin up. Some hard drives have loud-ish spin up, but it's hard to tell if it is a bad sound without hearing it myself

Sounds of a bad hard drive are - no sound at all (no spin up), trying to spin up more than once (spin up, slow down or stop, spin up again), repeated clicks, or continued head actuator repositioning sounds for long after the first few seconds while still not being accessed by OS (for example, if you stay in the BIOS setup screen)

The best strategy for testing a hard drive is to back it up right away

You can use dd to dump the entire drive to a file on another (bigger) drive :

LIsLinuxIsSogood · Veteran Joined: 13 Feb 2016 Posts: 1179

Thanks for the info about smartctl it was already installed so just ran it on three drives and all seem healthy.

I guess other than the potential for continual PSU/Motherboard failures (which are intermittent) so I will go on about my business as usual in the hopes that no damage is going to occur to the machine. Although it really only the data on disk that I care about, which is why I will now prioritize the backups to have them stored elsewhere like an External HD. Thanks to everyone for the suggestions. I will be sure to be looking for a good PSU to replace this one. Although as Ashie says I'm not sure that is actually going to fix the problem since it could be more electrical issues in the board that's causing the malfunction.

Ashie · n00b Joined: 09 Apr 2016 Posts: 54

If the PSU is cleared by further testing, i'd not consider it as being bad or needing replacement, atleast for non mission critical machine

PS. Have you tried to wiggle the RAM, Video card, reseat the CPU in it's socket (also look for bent pins in the socket) ? (beware that if it really is failing soldering points, the stress from taking off and reinstalling the CPU heatsink can push the board over the edge)

NeddySeagoon · Posted: Tue Jan 22, 2019 12:00 pm Post subject:

LIsLinuxIsSogood,

That's a good choice of PSU. The specification says it has a 1 x 4+4pin CPU +12V power connector, not shown in your images.
That's the important one as it supplies the power to the CPU, RAM on so on, via the regulator on the motherboard.

As your PSU generates two separate +12v supplies, one will be used for the CPU and one for everything else that needs 12v.

You can probably see enough of the insides to see failing capacitors without taking the cover of, so don't do that.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.