Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
sync hangs
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Pasketti
Tux's lil' helper
Tux's lil' helper


Joined: 04 Sep 2003
Posts: 101
Location: Austin, Texas

PostPosted: Sat Nov 10, 2012 6:33 pm    Post subject: sync hangs Reply with quote

I'm just looking for some sort of pointer here or maybe something else I can try.

My problem is that the sync command will intermittently hang up and never return.

When it does this, the sync process is unkillable.

I am also unable to shut down the system via "shutdown -r now" - that just says "System going down now!" and hangs up.

CTRL-ALT-DEL also doesn't work.

I can switch to another console or open another terminal session and everything seems fine, but if I issue another sync, that session will also lock up.

The only way to get rid of it is to manually shut down all server processes and power the box off.

When it comes back up, sync will work fine. Until it doesn't. Then it locks up.

I discovered this when a nightly job that does a sync never stopped. ps reported half a dozen processes that had been hanging for days. I commented out the sync line in the script, but that's just a bandaid. I'd like to find out what's going on and fix it.

I checked /sys/block/<device>/stat. Supposedly the 9th column shows pending requests, and that is 0, so it seems there is nothing for it to do, but still the sync never returns.

Has anyone else ever had this happen, and what did you do to fix it?

I'm running vanilla-sources 3.4.9, amd64. It's a server, and does not have X installed. I can be more vague on my configuration if necessary.

Thanks!
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 31907
Location: 56N 3W

PostPosted: Sat Nov 10, 2012 7:26 pm    Post subject: Reply with quote

Pasketti,

That its random, suggests a hardware error of some sort.
A properly working sync should retry after a limit and when that fails too exit cleanly after the retry count is exhausted.
We can say thats its unlikely to be a network issue as the above behavior is not observed.

Find a boot CD that has memtest or mentest86+ on it and run a few cycles. Its important that you boot directly into memtest as it need to run on the bare hardware to get useful results.
Errors reported by memtest are not always memory errors. If it reports problems, post the error reports.

Install lm-sensors and check the CPU tempreture. It could be overheating for any number of reasons, but syncing isn't nearly as CPU intensive as building packages.

It can also be a CPU Vcore regulator issue. That will require a visual covers off inspection to check.
This part of the motherboard gets hot and is worked very hard. Cheap motherboards skimp here and early failures are common. How old is the system?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Pasketti
Tux's lil' helper
Tux's lil' helper


Joined: 04 Sep 2003
Posts: 101
Location: Austin, Texas

PostPosted: Sun Nov 11, 2012 1:57 am    Post subject: Reply with quote

Bah. I was hoping for something like "Oh all you need to do is enable option XYZ and recompile your kernel."

I ran memtest for two passes, no errors found.

I installed lm_sensors. I'll see what it tells me.

It's older hardware (Core 2 Duo). The install is maybe three years old.

The only other thing I can think of is the power supply. Maybe I'll try that next.

Thank you for your assistance, good sir and/or madam!
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 31907
Location: 56N 3W

PostPosted: Sun Nov 11, 2012 11:42 am    Post subject: Reply with quote

Pasketti,

memtest actually gives most of your motherboard componets a good workout as it uses the CPU to thrash the RAM.
As it passed, I'm inclined to think its probably not a thermal issue either.

As you still have control when the hang occurs, look in dmesg for any disc IO errors.

I'm not aware of any kernel issues. If there were, it would be all over the forums as many users would hit the issue.
As it seems to be 'just yourself' its likely to be hardware. Its still possible that its software due to hardware. For example, rsync or something that it uses, compiled incorrectly because of a transient hardware issue.

Its also possible that if you do not use an rsync rotation that the single rsync sever you do use has a problem. Its been known

Run emerge --info, the SYNC= line should be one of the following. IF not fix it and try again.
Code:
#   Default:       "rsync://rsync.gentoo.org/gentoo-portage"
#   North America: "rsync://rsync.namerica.gentoo.org/gentoo-portage"
#   South America: "rsync://rsync.samerica.gentoo.org/gentoo-portage"
#   Europe:        "rsync://rsync.europe.gentoo.org/gentoo-portage"
#   Asia:          "rsync://rsync.asia.gentoo.org/gentoo-portage"
#   Australia:     "rsync://rsync.au.gentoo.org/gentoo-portage"
SYNC= "rsync://rsync.europe.gentoo.org/gentoo-portage"
These are not individual rsync servers. They are 'rotations' in that they pass you on to a real random rsync server. IF you get a dud, the next try normally gets you a different server. Make sure you are using a rotation (one of the above), not a single server.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
eccerr0r
Advocate
Advocate


Joined: 01 Jul 2004
Posts: 3899
Location: USA

PostPosted: Mon Nov 12, 2012 11:37 pm    Post subject: Reply with quote

'Sync' is the command to tell the kernel to dump all dirty buffers to disk. When sync hangs, it means it thinks it has something to write (perhaps metadata) and can't do it.

If the problem is to a disk, it could mean a failing disk but usually it will get alerted with other dmesg errors.

Are you using NFS or other network filesystem? Are you (or your users) using FUSE (which I despise though it's a good concept)? These two cause a lot of hangs for me...
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed to be advocating?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 31907
Location: 56N 3W

PostPosted: Tue Nov 13, 2012 7:19 pm    Post subject: Reply with quote

eccerr0r,

How did I misread that ...

Sorry Pasketti
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Pasketti
Tux's lil' helper
Tux's lil' helper


Joined: 04 Sep 2003
Posts: 101
Location: Austin, Texas

PostPosted: Wed Nov 14, 2012 4:53 am    Post subject: Reply with quote

No worries. It did get me to consider hardware issues, which I had been avoiding thinking about.

smartctl tells me the drives pass all the diagnostics.

I bought a new (bigger) power supply, but when I put it in, the machine became unstable and would cycle power randomly within a few minutes. Putting the old one back in made it stop doing that, so I'm returning the PS.

I did blow a huge amount of dust out of the thing before I put it back.

But I did notice something. I back everything up to a second hard drive using rsync. When that's done, I delete several directories via rm -rf from the backup drive that don't need to be backed up. And the rm terminated with a "Killed" message. Once that happened, sync would start hanging.

So I tried to delete the source folders (they hold backups from my kids' Windows machines, and get recreated every Sunday). And the rm terminated with "Killed" and my shell process hung. I opened a new terminal and tried a sync. It hung. I couldn't reboot either.

I shut everything down manually, then cycled power, booted from my handy Live CD and ran fsck on all my filesystems.

I'm thinking there may have been some corruption in the file system causing rm to puke, but it left some flag set in the kernel that caused a race condition with sync. Or not.

As of right now, sync is not hanging. If it's still working in a couple of days, then I'll start to be more optimistic.

I do appreciate having someone to bounce things off of. It helps.

Thanks again!
Back to top
View user's profile Send private message
eccerr0r
Advocate
Advocate


Joined: 01 Jul 2004
Posts: 3899
Location: USA

PostPosted: Wed Nov 14, 2012 5:21 am    Post subject: Reply with quote

When you see the "Killed" it's because the kernel decided that program was doing something really bad (or was confused itself) and there should be some diagnostics in 'dmesg' - check that when you get it killed. When you run dmesg you should see exactly what it didn't like. You might want to open another terminal and run "dmesg" once in a while so that it's cached in RAM so the next time something happens, it's cached and won't have to worry about it not being able to read from disk.

What filesystem is this on for curiosity sake?
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed to be advocating?
Back to top
View user's profile Send private message
Pasketti
Tux's lil' helper
Tux's lil' helper


Joined: 04 Sep 2003
Posts: 101
Location: Austin, Texas

PostPosted: Wed Nov 14, 2012 1:01 pm    Post subject: Reply with quote

It's ext4.

My backup ran last night, and there were no problems with rm. And sync isn't hanging either.

I am cautiously optimistic.
Back to top
View user's profile Send private message
Hu
Watchman
Watchman


Joined: 06 Mar 2007
Posts: 8938

PostPosted: Thu Nov 15, 2012 2:38 am    Post subject: Reply with quote

Pasketti wrote:
But I did notice something. I back everything up to a second hard drive using rsync. When that's done, I delete several directories via rm -rf from the backup drive that don't need to be backed up.
You may be able to avoid the deletion step by using rsync exclude directives to skip copying the files in the first place.
Back to top
View user's profile Send private message
Pasketti
Tux's lil' helper
Tux's lil' helper


Joined: 04 Sep 2003
Posts: 101
Location: Austin, Texas

PostPosted: Thu Nov 15, 2012 2:23 pm    Post subject: Reply with quote

Hu wrote:
You may be able to avoid the deletion step by using rsync exclude directives to skip copying the files in the first place.


Yeah, I know. It was a tradeoff.

Here's the backup script:

Code:

#!/bin/bash

DATE=`date "+%Y-%m-%d"`

date

# No leading slash on dir name

for DIR in root etc var/bind var/www var/lib/portage opt/msm home
do
  echo ============ Backup $DIR
  mkdir -p /backup/$DATE/$DIR
  rsync -avx --delete --link-dest=/backup/current/$DIR /$DIR/ /backup/$DATE/$DIR
done

rm -f /backup/prev
mv /backup/current /backup/prev
ln -s /backup/$DATE /backup/current


So I'd have to unwind the FOR loop if I wanted to exclude some directories. But the for loop makes it easy to add more paths. So I back it all up and then delete a few paths afterward. It happens at 3 AM, so it's not like I'm waiting for it to finish.
Back to top
View user's profile Send private message
Pasketti
Tux's lil' helper
Tux's lil' helper


Joined: 04 Sep 2003
Posts: 101
Location: Austin, Texas

PostPosted: Thu Nov 15, 2012 2:34 pm    Post subject: Reply with quote

I think it's fixed. It's been two days now, and everything's fine.

Before I ran fsck, it would hang up every morning.

So the lesson here is "if sync hangs, and you can't find a hardware problem, run fsck -pf"

Thank you guys again for letting me bounce things off you!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum