Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Kernel & Hardware
  • Search

sync hangs

Kernel not recognizing your hardware? Problems with power management or PCMCIA? What hardware is compatible with Gentoo? See here. (Only for kernels supported by Gentoo.)
Post Reply
Advanced search
12 posts • Page 1 of 1
Author
Message
Pasketti
Tux's lil' helper
Tux's lil' helper
User avatar
Posts: 109
Joined: Thu Sep 04, 2003 12:47 am
Location: Austin, Texas

sync hangs

  • Quote

Post by Pasketti » Sat Nov 10, 2012 6:33 pm

I'm just looking for some sort of pointer here or maybe something else I can try.

My problem is that the sync command will intermittently hang up and never return.

When it does this, the sync process is unkillable.

I am also unable to shut down the system via "shutdown -r now" - that just says "System going down now!" and hangs up.

CTRL-ALT-DEL also doesn't work.

I can switch to another console or open another terminal session and everything seems fine, but if I issue another sync, that session will also lock up.

The only way to get rid of it is to manually shut down all server processes and power the box off.

When it comes back up, sync will work fine. Until it doesn't. Then it locks up.

I discovered this when a nightly job that does a sync never stopped. ps reported half a dozen processes that had been hanging for days. I commented out the sync line in the script, but that's just a bandaid. I'd like to find out what's going on and fix it.

I checked /sys/block/<device>/stat. Supposedly the 9th column shows pending requests, and that is 0, so it seems there is nothing for it to do, but still the sync never returns.

Has anyone else ever had this happen, and what did you do to fix it?

I'm running vanilla-sources 3.4.9, amd64. It's a server, and does not have X installed. I can be more vague on my configuration if necessary.

Thanks!
Top
NeddySeagoon
Administrator
Administrator
User avatar
Posts: 56264
Joined: Sat Jul 05, 2003 9:37 am
Location: 56N 3W

  • Quote

Post by NeddySeagoon » Sat Nov 10, 2012 7:26 pm

Pasketti,

That its random, suggests a hardware error of some sort.
A properly working sync should retry after a limit and when that fails too exit cleanly after the retry count is exhausted.
We can say thats its unlikely to be a network issue as the above behavior is not observed.

Find a boot CD that has memtest or mentest86+ on it and run a few cycles. Its important that you boot directly into memtest as it need to run on the bare hardware to get useful results.
Errors reported by memtest are not always memory errors. If it reports problems, post the error reports.

Install lm-sensors and check the CPU tempreture. It could be overheating for any number of reasons, but syncing isn't nearly as CPU intensive as building packages.

It can also be a CPU Vcore regulator issue. That will require a visual covers off inspection to check.
This part of the motherboard gets hot and is worked very hard. Cheap motherboards skimp here and early failures are common. How old is the system?
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Top
Pasketti
Tux's lil' helper
Tux's lil' helper
User avatar
Posts: 109
Joined: Thu Sep 04, 2003 12:47 am
Location: Austin, Texas

  • Quote

Post by Pasketti » Sun Nov 11, 2012 1:57 am

Bah. I was hoping for something like "Oh all you need to do is enable option XYZ and recompile your kernel."

I ran memtest for two passes, no errors found.

I installed lm_sensors. I'll see what it tells me.

It's older hardware (Core 2 Duo). The install is maybe three years old.

The only other thing I can think of is the power supply. Maybe I'll try that next.

Thank you for your assistance, good sir and/or madam!
Top
NeddySeagoon
Administrator
Administrator
User avatar
Posts: 56264
Joined: Sat Jul 05, 2003 9:37 am
Location: 56N 3W

  • Quote

Post by NeddySeagoon » Sun Nov 11, 2012 11:42 am

Pasketti,

memtest actually gives most of your motherboard componets a good workout as it uses the CPU to thrash the RAM.
As it passed, I'm inclined to think its probably not a thermal issue either.

As you still have control when the hang occurs, look in dmesg for any disc IO errors.

I'm not aware of any kernel issues. If there were, it would be all over the forums as many users would hit the issue.
As it seems to be 'just yourself' its likely to be hardware. Its still possible that its software due to hardware. For example, rsync or something that it uses, compiled incorrectly because of a transient hardware issue.

Its also possible that if you do not use an rsync rotation that the single rsync sever you do use has a problem. Its been known

Run emerge --info, the SYNC= line should be one of the following. IF not fix it and try again.

Code: Select all

#   Default:       "rsync://rsync.gentoo.org/gentoo-portage"
#   North America: "rsync://rsync.namerica.gentoo.org/gentoo-portage"
#   South America: "rsync://rsync.samerica.gentoo.org/gentoo-portage"
#   Europe:        "rsync://rsync.europe.gentoo.org/gentoo-portage"
#   Asia:          "rsync://rsync.asia.gentoo.org/gentoo-portage"
#   Australia:     "rsync://rsync.au.gentoo.org/gentoo-portage"
SYNC= "rsync://rsync.europe.gentoo.org/gentoo-portage"
These are not individual rsync servers. They are 'rotations' in that they pass you on to a real random rsync server. IF you get a dud, the next try normally gets you a different server. Make sure you are using a rotation (one of the above), not a single server.
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Top
eccerr0r
Watchman
Watchman
Posts: 10245
Joined: Thu Jul 01, 2004 6:51 pm
Location: almost Mile High in the USA
Contact:
Contact eccerr0r
Website

  • Quote

Post by eccerr0r » Mon Nov 12, 2012 11:37 pm

'Sync' is the command to tell the kernel to dump all dirty buffers to disk. When sync hangs, it means it thinks it has something to write (perhaps metadata) and can't do it.

If the problem is to a disk, it could mean a failing disk but usually it will get alerted with other dmesg errors.

Are you using NFS or other network filesystem? Are you (or your users) using FUSE (which I despise though it's a good concept)? These two cause a lot of hangs for me...
Intel Core i7 2700K/Radeon Firepro W2100/24GB DDR3/800GB SSD
What am I supposed watching?
Top
NeddySeagoon
Administrator
Administrator
User avatar
Posts: 56264
Joined: Sat Jul 05, 2003 9:37 am
Location: 56N 3W

  • Quote

Post by NeddySeagoon » Tue Nov 13, 2012 7:19 pm

eccerr0r,

How did I misread that ...

Sorry Pasketti
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Top
Pasketti
Tux's lil' helper
Tux's lil' helper
User avatar
Posts: 109
Joined: Thu Sep 04, 2003 12:47 am
Location: Austin, Texas

  • Quote

Post by Pasketti » Wed Nov 14, 2012 4:53 am

No worries. It did get me to consider hardware issues, which I had been avoiding thinking about.

smartctl tells me the drives pass all the diagnostics.

I bought a new (bigger) power supply, but when I put it in, the machine became unstable and would cycle power randomly within a few minutes. Putting the old one back in made it stop doing that, so I'm returning the PS.

I did blow a huge amount of dust out of the thing before I put it back.

But I did notice something. I back everything up to a second hard drive using rsync. When that's done, I delete several directories via rm -rf from the backup drive that don't need to be backed up. And the rm terminated with a "Killed" message. Once that happened, sync would start hanging.

So I tried to delete the source folders (they hold backups from my kids' Windows machines, and get recreated every Sunday). And the rm terminated with "Killed" and my shell process hung. I opened a new terminal and tried a sync. It hung. I couldn't reboot either.

I shut everything down manually, then cycled power, booted from my handy Live CD and ran fsck on all my filesystems.

I'm thinking there may have been some corruption in the file system causing rm to puke, but it left some flag set in the kernel that caused a race condition with sync. Or not.

As of right now, sync is not hanging. If it's still working in a couple of days, then I'll start to be more optimistic.

I do appreciate having someone to bounce things off of. It helps.

Thanks again!
Top
eccerr0r
Watchman
Watchman
Posts: 10245
Joined: Thu Jul 01, 2004 6:51 pm
Location: almost Mile High in the USA
Contact:
Contact eccerr0r
Website

  • Quote

Post by eccerr0r » Wed Nov 14, 2012 5:21 am

When you see the "Killed" it's because the kernel decided that program was doing something really bad (or was confused itself) and there should be some diagnostics in 'dmesg' - check that when you get it killed. When you run dmesg you should see exactly what it didn't like. You might want to open another terminal and run "dmesg" once in a while so that it's cached in RAM so the next time something happens, it's cached and won't have to worry about it not being able to read from disk.

What filesystem is this on for curiosity sake?
Intel Core i7 2700K/Radeon Firepro W2100/24GB DDR3/800GB SSD
What am I supposed watching?
Top
Pasketti
Tux's lil' helper
Tux's lil' helper
User avatar
Posts: 109
Joined: Thu Sep 04, 2003 12:47 am
Location: Austin, Texas

  • Quote

Post by Pasketti » Wed Nov 14, 2012 1:01 pm

It's ext4.

My backup ran last night, and there were no problems with rm. And sync isn't hanging either.

I am cautiously optimistic.
Top
Hu
Administrator
Administrator
Posts: 24556
Joined: Tue Mar 06, 2007 5:38 am

  • Quote

Post by Hu » Thu Nov 15, 2012 2:38 am

Pasketti wrote:But I did notice something. I back everything up to a second hard drive using rsync. When that's done, I delete several directories via rm -rf from the backup drive that don't need to be backed up.
You may be able to avoid the deletion step by using rsync exclude directives to skip copying the files in the first place.
Top
Pasketti
Tux's lil' helper
Tux's lil' helper
User avatar
Posts: 109
Joined: Thu Sep 04, 2003 12:47 am
Location: Austin, Texas

  • Quote

Post by Pasketti » Thu Nov 15, 2012 2:23 pm

Hu wrote:You may be able to avoid the deletion step by using rsync exclude directives to skip copying the files in the first place.
Yeah, I know. It was a tradeoff.

Here's the backup script:

Code: Select all

#!/bin/bash

DATE=`date "+%Y-%m-%d"`

date

# No leading slash on dir name

for DIR in root etc var/bind var/www var/lib/portage opt/msm home
do
  echo ============ Backup $DIR
  mkdir -p /backup/$DATE/$DIR
  rsync -avx --delete --link-dest=/backup/current/$DIR /$DIR/ /backup/$DATE/$DIR
done

rm -f /backup/prev
mv /backup/current /backup/prev
ln -s /backup/$DATE /backup/current
So I'd have to unwind the FOR loop if I wanted to exclude some directories. But the for loop makes it easy to add more paths. So I back it all up and then delete a few paths afterward. It happens at 3 AM, so it's not like I'm waiting for it to finish.
Top
Pasketti
Tux's lil' helper
Tux's lil' helper
User avatar
Posts: 109
Joined: Thu Sep 04, 2003 12:47 am
Location: Austin, Texas

  • Quote

Post by Pasketti » Thu Nov 15, 2012 2:34 pm

I think it's fixed. It's been two days now, and everything's fine.

Before I ran fsck, it would hang up every morning.

So the lesson here is "if sync hangs, and you can't find a hardware problem, run fsck -pf"

Thank you guys again for letting me bounce things off you!
Top
Post Reply

12 posts • Page 1 of 1

Return to “Kernel & Hardware”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Authors
Gentoo is a trademark of the Gentoo Foundation, Inc. and of Förderverein Gentoo e.V.
The contents of this document, unless otherwise expressly stated, are licensed under the CC-BY-SA-4.0 license.
The Gentoo Name and Logo Usage Guidelines apply.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy