View previous topic :: View next topic |
Author |
Message |
mudrii l33t
Joined: 26 Jun 2003 Posts: 789 Location: Singapore
|
Posted: Thu Sep 15, 2005 3:27 pm Post subject: |
|
|
Yep notsc solv problem for me to not 100% but 90% is working _________________ www.gentoo.ro |
|
Back to top |
|
|
Pajarico Guru
Joined: 01 May 2004 Posts: 493 Location: Madrid, España.
|
Posted: Thu Sep 15, 2005 4:08 pm Post subject: |
|
|
I got this error sometimes (especially when doing harddisk intensive tasks). Activating Cool&Quiet solved for me. Before i did that i din't get this message (did you try this?):
Code: | lxuser@hal2000 ~ $ dmesg | grep power
powernow-k8: Found 1 AMD Athlon 64 / Opteron processors (version 1.40.4)
powernow-k8: 0 : fid 0x2 (1000 MHz), vid 0x12 (1100 mV)
powernow-k8: 1 : fid 0xa (1800 MHz), vid 0x6 (1400 mV)
powernow-k8: 2 : fid 0xc (2000 MHz), vid 0x2 (1500 mV)
lxuser@hal2000 ~ $
|
I got an error about some modes not being found (can't remember exactly). _________________ Gentoo: the only software worth paying that is free. |
|
Back to top |
|
|
frozenJim Guru
Joined: 18 Jun 2004 Posts: 341 Location: Montreal, Quebec, Canada
|
Posted: Sun Sep 25, 2005 3:52 pm Post subject: |
|
|
I think that the problem is more basic. It has to be in the kernel's handling of time. So I'm in the same boat as Entropias Entropius wrote: | In my case it's not just error messages -- the clock is losing about twenty minutes a day, making me late for work yesterday. |
I am sitting in an office with 5 PC's. All of them are synched with ntpd to the same time source at boot time. By the end of the day - each computer has a different time showing.
They range from an ancient P100 running on the Intel 430TX chipset - to an AMD 751 [Irongate] - to a P4 on Intel 82850 (Tehema). Some use APM and some use ACPI and some use nothing.
About all that they have in common is that they cannot keep time.
All have been booted at least once in the past three days. The AMD has lost about 30 minutes and the P100 has lost about 20 (compated against the P4 which I am using as a benchmark here). Is the P4 correct? It is today! But I have noticed that after a full-system emerge even my P4 can be up to an hour off.
It VERY MUCH APPEARS that the more I emerge - the slower my clocks get. I have not tested to see if it is "emerge" itself or , more likely, the compiler taking up so many cycles that the clock wonks out.
So either there is something that I am setting up incorrectly in my kernels (various kernels in play here folks) or "Houston, we have a problem". It sure doesn't appear to be processor or chipset or APM related because NONE of my systems have these things in common. Maybe I have 5 different problems - but if it that is the case and getting an accurate clock is that impossible, then we have a BIG problem with Gentoo!
I hope that this reaches the eyes of a kernel developer who will be intrigued enough to "show that frozenJim guy what an ass he is" that he will dig into the issue and give us a definitive reply (OK OK... I know that our kernel developers work very, very hard for us and would never say "ass" ... but are their clocks all correct?). _________________ Who controls the past, controls the future. Who controls the present, controls the past. |
|
Back to top |
|
|
gralves Guru
Joined: 20 May 2003 Posts: 389 Location: Sao Paulo, Brazil
|
Posted: Sun Sep 25, 2005 4:00 pm Post subject: |
|
|
I would say that Huston, we have a bigger problem take a look at http://bugzilla.kernel.org/show_bug.cgi?id=5105 . I'll post a link to this thread there, but I think you should post even more info.
edit: It already reached a kernel developer and they are looking for more info. |
|
Back to top |
|
|
frozenJim Guru
Joined: 18 Jun 2004 Posts: 341 Location: Montreal, Quebec, Canada
|
Posted: Sun Sep 25, 2005 4:14 pm Post subject: |
|
|
Ah gralves, I believe you have found the solution. I read through the bug and guess what? THERE IS SOMETHING IN COMMON WITH ALL OF MY MACHINES!!! They are all single cpu machines and they all have SMP > 1.
Code: | Linux Kernel v2.6.12-gentoo-r10 Configuration
Symmetric multi-processing support
CONFIG_SMP:
This enables support for systems with more than one CPU. If you have a system with only one CPU, like most personal computers, say N. If you have a system with more than one CPU, say Y.
If you say N here, the kernel will run on single and multiprocessor machines, but will use only one CPU of a multiprocessor machine. If you say Y here, the kernel will run on many, but not all, single processor machines. On a single processor machine, the kernel will run faster if you say N here.
Note that if you say Y here and choose architecture "586" or "Pentium" under "Processor family", the kernel will not work on 486 architectures. Similarly, multiprocessor kernels for the "PPro" architecture may not work on all Pentium based boards. People using multiprocessor machines who say Y here should also say Y to "Enhanced Real Time Clock Support", below. The "Advanced Power Management" code will be disabled if you say Y here.
See also the <file:Documentation/smp.txt>, <file:Documentation/i386/IO-APIC.txt>, <file:Documentation/nmi_watchdog.txt> and the SMP-HOWTO available at <http://www.tldp.org/docs.html#howto>.
If you don't know what to do here, say N.
Symbol: SMP [=n]
Prompt: Symmetric multi-processing support
Defined at arch/i386/Kconfig:461
Location:
-> Processor type and features |
Wanna quote me on something? As a prophecy, frozenJim wrote: | I am going to set smp=1 in all of my processors and THIS WILL SOLVE THE PROBLEM. This will also make my mouse stop getting all "jerky" and unresponsive from time to time AND may even cause some of those "temporary freeze-ups" that seem to happen intermittently. |
I have always questioned the safety of leaving that number above one - but done it anyhow because I'm too lazy to look into it any further. And this is something that has NOT been explored yet in this thread.
So - 5 systems will be reset today to smp=1 and I shall return with the result, but I'm going to be really disappointed if this doesn't work. It sure FEELS like the culprit (doesn't it?). _________________ Who controls the past, controls the future. Who controls the present, controls the past. |
|
Back to top |
|
|
gralves Guru
Joined: 20 May 2003 Posts: 389 Location: Sao Paulo, Brazil
|
Posted: Sun Sep 25, 2005 4:30 pm Post subject: |
|
|
frozenJim wrote: |
Ah gralves, I believe you have found the solution. I read through the bug and guess what? THERE IS SOMETHING IN COMMON WITH ALL OF MY MACHINES!!! They are all single cpu machines and they all have SMP > 1. Wanna quote me on something? As a prophecy, frozenJim wrote: | I am going to set smp=1 in all of my processors and THIS WILL SOLVE THE PROBLEM. |
I have always questioned the safety of leaving that number above one - but done it anyhow. And this is something that has NOT been explored yet in this thread.
So - 5 systems will be reset today to smp=1 and I shall return with the result. Whaddya say, wanna bet this does the trick? |
The problem is that my machine(and most machines on the kernel bug thread) is a dual core processor, so I must leave SMP alone (or else I loose a cpu). I think there is some fundamentally broken in the way the kernel deals with time. It might be related to the SMP, but I don't think it is the root of the problem, just a factor that might trigger it.
But if this solves the problem for you, please post your findings on the kernel bugzilla. |
|
Back to top |
|
|
frozenJim Guru
Joined: 18 Jun 2004 Posts: 341 Location: Montreal, Quebec, Canada
|
Posted: Sun Sep 25, 2005 4:37 pm Post subject: |
|
|
gralves wrote: | so I must leave SMP alone (or else I loose a cpu). |
Well, it's certainly too soon to get into it too deeply until we have some results from my test. But it doesn't say that you are screwed if you have smp>1. Code: | People using multiprocessor machines who say Y here should also say Y to "Enhanced Real Time Clock Support" |
So if this IS the problem, you might still be OK (have you checked your Enhanced RTC support setting?). _________________ Who controls the past, controls the future. Who controls the present, controls the past. |
|
Back to top |
|
|
frozenJim Guru
Joined: 18 Jun 2004 Posts: 341 Location: Montreal, Quebec, Canada
|
Posted: Mon Sep 26, 2005 10:13 pm Post subject: |
|
|
frozenJim wrote: | So - 5 systems will be reset today to smp=1 and I shall return with the result |
OK, so I'm back. Sorry it took so long. Here's the skinny:
I set all 5 PC's so that smp=1 (all were set to either 2 or 8, I don't remember WHY) last night and rebuild the kernels. All hell broke loose of course. In the end though, I have 5 computers all clicking over the minutes in perfect synchronization after ONE DAY of hard work. Too soon to be sure, but usually by the end of a day I'm out by up to a half hour between the different machines.
So here's what I ended up doing:
Code: | cd /etc/usr/linux
make menuconfig |
Choose the 3rd option: Processor Type and Features ---->
Then clear the * from the 5th item: [*] Symmetric multiprocessing support
Escape your way out and save your new kernel configuration.
Compile your new kernel, copy it over to your boot drive and reboot. (note1: this assumes that you use Grub as your boot manager.) (note2: replace /boot/bzImage with whatever you have told Grub to expect for a kernel name. hint: Your system may prefer /boot/kernel-2.6.10-gentoo-r12 or something):
Code: | cp /usr/src/linux/
make && make menu_config
mount /boot
cp arch/i386/boot/bzImage /boot/bzImage
cp System.map /boot/System.map
cp .config /boot/config
reboot
|
System boots with (maybe) lots and lots of errors. I was able to get past these errors on all 5 systems without being stuck without network while needing to emerge (my greatest fear). Since I made it safely on 5 systems - you should be OK too.
It seems that GCC needs to be made a bit different when you don't have SMP enabled. I wont swear to it - but it made the difference for me.
After the emerge, you will need to source your new profile like this:
Code: | source /etc/profile |
Then wipe out any of the old kernel and modules that might be left lying around in some kind of pre-compiled state. I am not sure that all of these steps are necessary, but I DID find that nothing failed when I did them ALL - any time I skipped a step it seemed like I'd have to start all over again. (note: modules-update is likely not necessary here, but it can't hurt either):
Code: | make clean
make menuconfig
make && make modules_install
mount /boot
cp arch/i386/boot/bzImage /boot/bzImage
cp System.map /boot/System.map
cp .config /boot/config
modules-update
reboot |
This system worked on all but one PIII - 350 which refused to load any of the modules. After three tries, I just compiled the drivers for sound and network into the kernel to get rid of the errors. Well they still failed to load - then I realized that maybe...just maybe I had forgotten to copy my new kernel over to my boot partition (it was late, I was tired). Once I ran through all the steps one more time - with the modules left compiled into the kernel due to laziness - and the kernel actually copied over to /boot/bzImage - it ran just fine.
Conclusion: I don't know enough yet about how everything works together. But for the first time in a year - all of my clocks indicate the same time at the end of the day. Perhaps it isn't yet proven enough for you to make the change on your server... but gimme a week and I'll be back to update you on just how synchronized my clocks are by then. ;0) _________________ Who controls the past, controls the future. Who controls the present, controls the past. |
|
Back to top |
|
|
gralves Guru
Joined: 20 May 2003 Posts: 389 Location: Sao Paulo, Brazil
|
Posted: Tue Sep 27, 2005 3:36 am Post subject: |
|
|
frozenJim wrote: | frozenJim wrote: | So - 5 systems will be reset today to smp=1 and I shall return with the result |
OK, so I'm back. Sorry it took so long. Here's the skinny:
I set all 5 PC's so that smp=1 (all were set to either 2 or 8, I don't remember WHY) last night and rebuild the kernels. All hell broke loose of course. In the end though, I have 5 computers all clicking over the minutes in perfect synchronization after ONE DAY of hard work. Too soon to be sure, but usually by the end of a day I'm out by up to a half hour between the different machines.
...
Conclusion: I don't know enough yet about how everything works together. But for the first time in a year - all of my clocks indicate the same time at the end of the day. Perhaps it isn't yet proven enough for you to make the change on your server... but gimme a week and I'll be back to update you on just how synchronized my clocks are by then. ;0) |
from kernel bugzilla wrote: |
------- Additional Comment #69 From john stultz 2005-09-26 11:21 -------
Gustavo: Unacceptable time drift sounds like a different issue then what this
bug is covering. Would you mind filing a new bug describing the time drift you
are seeing as well as your NTP settings.
thanks.
------ Additional Comment #74 From Gustavo Ribeiro Alves 2005-09-26 20:26 -------
john: I will ask for the guy that posted on gentoo.org to open the bug. Since I
use a gentoo kernel tainted w/ ati binary-only drivers, and the bug seems only
to appear after more than 3 days of uptime, it will be hard for me to try to
reproduce it with a "clean" kernel (I need this machine running w/ the binary
drivers)
|
FrozenJim,
Could you try to reproduce the bug on one of your machines using a "vanilla" kernel? If so, please open a bug on http://bugzilla.kernel.org/ . I can't test it since I only have one machine and I can't let it be w/o ati binary drivers for the time needed for the bug to appear on it (about 3 days).
Thanks,
Gustavo |
|
Back to top |
|
|
robnotts Guru
Joined: 15 Mar 2004 Posts: 405 Location: Nottingham, UK
|
Posted: Tue Sep 27, 2005 5:52 am Post subject: |
|
|
Whilst I have a machine with a dual-core processor, the pmtmr and notsc options solved everything... this machine has been running for nearly 2 weeks now, and is still perfectly in time with the clock on the BBC.
Rob.
If you want any of my config files, output for dmesg, etc, please ask. _________________ ---
Gentoo Phenom][ X4 955 on AMD790 + Geforce 220GT 8GB/1.75TB (Desktop)
+ MythTV (3xFreeview,1xFreesat HD) on 1080p
Gentoo Turion64 X2 Geforce 6150 2GB/120GB (Laptop) |
|
Back to top |
|
|
piwacet Guru
Joined: 30 Dec 2004 Posts: 486
|
Posted: Wed Sep 28, 2005 3:31 am Post subject: |
|
|
I, too, have an X2, and I, too, have pmtmr and notsc set, but I still have the lost ticks message. My time source is relatively stable but I haven't left the computer on for very long lately.
Oh Well. On the kernel bug thread they think they made a patch which solves it. We'll see. |
|
Back to top |
|
|
gralves Guru
Joined: 20 May 2003 Posts: 389 Location: Sao Paulo, Brazil
|
Posted: Wed Sep 28, 2005 4:15 am Post subject: |
|
|
I'm not using the pmtmr option, will try it and see... |
|
Back to top |
|
|
Donpasquale Apprentice
Joined: 03 Jan 2003 Posts: 150 Location: Munich
|
Posted: Tue Oct 11, 2005 1:23 pm Post subject: |
|
|
Hi there,
I got the same problem with my nforce4 asus a8n sli premium. I have a amd64 x2 3800 on that and a asus geforce 7800 gt. The error in dmesg is:
Code: |
oben hahn # dmesg | grep tick
Bootdata ok (command line is root=/dev/sdb3 report_lost_ticks=100 noapic notsc clock=pmtmr)
Kernel command line: root=/dev/sdb3 report_lost_ticks=100 noapic notsc clock=pmtmr
time.c: Lost 10 timer tick(s)! rip setup_boot_APIC_clock+0x112/0x120)
testing NMI watchdog ... <4>time.c: Lost 546 timer tick(s)! rip nmi_cpu_busy+0x3/0x20)
time.c: Lost 1 timer tick(s)! rip _spin_unlock_irqrestore+0x5/0x40)
time.c: Lost 1 timer tick(s)! rip _spin_unlock_irqrestore+0x5/0x40)
time.c: Lost 1 timer tick(s)! rip _spin_unlock_irqrestore+0x5/0x40)
|
I tried lots of different parameters but cant get that issue working. I also tried out some patches from lkml but cant fix this issue. Perhaps someone of you has a fix which i can pack in some kind of an amd64 x2 tutorial.
Thanks in advance,
Passi
edit:
What i also got now while trying to emerge sync is :
Code: |
emerge[23442] general protection rip:2aaaaac96d36 rsp:7fffffd5bc58 error:0
|
I think it is also related to the lost timer stuff. |
|
Back to top |
|
|
gralves Guru
Joined: 20 May 2003 Posts: 389 Location: Sao Paulo, Brazil
|
Posted: Tue Oct 11, 2005 4:27 pm Post subject: |
|
|
Donpasquale wrote: | Hi there,
I got the same problem with my nforce4 asus a8n sli premium. I have a amd64 x2 3800 on that and a asus geforce 7800 gt. The error in dmesg is:
...
I think it is also related to the lost timer stuff. |
You have exactly my mb and cpu (video card is different thought). I solved all the problems by using kernel 2.6.13-gentoo-r3 and the following boot options :
kernel /kernel-2.6.13-gentoo-r3 root=/dev/hdb3 notsc clock=pmtmr |
|
Back to top |
|
|
ajc158 n00b
Joined: 17 Aug 2005 Posts: 42
|
Posted: Tue Oct 11, 2005 9:06 pm Post subject: |
|
|
4200+ X2, Gentoo kernel 2.6.13-r3 and loosing ticks.
I will try those boot options and report back.
Alex |
|
Back to top |
|
|
Donpasquale Apprentice
Joined: 03 Jan 2003 Posts: 150 Location: Munich
|
Posted: Tue Oct 11, 2005 9:33 pm Post subject: |
|
|
i will also try to update the kernel. just also updated the bios and windows is running fine noe. perhaps also the lost ticks will be correctable. I will also report back later |
|
Back to top |
|
|
Donpasquale Apprentice
Joined: 03 Jan 2003 Posts: 150 Location: Munich
|
Posted: Wed Oct 12, 2005 7:43 am Post subject: |
|
|
it didnt work. i still get lost timer ticks. i played around with the boot options but there seems to be no way to get these errors out of my log. I get
Code: | time.c: Lost 4 timer tick(s)! rip acpi_processor_idle+0x12e/0x380) | now most of the time.
Perhaps anyone still got an idea how i can also fix that error. i already tried the noapic option but it didnt work out.
Thanks in advance,
Passi |
|
Back to top |
|
|
ajc158 n00b
Joined: 17 Aug 2005 Posts: 42
|
Posted: Wed Oct 12, 2005 10:11 pm Post subject: |
|
|
Using notsc and clock =pmtmr I have no time gain on a 4200+ X2. These options appear to have solved the problem.
Alex |
|
Back to top |
|
|
piwacet Guru
Joined: 30 Dec 2004 Posts: 486
|
Posted: Thu Oct 13, 2005 4:14 am Post subject: |
|
|
AMD64 X2 3800 on linux-2.6.13-gentoo-r3, booting with notsc and clock=pmtmr, still getting lost ticks (but my time is stable). On the bug report they thought they had a patch which would work to fix this, but don't know if the patch has made it into the kernel yet. |
|
Back to top |
|
|
Donpasquale Apprentice
Joined: 03 Jan 2003 Posts: 150 Location: Munich
|
Posted: Thu Oct 13, 2005 8:35 am Post subject: |
|
|
as far as i understand the patch only appends the notsc parameter when the processor is an x2. |
|
Back to top |
|
|
gralves Guru
Joined: 20 May 2003 Posts: 389 Location: Sao Paulo, Brazil
|
Posted: Thu Oct 13, 2005 8:36 am Post subject: |
|
|
piwacet wrote: | AMD64 X2 3800 on linux-2.6.13-gentoo-r3, booting with notsc and clock=pmtmr, still getting lost ticks (but my time is stable). On the bug report they thought they had a patch which would work to fix this, but don't know if the patch has made it into the kernel yet. |
The only thingh the patch does is to turn notsc on automatically on amd64 (if I remember correctly). |
|
Back to top |
|
|
Bobtheguy n00b
Joined: 13 Oct 2005 Posts: 8 Location: UIUC
|
Posted: Thu Oct 13, 2005 4:24 pm Post subject: 4400+ will try |
|
|
It amazes me how many of us have pretty much the exact same config. Well I will try the updated kernel (currently running 2.6.12-r10) and those boot options on my 4400+, A8N-SLI Deluxe, and 7800 GTX. Will report back soon. |
|
Back to top |
|
|
cylamanae Tux's lil' helper
Joined: 26 Mar 2004 Posts: 126
|
Posted: Sat Oct 15, 2005 1:50 pm Post subject: |
|
|
I am having the same problem only with a laptop.... When it happens on my system and I have the screen saver. X locks up and the onlything that will fix it is a reboot. When I disable the screen saver it will continue to run fine. I wonder if CPU Frequency scaling could be the problem. I tried Symmetric multi-processing support and I still have the problem I hope we can all find a fix for this soon.
Thanks Calvin
*EDIT*
On the buzilla page. http://bugzilla.kernel.org/show_bug.cgi?id=5105 Supposedly the problem has been resolved for the 2.6.12.5 kernel. I will download that tonight to see if that will help fix it... |
|
Back to top |
|
|
Bobtheguy n00b
Joined: 13 Oct 2005 Posts: 8 Location: UIUC
|
Posted: Thu Oct 27, 2005 4:42 pm Post subject: Kernel upgrade has a fix |
|
|
The 2.6.13-r3 kernel has a patch for the X2 time source problem. I have been running it for about a week, no problems. |
|
Back to top |
|
|
tgh Apprentice
Joined: 05 Oct 2005 Posts: 222
|
Posted: Wed Nov 09, 2005 7:52 pm Post subject: |
|
|
I'm getting this in 2.6.13-gentoo-r5 with an Asus A8V (K8T800Pro and VT8237) w/ Athlon64 3200+ chip. Hard drives are hooked up to the onboard Promise controller (PDC20378), the onboard SATA controller and the onboard IDE controller. Plus the motherboard has an onboard gigabit ethernet NIC (Marvell 88E8001). In addition, I have even more hard drives hooked up to a Promise Ultra133 TX2 PCI card (PDC20269) and some HighPoint Rocket133SB PCI cards (HPT302). The dmesg output is:
Losing some ticks... checking if CPU frequency changed.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-4, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-5, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
skge eth0: enabling interface
skge eth0: Link is up at 1000 Mbps, full duplex, flow control tx and rx
eth0: no IPv6 routers present
warning: many lost ticks.
Your time source seems to be instable or some driver is hogging interupts
rip __do_softirq+0x48/0xb0
I already have SMP disabled (I've always disabled that in my kernels). The output from top shows heavy system and IRQ utilization:
Cpu(s): 0.9% us, 39.8% sy, 0.0% ni, 6.0% id, 0.3% wa, 1.0% hi, 52.0% si
The system is currently rebuilding a drive in a RAID1 array. The odd thing is that when I built this array using the AMD64 LiveCD, I was getting very good data rates when building the RAID array and there was no slowdown. Now I'm only getting around 3MB/s according to /proc/mdstat. The system is *extremely* sluggish as a result of the high sy/si utilizations. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|