Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
nvidia + X + kernel => high latency/crashing
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
redwood
Guru
Guru


Joined: 27 Jan 2006
Posts: 306

PostPosted: Fri Dec 15, 2006 4:12 am    Post subject: nvidia + X + kernel => high latency/crashing Reply with quote

I recently retired my old socket A 2.8Ghz Athlong T-bird into service as an zaptel/asterisk server
and bought a new 4200+ amd64 am2 / ASUS M2N4-SLI / Nvidia GeForce 7300GS / 1G ram barebones computer.
I put my old hard drives and Turtle Beach sound card into my new computer and recompiled a new i386 kernel for my new system
using nvidia modules for my mb's chipset/bridges/ethernet/AC97sound. I also recompiled nvida-drivers and alsa-driver.

gentoo-sources => 2.6.18-gentoo-r3
nvidia-drivers => 1.0.9631
alsa-driver => 1.0.13
gcc => i686-pc-linux-gnu-4.1.1

CFLAGS="-O2 -march=k8 -pipe"
CHOST="i686-pc-linux-gnu"
CXXFLAGS="${CFLAGS}"
ALSA_CARDS="cs46xx intel8x0"
INPUT_DEVICES="keyboard mouse evdev"
VIDEO_CARDS="nv nvidia vesa"

As soon as I start X the system starts randomly stalling/freezing and the hard disk light stays lit. Sometimes the sytem recovers and I can get ~10 minutes of work done before it again locks up.
Other times I eventually crash and corrupt my ext2 lvm2 volumes /var/tmp /tmp and /mnt/backups.

At first I thought the problem was due to a recent "emerge -uvD world" which upgraded dbus and broke
nearly all of gnome and a good portion of kde. I solved the dbus problem by installing dbus-glib and dbus-qt3-old and doing a complete "revdep-rebuild" followed by an upgrade of hal/pmount to ~x86 testing versions. According to revdep-rebuild my system is now OK.

But as soon as I started X and began some work the system hung for a few minutes.
So I went to another computer on my network and took a look at the tail of dmesg which shows
a lot of the following:

gameport: CS46xx Gameport is pci0000:01:06.0/gameport0, speed 1704kHz
PCI: Setting latency timer of device 0000:05:00.0 to 64
NVRM: loading NVIDIA Linux x86 Kernel Module 1.0-9631 Thu Nov 9 17:38:10 PST 2006
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x380000 action 0x2
ata4.00: (BMDMA stat 0x20)
ata4.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x380000 action 0x2
ata3.00: (BMDMA stat 0x20)
ata3.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)
ata4: soft resetting port
ata3: soft resetting port
ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata4.00: configured for UDMA/133
ata4: EH complete
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: configured for UDMA/133
ata3: EH complete
SCSI device sdc: 488397168 512-byte hdwr sectors (250059 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: drive cache: write back
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x380000 action 0x2
ata3.00: (BMDMA stat 0x20)
ata3.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)
ata3: soft resetting port
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: configured for UDMA/133
ata3: EH complete
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x180000 action 0x2 frozen
ata4.00: (BMDMA stat 0x21)
ata4.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata4: port is slow to respond, please be patient
ata4: port failed to respond (30 secs)
ata4: soft resetting port
ata4: port is slow to respond, please be patient
ata4: port failed to respond (30 secs)
ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ATA: abnormal status 0xD0 on port 0x967
ATA: abnormal status 0xD0 on port 0x967
ATA: abnormal status 0xD0 on port 0x967
ATA: abnormal status 0xD0 on port 0x967
ATA: abnormal status 0xD0 on port 0x967
ata4.00: qc timeout (cmd 0xec)
ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata4.00: revalidation failed (errno=-5)
ata4: failed to recover some devices, retrying in 5 secs
ata4: hard resetting port
ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata4.00: configured for UDMA/133
ata4: EH complete
SCSI device sdc: 488397168 512-byte hdwr sectors (250059 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: drive cache: write back
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x180000 action 0x2 frozen
ata3.00: (BMDMA stat 0x21)
ata3.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata3: port is slow to respond, please be patient
ata3: port failed to respond (30 secs)
ata3: soft resetting port
ata3: port is slow to respond, please be patient
ata3: port failed to respond (30 secs)
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ATA: abnormal status 0xD0 on port 0x9E7
ATA: abnormal status 0xD0 on port 0x9E7
ATA: abnormal status 0xD0 on port 0x9E7
ATA: abnormal status 0xD0 on port 0x9E7
ATA: abnormal status 0xD0 on port 0x9E7
ata3.00: qc timeout (cmd 0xec)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata3.00: revalidation failed (errno=-5)
ata3: failed to recover some devices, retrying in 5 secs
ata3: hard resetting port
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: configured for UDMA/133
ata3: EH complete
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back

These problems with my hard drives locking up only seems to happen when I'm running X.
I've tried kde/gnome/xfce4/fluxbox/twm with the exact same problem.
THANKS in advance for any ideas/suggestions on what is amiss.

Here's some more info on my system:

# lspci
00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)
00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev f3)
00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2)
00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2)
00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3)
00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2)
00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2)
00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev f2)
00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev f3)
00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev f3)
00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev f3)
00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev f3)
00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:06.0 Multimedia audio controller: Cirrus Logic CS 4614/22/24/30 [CrystalClear SoundFusion Audio Accelerator] (rev 01)
05:00.0 VGA compatible controller: nVidia Corporation GeForce 7300 GS (rev a1)

# cat /proc/interrupts
CPU0 CPU1
0: 5671297 187676 XT-PIC timer
1: 3238 9 IO-APIC-edge i8042
6: 0 3 IO-APIC-edge floppy
8: 1 1 IO-APIC-edge rtc
9: 0 0 IO-APIC-level acpi
12: 12103 115 IO-APIC-edge i8042
14: 177 12 IO-APIC-edge ide0
15: 273 12 IO-APIC-edge ide1
50: 0 0 IO-APIC-level ehci_hcd:usb1
58: 0 0 IO-APIC-level CS46XX
217: 593317 1 IO-APIC-level ohci_hcd:usb2, eth0
225: 1340 5 IO-APIC-level libata, NVidia CK804
233: 52406 9 IO-APIC-level libata
NMI: 0 0
LOC: 5858860 5858859
ERR: 0
MIS: 0

# lsmod
Module Size Used by
nvidia 4705972 0
snd_cs46xx 71624 0
snd_intel8x0 26140 0
snd_ac97_codec 84004 2 snd_cs46xx,snd_intel8x0
snd_ac97_bus 2304 1 snd_ac97_codec
snd_seq_midi 6048 0
snd_pcm_oss 34976 0
snd_mixer_oss 13568 1 snd_pcm_oss
snd_seq_oss 27392 0
snd_seq_midi_event 5888 2 snd_seq_midi,snd_seq_oss
snd_seq 41808 5 snd_seq_midi,snd_seq_oss,snd_seq_midi_event
via_rhine 18056 0
snd_rawmidi 17440 2 snd_cs46xx,snd_seq_midi
snd_seq_device 6284 4 snd_seq_midi,snd_seq_oss,snd_seq,snd_rawmidi
snd_pcm 59140 4 snd_cs46xx,snd_intel8x0,snd_ac97_codec,snd_pcm_oss
snd_timer 16772 2 snd_seq,snd_pcm
i2c_nforce2 6016 0
snd 40036 11 snd_cs46xx,snd_intel8x0,snd_ac97_codec,snd_pcm_oss,snd_mixer_oss,snd_seq_oss,snd_seq,snd_rawmidi,snd_seq_device,snd_pcm,snd_timer
soundcore 7136 1 snd
snd_page_alloc 7304 3 snd_cs46xx,snd_intel8x0,snd_pcm


# emerge --info
Portage 2.1.2_rc3-r3 (default-linux/x86/2006.0, gcc-4.1.1, glibc-2.4-r4, 2.6.18-gentoo-r3 i686)
=================================================================
System uname: 2.6.18-gentoo-r3 i686 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+
Gentoo Base System version 1.12.6
Last Sync: Thu, 14 Dec 2006 11:00:01 +0000
ccache version 2.3 [enabled]
dev-java/java-config: 1.3.7, 2.0.30
dev-lang/python: 2.4.3-r4
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache: 2.3
sys-apps/sandbox: 1.2.17
sys-devel/autoconf: 2.13, 2.60
sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2
sys-devel/binutils: 2.16.1-r3
sys-devel/gcc-config: 1.3.14
sys-devel/libtool: 1.5.22
virtual/os-headers: 2.6.17-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=k8 -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/lib/fax /usr/lib/mozilla/defaults/pref /usr/share/X11/xkb /usr/share/config /var/spool/fax/etc"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/vms/ /etc/revdep-rebuild /etc/splash /etc/terminfo /etc/texmf/web2c"
CXXFLAGS="-O2 -march=k8 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig ccache distcc distlocks loadpolicy metadata-transfer parallel-fetch sandbox sfperms"
GENTOO_MIRRORS="http://gentoo.osuosl.org/ http://distfiles.xgl-coffee.org/ http://www.schokokeks.org/~hanno/snapshots ftp://ftp.gtlib.cc.gatech.edu/pub/gentoo ftp://ftp.ussg.iu.edu/pub/linux/gentoo ftp://ftp.ucsb.edu/pub/mirrors/linux/gentoo/ http://gentoo.seren.com/gentoo http://gentoo.chem.wisc.edu/gentoo/ ftp://gentoo.chem.wisc.edu/gentoo/ http://cudlug.cudenver.edu/gentoo/ ftp://cudlug.cudenver.edu/pub/mirrors/distributions/gentoo/ http://gentoo.mirrors.pair.com/ ftp://gentoo.mirrors.pair.com/ http://gentoo.ccccom.com ftp://gentoo.ccccom.com http://gentoo.mirrors.tds.net/gentoo ftp://gentoo.mirrors.tds.net/gentoo http://gentoo.netnitco.net ftp://gentoo.netnitco.net/pub/mirrors/gentoo/source/ http://mirror.tucdemonic.org/gentoo/ http://mirrors.acm.cs.rpi.edu/gentoo/ ftp://ftp.ndlug.nd.edu/pub/gentoo/ ftp://gentoo.agsn.ca/ http://open-systems.ufl.edu/mirrors/gentoo http://gentoo.llarian.net/ ftp://gentoo.llarian.net/pub/gentoo http://gentoo.binarycompass.org http://gentoo.mirrored.ca/ ftp://gentoo.mirrored.ca/ http://mirror.datapipe.net/gentoo http://mirror.datapipe.net/gentoo http://gentoo.eliteitminds.com http://gentoo.cs.lewisu.edu/gentoo/ ftp://linux.cs.lewisu.edu/gentoo/ http://prometheus.cs.wmich.edu/gentoo http://modzer0.cs.uaf.edu/public/gentoo/ http://mirror.usu.edu/mirrors/gentoo/ ftp://mirror.usu.edu/mirrors/gentoo/ http://lug.mtu.edu/gentoo"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage /usr/portage/local/layman/xeffects"
SYNC="rsync://deeds.acjlaw.net/gentoo-portage"
USE="x86 16bit 3dnow 3dnowext 7zip S3TC X X509 Xaw3d a52 aac aalib acl acpi activefilter akode alsa alsa_cards_cs46xx alsa_cards_intel8x0 amd ao aotuv apache2 apm arts artswrappersuid artworkextra asf audiofile bash-completion bdf beagle berkdb bidi bigpatch bitmap-fonts bittorrent bl bonobo cairo cdda cdf cdio cdparanoia cdr cdrom cgi chroot cjk clamav clamd clanJavaScript cli corba cpudetection cracklib crypt css cups curlwrappers dbus dbx dga dhcp dillo dio directfb dlloader dmi dpms dri dts dv dvb dvbplayer dvd dvdr dvdread dynagraph eap-tls ecc edl eds effects elibc_glibc emboss emoticon encode enscript epiphany epson escreen esd evo exif exscalibar fame fastcgi fat fb ffmpeg firefox flac flash flatfile fmod font-server fontconfig foomaticdb fortran fping fpx ftp gb gcj gdbm gif gimp gimpprint glitz gnokii gnome gnustep gnutls gphoto2 gpm graphviz gs gstreamer gstreamer010 gstreamer08 gtk gtk2 gtkhtml gzip hal hardened hardenedphp hash hbci hddtemp hdf hdf5 hfs hlapi hpn iconv id3 idn imagemagick imap imlib inkjar input_devices_evdev input_devices_keyboard input_devices_mouse ipod ipv6 isdnlog jack jack-tmpfs java javascript jbig jce jfs jikes joystick jpeg jpeg2k jumpplay justify kde kdeenablefinal kdepim kdexdeltas kerberos kernel_linux kexi kipi koffice-plugin kqemu krb4 ladcca ladspa lame lapack lcms libcaca libclamav libg++ libgda libsamplerate libwww lids live lm_sensors logitech-mouse logrotate lpr lua lzo lzw mad maildir math matroska maya-shaderlibrary mbox mbrola mcve md5sum menubar mgetty mhash mikmod mime ming mjpeg mmx mmxext mod modplug motif mozsvg mp3 mp4 mp4live mpd-mad mpeg mpeg2 mpi mplayer musicbrainz mysql mysqli nas ncurses netjack netpbm network nextaw nforce2 nfs nls nptl nptlonly nsplugin ntfs nvidia odbc ofx ogg oggvorbis ole on-the-fly-crypt openal openexr opengl oss pam panel-plugin pcre pda pdf pdfkit perl pfpro php plotutils plugin pmu png posix postgres povray ppds pppd preview-latex python qemu-fast qt3 qt4 quicktime quotas quotes rar rc5 rdesktop readline real reflection reiser4 reiserfs rtc rtsp sasl sblive scanner sdl sensord server session seti setup-plugin sftp sftplogging shout silverxp skins slp smartcard sms sndfile sockets sox speedo spell spl spreadsheet sql sqlite sqlite3 sse sse2 ssl stream submenu subtitles svg svga svgz tcpd tetex tga theora thesaurus threads thumbnail thunar-vfs tidy tiff timidity tokenizer tomsfastmath toolbar transcode truetype truetype-fonts type1 type1-fonts udev unicode userland_GNU v4l vcd vcdimager vdr vfat vhosts video_cards_nv video_cards_nvidia video_cards_vesa videos vidix vim vim-with-x visualization vmdbmysql vmdbpostgres vorbis win32codecs wma wordperfect wsconvert wv x264 xanim xattr xcomposite xfs xine xml xorg xpm xprint xscreensaver xsettings xv xvid xvmc yaepg yv12 zlib"
Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_RSYNC_EXTRA_OPTS
Back to top
View user's profile Send private message
yabbadabbadont
Advocate
Advocate


Joined: 14 Mar 2003
Posts: 4791
Location: 2 exits past crazy

PostPosted: Fri Dec 15, 2006 4:19 am    Post subject: Reply with quote

Try using the 'nv' driver instead of 'nvidia' and be sure to use eselect to change your opengl to the xorg version. If it helps, then at least you know where to start experimenting to get a solution.
_________________
Bones McCracker wrote:
On the other hand, regex is popular with the ladies.
Back to top
View user's profile Send private message
redwood
Guru
Guru


Joined: 27 Jan 2006
Posts: 306

PostPosted: Fri Dec 15, 2006 6:59 am    Post subject: tried "nv" driver to no avail Reply with quote

I regenerated a new xorg.conf "Xorg --configure"
and mv'd xorg.conf.new /etc/X11/xorg.conf
Then I started up an xfce4 session.
Next I opened up an xterm and tried an "emerge -puvDt world" to get some disk activity.
and watched `tail -f /var/log/everything/current`:

Dec 15 01:39:27 [kernel] ata4.00: limiting speed to UDMA/66
Dec 15 01:39:27 [kernel] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x180000 action 0x2 frozen
Dec 15 01:39:27 [kernel] ata4.00: (BMDMA stat 0x21)
Dec 15 01:39:27 [kernel] ata4.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout)
Dec 15 01:39:34 [kernel] ata4: port is slow to respond, please be patient
Dec 15 01:39:57 [kernel] ata4: port failed to respond (30 secs)
Dec 15 01:39:57 [kernel] ata4: soft resetting port
Dec 15 01:40:01 [cron] (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Dec 15 01:40:04 [kernel] ata4: port is slow to respond, please be patient
Dec 15 01:40:27 [kernel] ata4: port failed to respond (30 secs)
Dec 15 01:40:27 [kernel] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Dec 15 01:40:27 [kernel] ATA: abnormal status 0xD0 on port 0x967
- Last output repeated 4 times -
Dec 15 01:40:57 [kernel] ata4.00: qc timeout (cmd 0xec)
Dec 15 01:40:57 [kernel] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Dec 15 01:40:57 [kernel] ata4.00: revalidation failed (errno=-5)
Dec 15 01:40:57 [kernel] ata4: failed to recover some devices, retrying in 5 secs

Dec 15 01:41:02 [kernel] ata4: hard resetting port
Dec 15 01:41:03 [kernel] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Dec 15 01:41:03 [kernel] ata4.00: configured for UDMA/66
Dec 15 01:41:03 [kernel] ata4: EH complete
Dec 15 01:41:03 [kernel] SCSI device sdc: 488397168 512-byte hdwr sectors (250059 MB)
Dec 15 01:41:03 [kernel] sdc: Write Protect is off
Dec 15 01:41:03 [kernel] SCSI device sdc: drive cache: write back



And some more dmesg output:
ata3: COMRESET failed (device not ready)
ata3: hardreset failed, retrying in 5 secs
ata3: hard resetting port
ata3: port is slow to respond, please be patient
ata3: port failed to respond (30 secs)
ata3: COMRESET failed (device not ready)
ata3: reset failed, giving up
ata3.00: disabled
ata3: EH complete
sd 2:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sdb, sector 35252182
sd 2:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sdb, sector 70644942
Buffer I/O error on device dm-5, logical block 1053
lost page write due to I/O error on dm-5
sd 2:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sdb, sector 70886502
Buffer I/O error on device dm-5, logical block 61456
lost page write due to I/O error on dm-5
sd 2:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sdb, sector 76407982
Buffer I/O error on device dm-6, logical block 393241
lost page write due to I/O error on dm-6
sd 2:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sdb, sector 76408014
Buffer I/O error on device dm-6, logical block 393245
lost page write due to I/O error on dm-6
Buffer I/O error on device dm-6, logical block 393246
lost page write due to I/O error on dm-6
Buffer I/O error on device dm-6, logical block 393247
lost page write due to I/O error on dm-6
sd 2:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sdb, sector 76417382
Buffer I/O error on device dm-6, logical block 395611
lost page write due to I/O error on dm-6
sd 2:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sdb, sector 76432854
sd 2:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sdb, sector 76449574
sd 2:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sdb, sector 76457318
sd 2:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sdb, sector 76468070
Buffer I/O error on device dm-6, logical block 408372
lost page write due to I/O error on dm-6
Aborting journal on device dm-5.
ReiserFS: dm-0: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [21119 260 0x0 SD]
ext3_abort called.
EXT3-fs error (device dm-5): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
sd 2:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sdb, sector 76469094
Buffer I/O error on device dm-6, logical block 408690
lost page write due to I/O error on dm-6
ata3: exception Emask 0x10 SAct 0x0 SErr 0x150000 action 0x2 frozen
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
Buffer I/O error on device dm-6, logical block 409085
lost page write due to I/O error on dm-6
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdb, sector 76470118
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
raid1: Disk failure on sdb3, disabling device.
Operation continuing on 1 devices
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
__journal_remove_journal_head: freeing b_committed_data
sd 2:0:0:0: rejecting I/O to offline device
EXT3-fs error (device dm-5): ext3_find_entry: reading directory #327701 offset 0
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:0, dev:sdb3
disk 1, wo:0, o:1, dev:sdc3
RAID1 conf printout:
--- wd:1 rd:2
disk 1, wo:0, o:1, dev:sdc3
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
REISERFS: abort (device dm-3): Journal write error in flush_commit_list
REISERFS: Aborting journal for filesystem on dm-3
sd 2:0:0:0: rejecting I/O to offline device
sd 2:0:0:0: rejecting I/O to offline device
ata3: port is slow to respond, please be patient
ata3: port failed to respond (30 secs)
ata3: soft resetting port
ata3: port is slow to respond, please be patient
ata3: port failed to respond (30 secs)
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3: EH pending after completion, repeating EH (cnt=4)
ata3: EH complete
ata3.00: detaching (SCSI 2:0:0:0)


So I'm going to reboot this system before my volumes are trashed.
Back to top
View user's profile Send private message
Dominique_71
Veteran
Veteran


Joined: 17 Aug 2005
Posts: 1877
Location: Switzerland (Romandie)

PostPosted: Mon Dec 18, 2006 5:39 pm    Post subject: Reply with quote

It look like a hardware problem to me. proc/interrupts show you at nvidia share its IRQ with libata. You must try to change the IRQ setting in your bios and/or move around some card(s) in your box. Further reading: http://www.gentoo.org/doc/en/articles/hardware-stability-p2.xml
_________________
"Confirm You are a robot." - the singularity
Back to top
View user's profile Send private message
redwood
Guru
Guru


Joined: 27 Jan 2006
Posts: 306

PostPosted: Mon Dec 18, 2006 7:31 pm    Post subject: [SOLVED] upgraded to [~][M]nvidia-drivers- 1.0.9742 Reply with quote

I upgraded to the Beta nvidia-drivers.

I did still get a system error message:
Dec 18 12:29:19 [kernel] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x380000 act
ion 0x2
Dec 18 12:29:19 [kernel] ata4.00: (BMDMA stat 0x20)
Dec 18 12:29:19 [kernel] ata4.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (
ATA bus error)
Dec 18 12:29:19 [kernel] ata4: soft resetting port


But the sata harddrive recovered and seems to be working without further errors in
'tail -f /var/log/everyything/current'


I'll keep monitoring it for a while though.
I don't know that libata is sharing an irq with the video card.
From reading in a forum, I learned that not all video cards require an irq.
the "NVIDIA CK804" may be the AC'97 onboard sound (it's kind of confusing since all the
mobo chips are "nforce4 CK804")


I also tried the VESA driver, and everything worked OK without harddisk problems, but graphics was
very, very slow. I could not configure xorg.conf with "nv". All I got was a black screen with no "X"
Back to top
View user's profile Send private message
Dominique_71
Veteran
Veteran


Joined: 17 Aug 2005
Posts: 1877
Location: Switzerland (Romandie)

PostPosted: Mon Dec 18, 2006 10:08 pm    Post subject: Reply with quote

If you are using the nvidia driver, it will use an IRQ (needed for the 3D for what I know). But the 2D nv driver will not use any IRQ.

IRQs in a PC are a mess. The first 8080 PC was having a PIC interface with only 8 IRQ chanels. With the 286, this PIC interface was upgraded to 16 IRQ. It is not much in modern PCs, so we have now the APIC interface that can manage more IRQ. The problem with the APIC is at it is not a completly new interface but a new level over the PIC. I thing at it is why you have this shared IRQ.

If you take a look in your bios, you will see at the IRQ bios setting know only about the PIC (16 IRQs). So, if you want to try to trim this issue, and I recommand you to do so, you will get a more reliable and stable system, the best thing to do is to disable the APIC in linux with noapic in grub or by disable it in the bios. So, you will find the same IRQ in the bios as in /proc/interupts, and it will be easier to find its way thru this problem. When done, it is up to you to use the PIC or the APIC. Some peoples are saying at the APIC interface have more overun as the PIC.

A simple way to win an IRQ is to add an acpi=off boot parameter in grub. It will completly disable the ACPI and will not work with a laptop.
_________________
"Confirm You are a robot." - the singularity
Back to top
View user's profile Send private message
kamelli952
n00b
n00b


Joined: 06 Dec 2006
Posts: 4

PostPosted: Sun Dec 24, 2006 9:25 pm    Post subject: Reply with quote

Try using the 2.6.19.1 kernel. I had about the same problem, and that seemed to solve it.
Back to top
View user's profile Send private message
Stolz
Moderator
Moderator


Joined: 19 Oct 2003
Posts: 3028
Location: Hong Kong

PostPosted: Mon Dec 25, 2006 12:22 am    Post subject: Re: nvidia + X + kernel => high latency/crashing Reply with quote

redwood wrote:
As soon as I start X the system starts randomly stalling/freezing and the hard disk light stays lit. Sometimes the sytem recovers and I can get ~10 minutes of work done before it again locks up.
Other times I eventually crash and corrupt my ext2 lvm2 volumes /var/tmp /tmp and /mnt/backups.


I was having similar problems, but I didn't see any errors on dmesg. The problem was >=2.618 forces IOMMU, and IOMMU forces kernel's AGP. The solution is in this post.

Hope it helps.

--Stolz
Back to top
View user's profile Send private message
redwood
Guru
Guru


Joined: 27 Jan 2006
Posts: 306

PostPosted: Wed Dec 27, 2006 4:07 am    Post subject: 2.6.19-r2 seems to be stable Reply with quote

I upgraded the kernel to gentoo-sources 2.6.19-r2
and the M2N4-SLI bios to version 704

(which did not go off without glitches. After using the EZ-Flash Update bios utility, upon reboot the system would only beep -- I had to unplug the computer, remove the battery, re-jumpter the BIOS pins to reset the flash ram. After doing all this and turning the
system back on it finally booted with the new 704 bios)

I also probably made some changes to the kernel configuration, but I've finally got a system that seems to be running X without trashing my filesystems.

Today (via ssh) I'm getting some error messages from dmesg like the following:

ReiserFS: dm-13: warning: vs-13075: reiserfs_read_locked_inode: dead inode read from disk [295593 297014 0x0 SD]. This is likely to be race with knfsd. Ignore
audacious invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
[<c013ac4d>] out_of_memory+0x6c/0x18f
[<c013c0b3>] __alloc_pages+0x1fa/0x284
[<c013d590>] __do_page_cache_readahead+0xbd/0x1e8
[<c04bf15a>] io_schedule+0x26/0x30
[<c041622a>] <4>metalog invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
dm_table_any_congested+0x32/0x48
[<c0414881>] dm_any_congested+0x2f/0x35
[<c013a0aa>] filemap_nopage+0x176/0x348
[<c013ac4d>] [<c01426bc>] __handle_mm_fault+0x166/0x7a0
[<c012d334>] out_of_memory+0x6c/0x18f
hrtimer_try_to_cancel+0x3c/0x42
[<c013c0b3>] __alloc_pages+0x1fa/0x284
[<c012d44e>] hrtimer_nanosleep+0x3d/0xf0
[<c013d590>] [<c0112a78>] do_page_fault+0x219/0x51d
[<c012d1ce>] __do_page_cache_readahead+0xbd/0x1e8
hrtimer_wakeup+0x0/0x18
[<c0425c1c>] [<c011285f>] sock_aio_write+0xf6/0x102
do_page_fault+0x0/0x51d
[<c04c0679>] error_code+0x39/0x40
[<c013a0aa>] =======================
Mem-info:
DMA per-cpu:
CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
CPU 1: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: Hot: hi: 186, btch: 31 usd: 4 Cold: hi: 62, btch: 15 usd: 15
CPU 1: Hot: hi: 186, btch: 31 usd: 30 Cold: hi: 62, btch: 15 usd: 47
Active:104768 inactive:105694 dirty:0 writeback:0 unstable:0 free:1822 slab:4282 mapped:115 pagetables:1093
filemap_nopage+0x176/0x348
DMA free:3544kB min:68kB low:84kB high:100kB active:3608kB inactive:3408kB present:16256kB pages_scanned:13290 all_unreclaimable? yes
[<c01426bc>] lowmem_reserve[]:__handle_mm_fault+0x166/0x7a0
[<c012ae9c>] 0 873


But at least the system isn't crashing. I had been running audacious + audacity just fine before the holidays (but from the above dmesg, maybe something's crashed? -- I'll see when I get back into the office)

Following the Gentoo Wiki on installing ardour + jack,
I tried to compile a kernel with "realtime lsm" built as a module, but kept getting a kernel panic during boot
despite adding the modules "capability" and "realtime" to /etc/modules.autoload/kernel-2.6
Back to top
View user's profile Send private message
Dominique_71
Veteran
Veteran


Joined: 17 Aug 2005
Posts: 1877
Location: Switzerland (Romandie)

PostPosted: Wed Dec 27, 2006 11:58 am    Post subject: Reply with quote

If you want to do serious audio work, I can recommand you to install a kernel from this overlay: Pro-Audio Gentoo Overlay Wiki forum thread. Both the realtime-lsm and rlimits (with and without pam) are supported to manage the priorities. But be aware at you will get in trouble with such a rt kernel and your shared IRQ. Both 2.6.16-rt29 and 2.6.19-rt15 are working fine in my box with the rt-lsm, gensplash and the nvidia driver (2.6.19-rt15 don't work with the alsa-driver but work fine with the in-kernel alsa driver. 2.6.16-rt29 work fine with both alsa drivers.).

Capability and realtime (realcap on 2.6.19) must be build as modules, but only realtime (or realcap) must be in /etc/modules.autolaod.d/kernel-2.6. You must add an option to tell the module to use the realtime cap for the audio group and you must be in the audio group:
Code:
realtime   gid=18

_________________
"Confirm You are a robot." - the singularity
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum