View previous topic :: View next topic |
Author |
Message |
drescherjm Advocate
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
Posted: Tue Oct 05, 2010 12:51 pm Post subject: [SOLVED] strange badblocks / defective firmware Samsung F4 |
|
|
I got a new 2TB Samsung F4 drive (actually 3 of them) but I wanted to use the second for real data so I decided to test the drive first to make sure it was not DOA...
Anyways after 40+ hours of testing I was surprised to see that badblocks found bad blocks but the drive smart is telling me that the drive is totally fine and there were no drive related messages in my dmesg.
Code: | Here is the output:
jmd0 ~ # badblocks -svw /dev/sdf -o S2HGJ1BZ836643.txt
Checking for bad blocks in read-write mode
From block 0 to 1953514583
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 736 bad blocks found.
|
Now the smart
Code: |
jmd0 ~ # smartctl --all /dev/sdf
smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG HD204UI
Serial Number: S2HGJ1BZ836643
Firmware Version: 1AQ10001
User Capacity: 2,000,398,934,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 6
Local Time is: Mon Oct 4 19:47:42 2010 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (21060) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 0
2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0
3 Spin_Up_Time 0x0023 068 068 025 Pre-fail Always - 9724
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 2
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 56
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 2
181 Program_Fail_Cnt_Total 0x0022 252 252 000 Old_age Always - 0
191 G-Sense_Error_Rate 0x0022 252 252 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 064 064 000 Old_age Always - 28 (Lifetime Min/Max 22/36)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 0
223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 2
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
Note: selective self-test log revision number (0) not 1 implies that no selective self-test has ever been run
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Completed [00% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
|
The machine is a Intel core2 Q9550 running at 3.1GHz instead of 2.83 GHz. Yes, I know overclocking can cause this. However the system has been rock stable (24/7/365) at this overclock (not a single kernel panic ...) for nearly 2 years since I purchased it in November of 2008. I guess I should test the drive on my i7 box. The interesting thing is the first entire pass returned no errors at all.
And here is a link from the bad blocks output:
http://pastebin.com/QPURQ6Au
In a hard drive forum a user pointed out that this was strange since the blocks were 32K and badblocks was testing 64 1K blocks. The pattern was always 32 KB of bad blocks and that was scattered throughout the disk.
In looking at the data on the disk after the 4th pass has completed the result is totally unexpected:
Code: | jmd0 ~ # dd if=/dev/sdf bs=1024 skip=1862898656 count=33 | hexdump -C
00000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
*
00008000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00008400
33+0 records in
33+0 records out
33792 bytes (34 kB) copied, 0.000801165 s, 42.2 MB/s
|
At the 4th pass all bytes on the disk should be 0. FF was the previous pattern.
Again a second bad block shows the same pattern:
Code: | jmd0 ~ # dd if=/dev/sdf bs=1024 skip=1726351584 count=33 | hexdump -C
00000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
*
00008000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00008400
33+0 records in
33+0 records out
33792 bytes (34 kB) copied, 0.0218385 s, 1.5 MB/s |
After that a user on the hardware forum suggested I change the block size to 4096 (since the drive has 4096 byte sectors and retry):
Code: |
jmd0 ~ # badblocks -svw -c 32 -b 4096 /dev/sdf -o S2HGJ1BZ836643_test2.txt
Checking for bad blocks in read-write mode
From block 0 to 488378645
Testing with pattern 0xaa: jmd0 ~ # badblocks -svw -c 32 -b 4096 /dev/sdf -o S2HGJ1BZ836643_test2.txt
Checking for bad blocks in read-write mode
From block 0 to 488378645
done
|
So now its done writing the first pass and now reading it back. And I have bad blocks again..
Code: |
22320280
22320281
22320282
22320283
22320284
22320285
22320286
22320287
132451576
132451577
132451578
132451579
132451580
132451581
132451582
132451583
184302392
184302393
184302394
184302395
184302396
184302397
184302398
184302399
234629240
234629241
234629242
234629243
234629244
234629245
234629246
234629247
282862744
282862745
282862746
282862747
282862748
282862749
282862750
282862751
327766616
327766617
327766618
327766619
327766620
327766621
327766622
327766623
|
Notice these again are 32KB. And again they exhibit the same pattern..
Code: | jmd0 ~ # dd if=/dev/sdf bs=4096 skip=22320280 count=9 | hexdump -C
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
9+0 records in
9+0 records out
*
36864 bytes (37 kB) copied, 5.2471e-05 s, 703 MB/s
00008000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa |................|
*
00009000 |
The data after pass 1 should be all aa not 00. 00 was what the data was after the 4 th pass of the first run. _________________ John
My gentoo overlay
Instructons for overlay
Last edited by drescherjm on Fri Dec 10, 2010 1:33 am; edited 5 times in total |
|
Back to top |
|
|
richard.scott Veteran
Joined: 19 May 2003 Posts: 1497 Location: Oxfordshire, UK
|
Posted: Tue Oct 05, 2010 4:50 pm Post subject: |
|
|
In this output it doesn't say you have bad blocks:
Code: | jmd0 ~ # badblocks -svw -c 32 -b 4096 /dev/sdf -o S2HGJ1BZ836643_test2.txt
Checking for bad blocks in read-write mode
From block 0 to 488378645
Testing with pattern 0xaa: jmd0 ~ # badblocks -svw -c 32 -b 4096 /dev/sdf -o S2HGJ1BZ836643_test2.txt
Checking for bad blocks in read-write mode
From block 0 to 488378645
done |
How do you know that you have bad blocks when testing with 4K blocks?
Rich |
|
Back to top |
|
|
drescherjm Advocate
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
Posted: Tue Oct 05, 2010 4:53 pm Post subject: |
|
|
Looking at the disk directly I can see the problem is that for some reason at random some 32K blocks of data generated by badblocks is either not getting flushed to disk or the disk is ignoring the write. I now believe this is a kernel bug (as I can not see a drive randomly rejecting a 32 KB write without issuing an error) but I will have to test other drives.
Quote: | In this output it doesn't say you have bad blocks: |
I have it save the badblocks list into a file with the -o option. _________________ John
My gentoo overlay
Instructons for overlay |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9678 Location: almost Mile High in the USA
|
Posted: Tue Oct 05, 2010 4:55 pm Post subject: |
|
|
Everything wears out. Overclocking will wear out CPUs faster.
You should clock your CPU back at stock frequency and see if you can reproduce the error. Perhaps the overclocking is starting to wear out the processor?
Are there any reports of bad blocks in your dmesg ? If not, I would blame motherboard/cpu/RAM... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
drescherjm Advocate
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
Posted: Tue Oct 05, 2010 4:56 pm Post subject: |
|
|
No errors at all in dmesg.
Quote: | Everything wears out. Overclocking will wear out CPUs faster.
You should clock your CPU back at stock frequency and see if you can reproduce the error. Perhaps the overclocking is starting to wear out the processor? |
I agree. The thing is that I am not seeing a problem in anything else. The machine is on 24/7/365 and has been rock steady. No kernel panics. It records several GB of data for my HTPC daily and also it serves as an svn server and backup server. None of these experience errors / glitches .. (at least I do not see any ). I can try that easy though.. _________________ John
My gentoo overlay
Instructons for overlay |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9678 Location: almost Mile High in the USA
|
Posted: Tue Oct 05, 2010 9:33 pm Post subject: |
|
|
Due to the multiplier-locked nature of these chips, the bumped clock frequency is probably also bumping the fsb speed. This will affect the northbridge. I think later northbridges contain SATA, and this could be the component stressed by the very high traffic going through it. Is the heatsink on the northbridge in good shape? Fan?
Still very weird.
Could even be an incompatibility between the sata of the disk and the motherboard... that would be annoying. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
drescherjm Advocate
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
Posted: Tue Oct 05, 2010 9:42 pm Post subject: |
|
|
Thanks. I can see that this could be an issue especially with many hours of continuous writing. I am not sure if the freuency of the PCIe bus or northbridge is fixed. I can back off the overclock tonight.
Quote: | Is the heatsink on the northbridge in good shape? Fan? |
I am pretty sure there is no fan. This is an intel p45 board. ASUS P5Q-Pro.
I would think corruption would be random values on the disk. I can not explain how 32KB blocks (it's always 32KB) are randomly not being flushed to disk without any errors recorded by the disk or the os. _________________ John
My gentoo overlay
Instructons for overlay |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9678 Location: almost Mile High in the USA
|
Posted: Tue Oct 05, 2010 9:52 pm Post subject: |
|
|
It seems kind of unintuitive but corruption does not necessarily mean data corruption. Sometimes program/logic control flow gets corrupted and whole chunks of data/code get executed wrong... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
drescherjm Advocate
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
Posted: Tue Oct 05, 2010 11:16 pm Post subject: |
|
|
I just touched the heatsink on the core and it was barely warm. It is a heatpipe design that connects with the mosfets (power circuity) to keep them cool as well. I will reboot to set the frequency back to stock 2.83GHz. _________________ John
My gentoo overlay
Instructons for overlay |
|
Back to top |
|
|
Ant P. Watchman
Joined: 18 Apr 2009 Posts: 6920
|
Posted: Wed Oct 06, 2010 1:09 am Post subject: Re: strange badblocks problem. |
|
|
drescherjm wrote: | Code: | SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t] |
|
You should probably do a smartctl -t offline as well to see if it's really the disk. |
|
Back to top |
|
|
drescherjm Advocate
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
Posted: Wed Oct 06, 2010 1:51 am Post subject: |
|
|
I now doubt that the disk is the problem but I will test that at some point depending on where the stock frequency test takes me. I am around 2 hours 25 minutes in to writing and it is at 51% so it will not be until I wake up to see if the read test succeeded. _________________ John
My gentoo overlay
Instructons for overlay |
|
Back to top |
|
|
drescherjm Advocate
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
Posted: Wed Oct 06, 2010 1:02 pm Post subject: |
|
|
There were no errors on the first pass (without overclocking) however there are 4 things to consider.
1. The first pass has worked before without this issue.
2. I was not logged into kde-4.4
3. The drive was available at boot not hotplugged like I did the last time
4. Obviously the 270 MHz overclock of a 2830 MHz CPU was removed..
The second pass should be done in the next 5 to 6 hours.. _________________ John
My gentoo overlay
Instructons for overlay |
|
Back to top |
|
|
drescherjm Advocate
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
Posted: Wed Oct 06, 2010 4:49 pm Post subject: |
|
|
Two complete passes have completed successfully. The writing (and all reading) for last 1/2 of the 2nd pass was done with a user logged into kde. I will let it complete the 3rd pass this way. And then put the overclock back to see if the results are reproducible.. _________________ John
My gentoo overlay
Instructons for overlay |
|
Back to top |
|
|
drescherjm Advocate
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
Posted: Wed Oct 06, 2010 9:05 pm Post subject: |
|
|
On the third pass it appears to be back to its ways. This is a partial listing because we are less than 2% into the reading of the disk. Seems like the error rate is much worse this time.
Code: | jmd0 ~ # cat S2HGJ1BZ836643_test3.txt
251189752
251189753
251189754
251189755
251189756
251189757
251189758
251189759
297838360
297838361
297838362
297838363
297838364
297838365
297838366
297838367
341735416
341735417
341735418
341735419
341735420
341735421
341735422
341735423
382727320
382727321
382727322
382727323
382727324
382727325
382727326
382727327
454940152
454940153
454940154
454940155
454940156
454940157
454940158
454940159
486119800
486119801
486119802
486119803
486119804
486119805
486119806
486119807
|
_________________ John
My gentoo overlay
Instructons for overlay |
|
Back to top |
|
|
Herring42 Guru
Joined: 10 Mar 2004 Posts: 373 Location: Buckinghamshire
|
Posted: Thu Oct 07, 2010 11:40 am Post subject: |
|
|
Sounds like a hardware problem to me.
Thermal expansion, interference, anything like that really. When you look at the signal levels involved with SATA cables, and indeed, just the tracks on the motherboard it is statistically likely that you will get problems at least some of the time. Add a marginal component, and it becomes more likely. _________________ "The problem with quotes on the internet is that it is difficult
to determine whether or not they are genuine." -- Abraham Lincoln |
|
Back to top |
|
|
drescherjm Advocate
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
Posted: Thu Oct 07, 2010 12:34 pm Post subject: |
|
|
If the problem was a SATA cable I would expect to see CRC errors in SMART.
To exhaust the overclocking issue I have the same board at work that I could test. Only problem is I can not downgrade the kernel at work to 2.6.32. Also I would not expect this result from marginal components. I mean I would more likely expect a single bit filp or some random data but not 32KB writes not being flushed to disk. The bad blocks are always 32K and always the problem is the entire 32KB block on the disk had the value it should have had at the previous pass. _________________ John
My gentoo overlay
Instructons for overlay |
|
Back to top |
|
|
drescherjm Advocate
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
Posted: Fri Oct 08, 2010 12:44 pm Post subject: Re: strange badblocks problem. |
|
|
Ant_P wrote: | drescherjm wrote: | Code: | SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t] |
|
You should probably do a smartctl -t offline as well to see if it's really the disk. |
The disk says no errors.
Code: | jmd0 ~ # smartctl --all /dev/sdd
smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG HD204UI
Serial Number: S2HGJ1BZ836643
Firmware Version: 1AQ10001
User Capacity: 2,000,398,934,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 6
Local Time is: Fri Oct 8 08:40:57 2010 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (21060) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 0
2 Throughput_Performance 0x0026 056 056 000 Old_age Always - 19168
3 Spin_Up_Time 0x0023 077 068 025 Pre-fail Always - 7092
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 4
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 141
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 4
181 Program_Fail_Cnt_Total 0x0022 252 252 000 Old_age Always - 0
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 144
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 064 064 000 Old_age Always - 29 (Lifetime Min/Max 22/36)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 3
223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 4
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 138 -
Note: selective self-test log revision number (0) not 1 implies that no selective self-test has ever been run
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Completed [00% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
|
_________________ John
My gentoo overlay
Instructons for overlay |
|
Back to top |
|
|
Ant P. Watchman
Joined: 18 Apr 2009 Posts: 6920
|
Posted: Fri Oct 08, 2010 9:20 pm Post subject: |
|
|
I'm guessing it might be a transient memory error. Try memtest86, and if that comes out fine, try it with an overclock slightly higher than normal. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9678 Location: almost Mile High in the USA
|
Posted: Fri Oct 08, 2010 10:53 pm Post subject: |
|
|
I'm tending to guess a motherboard issue at this time...
definitely should try another motherboard or computer... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
drescherjm Advocate
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
Posted: Fri Oct 08, 2010 11:27 pm Post subject: |
|
|
I have a few identical boards at work but I will have to see what access I can get to them being that the test takes a very long time. _________________ John
My gentoo overlay
Instructons for overlay |
|
Back to top |
|
|
drescherjm Advocate
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
Posted: Sun Oct 10, 2010 7:29 pm Post subject: |
|
|
I put the drive in a different machine and it appears that the drive is fine. There are 2 badblocks listed however I am pretty certian I caused that (messing with drive mounting bracket while running the test). Unlike all other errors these were seen in dmesg. But again the hard drive did not see an error at all.
Code: | jmd1 ~ # badblocks -svw -c 32 -b 4096 /dev/sda -o S2HGJ1BZ836643_test4.txt
Checking for bad blocks in read-write mode
From block 0 to 488378645
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: cdone
Testing with pattern 0x00: ^[[Adone
Reading and comparing: done
Pass completed, 2 bad blocks found. |
Code: | jmd1 ~ # smartctl --all /dev/sda
smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG HD204UI
Serial Number: S2HGJ1BZ836643
Firmware Version: 1AQ10001
User Capacity: 2,000,398,934,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Not recognized. Minor revision code: 0x28
Local Time is: Sun Oct 10 15:26:44 2010 EDT
==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (21060) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off supp ort.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_ FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 24
2 Throughput_Performance 0x0026 056 056 000 Old_age Always - 19168
3 Spin_Up_Time 0x0023 069 068 025 Pre-fail Always - 9657
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 5
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 195
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 5
181 Unknown_Attribute 0x0022 252 252 000 Old_age Always - 0
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 144
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 056 055 000 Old_age Always - 44 (Lifetime Min/Max 22/45)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 3
223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 5
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA _of_first_error
# 1 Extended offline Completed without error 00% 138 -
SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data structure revis ion number = 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Completed [00% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
jmd1 ~ #
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 24
2 Throughput_Performance 0x0026 056 056 000 Old_age Always - 19168
3 Spin_Up_Time 0x0023 069 068 025 Pre-fail Always - 9657
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 5
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 195
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 5
181 Unknown_Attribute 0x0022 252 252 000 Old_age Always - 0
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 144
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 056 055 000 Old_age Always - 44 (Lifetime Min/Max 22/45)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 3
223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 5
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 138 -
SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data structure revision number = 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Completed [00% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay. |
_________________ John
My gentoo overlay
Instructons for overlay |
|
Back to top |
|
|
drescherjm Advocate
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
Posted: Fri Dec 03, 2010 2:04 pm Post subject: |
|
|
The following link believes the cause is a firmware problem on the new samsung F4 drives.
http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks
Quote: | The above suggests that the disk sometimes discards a pending 64 sector write command when a IDENTIFY DEVICE command is received. This data loss occurs silently. There is no error message in kernel log, SMART Error log, NCQ Command Error log page, or SATA Phy Event Counters log page.
Please note that the badblocks command reported "256 bad blocks" in the above test because the data read differs from the data written before. None of the tests resulted in actual bad (unreadable) blocks on the disk. Testing did not damage the disk itself. The problem is that new data already sent to the disk may not be written. Previously written data is not affected.
The problem could not be reproduced with the above test if any of the following conditions are met:
* Disk write cache is disabled.
* NCQ is disabled. This may not always be true as the c't lab also reported problems with NCQ disabled.
* A modified test version of smartctl which does not issue IDENTIFY DEVICE commands is used. Then all other SMART and non-SMART commands used by smartctl work without any data loss. |
_________________ John
My gentoo overlay
Instructons for overlay |
|
Back to top |
|
|
drescherjm Advocate
Joined: 05 Jun 2004 Posts: 2790 Location: Pittsburgh, PA, USA
|
|
Back to top |
|
|
|