Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] strange badblocks / defective firmware Samsung F4
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Tue Oct 05, 2010 12:51 pm    Post subject: [SOLVED] strange badblocks / defective firmware Samsung F4 Reply with quote

I got a new 2TB Samsung F4 drive (actually 3 of them) but I wanted to use the second for real data so I decided to test the drive first to make sure it was not DOA...

Anyways after 40+ hours of testing I was surprised to see that badblocks found bad blocks but the drive smart is telling me that the drive is totally fine and there were no drive related messages in my dmesg.

Code:
Here is the output:

jmd0 ~ # badblocks -svw /dev/sdf -o S2HGJ1BZ836643.txt
Checking for bad blocks in read-write mode
From block 0 to 1953514583
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 736 bad blocks found.


Now the smart

Code:

jmd0 ~ # smartctl --all /dev/sdf
smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG HD204UI
Serial Number: S2HGJ1BZ836643
Firmware Version: 1AQ10001
User Capacity: 2,000,398,934,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 6
Local Time is: Mon Oct 4 19:47:42 2010 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (21060) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 0
2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0
3 Spin_Up_Time 0x0023 068 068 025 Pre-fail Always - 9724
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 2
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 56
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 2
181 Program_Fail_Cnt_Total 0x0022 252 252 000 Old_age Always - 0
191 G-Sense_Error_Rate 0x0022 252 252 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 064 064 000 Old_age Always - 28 (Lifetime Min/Max 22/36)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 0
223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 2

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


Note: selective self-test log revision number (0) not 1 implies that no selective self-test has ever been run
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Completed [00% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


The machine is a Intel core2 Q9550 running at 3.1GHz instead of 2.83 GHz. Yes, I know overclocking can cause this. However the system has been rock stable (24/7/365) at this overclock (not a single kernel panic ...) for nearly 2 years since I purchased it in November of 2008. I guess I should test the drive on my i7 box. The interesting thing is the first entire pass returned no errors at all.

And here is a link from the bad blocks output:
http://pastebin.com/QPURQ6Au

In a hard drive forum a user pointed out that this was strange since the blocks were 32K and badblocks was testing 64 1K blocks. The pattern was always 32 KB of bad blocks and that was scattered throughout the disk.

In looking at the data on the disk after the 4th pass has completed the result is totally unexpected:

Code:
jmd0 ~ # dd if=/dev/sdf bs=1024 skip=1862898656 count=33 | hexdump -C
00000000  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00008000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00008400
33+0 records in
33+0 records out
33792 bytes (34 kB) copied, 0.000801165 s, 42.2 MB/s


At the 4th pass all bytes on the disk should be 0. FF was the previous pattern.

Again a second bad block shows the same pattern:
Code:
jmd0 ~ # dd if=/dev/sdf bs=1024 skip=1726351584 count=33 | hexdump -C
00000000  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00008000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00008400
33+0 records in
33+0 records out
33792 bytes (34 kB) copied, 0.0218385 s, 1.5 MB/s


After that a user on the hardware forum suggested I change the block size to 4096 (since the drive has 4096 byte sectors and retry):

Code:

jmd0 ~ # badblocks -svw -c 32 -b 4096 /dev/sdf -o S2HGJ1BZ836643_test2.txt
Checking for bad blocks in read-write mode
From block 0 to 488378645
Testing with pattern 0xaa: jmd0 ~ # badblocks -svw -c 32 -b 4096 /dev/sdf -o S2HGJ1BZ836643_test2.txt
Checking for bad blocks in read-write mode
From block 0 to 488378645
done


So now its done writing the first pass and now reading it back. And I have bad blocks again..

Code:

22320280
22320281
22320282
22320283
22320284
22320285
22320286
22320287
132451576
132451577
132451578
132451579
132451580
132451581
132451582
132451583
184302392
184302393
184302394
184302395
184302396
184302397
184302398
184302399
234629240
234629241
234629242
234629243
234629244
234629245
234629246
234629247
282862744
282862745
282862746
282862747
282862748
282862749
282862750
282862751
327766616
327766617
327766618
327766619
327766620
327766621
327766622
327766623


Notice these again are 32KB. And again they exhibit the same pattern..

Code:
jmd0 ~ #  dd if=/dev/sdf bs=4096 skip=22320280 count=9 | hexdump -C
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
9+0 records in
9+0 records out
*
36864 bytes (37 kB) copied, 5.2471e-05 s, 703 MB/s
00008000  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|
*
00009000


The data after pass 1 should be all aa not 00. 00 was what the data was after the 4 th pass of the first run.
_________________
John

My gentoo overlay
Instructons for overlay


Last edited by drescherjm on Fri Dec 10, 2010 1:33 am; edited 5 times in total
Back to top
View user's profile Send private message
richard.scott
Veteran
Veteran


Joined: 19 May 2003
Posts: 1497
Location: Oxfordshire, UK

PostPosted: Tue Oct 05, 2010 4:50 pm    Post subject: Reply with quote

In this output it doesn't say you have bad blocks:

Code:
jmd0 ~ # badblocks -svw -c 32 -b 4096 /dev/sdf -o S2HGJ1BZ836643_test2.txt
Checking for bad blocks in read-write mode
From block 0 to 488378645
Testing with pattern 0xaa: jmd0 ~ # badblocks -svw -c 32 -b 4096 /dev/sdf -o S2HGJ1BZ836643_test2.txt
Checking for bad blocks in read-write mode
From block 0 to 488378645
done


How do you know that you have bad blocks when testing with 4K blocks?

Rich
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Tue Oct 05, 2010 4:53 pm    Post subject: Reply with quote

Looking at the disk directly I can see the problem is that for some reason at random some 32K blocks of data generated by badblocks is either not getting flushed to disk or the disk is ignoring the write. I now believe this is a kernel bug (as I can not see a drive randomly rejecting a 32 KB write without issuing an error) but I will have to test other drives.

Quote:
In this output it doesn't say you have bad blocks:


I have it save the badblocks list into a file with the -o option.
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9645
Location: almost Mile High in the USA

PostPosted: Tue Oct 05, 2010 4:55 pm    Post subject: Reply with quote

Everything wears out. Overclocking will wear out CPUs faster.

You should clock your CPU back at stock frequency and see if you can reproduce the error. Perhaps the overclocking is starting to wear out the processor?

Are there any reports of bad blocks in your dmesg ? If not, I would blame motherboard/cpu/RAM...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Tue Oct 05, 2010 4:56 pm    Post subject: Reply with quote

No errors at all in dmesg.

Quote:
Everything wears out. Overclocking will wear out CPUs faster.

You should clock your CPU back at stock frequency and see if you can reproduce the error. Perhaps the overclocking is starting to wear out the processor?


I agree. The thing is that I am not seeing a problem in anything else. The machine is on 24/7/365 and has been rock steady. No kernel panics. It records several GB of data for my HTPC daily and also it serves as an svn server and backup server. None of these experience errors / glitches .. (at least I do not see any ). I can try that easy though..
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9645
Location: almost Mile High in the USA

PostPosted: Tue Oct 05, 2010 9:33 pm    Post subject: Reply with quote

Due to the multiplier-locked nature of these chips, the bumped clock frequency is probably also bumping the fsb speed. This will affect the northbridge. I think later northbridges contain SATA, and this could be the component stressed by the very high traffic going through it. Is the heatsink on the northbridge in good shape? Fan?

Still very weird.

Could even be an incompatibility between the sata of the disk and the motherboard... that would be annoying.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Tue Oct 05, 2010 9:42 pm    Post subject: Reply with quote

Thanks. I can see that this could be an issue especially with many hours of continuous writing. I am not sure if the freuency of the PCIe bus or northbridge is fixed. I can back off the overclock tonight.


Quote:
Is the heatsink on the northbridge in good shape? Fan?

I am pretty sure there is no fan. This is an intel p45 board. ASUS P5Q-Pro.


Quote:
Still very weird.

I would think corruption would be random values on the disk. I can not explain how 32KB blocks (it's always 32KB) are randomly not being flushed to disk without any errors recorded by the disk or the os.
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9645
Location: almost Mile High in the USA

PostPosted: Tue Oct 05, 2010 9:52 pm    Post subject: Reply with quote

It seems kind of unintuitive but corruption does not necessarily mean data corruption. Sometimes program/logic control flow gets corrupted and whole chunks of data/code get executed wrong...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Tue Oct 05, 2010 11:16 pm    Post subject: Reply with quote

I just touched the heatsink on the core and it was barely warm. It is a heatpipe design that connects with the mosfets (power circuity) to keep them cool as well. I will reboot to set the frequency back to stock 2.83GHz.
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6920

PostPosted: Wed Oct 06, 2010 1:09 am    Post subject: Re: strange badblocks problem. Reply with quote

drescherjm wrote:
Code:
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

You should probably do a smartctl -t offline as well to see if it's really the disk.
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Wed Oct 06, 2010 1:51 am    Post subject: Reply with quote

I now doubt that the disk is the problem but I will test that at some point depending on where the stock frequency test takes me. I am around 2 hours 25 minutes in to writing and it is at 51% so it will not be until I wake up to see if the read test succeeded.
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Wed Oct 06, 2010 1:02 pm    Post subject: Reply with quote

There were no errors on the first pass (without overclocking) however there are 4 things to consider.
1. The first pass has worked before without this issue.
2. I was not logged into kde-4.4
3. The drive was available at boot not hotplugged like I did the last time
4. Obviously the 270 MHz overclock of a 2830 MHz CPU was removed..

The second pass should be done in the next 5 to 6 hours..
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Wed Oct 06, 2010 4:49 pm    Post subject: Reply with quote

Two complete passes have completed successfully. The writing (and all reading) for last 1/2 of the 2nd pass was done with a user logged into kde. I will let it complete the 3rd pass this way. And then put the overclock back to see if the results are reproducible..
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Wed Oct 06, 2010 9:05 pm    Post subject: Reply with quote

On the third pass it appears to be back to its ways. This is a partial listing because we are less than 2% into the reading of the disk. Seems like the error rate is much worse this time.

Code:
jmd0 ~ # cat S2HGJ1BZ836643_test3.txt
251189752
251189753
251189754
251189755
251189756
251189757
251189758
251189759
297838360
297838361
297838362
297838363
297838364
297838365
297838366
297838367
341735416
341735417
341735418
341735419
341735420
341735421
341735422
341735423
382727320
382727321
382727322
382727323
382727324
382727325
382727326
382727327
454940152
454940153
454940154
454940155
454940156
454940157
454940158
454940159
486119800
486119801
486119802
486119803
486119804
486119805
486119806
486119807

_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
Herring42
Guru
Guru


Joined: 10 Mar 2004
Posts: 373
Location: Buckinghamshire

PostPosted: Thu Oct 07, 2010 11:40 am    Post subject: Reply with quote

Sounds like a hardware problem to me.

Thermal expansion, interference, anything like that really. When you look at the signal levels involved with SATA cables, and indeed, just the tracks on the motherboard it is statistically likely that you will get problems at least some of the time. Add a marginal component, and it becomes more likely.
_________________
"The problem with quotes on the internet is that it is difficult
to determine whether or not they are genuine." -- Abraham Lincoln
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Thu Oct 07, 2010 12:34 pm    Post subject: Reply with quote

If the problem was a SATA cable I would expect to see CRC errors in SMART.

To exhaust the overclocking issue I have the same board at work that I could test. Only problem is I can not downgrade the kernel at work to 2.6.32. Also I would not expect this result from marginal components. I mean I would more likely expect a single bit filp or some random data but not 32KB writes not being flushed to disk. The bad blocks are always 32K and always the problem is the entire 32KB block on the disk had the value it should have had at the previous pass.
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Fri Oct 08, 2010 12:44 pm    Post subject: Re: strange badblocks problem. Reply with quote

Ant_P wrote:
drescherjm wrote:
Code:
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

You should probably do a smartctl -t offline as well to see if it's really the disk.


The disk says no errors.

Code:
jmd0 ~ # smartctl --all /dev/sdd
smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG HD204UI
Serial Number:    S2HGJ1BZ836643
Firmware Version: 1AQ10001
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 6
Local Time is:    Fri Oct  8 08:40:57 2010 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (21060) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0026   056   056   000    Old_age   Always       -       19168
  3 Spin_Up_Time            0x0023   077   068   025    Pre-fail  Always       -       7092
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       4
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       141
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       4
181 Program_Fail_Cnt_Total  0x0022   252   252   000    Old_age   Always       -       0
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       144
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   064   064   000    Old_age   Always       -       29 (Lifetime Min/Max 22/36)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   252   252   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       3
223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       4

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       138         -

Note: selective self-test log revision number (0) not 1 implies that no selective self-test has ever been run
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Completed [00% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6920

PostPosted: Fri Oct 08, 2010 9:20 pm    Post subject: Reply with quote

I'm guessing it might be a transient memory error. Try memtest86, and if that comes out fine, try it with an overclock slightly higher than normal.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9645
Location: almost Mile High in the USA

PostPosted: Fri Oct 08, 2010 10:53 pm    Post subject: Reply with quote

I'm tending to guess a motherboard issue at this time...
definitely should try another motherboard or computer...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Fri Oct 08, 2010 11:27 pm    Post subject: Reply with quote

I have a few identical boards at work but I will have to see what access I can get to them being that the test takes a very long time.
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Sun Oct 10, 2010 7:29 pm    Post subject: Reply with quote

I put the drive in a different machine and it appears that the drive is fine. There are 2 badblocks listed however I am pretty certian I caused that (messing with drive mounting bracket while running the test). Unlike all other errors these were seen in dmesg. But again the hard drive did not see an error at all.

Code:
jmd1 ~ # badblocks -svw -c 32 -b 4096 /dev/sda -o S2HGJ1BZ836643_test4.txt
Checking for bad blocks in read-write mode
From block 0 to 488378645
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: cdone
Testing with pattern 0x00: ^[[Adone
Reading and comparing: done
Pass completed, 2 bad blocks found.


Code:
jmd1 ~ # smartctl --all /dev/sda
smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG HD204UI
Serial Number:    S2HGJ1BZ836643
Firmware Version: 1AQ10001
User Capacity:    2,000,398,934,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Not recognized. Minor revision code: 0x28
Local Time is:    Sun Oct 10 15:26:44 2010 EDT

==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (21060) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off supp                                                                   ort.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_                                                                   FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -                                                                          24
  2 Throughput_Performance  0x0026   056   056   000    Old_age   Always       -                                                                          19168
  3 Spin_Up_Time            0x0023   069   068   025    Pre-fail  Always       -                                                                          9657
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -                                                                          5
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -                                                                          0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -                                                                          0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -                                                                          0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -                                                                          195
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -                                                                          0
 11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -                                                                          0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -                                                                          5
181 Unknown_Attribute       0x0022   252   252   000    Old_age   Always       -                                                                          0
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -                                                                          144
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -                                                                          0
194 Temperature_Celsius     0x0002   056   055   000    Old_age   Always       -                                                                          44 (Lifetime Min/Max 22/45)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -                                                                          0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -                                                                          0
197 Current_Pending_Sector  0x0032   252   252   000    Old_age   Always       -                                                                          0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -                                                                          0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -                                                                          0
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -                                                                          3
223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -                                                                          0
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -                                                                          5

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA                                                                   _of_first_error
# 1  Extended offline    Completed without error       00%       138         -

SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data structure revis                                                                   ion number = 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Completed [00% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

jmd1 ~ #
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       24
  2 Throughput_Performance  0x0026   056   056   000    Old_age   Always       -       19168
  3 Spin_Up_Time            0x0023   069   068   025    Pre-fail  Always       -       9657
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       5
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       195
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       5
181 Unknown_Attribute       0x0022   252   252   000    Old_age   Always       -       0
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       144
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   056   055   000    Old_age   Always       -       44 (Lifetime Min/Max                                        22/45)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   252   252   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       3
223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       5

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       138         -

SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data structure revision number = 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Completed [00% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Fri Dec 03, 2010 2:04 pm    Post subject: Reply with quote

The following link believes the cause is a firmware problem on the new samsung F4 drives.

http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks

Quote:
The above suggests that the disk sometimes discards a pending 64 sector write command when a IDENTIFY DEVICE command is received. This data loss occurs silently. There is no error message in kernel log, SMART Error log, NCQ Command Error log page, or SATA Phy Event Counters log page.

Please note that the badblocks command reported "256 bad blocks" in the above test because the data read differs from the data written before. None of the tests resulted in actual bad (unreadable) blocks on the disk. Testing did not damage the disk itself. The problem is that new data already sent to the disk may not be written. Previously written data is not affected.

The problem could not be reproduced with the above test if any of the following conditions are met:

* Disk write cache is disabled.

* NCQ is disabled. This may not always be true as the c't lab also reported problems with NCQ disabled.

* A modified test version of smartctl which does not issue IDENTIFY DEVICE commands is used. Then all other SMART and non-SMART commands used by smartctl work without any data loss.

_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
drescherjm
Advocate
Advocate


Joined: 05 Jun 2004
Posts: 2790
Location: Pittsburgh, PA, USA

PostPosted: Fri Dec 10, 2010 1:31 am    Post subject: Reply with quote

Samsung has released a patch to fix this bug.

http://www.samsung.com/global/business/hdd/faqView.do?b2b_bbs_msg_id=386

I have confirmed that the patch fixes the issue for me.
_________________
John

My gentoo overlay
Instructons for overlay
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum