View previous topic :: View next topic |
Author |
Message |
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 89
|
Posted: Fri Nov 06, 2015 11:52 am Post subject: strange issues with raid6 (file corruption or kernel oops) |
|
|
Hello,
I have a raid6 array with a damaged hard drive. However, when a write error occurs on the array, it doesn't fail the harddrive, instead one of two things happen:
If I'm using kernel 3.18.12, it will log messages to dmesg saying I/O error, and the file on the array will be corrupt. The array does not fail the disk, as it should, so I end up with tons of corrupt files
If I'm using any 4.x version of kernel (I have tried both 4.0.9 and 4.1.12) then when a write error occurs, I get a kernel oops logged to dmesg and all I/O to the array will hang. I have to forcefully reboot the server, because a ton of processes get stuck in state D, and the discs are never marked as failed.
Here is the output from dmesg of a write error when it occurs on kernel version 3.18.12:
Code: | 172.679073] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 1052672 starting block 5172953088)
[ 172.679076] Buffer I/O error on device md4, logical block 5172953088
[ 172.679078] Buffer I/O error on device md4, logical block 5172953089
[ 172.679078] Buffer I/O error on device md4, logical block 5172953090
[ 172.679079] Buffer I/O error on device md4, logical block 5172953091
[ 172.679080] Buffer I/O error on device md4, logical block 5172953092
[ 172.679081] Buffer I/O error on device md4, logical block 5172953093
[ 172.679082] Buffer I/O error on device md4, logical block 5172953094
[ 172.679082] Buffer I/O error on device md4, logical block 5172953095
[ 172.679083] Buffer I/O error on device md4, logical block 5172953096
[ 172.679084] Buffer I/O error on device md4, logical block 5172953097
[ 172.983977] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 1576960 starting block 5172953216)
[ 173.489071] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 2101248 starting block 5172953344)
[ 174.330710] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 2625536 starting block 5172953472)
[ 175.123257] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 3149824 starting block 5172953600)
[ 175.406390] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 3674112 starting block 5172953728)
[ 175.608958] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 4198400 starting block 5172953856)
[ 175.968224] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 4722688 starting block 5172953984)
[ 176.130072] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 5246976 starting block 5172954112)
[ 176.215623] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 6819840 starting block 5172954240)
[ 177.925267] EXT4-fs warning: 6 callbacks suppressed
[ 177.925270] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 33554432 size 1052672 starting block 5172955136)
[ 177.925271] buffer_io_error: 2038 callbacks suppressed
[ 177.925272] Buffer I/O error on device md4, logical block 5172955136
[ 177.925274] Buffer I/O error on device md4, logical block 5172955137
[ 177.925275] Buffer I/O error on device md4, logical block 5172955138
[ 177.925276] Buffer I/O error on device md4, logical block 5172955139
[ 177.925276] Buffer I/O error on device md4, logical block 5172955140
[ 177.925277] Buffer I/O error on device md4, logical block 5172955141
[ 177.925278] Buffer I/O error on device md4, logical block 5172955142
[ 177.925279] Buffer I/O error on device md4, logical block 5172955143
[ 177.925280] Buffer I/O error on device md4, logical block 5172955144
[ 177.925280] Buffer I/O error on device md4, logical block 5172955145
[ 178.642566] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 33554432 size 1576960 starting block 5172955264)
[ 179.078914] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 33554432 size 2101248 starting block 5172955392)
[ 179.976324] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 33554432 size 2625536 starting block 5172955520)
[ 180.782833] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 33554432 size 3149824 starting block 5172955648)
[ 181.333570] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 33554432 size 3674112 starting block 5172955776)
[ 181.820475] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 33554432 size 4198400 starting block 5172955904)
[ 183.171425] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 33554432 size 4722688 starting block 5172956032)
[ 183.171428] buffer_io_error: 886 callbacks suppressed
[ 183.171429] Buffer I/O error on device md4, logical block 5172956032
[ 183.171431] Buffer I/O error on device md4, logical block 5172956033
[ 183.171432] Buffer I/O error on device md4, logical block 5172956034
[ 183.171433] Buffer I/O error on device md4, logical block 5172956035
[ 183.171434] Buffer I/O error on device md4, logical block 5172956036
[ 183.171435] Buffer I/O error on device md4, logical block 5172956037
[ 183.171436] Buffer I/O error on device md4, logical block 5172956038
[ 183.171436] Buffer I/O error on device md4, logical block 5172956039
[ 183.171437] Buffer I/O error on device md4, logical block 5172956040
[ 183.171438] Buffer I/O error on device md4, logical block 5172956041
|
Here is sample output from dmesg when a write error occurs on version 4.0.9 or 4.1.12:
Code: |
[ 158.138253] BUG: unable to handle kernel NULL pointer dereference at 0000000000000120
[ 158.138391] IP: [<ffffffffa024cc1f>] handle_stripe+0xdc0/0x1e1f [raid456]
[ 158.138482] PGD 24ff59067 PUD 24fe43067 PMD 0
[ 158.138646] Oops: 0000 [#1] SMP
[ 158.138758] Modules linked in: ipv6 binfmt_misc joydev x86_pkg_temp_thermal coretemp kvm_intel kvm microcode pcspkr video i2c_i801 thermal acpi_cpufreq fan battery rtc_cmos backlight processor thermal_sys xhci_pci button xts gf128mul aes_x86_64 cbc sha256_generic scsi_transport_iscsi multipath linear raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod hid_sunplus hid_sony led_class hid_samsung hid_pl hid_petalynx hid_monterey hid_microsoft hid_logitech hid_gyration hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech sl811_hcd usbhid xhci_hcd ohci_hcd uhci_hcd usb_storage ehci_pci ehci_hcd usbcore usb_common megaraid_sas megaraid_mbox megaraid_mm megaraid sx8
[ 158.141809] DAC960 cciss mptsas mptfc scsi_transport_fc mptspi scsi_transport_spi mptscsih mptbase sg
[ 158.142226] CPU: 0 PID: 2017 Comm: md4_raid6 Not tainted 4.1.12-gentoo #1
[ 158.142272] Hardware name: Supermicro X10SAT/X10SAT, BIOS 2.0 04/21/2014
[ 158.142323] task: ffff880254267050 ti: ffff880095afc000 task.ti: ffff880095afc000
[ 158.142376] RIP: 0010:[<ffffffffa024cc1f>] [<ffffffffa024cc1f>] handle_stripe+0xdc0/0x1e1f [raid456]
[ 158.142493] RSP: 0018:ffff880095affc18 EFLAGS: 00010202
[ 158.142554] RAX: 000000000000000d RBX: ffff880095cfac00 RCX: 0000000000000002
[ 158.142617] RDX: 000000000000000d RSI: 0000000000000000 RDI: 0000000000001040
[ 158.142682] RBP: ffff880095affcf8 R08: 0000000000000003 R09: 00000000cd920408
[ 158.142745] R10: 000000000000000d R11: 0000000000000007 R12: 000000000000000d
[ 158.142809] R13: 0000000000000000 R14: 000000000000000c R15: ffff8802161f2588
[ 158.142873] FS: 0000000000000000(0000) GS:ffff88025ea00000(0000) knlGS:0000000000000000
[ 158.142938] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 158.143000] CR2: 0000000000000120 CR3: 0000000253ef4000 CR4: 00000000001406f0
[ 158.143062] Stack:
[ 158.143117] 0000000000000000 ffff880254267050 00000000000147c0 0000000000000000
[ 158.143328] ffff8802161f25d0 0000000effffffff ffff8802161f3670 ffff8802161f2ef0
[ 158.143537] 0000000000000000 0000000000000000 0000000000000000 0000000c00000000
[ 158.143747] Call Trace:
[ 158.143805] [<ffffffffa024dea3>] handle_active_stripes.isra.37+0x225/0x2aa [raid456]
[ 158.143873] [<ffffffffa024e31d>] raid5d+0x363/0x40d [raid456]
[ 158.143937] [<ffffffff814315bc>] ? schedule+0x6f/0x7e
[ 158.143998] [<ffffffff81372ae7>] md_thread+0x125/0x13b
[ 158.144060] [<ffffffff81061b00>] ? wait_woken+0x71/0x71
[ 158.144122] [<ffffffff813729c2>] ? md_start_sync+0xda/0xda
[ 158.144185] [<ffffffff81050609>] kthread+0xcd/0xd5
[ 158.144244] [<ffffffff8105053c>] ? kthread_create_on_node+0x16d/0x16d
[ 158.144309] [<ffffffff81434f92>] ret_from_fork+0x42/0x70
[ 158.144370] [<ffffffff8105053c>] ? kthread_create_on_node+0x16d/0x16d
[ 158.144432] Code: 8c 0f d0 01 00 00 48 8b 49 10 80 e1 10 74 0d 49 8b 4f 48 80 e1 40 0f 84 c2 0f 00 00 31 c9 41 39 c8 7e 31 48 8b b4 cd 50 ff ff ff <48> 83 be 20 01 00 00 00 74 1a 48 8b be 38 01 00 00 40 80 e7 01
[ 158.147700] RIP [<ffffffffa024cc1f>] handle_stripe+0xdc0/0x1e1f [raid456]
[ 158.147801] RSP <ffff880095affc18>
[ 158.147859] CR2: 0000000000000120
[ 158.147916] ---[ end trace 536b72bd7c91f068 ]---
|
Things that I have tried:
Disable queuing on all drives
Disable write cache on all drives
Build minimal kernel which doesn't contain any sata drivers for any controller other than what I'm using.
The drives are connected to two LSI PCI-Express SAS Controllers. These controllers don't support hardware raid, setup as JBOD.
Any Idea's? I can obviously change the faulty disk to stop this from happening, but I don't want to do that until this is fixed, because if a drive fails in the future, and I don't notice, I could have corrupt files.
My /proc/mdstat:
Code: | Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear] [multipath]
md2 : active raid1 sdk2[0] sdl2[1]
16760832 blocks super 1.2 [2/2] [UU]
md4 : active raid6 sdc1[0] sdp1[13] sdo1[12] sdn1[11] sdm1[10] sdj1[9] sdb1[8] sdg1[15] sdi1[6] sdh1[5] sda1[14] sdf1[3] sde1[2] sdd1[1]
23440588800 blocks super 1.2 level 6, 512k chunk, algorithm 2 [14/14] [UUUUUUUUUUUUUU]
bitmap: 2/15 pages [8KB], 65536KB chunk
md1 : active raid1 sdk1[0] sdl1[1]
1048512 blocks [2/2] [UU]
md3 : active raid1 sdk3[0] sdl3[1]
1935556672 blocks super 1.2 [2/2] [UU]
bitmap: 2/15 pages [8KB], 65536KB chunk
unused devices: <none> |
My mdadm --detail /dev/md4:
Code: | /dev/md4:
Version : 1.2
Creation Time : Thu May 21 09:36:16 2015
Raid Level : raid6
Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
Used Dev Size : 1953382400 (1862.89 GiB 2000.26 GB)
Raid Devices : 14
Total Devices : 14
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Fri Nov 6 11:44:14 2015
State : clean
Active Devices : 14
Working Devices : 14
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : livecd:4
UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
Events : 4122
Number Major Minor RaidDevice State
0 8 33 0 active sync /dev/sdc1
1 8 49 1 active sync /dev/sdd1
2 8 65 2 active sync /dev/sde1
3 8 81 3 active sync /dev/sdf1
14 8 1 4 active sync /dev/sda1
5 8 113 5 active sync /dev/sdh1
6 8 129 6 active sync /dev/sdi1
15 8 97 7 active sync /dev/sdg1
8 8 17 8 active sync /dev/sdb1
9 8 145 9 active sync /dev/sdj1
10 8 193 10 active sync /dev/sdm1
11 8 209 11 active sync /dev/sdn1
12 8 225 12 active sync /dev/sdo1
13 8 241 13 active sync /dev/sdp1 |
Thanks _________________ OSST - Formally: The Linux Mirror Project
OSST - Open Source Software Downloads - Torrents for over 80 Distributions |
|
Back to top |
|
|
frostschutz Advocate
Joined: 22 Feb 2005 Posts: 2977 Location: Germany
|
Posted: Fri Nov 06, 2015 12:01 pm Post subject: |
|
|
can you post mdadm --detail for /dev/md* and mdadm --examine for /dev/sd* and tune2fs -l /dev/md4?
Your issue is strange because it actually reports as I/O error on md4. With a bad disk it should report I/O error on /dev/sdx instead. It's a raid with double redundancy so a bad disk should not cause I/O errors on the md device until you have triple failure.
So it may be your issue is something different after all, such as a filesystem that believes itself to be larger than the device its on, or some other structural / logical problem rather than a hardware one.
The kernel panic you should probably take to the raid mailing list (try the latest stable kernel first, in case it was fixed somewhere already) |
|
Back to top |
|
|
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 89
|
Posted: Fri Nov 06, 2015 12:38 pm Post subject: |
|
|
Thanks for the reply.
If I re-enable ncq on the discs then the errors in the log do report at the /dev/sdb for example, but since I set the queue_depth to 1, it reports the raid device.
Here is all the info you requested:
mdadm --detail for /dev/md*
Code: | /dev/md1:
Version : 0.90
Creation Time : Fri May 22 18:38:44 2015
Raid Level : raid1
Array Size : 1048512 (1024.11 MiB 1073.68 MB)
Used Dev Size : 1048512 (1024.11 MiB 1073.68 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Fri Nov 6 12:30:49 2015
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : 3021e831:c6f0b96b:cb201669:f728008a
Events : 0.24
Number Major Minor RaidDevice State
0 8 161 0 active sync /dev/sdk1
1 8 177 1 active sync /dev/sdl1
/dev/md2:
Version : 1.2
Creation Time : Fri May 22 18:39:20 2015
Raid Level : raid1
Array Size : 16760832 (15.98 GiB 17.16 GB)
Used Dev Size : 16760832 (15.98 GiB 17.16 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Fri Oct 30 11:21:19 2015
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Name : livecd:2
UUID : c841b565:9ce84038:33926cee:e78f907a
Events : 17
Number Major Minor RaidDevice State
0 8 162 0 active sync /dev/sdk2
1 8 178 1 active sync /dev/sdl2
/dev/md3:
Version : 1.2
Creation Time : Fri May 22 18:41:13 2015
Raid Level : raid1
Array Size : 1935556672 (1845.89 GiB 1982.01 GB)
Used Dev Size : 1935556672 (1845.89 GiB 1982.01 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Fri Nov 6 12:33:29 2015
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Name : livecd:3
UUID : cd185b80:08a5a8bf:fb3016b7:45891977
Events : 5592
Number Major Minor RaidDevice State
0 8 163 0 active sync /dev/sdk3
1 8 179 1 active sync /dev/sdl3
/dev/md4:
Version : 1.2
Creation Time : Thu May 21 09:36:16 2015
Raid Level : raid6
Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
Used Dev Size : 1953382400 (1862.89 GiB 2000.26 GB)
Raid Devices : 14
Total Devices : 14
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Fri Nov 6 12:30:52 2015
State : clean
Active Devices : 14
Working Devices : 14
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : livecd:4
UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
Events : 4128
Number Major Minor RaidDevice State
0 8 33 0 active sync /dev/sdc1
1 8 49 1 active sync /dev/sdd1
2 8 65 2 active sync /dev/sde1
3 8 81 3 active sync /dev/sdf1
14 8 1 4 active sync /dev/sda1
5 8 113 5 active sync /dev/sdh1
6 8 129 6 active sync /dev/sdi1
15 8 97 7 active sync /dev/sdg1
8 8 17 8 active sync /dev/sdb1
9 8 145 9 active sync /dev/sdj1
10 8 193 10 active sync /dev/sdm1
11 8 209 11 active sync /dev/sdn1
12 8 225 12 active sync /dev/sdo1
13 8 241 13 active sync /dev/sdp1 |
mdadm --examine for /dev/sd*
Code: | /dev/sda:
MBR Magic : aa55
Partition[0] : 3907027120 sectors at 2048 (type fd)
/dev/sda1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x9
Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
Name : livecd:4
Creation Time : Thu May 21 09:36:16 2015
Raid Level : raid6
Raid Devices : 14
Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=176 sectors
State : clean
Device UUID : 7e11b910:f5624a24:38ed2418:7e309fd0
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Nov 6 12:30:52 2015
Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.
Checksum : 818939e0 - correct
Events : 4128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 4
Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdb:
MBR Magic : aa55
Partition[0] : 3907027120 sectors at 2048 (type fd)
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
Name : livecd:4
Creation Time : Thu May 21 09:36:16 2015
Raid Level : raid6
Raid Devices : 14
Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=176 sectors
State : clean
Device UUID : 23903a8f:b96bfb6e:04f35623:0c35676e
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Nov 6 12:30:52 2015
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : f3b5cb95 - correct
Events : 4128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 8
Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc:
MBR Magic : aa55
Partition[0] : 3907027120 sectors at 2048 (type fd)
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
Name : livecd:4
Creation Time : Thu May 21 09:36:16 2015
Raid Level : raid6
Raid Devices : 14
Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=176 sectors
State : clean
Device UUID : 10911088:dadaf2c5:19a09b0a:91d51505
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Nov 6 12:30:52 2015
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : c1768935 - correct
Events : 4128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd:
MBR Magic : aa55
Partition[0] : 3907027120 sectors at 2048 (type fd)
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
Name : livecd:4
Creation Time : Thu May 21 09:36:16 2015
Raid Level : raid6
Raid Devices : 14
Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=176 sectors
State : clean
Device UUID : aa6811d5:10d5679f:0c559636:ffceb688
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Nov 6 12:30:52 2015
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 97880968 - correct
Events : 4128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde:
MBR Magic : aa55
Partition[0] : 3907027120 sectors at 2048 (type fd)
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
Name : livecd:4
Creation Time : Thu May 21 09:36:16 2015
Raid Level : raid6
Raid Devices : 14
Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=176 sectors
State : clean
Device UUID : dd21b987:7f344fee:05ba94e7:2e5e82c9
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Nov 6 12:30:52 2015
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 8ae11634 - correct
Events : 4128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf:
MBR Magic : aa55
Partition[0] : 3907027120 sectors at 2048 (type fd)
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
Name : livecd:4
Creation Time : Thu May 21 09:36:16 2015
Raid Level : raid6
Raid Devices : 14
Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=176 sectors
State : clean
Device UUID : 81fc58c1:bd831960:ffbbc225:efff592c
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Nov 6 12:30:52 2015
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : d750e3d0 - correct
Events : 4128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg:
MBR Magic : aa55
Partition[0] : 3907027120 sectors at 2048 (type fd)
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x9
Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
Name : livecd:4
Creation Time : Thu May 21 09:36:16 2015
Raid Level : raid6
Raid Devices : 14
Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=176 sectors
State : clean
Device UUID : 60cdfa5c:246ba2a4:5368f531:b10580ac
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Nov 6 12:30:52 2015
Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.
Checksum : 43d44e41 - correct
Events : 4128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 7
Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdh:
MBR Magic : aa55
Partition[0] : 3907027120 sectors at 2048 (type fd)
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
Name : livecd:4
Creation Time : Thu May 21 09:36:16 2015
Raid Level : raid6
Raid Devices : 14
Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=176 sectors
State : clean
Device UUID : 38403e44:8bf2a98f:cb3d98b7:10969838
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Nov 6 12:30:52 2015
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 27daae45 - correct
Events : 4128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdi:
MBR Magic : aa55
Partition[0] : 3907027120 sectors at 2048 (type fd)
/dev/sdi1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x9
Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
Name : livecd:4
Creation Time : Thu May 21 09:36:16 2015
Raid Level : raid6
Raid Devices : 14
Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=176 sectors
State : clean
Device UUID : e8953848:8a01645f:de181342:376666ba
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Nov 6 12:30:52 2015
Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.
Checksum : 7d7be37 - correct
Events : 4128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 6
Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdj:
MBR Magic : aa55
Partition[0] : 3907027120 sectors at 2048 (type fd)
/dev/sdj1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
Name : livecd:4
Creation Time : Thu May 21 09:36:16 2015
Raid Level : raid6
Raid Devices : 14
Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=176 sectors
State : clean
Device UUID : c1f36d7f:aa57e669:d7597f75:07b62e66
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Nov 6 12:30:52 2015
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 28c402f4 - correct
Events : 4128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 9
Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdk:
MBR Magic : aa55
Partition[0] : 2097152 sectors at 2048 (type fd)
Partition[1] : 33554432 sectors at 2099200 (type fd)
Partition[2] : 3871375536 sectors at 35653632 (type fd)
/dev/sdk1:
Magic : a92b4efc
Version : 0.90.00
UUID : 3021e831:c6f0b96b:cb201669:f728008a
Creation Time : Fri May 22 18:38:44 2015
Raid Level : raid1
Used Dev Size : 1048512 (1024.11 MiB 1073.68 MB)
Array Size : 1048512 (1024.11 MiB 1073.68 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Update Time : Fri Nov 6 12:30:49 2015
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : e321120 - correct
Events : 24
Number Major Minor RaidDevice State
this 0 8 161 0 active sync /dev/sdk1
0 0 8 161 0 active sync /dev/sdk1
1 1 8 177 1 active sync /dev/sdl1
/dev/sdk2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : c841b565:9ce84038:33926cee:e78f907a
Name : livecd:2
Creation Time : Fri May 22 18:39:20 2015
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 33521664 (15.98 GiB 17.16 GB)
Array Size : 16760832 (15.98 GiB 17.16 GB)
Data Offset : 32768 sectors
Super Offset : 8 sectors
Unused Space : before=32680 sectors, after=0 sectors
State : clean
Device UUID : a28b0b55:8a027224:ba4b1ca0:b84661dc
Update Time : Fri Oct 30 11:21:19 2015
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : ed26c709 - correct
Events : 17
Device Role : Active device 0
Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdk3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : cd185b80:08a5a8bf:fb3016b7:45891977
Name : livecd:3
Creation Time : Fri May 22 18:41:13 2015
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 3871113392 (1845.89 GiB 1982.01 GB)
Array Size : 1935556672 (1845.89 GiB 1982.01 GB)
Used Dev Size : 3871113344 (1845.89 GiB 1982.01 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=48 sectors
State : clean
Device UUID : 41b24115:a618293c:e4f20ee0:2af72266
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Nov 6 12:34:31 2015
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : c08f6a12 - correct
Events : 5592
Device Role : Active device 0
Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdl:
MBR Magic : aa55
Partition[0] : 2097152 sectors at 2048 (type fd)
Partition[1] : 33554432 sectors at 2099200 (type fd)
Partition[2] : 3871375536 sectors at 35653632 (type fd)
/dev/sdl1:
Magic : a92b4efc
Version : 0.90.00
UUID : 3021e831:c6f0b96b:cb201669:f728008a
Creation Time : Fri May 22 18:38:44 2015
Raid Level : raid1
Used Dev Size : 1048512 (1024.11 MiB 1073.68 MB)
Array Size : 1048512 (1024.11 MiB 1073.68 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Update Time : Fri Nov 6 12:30:49 2015
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : e321132 - correct
Events : 24
Number Major Minor RaidDevice State
this 1 8 177 1 active sync /dev/sdl1
0 0 8 161 0 active sync /dev/sdk1
1 1 8 177 1 active sync /dev/sdl1
/dev/sdl2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : c841b565:9ce84038:33926cee:e78f907a
Name : livecd:2
Creation Time : Fri May 22 18:39:20 2015
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 33521664 (15.98 GiB 17.16 GB)
Array Size : 16760832 (15.98 GiB 17.16 GB)
Data Offset : 32768 sectors
Super Offset : 8 sectors
Unused Space : before=32680 sectors, after=0 sectors
State : clean
Device UUID : 6558b051:61dbd3fa:296798ee:2e82dcf0
Update Time : Fri Oct 30 11:21:19 2015
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 2324c38b - correct
Events : 17
Device Role : Active device 1
Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdl3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : cd185b80:08a5a8bf:fb3016b7:45891977
Name : livecd:3
Creation Time : Fri May 22 18:41:13 2015
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 3871113392 (1845.89 GiB 1982.01 GB)
Array Size : 1935556672 (1845.89 GiB 1982.01 GB)
Used Dev Size : 3871113344 (1845.89 GiB 1982.01 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=48 sectors
State : clean
Device UUID : 93c52ef1:1fc77f86:a37016c3:8bbe6b63
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Nov 6 12:34:31 2015
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : c72370ff - correct
Events : 5592
Device Role : Active device 1
Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdm:
MBR Magic : aa55
Partition[0] : 3907027120 sectors at 2048 (type fd)
/dev/sdm1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
Name : livecd:4
Creation Time : Thu May 21 09:36:16 2015
Raid Level : raid6
Raid Devices : 14
Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=176 sectors
State : clean
Device UUID : f6fe2e35:6d4fdccf:bde20ad0:a21b7f9c
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Nov 6 12:30:52 2015
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : d556f46e - correct
Events : 4128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 10
Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdn:
MBR Magic : aa55
Partition[0] : 3907027120 sectors at 2048 (type fd)
/dev/sdn1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
Name : livecd:4
Creation Time : Thu May 21 09:36:16 2015
Raid Level : raid6
Raid Devices : 14
Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=176 sectors
State : clean
Device UUID : 4d95468f:26b94d0c:9fc8db13:7ab51494
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Nov 6 12:30:52 2015
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : a7467438 - correct
Events : 4128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 11
Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdo:
MBR Magic : aa55
Partition[0] : 3907027120 sectors at 2048 (type fd)
/dev/sdo1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
Name : livecd:4
Creation Time : Thu May 21 09:36:16 2015
Raid Level : raid6
Raid Devices : 14
Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=176 sectors
State : clean
Device UUID : d3f2f2a7:ccb804fa:15b8dce3:25928566
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Nov 6 12:30:52 2015
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 501b9d88 - correct
Events : 4128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 12
Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdp:
MBR Magic : aa55
Partition[0] : 3907027120 sectors at 2048 (type fd)
/dev/sdp1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x9
Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
Name : livecd:4
Creation Time : Thu May 21 09:36:16 2015
Raid Level : raid6
Raid Devices : 14
Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=176 sectors
State : clean
Device UUID : c891d33f:47ac354c:ad47f2ea:832e7cd1
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Nov 6 12:30:52 2015
Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.
Checksum : ac39644e - correct
Events : 4128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 13
Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing) |
tune2fs -l /dev/md4
Code: | tune2fs 1.42.13 (17-May-2015)
Filesystem volume name: <none>
Last mounted on: /mnt/DataArray
Filesystem UUID: 68d335b1-4d92-4945-ab5b-e7416f346468
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 366260224
Block count: 5860147200
Reserved block count: 586014
Free blocks: 606915714
Free inodes: 360358423
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 2048
Inode blocks per group: 128
RAID stride: 128
RAID stripe width: 1536
Flex block group size: 16
Filesystem created: Fri May 22 22:18:26 2015
Last mount time: Fri Nov 6 12:30:52 2015
Last write time: Fri Nov 6 12:30:52 2015
Mount count: 18
Maximum mount count: -1
Last checked: Fri Jul 17 10:54:53 2015
Check interval: 0 (<none>)
Lifetime writes: 88 TB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 9a96b45d-93ee-4faf-b081-74a7ebe2b0b4
Journal backup: inode blocks |
Thanks for the help _________________ OSST - Formally: The Linux Mirror Project
OSST - Open Source Software Downloads - Torrents for over 80 Distributions |
|
Back to top |
|
|
frostschutz Advocate
Joined: 22 Feb 2005 Posts: 2977 Location: Germany
|
Posted: Fri Nov 06, 2015 12:47 pm Post subject: |
|
|
Maybe your issue has something to do with the bad block log, which is a relatively new feature in MD. A drive might get a bad block recorded in this log instead of being kicked from the array.
But /dev/sda1, /dev/sdg1, /dev/sdi1, /dev/sdp1 all claim to have "bad blocks present" and that probably shouldn't be, it shouldn't affect this many disks.
Do the disks all pass a 'smartctl -t long' self-test?
Quote: |
man md
BAD BLOCK LIST
When a block cannot be read and cannot be repaired by writing data
recovered from other devices, the address of the block is stored in the
bad block list. Similarly if an attempt to write a block fails, the
address will be recorded as a bad block. If attempting to record the
bad block fails, the whole device will be marked faulty.
|
Maybe that's your issue, the md itself has bad blocks, hence the I/O errors on md.
Please also do mdadm --examine-badblocks for all
But none of this explains the kernel panic you're getting, so you should still take it to the RAID mailing list, so one of the developers can take a peek at it. |
|
Back to top |
|
|
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 89
|
Posted: Fri Nov 06, 2015 12:59 pm Post subject: |
|
|
I did notice while pasting the logs that I had badblocks on multiple discs. But surely the raid array should take the approach of degrading if a write fails. I assume that if a write fails and it records it in the badblock list, that it will use another part of the disk to write that data?
It's also strange that on 3.18.12 I have I/O errors and on 4.0.8 / 4.0.12 I get a kernel oops, as if the condition is being handled differently.
I will post this to the kernel raid mailing list as well. _________________ OSST - Formally: The Linux Mirror Project
OSST - Open Source Software Downloads - Torrents for over 80 Distributions |
|
Back to top |
|
|
frostschutz Advocate
Joined: 22 Feb 2005 Posts: 2977 Location: Germany
|
Posted: Fri Nov 06, 2015 1:11 pm Post subject: |
|
|
Quote: | But surely the raid array should take the approach of degrading if a write fails. |
It depends. Failing 8TB worth of disk for a single bad sector may not always be what you want. If you have a single bad sector on 3 different disks, but the sectors are in different places for each disk, you can still use them for rebuilding with intact disks. As long as you're smart enough to actually replace disks that have bad sectors, your RAID survives where with/out bad block log, it would have failed already.
My own RAID setup is a bit older, from before the bad block log, but I took the same approach after a fashion. I use a split RAID. Instead of making one big terabyte array, I use smaller partitions (250G each, so 4 partitions per terabyte per disk), and build an independent array for each set of partitions, which are later joined back together using LVM. That way my RAID too survives multiple single bad sectors on different disks as long as they're 250G apart. Since a single bad sector only degrades the 250G partition it was in and not the whole disk.
The way I understood it the bad block log implements my split RAID idea on an actual block-level resolution.
But I don't have personal experience with that system yet, my disks refuse to die on me. |
|
Back to top |
|
|
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 89
|
Posted: Fri Nov 06, 2015 2:11 pm Post subject: |
|
|
Makes sense.
I have posted my issues to the linux-raid kernel mailing list, and linked them to this thread for more information.
I have issued the smartctl tests on all drives. That is going to take 4 hours so I will come back with the results from them later. I am expecting some of the drives to fail, in which case I would have thought that linux raid would degrade the array, unless obviously the badblocks list can work around the errors - but if that was the case I should have corrupt files / kernel oops.
Thanks for all your help.
Matt _________________ OSST - Formally: The Linux Mirror Project
OSST - Open Source Software Downloads - Torrents for over 80 Distributions |
|
Back to top |
|
|
frostschutz Advocate
Joined: 22 Feb 2005 Posts: 2977 Location: Germany
|
Posted: Fri Nov 06, 2015 3:05 pm Post subject: |
|
|
and what does the --examine-badblocks look like?
If my theory was right it should show the same blocks bad on 3 disks and that block should translate to the sector ext4 was complaining about.
Since it's incredibly unlikely for same block to go back on three disks, maybe a controller issue that triggered it. you'd have to check your logs for old messages if you have them |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54239 Location: 56N 3W
|
Posted: Fri Nov 06, 2015 8:13 pm Post subject: |
|
|
matt2kjones,
When you get a write fail on a single drive in a raid set the drive will attempt to reallocate the failed sector.
This is a internal to the drive thing. The kernel is not involved.
Similarly with a read fail. The drive will want to reallocate the sector but can't because it can't read it.
Events like this are recorded in the drives internal SMART log. Take a look with smartmontools.
Drive level errors look like
Code: | [415787.257222] ata1.00: exception Emask 0x0 SAct 0xfff000 SErr 0x0 action 0x0
[415787.257229] ata1.00: irq_stat 0x40000008
[415787.257243] ata1.00: cmd 60/08:60:08:d4:f4/00:00:bd:00:00/40 tag 12 ncq 4096 in
[415787.257246] res 41/40:00:08:d4:f4/00:00:bd:00:00/40 Emask 0x409 (media error) <F>
[415787.267041] ata1.00: configured for UDMA/133
[415787.267075] ata1: EH complete | in dmesg.
What do your SMART logs look like?
In particular, these parameters
Code: | ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 1
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1 |
The Reallocated_Sector_Ct being not zero is not a cause for concern. That's how drives hide bad sectors from the operating system.
Current_Pending_Sectors are a bad thing. That's a count of the blocks the drive had tried to read and can't.
On a single drive filesystem, that data is probably lost. On a raid set, it can be reconstructed from the redundant data.
That tells me I need to run a repair on that raid set nowish or even sooner. That should force that pending sector to be reconstructed from the other members of the set.
Your problems appear to be related to the filesystem on the raid set itself, rather than the individual members of the set. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
frostschutz Advocate
Joined: 22 Feb 2005 Posts: 2977 Location: Germany
|
Posted: Sun Nov 08, 2015 2:04 am Post subject: |
|
|
matt2kjones wrote: | I have posted my issues to the linux-raid kernel mailing list, and linked them to this thread for more information. |
Do you have a link for your mail in the mailinglist archives? I can't find it... |
|
Back to top |
|
|
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 89
|
Posted: Mon Nov 09, 2015 9:06 am Post subject: |
|
|
Yeah I'm not sure whats happening.
I subscribed to the linux-raid mailing list, and everything went fine, I am now receiving mails sent to that list.
I posted a message to the list and I get no response back, however if I send a command like "help" to the list, I do get a reply, so not sure why my message isn't being posted to the list. _________________ OSST - Formally: The Linux Mirror Project
OSST - Open Source Software Downloads - Torrents for over 80 Distributions |
|
Back to top |
|
|
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 89
|
Posted: Mon Nov 09, 2015 12:02 pm Post subject: |
|
|
OK, I have managed to post to the kernel mailing list using a different email address.
I have the output from mdadm --examine-badblocks. I am only listing drives here which have anything in the list:
/dev/sda1:
Code: | Bad-blocks on /dev/sda1:
1938038928 for 512 sectors
1938039440 for 512 sectors
1938977144 for 512 sectors
1938977656 for 512 sectors
3303750816 for 512 sectors
3303751328 for 512 sectors
3313648904 for 512 sectors
3313649416 for 512 sectors
3313651976 for 512 sectors
3313652488 for 512 sectors
3418023432 for 512 sectors
3418023944 for 512 sectors
3418024456 for 512 sectors
3418024968 for 512 sectors
3418037768 for 512 sectors
3418038280 for 512 sectors
3418038792 for 512 sectors
3418039304 for 512 sectors
3418112520 for 512 sectors
3418113032 for 512 sectors
3418113544 for 512 sectors
3418114056 for 512 sectors
3418114568 for 512 sectors
3418115080 for 512 sectors
3418124808 for 512 sectors
3418125320 for 512 sectors
3418165768 for 512 sectors
3418166280 for 512 sectors
3418187272 for 512 sectors
3418187784 for 512 sectors
3418213224 for 512 sectors
3418213736 for 512 sectors
3418214248 for 512 sectors
3418214760 for 512 sectors
3418215272 for 512 sectors
3418215784 for 512 sectors
3420607528 for 512 sectors
3420608040 for 512 sectors
3420626984 for 512 sectors
3420627496 for 512 sectors
3448897824 for 512 sectors
3448898336 for 512 sectors
3458897888 for 512 sectors
3458898400 for 512 sectors
3519403992 for 512 sectors
3519404504 for 512 sectors
3617207456 for 512 sectors
3617207968 for 512 sectors
|
/dev/sdg1:
Code: | Bad-blocks on /dev/sdg1:
1938038928 for 512 sectors
1938039440 for 512 sectors
1938977144 for 512 sectors
1938977656 for 512 sectors
3303750816 for 512 sectors
3303751328 for 512 sectors
3313648904 for 512 sectors
3313649416 for 512 sectors
3313651976 for 512 sectors
3313652488 for 512 sectors
3418023432 for 512 sectors
3418023944 for 512 sectors
3418024456 for 512 sectors
3418024968 for 512 sectors
3418037768 for 512 sectors
3418038280 for 512 sectors
3418038792 for 512 sectors
3418039304 for 512 sectors
3418112520 for 512 sectors
3418113032 for 512 sectors
3418113544 for 512 sectors
3418114056 for 512 sectors
3418114568 for 512 sectors
3418115080 for 512 sectors
3418124808 for 512 sectors
3418125320 for 512 sectors
3418165768 for 512 sectors
3418166280 for 512 sectors
3418187272 for 512 sectors
3418187784 for 512 sectors
3418213224 for 512 sectors
3418213736 for 512 sectors
3418214248 for 512 sectors
3418214760 for 512 sectors
3418215272 for 512 sectors
3418215784 for 512 sectors
3420607528 for 512 sectors
3420608040 for 512 sectors
3420626984 for 512 sectors
3420627496 for 512 sectors
3448897824 for 512 sectors
3448898336 for 512 sectors
3458897888 for 512 sectors
3458898400 for 512 sectors
3519403992 for 512 sectors
3519404504 for 512 sectors
3617207456 for 512 sectors
3617207968 for 512 sectors
|
/dev/sdi1:
Code: | Bad-blocks on /dev/sdi1:
1938977144 for 512 sectors
1938977656 for 512 sectors |
/dev/sdp1:
Code: | Bad-blocks on /dev/sdp1:
1938038928 for 512 sectors
1938039440 for 512 sectors
3303750816 for 512 sectors
3303751328 for 512 sectors
3313648904 for 512 sectors
3313649416 for 512 sectors
3313651976 for 512 sectors
3313652488 for 512 sectors
3418023432 for 512 sectors
3418023944 for 512 sectors
3418024456 for 512 sectors
3418024968 for 512 sectors
3418037768 for 512 sectors
3418038280 for 512 sectors
3418038792 for 512 sectors
3418039304 for 512 sectors
3418112520 for 512 sectors
3418113032 for 512 sectors
3418113544 for 512 sectors
3418114056 for 512 sectors
3418114568 for 512 sectors
3418115080 for 512 sectors
3418124808 for 512 sectors
3418125320 for 512 sectors
3418165768 for 512 sectors
3418166280 for 512 sectors
3418187272 for 512 sectors
3418187784 for 512 sectors
3418213224 for 512 sectors
3418213736 for 512 sectors
3418214248 for 512 sectors
3418214760 for 512 sectors
3418215272 for 512 sectors
3418215784 for 512 sectors
3420607528 for 512 sectors
3420608040 for 512 sectors
3420626984 for 512 sectors
3420627496 for 512 sectors
3448897824 for 512 sectors
3448898336 for 512 sectors
3458897888 for 512 sectors
3458898400 for 512 sectors
3519403992 for 512 sectors
3519404504 for 512 sectors
3617207456 for 512 sectors
3617207968 for 512 sectors
|
It seems very odd that I have 3 drives with badblocks at the same locations? Something looks very wrong there :/
I have also unmounted the filesystem and done as fsck on this array and nothing is wrong.
As for the extended tests with smartmontools, 1 drive out of the set had a "read error" 60%, all other discs passed. _________________ OSST - Formally: The Linux Mirror Project
OSST - Open Source Software Downloads - Torrents for over 80 Distributions |
|
Back to top |
|
|
frostschutz Advocate
Joined: 22 Feb 2005 Posts: 2977 Location: Germany
|
Posted: Mon Nov 09, 2015 12:21 pm Post subject: |
|
|
Yup, I'm not sure if that's how bad blocks are supposed to work. In your case it seems to have resulted in a "raid that never fails" which is not particularly useful if that in turn leaves the filesystem or system to deal with the mess or even crash...
I wish the bad blocks feature would be more exposed, say in /proc/mdstat instead of showing [UUU] or [U_U] it could do something like [BBB] for disks with known bad blocks, and mdadm monitor should send you mails about it if it doesn't already.
RAID survival depends on detecting errors early, and replacing disks immediately; if the bad blocks is designed to hide errors from you then it would be better to go without this feature. (even though it is a nice idea depending on the implementation as I mentioned earlier in this thread)
Quote: | It seems very odd that I have 3 drives with badblocks at the same locations? Something looks very wrong there :/ |
Kernel panic aside, it explains why the read errors show on /dev/mdX rather than a specific /dev/sdX.
As for how those bad blocks came to be you'd have to check your logs if you have them, maybe some controller jitter...
I'm not sure what the mailing list will recommend; I would probably attempt recovery by clearing bad block log on the disks that passed the SMART long selftest and then replace the drive that failed.
The problem is you don't see when those sectors were added to the log, if it was a controller fluke then it probably all happened at the same time, but ... if one disk had the bad block log earlier than the others then that disk would be less likely to have good data in those sectors than the others so ... you should only clear the ones that have good data, or simply try several combinations (once you've determined what is stored in those locations using filefrag). |
|
Back to top |
|
|
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 89
|
Posted: Mon Nov 09, 2015 1:04 pm Post subject: |
|
|
Thanks again for the reply.
I think I read somewhere that if you use metadata version 0.9, the badblock functionality isn't enabled.
This array is split over two controllers (two 8port sas cards), and one of those drives with badblocks is on a different controller to the other, so I don't think it would be controller error, unless there was a power glitch or something, which could be possible, although the array and server are attached to a ups.
I can actually destroy this array. This server contains backups of our live, master server which uses hardware raid10 with many more discs. So I can easily destroy this array and re-create it with good discs and see if the problem goes away. The main reason I am looking to resolve it without destroying the data is so that I can understand why it's happened, and how to get around it in the future if it happens again.
I will probably go down the route you suggest and clear the badblock logs for all the drives and replace the known faulty drive (we have lots of unopened spares here).
One question, if you don't mind? If a drive has a write error, then the block is added to the badblocks list and if the write to the badblocks list fails, then the drive is set as faulty, I understand that. But what happens if the drive sucessfully writes the badblock to the badblock list? Do I only have one copy of that data elsewhere on the array? What happens if I have drive A with badblocks then Drive B and C fail. Theoretically I can recover the array, but I assume that the data in those badblocks would be lost. _________________ OSST - Formally: The Linux Mirror Project
OSST - Open Source Software Downloads - Torrents for over 80 Distributions |
|
Back to top |
|
|
frostschutz Advocate
Joined: 22 Feb 2005 Posts: 2977 Location: Germany
|
Posted: Mon Nov 09, 2015 1:16 pm Post subject: |
|
|
matt2kjones wrote: | I think I read somewhere that if you use metadata version 0.9, the badblock functionality isn't enabled. |
Don't use 0.90 metadata for anything.
badblock list can be enabled or disabled as you like (update bbl, no-bbl, something like that)
Quote: | Do I only have one copy of that data elsewhere on the array? |
Yes, that block is no longer redundant (or in case of RAID6, less redundant than it should be).
Quote: | What happens if I have drive A with badblocks then Drive B and C fail. |
It's dead... (at least the bad blocks are gone for good in that case, if actually bad on the drives and not just md-believes-so, in which case they might just have outdated data) |
|
Back to top |
|
|
DingbatCA Guru
Joined: 07 Jul 2004 Posts: 384 Location: Portland Or
|
Posted: Tue Nov 10, 2015 11:12 pm Post subject: |
|
|
I am having what I think is the same issue. After 50+ hours of trouble shooting, I ordered in 2 new LSI controller cards. I think the problem is with the mvsas card/driver. Matt2kjones, can you give us the output of lspci? What type of drives are you using? Mine are all 3TB WD Greens.
I have 10 drives in question and they keep failing. The drive(s) gets a sector marked as "pending" bad. From there I can use hdparm --write-sector to toggle the exact sector in question. The drive says the sector is fine. I have gone as far as running SMART long tests, and secure erase. The drives always come back healthy. Every test I can run on the drive shows they are in good healthy.
Code: | root@MediaNAS:~# lspci | grep SATA
00:1f.2 IDE interface: Intel Corporation 5 Series/3400 Series Chipset 4 port SATA IDE Controller (rev 05)
00:1f.5 IDE interface: Intel Corporation 5 Series/3400 Series Chipset 2 port SATA IDE Controller (rev 05)
03:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SE6440 SAS/SATA PCIe controller (rev 02)
04:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SE6440 SAS/SATA PCIe controller (rev 02)
05:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SE6440 SAS/SATA PCIe controller (rev 02)
06:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SE6440 SAS/SATA PCIe controller (rev 02) |
Yes, in this case it is pointing to one drive. But the drive and sectors seem to move around.
Code: | [ 3815.448942] md/raid:md1: read error not correctable (sector 865360712 on sdl).
[ 3815.448949] md/raid:md1: read error not correctable (sector 865360720 on sdl).
[ 3815.448952] md/raid:md1: read error not correctable (sector 865360728 on sdl).
[ 3815.448955] md/raid:md1: read error not correctable (sector 865360736 on sdl).
[ 3815.448957] md/raid:md1: read error not correctable (sector 865360744 on sdl).
[ 3815.448960] md/raid:md1: read error not correctable (sector 865360752 on sdl).
[ 3815.448963] md/raid:md1: read error not correctable (sector 865360760 on sdl).
[ 3815.448966] md/raid:md1: read error not correctable (sector 865360768 on sdl).
[ 3815.448969] md/raid:md1: read error not correctable (sector 865360776 on sdl).
[ 3815.448971] md/raid:md1: read error not correctable (sector 865360784 on sdl). |
|
|
Back to top |
|
|
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 89
|
Posted: Wed Nov 11, 2015 8:54 am Post subject: |
|
|
Hi DingbatCA,
My issue seems to be that I have multiple drives with blocks all in the same area. If I take a drive out of the array with no badblocks, then add it back in, the badblocks from the other drives propagate to the badblocks list on the drive added back in. Not sure if this is meant to happen.
I have different cards to you:
lspci |grep SAS
01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
Also, on my system, I'm not actually getting any harddrive errors on the harddrives themselves, only on the raid array as a whole, which makes me think that no read/write errors are actually happening on any of the drives and I think the badblock list is faulty somehow _________________ OSST - Formally: The Linux Mirror Project
OSST - Open Source Software Downloads - Torrents for over 80 Distributions |
|
Back to top |
|
|
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 89
|
Posted: Wed Nov 11, 2015 9:30 am Post subject: |
|
|
OK, Changes I have done since my last post.
I have failed, removed and re-added 3 drives, one at a time.
/dev/sdp - This had the full list of badblocks above. When it re-added, it had none, when the sync completed, the badblock list was full again.
/dev/sda - Same as above
/dev/sdi - This drive only had two entries in the badblock list prior to removal - After a full sync, it had the full list... same as sdp and sda.
I have also switched to the latest mainline kernel 4.3.0
Since I have done these two things, write have been considerably faster, and I haven't had any dmesg errors yet (written over 400GB so far).
So I'm not sure whether taking the drives out of the array, and adding them back in, 1 at a time has fixed the issue, or whether the badblocks implementation is broken in earlier kernels, and it works correctly in 4.3.0
I plan to fill all the free space (about 6TB) to see if I have any write errors - If not, I assume this is fixed. _________________ OSST - Formally: The Linux Mirror Project
OSST - Open Source Software Downloads - Torrents for over 80 Distributions |
|
Back to top |
|
|
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 89
|
Posted: Wed Nov 11, 2015 12:55 pm Post subject: |
|
|
This issue seems to be resolved.
Wrote over 4TB of data to the array this morning, and finally hit an I/O error on /dev/sdd
drive was marked as faulty, and array degraded.
Code: | [77502.279233] sd 0:0:3:0: attempting task abort! scmd(ffff8801ef40b6c0)
[77502.279237] sd 0:0:3:0: [sdd] CDB: opcode=0x85 85 08 0e 00 d5 00 01 00 00 00 4f 00 c2 00 b0 00
[77502.279239] scsi target0:0:3: handle(0x000c), sas_address(0x4433221103000000), phy(3)
[77502.279240] scsi target0:0:3: enclosure_logical_id(0x500605b008924a60), slot(1)
[77502.279241] scsi target0:0:3: enclosure level(0x0000),connector name()
[77502.333188] sd 0:0:3:0: task abort: SUCCESS scmd(ffff8801ef40b6c0)
[77502.713979] blk_update_request: I/O error, dev sdd, sector 2064
[77502.713982] md: super_written gets error=-5
[77502.713985] md/raid:md4: Disk failure on sdd1, disabling device.
md/raid:md4: Operation continuing on 13 devices.
|
Seems to be working as expected now.
The only thing I can image has fixed it was removing and adding all the drives with badblocks. I'm guessing that the array was in some sort of error state maybe from a broken implementation of badblocks when using an earlier kernel.
I also upgraded to kernel 4.3.0, rather than using the latest kernel from the portage tree, so that may have something to do with it also.
Thanks to everyone that helped, especially frostschutz who replied to every post.
Cheers! _________________ OSST - Formally: The Linux Mirror Project
OSST - Open Source Software Downloads - Torrents for over 80 Distributions |
|
Back to top |
|
|
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 89
|
Posted: Thu Nov 12, 2015 11:39 am Post subject: |
|
|
Spoke to soon...
After writing about 6TB of data I have hit buffer I/O errors again:
Code: | [158219.456484] EXT4-fs warning (device md4): ext4_end_bio:329: I/O error -5 writing to inode 125274714 (offset 176160768 size 8388608 starting block 4955235712)
[158219.456487] Buffer I/O error on device md4, logical block 4955235584
[158219.456490] Buffer I/O error on device md4, logical block 4955235585
[158219.456491] Buffer I/O error on device md4, logical block 4955235586
[158219.456491] Buffer I/O error on device md4, logical block 4955235587
[158219.456492] Buffer I/O error on device md4, logical block 4955235588
[158219.456493] Buffer I/O error on device md4, logical block 4955235589
[158219.456494] Buffer I/O error on device md4, logical block 4955235590
[158219.456495] Buffer I/O error on device md4, logical block 4955235591
[158219.456496] Buffer I/O error on device md4, logical block 4955235592
[158219.456497] Buffer I/O error on device md4, logical block 4955235593
[158219.456580] EXT4-fs warning (device md4): ext4_end_bio:329: I/O error -5 writing to inode 125274714 (offset 176160768 size 8388608 starting block 4955235456)
[158219.456663] EXT4-fs warning (device md4): ext4_end_bio:329: I/O error -5 writing to inode 125274714 (offset 176160768 size 8388608 starting block 4955235200)
[158219.456747] EXT4-fs warning (device md4): ext4_end_bio:329: I/O error -5 writing to inode 125274714 (offset 176160768 size 8388608 starting block 4955234944)
[158219.456829] EXT4-fs warning (device md4): ext4_end_bio:329: I/O error -5 writing to inode 125274714 (offset 176160768 size 8388608 starting block 4955234688)
[158219.456912] EXT4-fs warning (device md4): ext4_end_bio:329: I/O error -5 writing to inode 125274714 (offset 176160768 size 8388608 starting block 4955234432)
[158469.158278] EXT4-fs warning (device md4): ext4_end_bio:329: I/O error -5 writing to inode 123995503 (offset 0 size 8388608 starting block 4970080384) |
Whats interesting though, if I remove a drive with no entries in the badblocks list, then add it back... Once it has synced, that drive will have the same badblocks list as all the others.
I now have 5 drives in the array with the same badblocks list. I am sure that if I take each drive out one by one, and add them all back in, every drive would have the same badblocks list. Should the badblocks list be replicating like this?
I can't even remove the badblocks feature, because according to the man pages, if badblocks contain any entries, it can't be removed
The next question I have, do the badblocks in the badblocks list map to locations on a physical device, or to locations within the array on the md device? If it maps to bad blocks within the array, that makes sense why it would be propagated, and would also mean those badblocks could be passed to ext4 to get to avoid using that area of the filesystem. _________________ OSST - Formally: The Linux Mirror Project
OSST - Open Source Software Downloads - Torrents for over 80 Distributions |
|
Back to top |
|
|
krinn Watchman
Joined: 02 May 2003 Posts: 7470
|
Posted: Thu Nov 12, 2015 2:50 pm Post subject: |
|
|
If you want sector2 of drive1 the same content of sector2 in drive2, 3... you duplicates their sectors content. And have no way than accepting if any of the drive have sector3 dead, all drives will have sector3 mark dead (dead sector count == total different sectors dead on all drives)
You have another way by duplicating files instead, that is more flexible (you can use compression, dead sectors count == the biggest total of sector count on all drives, files content are the same on all disks, but sectors content are not), but the complexity to handle that have a great impact on performance.
with software raid, you can duplicate logical sectors, with hw you can only duplicate hw sectors (because to know the logical sectors, you must know the partition). So software raid array can combine different partitions from different disks, while hw arrays can only be made from disks. |
|
Back to top |
|
|
frostschutz Advocate
Joined: 22 Feb 2005 Posts: 2977 Location: Germany
|
Posted: Thu Nov 12, 2015 6:13 pm Post subject: |
|
|
if the same blocks are bad on 3+ disks, data for those blocks is gone (or at least considered such by mdadm), so sync won't get you data for those blocks back.
So after syncing the synced disks don't have valid data for these blocks, thus they are bad in a way.
You might have to turn off the bad block log to get rid of this issue (and remove disks that were not previously part of the raid, as those will be guaranteed to have wrong data in those blocks).
Please note my own experience with the bbl is very limited, hence I suggested the mailing list, ...
You can enable/disable bad block log using bbl / no-bbl options on assemble update, check mdadm manpage for details. |
|
Back to top |
|
|
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 89
|
Posted: Fri Nov 13, 2015 10:56 am Post subject: |
|
|
Hi,
I was going to remove the badblocks log but according to man, if there is anything stored in the badblocks log you can't remove it, IE, you can only remove the badblocks log if there are currently no badblocks logged on that drive.
So it seems that I am stuck in an error state that I can't get out of. MDADM adds badblocks to all drives that I add or remove.. the badblocks are not passed down to the filesystem level, so I can't even get ext4 to ignore the badblocks to avoid corruption, and I can't remove the badblocks list from any harddrives.
I could fail and replace each disc with a new disc one at a time and would still have array that is unusable.
I have posted this thread and additional information to the kernel mailing list and I haven't had any replies, and as there is so little information on mdadm badblocks on the internet, im going to have to destroy the array and start fresh, rebuilding the data from the master server as I can't spend all of next week on this as well (Spent two weeks trying to get it operating so far). When I re-create the array I will leave badblocks on - I guess it got into this state due to an early broken implementation.
Thanks for all the help _________________ OSST - Formally: The Linux Mirror Project
OSST - Open Source Software Downloads - Torrents for over 80 Distributions |
|
Back to top |
|
|
frostschutz Advocate
Joined: 22 Feb 2005 Posts: 2977 Location: Germany
|
Posted: Fri Nov 13, 2015 11:05 am Post subject: |
|
|
matt2kjones wrote: | you can only remove the badblocks log if there are currently no badblocks logged on that drive. |
That sucks.
You could patch-remove that check from mdadm source code though. Or edit the metadata directly, although that also involves updating the metadata checksum.
Or re-create in-place but that's probably the most dangerous choice of all as it's so easy to get wrong, you can't rely on default values (defaults change over time) so if you do re-create you have to specify everything (metadata version, data offset, raid level, chunk size, layout, disk order, ...). And that with assume clean, and if you added any disks after blocks were already marked as bad (such as if you replaced your drive that was actually faulty) you should add that as 'missing' so it can sync in with the "original" data of those "bad" blocks just in case it's relevant for anything (and I guess it is as otherwise you'd not have hit the errors yourself). |
|
Back to top |
|
|
matt2kjones Tux's lil' helper
Joined: 03 Mar 2004 Posts: 89
|
Posted: Fri Nov 13, 2015 11:26 am Post subject: |
|
|
This server acts as a backup that we can quickly grab files off, or a server we can switch over to if our master fails, so I am in a position where I can just destroy the array and re-create it.
Would have been nice to find a way out of this situation other than starting clean though.
I was thinking of stopping the array, then using dd to write zeros to the location of the badblocks list on each drive as I'm not worried about the data in those locations, then force a check on the array, but I guess I would have run into issues doing that and seemed like a lot of work for something that probably wouldn't have worked.
Again, thanks for all the help. Really appreciate all the help you've given. _________________ OSST - Formally: The Linux Mirror Project
OSST - Open Source Software Downloads - Torrents for over 80 Distributions |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|