Dont use "make syncconfig" ... use "make oldconfig". See also:Jimini wrote:[...](I did not configure and compile every new kernel from scratch. For new kernel versions, I reuse the old config and, after a "make syncconfig", I compile the new kernel.) [...]
Code: Select all
$ grep -a3 syncconfig /usr/src/linux/scripts/kconfig/Makefile
...
#
# Note:
# syncconfig has become an internal implementation detail and is now
# deprecated for external use
Code: Select all
# CONFIG_INTEL_IOMMU is not setCode: Select all
# CONFIG_X86_INTEL_LPSS is not set
... AND one of them:
# CONFIG_LPC_ICH is not set
# CONFIG_LPC_SCH is not set
# CONFIG_MFD_INTEL_LPSS_ACPI is not set
# CONFIG_MFD_INTEL_LPSS_PCI is not setCode: Select all
# CONFIG_INTEL_IDMA64 is not setCode: Select all
CONFIG_INTEL_IOATDMA=yCode: Select all
1.
CONFIG_PREEMPT_NONE=y
2.
# CONFIG_SCHED_CORE is not set
3.
CONFIG_TRACEPOINTS=y
4.
# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
3.
CONFIG_PM_DEBUG=y
CONFIG_PM_ADVANCED_DEBUG=y
5.
# CONFIG_ACPI_PROCESSOR_AGGREGATOR is not set
6.
# CONFIG_INTEL_IDLE is not set
7.
CONFIG_IA32_EMULATION=y
8.
CONFIG_BLK_DEV_THROTTLING=y
3.
CONFIG_BLK_DEBUG_FS=y
9.
CONFIG_EXTRA_FIRMWARE=""
10.
CONFIG_I2C_ALI1535=y
CONFIG_I2C_ALI1563=y
CONFIG_I2C_ALI15X3=y
CONFIG_I2C_AMD756=y
CONFIG_I2C_AMD756_S4882=y
CONFIG_I2C_AMD8111=y
CONFIG_I2C_I801=y
CONFIG_I2C_PIIX4=y
CONFIG_I2C_NFORCE2=y
CONFIG_I2C_SIS5595=y
CONFIG_I2C_SIS630=y
CONFIG_I2C_SIS96X=y
CONFIG_I2C_VIA=y
CONFIG_I2C_VIAPRO=y
CONFIG_I2C_OCORES=y
CONFIG_I2C_SIMTEC=y
CONFIG_I2C_TAOS_EVM=y
CONFIG_I2C_TINY_USB=y
# CONFIG_PINCTRL is not set
11.
CONFIG_SECURITY_SELINUX=y
CONFIG_INTEGRITY=y
3.
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
CONFIG_BLK_DEV_IO_TRACE=yRalphred wrote:I know that feeling, if pietinger.net offered a "drop your lspci output and kernel .config here for expert feedback, only €5.99 a year" I'd subscribe
Code: Select all
top - 11:17:38 up 6 min, 2 users, load average: 2.16, 1.52, 0.69
Tasks: 180 total, 2 running, 178 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 11.9 sy, 0.0 ni, 75.0 id, 13.1 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 15373.5 total, 8416.2 free, 1284.8 used, 5847.0 buff/cache
MiB Swap: 6144.0 total, 6144.0 free, 0.0 used. 14088.7 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
124 root 20 0 0 0 0 R 100.0 0.0 4:23.74 kworker/u+
1 root 20 0 2472 1536 1536 S 0.0 0.0 0:01.00 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
[...]
2277 root 20 0 5876 2560 2432 D 0.0 0.0 0:00.00 tarHmm ... frankly, I don't know what the problem is ... if all the following doesn't help, then my last idea would be to use a different kernel version (it doesn't have to be the old 6.1. but take 6.12 - because that will soon be stable for Gentoo anyway) ... I would like to mention the following:Jimini wrote:[...] ...any ideas, what else to try?
Code: Select all
[ 10.742056] Warning: unable to open an initial console.
<=>
05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [Radeon Vega Series / Radeon Vega Mobile Series] [1002:1636] (rev da)
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [Radeon Vega Series / Radeon Vega Mobile Series] [1002:1636]
<=>
# CONFIG_DRM_AMDGPU is not setCode: Select all
[ 12.327137] EXT4-fs (sdh2): INFO: recovery required on readonly filesystem
[ 13.732730] md/raid:md0: not clean -- starting background reconstructionCode: Select all
# CONFIG_CHR_DEV_SG is not set
# CONFIG_SCSI_CONSTANTS is not setCode: Select all
# CONFIG_UDMABUF is not setCode: Select all
CONFIG_SATA_MOBILE_LPM_POLICY=0Code: Select all
[ 0.000000] Linux version 6.6.74-gentoo (root@share2) (gcc (Gentoo Hardened 14.2.1_p20241221 p7) 14.2.1 20241221, GNU ld (Gentoo 2.43 p3) 2.43.1) #3 SMP PREEMPT_DYNAMIC Sun Feb 16 10:04:59 CET 2025
[ 1.901510] Memory: 15729892K/16121688K available (16384K kernel code, 847K rwdata, 3828K rodata, 11804K init, 2124K bss, 391536K reserved, 0K cma-reserved)
[ 2.100062] smpboot: CPU0: AMD Ryzen 3 PRO 4350G with Radeon Graphics (family: 0x17, model: 0x60, stepping: 0x1)
[ 2.107025] smp: Brought up 1 node, 8 CPUs
[ 2.885892] smapi::smapi_init, ERROR invalid usSmapiID
[ 2.885959] mwave: tp3780i::tp3780I_InitializeBoardData: Error: SMAPI is not available on this machine
[ 2.886051] mwave: mwavedd::mwave_init: Error: Failed to initialize board data
[ 2.886127] mwave: mwavedd::mwave_init: Error: Failed to initialize
[ 2.886557] AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug.
[ 2.893957] megasas: 07.725.01.00-rc1
[ 2.894039] mpt3sas version 43.100.00.00 loaded
[ 2.894293] mpt2sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (15730200 kB)
[ 3.019930] mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[ 3.020039] mpt2sas_cm0: MSI-X vectors supported: 1
[ 3.020116] no of cores: 8, max_msix_vectors: -1
[ 3.020193] mpt2sas_cm0: 0 1 1
[ 3.020353] mpt2sas_cm0: High IOPs queues : disabled
[ 3.020431] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 32
[ 3.020510] mpt2sas_cm0: iomem(0x00000000fcf40000), mapped(0x000000006fcf1c8f), size(65536)
[ 3.020611] mpt2sas_cm0: ioport(0x000000000000f000), size(256)
[ 3.075080] mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[ 3.102826] mpt2sas_cm0: scatter gather: sge_in_main_msg(1), sge_per_chain(9), sge_per_io(128), chains_per_io(15)
[ 3.103033] mpt2sas_cm0: request pool(0x00000000e7d4f7f1) - dma(0x103300000): depth(3492), frame_size(128), pool_size(436 kB)
[ 3.108544] mpt2sas_cm0: sense pool(0x00000000f8b86087) - dma(0x103a80000): depth(3367), element_size(96), pool_size (315 kB)
[ 3.108721] mpt2sas_cm0: reply pool(0x0000000058ad46ef) - dma(0x103b00000): depth(3556), frame_size(128), pool_size(444 kB)
[ 3.108827] mpt2sas_cm0: config page(0x000000004fb93642) - dma(0x103a40000): size(512)
[ 3.108908] mpt2sas_cm0: Allocated physical memory: size(7579 kB)
[ 3.108979] mpt2sas_cm0: Current Controller Queue Depth(3364),Max Controller Queue Depth(3432)
[ 3.109068] mpt2sas_cm0: Scatter Gather Elements per IO(128)
[ 3.153821] mpt2sas_cm0: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03)
[ 3.153919] mpt2sas_cm0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[ 3.154131] scsi host0: Fusion MPT SAS Host
[ 3.154765] mpt2sas_cm0: sending port enable !!
[ 10.730207] pktgen: Packet Generator for packet performance testing. Version: 2.75
[ 10.730525] GACT probability NOT on
[ 10.730598] Mirror/redirect action on
01:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)
Subsystem: Dell 6Gbps SAS HBA Adapter [1028:1f1c]
Kernel driver in use: mpt3sasCode: Select all
[ 10.733824] microcode: microcode updated early to new patch_level=0x0860010d
[ 10.733930] microcode: CPU2: patch_level=0x0860010d
[ 10.733931] microcode: CPU0: patch_level=0x0860010d
...Yeah, but this should not be a problem here, since the system is headless :)pietinger wrote:1) I'm sure you're aware of this yourself:=> https://wiki.gentoo.org/wiki/User:Pieti ... s_Firmware (because you have a monolithic kernel) => https://wiki.gentoo.org/wiki/AMDGPUCode: Select all
[ 10.742056] Warning: unable to open an initial console. <=> 05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [Radeon Vega Series / Radeon Vega Mobile Series] [1002:1636] (rev da) Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [Radeon Vega Series / Radeon Vega Mobile Series] [1002:1636] <=> # CONFIG_DRM_AMDGPU is not set
Code: Select all
[ 12.327137] EXT4-fs (sdh2): INFO: recovery required on readonly filesystem
[ 13.732730] md/raid:md0: not clean -- starting background reconstructionDone - unfortunately without effect.3) Please try to enable these both:(This is not relevant for your problem but should be enabled also: # CONFIG_I2C_PIIX4 is not set)Code: Select all
# CONFIG_CHR_DEV_SG is not set # CONFIG_SCSI_CONSTANTS is not set
The system is running on bare metal.4) Do you run this kernel on bare metal or in a VM? If it is a VM then enable:Code: Select all
# CONFIG_UDMABUF is not set
Done - unfortunately without effect.5) This is a shoot in the dark ... change it from 0 to 1:Code: Select all
CONFIG_SATA_MOBILE_LPM_POLICY=0
Code: Select all
test 1 - decompressing tar.gz on SSD:
real 0m11.678s
user 0m8.851s
sys 0m8.750s
test 2 - decompressing tar.gz on RAID6
real 2m54.919s
user 0m11.804s
sys 0m12.612s
test 1 - decompressing tar.gz on SSD:
real 0m9.814s
user 0m8.406s
sys 0m6.630s
test 2 - decompressing tar.gz on RAID6
real 0m9.547s
user 0m8.637s
sys 0m6.358s
test 1 - decompressing tar.gz on SSD:
real 0m9.641s
user 0m8.519s
sys 0m6.517s
test 2 - decompressing tar.gz on RAID6
real 0m9.412s
user 0m8.563s
sys 0m6.311sCode: Select all
test 1 - decompressing tar.gz on SSD:
real 0m12.832s
user 0m8.466s
sys 0m5.534s
test 2 - decompressing tar.gz on RAID6
real 0m10.045s
user 0m8.492s
sys 0m5.479s
test 1 - decompressing tar.gz on SSD:
real 0m9.347s
user 0m8.394s
sys 0m5.407s
test 2 - decompressing tar.gz on RAID6
real 0m13.264s
user 0m8.365s
sys 0m5.512s
test 1 - decompressing tar.gz on SSD:
real 0m9.314s
user 0m8.519s
sys 0m5.315s
test 2 - decompressing tar.gz on RAID6
real 0m9.812s
user 0m8.445s
sys 0m5.316sI would be very interested to see what happens with 6.12.14 (*). Only one test would be necessary - this one:Jimini wrote:[...] I ran my tests again, just for comparison [...]
I cannot tell, where the difference between "write data to an empty dir" and "overwrite existing data" exactly is, but perhaps I am now one little step closer to the root cause...
(Of course with the same config as before; updated with “make oldconfig”; answers to the questions are here: https://wiki.gentoo.org/wiki/User:Pieti ... perimental )Jimini wrote:6.6.74 & RAID6: takes forever (I interrupted the task after ~40mins)
Code: Select all
kernel:
6.12.14-gentoo
test 2 - decompressing tar.gz on RAID6
real 0m11.202s
user 0m8.643s
sys 0m7.729s
test 2 - decompressing tar.gz on RAID6
real 0m14.094s
user 0m8.692s
sys 0m7.817s
test 2 - decompressing tar.gz on RAID6
real 0m19.357s
user 0m8.637s
sys 0m7.783s
test 2 - decompressing tar.gz on RAID6
real 0m18.047s
user 0m8.358s
sys 0m8.072s
Code: Select all
[ 13.732730] md/raid:md0: not clean -- starting background reconstructionCode: Select all
$ cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md127 : active raid5 sdc3[0] sda3[2] sdd3[3] sdb3[1]
23441117184 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
bitmap: 2/59 pages [8KB], 65536KB chunkCode: Select all
sync
echo 3 > /proc/sys/vm/drop_caches Code: Select all
$ dmesg | grep -i raid6
[ 10.254857] raid6: skipped pq benchmark and selected avx2x4
[ 10.254857] raid6: using avx2x2 recovery algorithmYou are absolutely right - but no process except tar was writing data to the array. And when I had to hard-reset the system, this was only because the system was stuck while writing veeeery slowly. Thus, the inconsisteny was very small, so the background reconstruction of the array was already finished, when I re-logged in.You can't do any useful speed tests with that going on.Code: Select all
[ 13.732730] md/raid:md0: not clean -- starting background reconstruction
Yes, I have :)Hopefully, you have a write intent bitmap so the reconstruction time is minimised?Notice the 'bitmap' entry. It's not always useful, but every little helps. You have space for it in the mdadm header, so it costs nothing to add.Code: Select all
$ cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md127 : active raid5 sdc3[0] sda3[2] sdd3[3] sdb3[1] 23441117184 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU] bitmap: 2/59 pages [8KB], 65536KB chunk
Code: Select all
share2 ~ # cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid6 sdf[3] sdb[0] sdd[6] sde[1] sdg[5] sda[2] sdc[4]
78128737280 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/7] [UUUUUUU]
bitmap: 0/117 pages [0KB], 65536KB chunkCode: Select all
Consistency Policy : bitmapThis is a good hint, I added this to my test script.To flush your caches, so you compare like with like from run to run doat the start of each run. This will slow second and subsequent tests due to the 'cold' caches.Code: Select all
sync echo 3 > /proc/sys/vm/drop_caches
Please keep in mind, that I this array is in use since 2023. Everything worked without problems until the middle of 2024 or so. It took some time until I noticed, that changing the kernel version was a workaround.If you raid set is on rotating rust, the head/platter data limit will change by a factor of 4 or 5 from the outer to inner tracks. The drives are zoned. You get more sectors per track at the edge than near the spindle.
You should be able to hit that anywhere on the disk, unless its doing a resync.
Of course not! :)Don't even think about using SMR drives in a raid set.
For me it looks as if nothing had changed between 6.1.127, 6.6.74 and 6.12.14:Has the raid6 paraty method changed between kernels?
I only use raid5 butThe kernel is supposed to choose the fastest pq calculator for your CPUCode: Select all
$ dmesg | grep -i raid6 [ 10.254857] raid6: skipped pq benchmark and selected avx2x4 [ 10.254857] raid6: using avx2x2 recovery algorithm
Code: Select all
[ 2.275772] raid6: avx2x4 gen() 36806 MB/s
[ 2.445772] raid6: avx2x2 gen() 38635 MB/s
[ 2.615772] raid6: avx2x1 gen() 30153 MB/s
[ 2.615840] raid6: using algorithm avx2x2 gen() 38635 MB/s
[ 2.785772] raid6: .... xor() 22290 MB/s, rmw enabled
[ 2.785854] raid6: using avx2x2 recovery algorithmCode: Select all
share2 ~ # dmesg | grep -i raid6
[ 2.283004] raid6: avx2x4 gen() 37004 MB/s
[ 2.453005] raid6: avx2x2 gen() 38942 MB/s
[ 2.623005] raid6: avx2x1 gen() 30282 MB/s
[ 2.623073] raid6: using algorithm avx2x2 gen() 38942 MB/s
[ 2.793005] raid6: .... xor() 22487 MB/s, rmw enabled
[ 2.793086] raid6: using avx2x2 recovery algorithmCode: Select all
[ 2.284097] raid6: avx2x4 gen() 37074 MB/s
[ 2.454096] raid6: avx2x2 gen() 38892 MB/s
[ 2.624097] raid6: avx2x1 gen() 30169 MB/s
[ 2.624165] raid6: using algorithm avx2x2 gen() 38892 MB/s
[ 2.794097] raid6: .... xor() 22496 MB/s, rmw enabled
[ 2.794178] raid6: using avx2x2 recovery algorithm<=>Jimini wrote:[...] This script is now doing its job, like with kernel 6.1.
Seems like it is working again. [...]
To be on the safe side, I would like to ask once again: Do I understand correctly that with 6.12. the test “Overwrite many small files” works correctly AGAIN and does NOT lead to a hang, as with 6.6. ?dmpogo wrote:[...] But if 6.12 is olso misbehaving, [...]
Code: Select all
# CONFIG_IA32_EMULATION is not set
# CONFIG_SUSPEND is not set
# CONFIG_HIBERNATION is not set
# CONFIG_MODULES is not set
# CONFIG_DRM_AMDGPU is not set
# CONFIG_SOUND is not setCode: Select all
CONFIG_SLAB_MERGE_DEFAULT=y
# CONFIG_SLAB_FREELIST_RANDOM is not set
# CONFIG_SLAB_FREELIST_HARDENED is not set
# CONFIG_SLAB_BUCKETS is not set
# CONFIG_SLUB_STATS is not set
CONFIG_SLUB_CPU_PARTIAL=y
# CONFIG_RANDOM_KMALLOC_CACHES is not setCode: Select all
# CONFIG_EXPERT is not setCode: Select all
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_HYGON=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_CPU_SUP_ZHAOXIN=yCode: Select all
# CONFIG_I2C_HID_ACPI is not setCode: Select all
CONFIG_HID_A4TECH=y
CONFIG_HID_BELKIN=y
CONFIG_HID_CHERRY=y
CONFIG_HID_CHICONY=y
... and many moreWell, if it is fixed in 6.12, then it is not as interesting, but perhaps just reading kernel ChangeLogs could shed a light. Unless fix was accidentalpietinger wrote:<=>Jimini wrote:[...] This script is now doing its job, like with kernel 6.1.
Seems like it is working again. [...]To be on the safe side, I would like to ask once again: Do I understand correctly that with 6.12. the test “Overwrite many small files” works correctly AGAIN and does NOT lead to a hang, as with 6.6. ?dmpogo wrote:[...] But if 6.12 is olso misbehaving, [...]
If so, then I would say that there really was a kernel regression between 6.1 and 6.6 ... which was then fixed between 6.6 and 6.12. But finding that out by bisecting might be a herculean task.

Yes, so I ran a few tests with bonnie++. The results for 6.1.127, 6.6.74 and 6.12.14 can be seen here: https://syncookie.de/bonnie/pietinger wrote:To be on the safe side, I would like to ask once again: Do I understand correctly that with 6.12. the test “Overwrite many small files” works correctly AGAIN and does NOT lead to a hang, as with 6.6. ?
Code: Select all
sync
echo 3 > /proc/sys/vm/drop_caches
time tar -xzpf linux.tar.gz