Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Other Things Gentoo
  • Search

NVMe and emerge compile

Still need help with Gentoo, and your question doesn't fit in the above forums? Here is your last bastion of hope.
Post Reply
Advanced search
33 posts
  • 1
  • 2
  • Next
Author
Message
Black
Apprentice
Apprentice
User avatar
Posts: 158
Joined: Tue Dec 10, 2002 11:33 pm
Location: Québec, Canada

NVMe and emerge compile

  • Quote

Post by Black » Tue May 09, 2023 5:37 pm

2 years ago, I got myself a new computer, this time with a NVMe drive mounted as root drive (/). I also have regular HDD set up in RAID1 for my /home. I did not create a separate partition for /var, so it's on the same NVMe drive as /.

That PC is on 24/7. At some point, I rebooted, and the BIOS complained about that drive failing the SMART test. I have now set up /var/tmp to be a tmpfs, taking 12GB out of the system's total 32GB of RAM. I also went looking on the net to see if leaving emerge to compile on a NVMe drive is bad. I came across a reddit page where people say it's not an issue, with one example saying he's got "38TB written in 7000 hours", with a warrantied TBW of 1200TB.

In my case, the drive is a Kingston SA2000M8250G - from what I find online, the TBW limit is 150TB. From what I can tell, I'm waaaaay past that, at 1.03PB (in almost 2 years - see below). Kingston's warranty is also apparently void since the "percentage used" is now 100%. So yeah, that drive is a failure waiting to happen (it still runs, despite the SMART test failure - I'm currently using this computer to post this message).

So my question is: is letting portage use a NVMe drive to compile killing such drives?

Code: Select all

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- NVM subsystem reliability has been degraded

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x04
Temperature:                        31 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    100%
Data Units Read:                    301,659 [154 GB]
Data Units Written:                 2,014,358,159 [1.03 PB]
Host Read Commands:                 11,206,812
Host Write Commands:                7,973,648,047
Controller Busy Time:               90,160
Power Cycles:                       46
Power On Hours:                     16,711
Unsafe Shutdowns:                   16
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Top
eccerr0r
Watchman
Watchman
Posts: 10239
Joined: Thu Jul 01, 2004 6:51 pm
Location: almost Mile High in the USA
Contact:
Contact eccerr0r
Website

  • Quote

Post by eccerr0r » Tue May 09, 2023 5:54 pm

Today's TLC and QLC drives just don't have the endurance anymore but for most uses they are fine. 1PB written however is a LOT, what are you doing to the disk? Using it as a bittorrent dump?

I have machines on 24/7 but they're mostly idle. One of them is PVR and it's accumulated ~ 65TB written since last mkfs (it's a mechanical HDD however) and it's been over 10 years. I don't constantly do updates on it however, but it definitely gets emerge @world once in a while - but the vast majority of the writes are from downloading OTA TV programming.

Granted for me my Gentoo boxes typically use tmpfs when I can, but being RAM limited I cannot always use tmpfs. I do have a 180G SATA SSD that I've gone 23TB written according to SMART, but it has a minimum 540TBW estimate endurance limit
Last edited by eccerr0r on Tue May 09, 2023 5:55 pm, edited 1 time in total.
Intel Core i7 2700K/Radeon Firepro W2100/24GB DDR3/800GB SSD
What am I supposed watching?
Top
NeddySeagoon
Administrator
Administrator
User avatar
Posts: 56088
Joined: Sat Jul 05, 2003 9:37 am
Location: 56N 3W

  • Quote

Post by NeddySeagoon » Tue May 09, 2023 5:54 pm

Black

Code: Select all

Available Spare:                    100% 
Data Units Read:                    301,659 [154 GB]
Data Units Written:                 2,014,358,159 [1.03 PB] 
Power On Hours:                     16,711 
I'm not sure I believe those numbers 1.03 PB in 16,711 hours is 60GB an hour. That's 16MB/sec Portage is not doing that.
The drive also has not used any of its spare capacity, which would be way down at end of life.

The data set is not self consistent.
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Top
eccerr0r
Watchman
Watchman
Posts: 10239
Joined: Thu Jul 01, 2004 6:51 pm
Location: almost Mile High in the USA
Contact:
Contact eccerr0r
Website

  • Quote

Post by eccerr0r » Tue May 09, 2023 6:00 pm

Definitely not Gentoo doing that usage but I can't say it's unbelievable - but we don't know what else is using the disk. One thing that is suspicious is that the read/write ratio is oddly skewed to writes - meaning that it's written and never read back...

I found that I (accidentally) took a big chunk out of some of my SSDs by thrash swapping to them, and with an NVMe interface this can add up fast.

I do have to say that there are firmware bugs out there that lie about usage. One of my SSDs, according to its POH, says it was made when Edison made his first light bulb...
Intel Core i7 2700K/Radeon Firepro W2100/24GB DDR3/800GB SSD
What am I supposed watching?
Top
Black
Apprentice
Apprentice
User avatar
Posts: 158
Joined: Tue Dec 10, 2002 11:33 pm
Location: Québec, Canada

  • Quote

Post by Black » Tue May 09, 2023 6:19 pm

eccerr0r wrote:Today's TLC and QLC drives just don't have the endurance anymore but for most uses they are fine. 1PB written however is a LOT, what are you doing to the disk? Using it as a bittorrent dump?

I have machines on 24/7 but they're mostly idle. One of them is PVR and it's accumulated ~ 65TB written since last mkfs (it's a mechanical HDD however) and it's been over 10 years. I don't constantly do updates on it however, but it definitely gets emerge @world once in a while - but the vast majority of the writes are from downloading OTA TV programming.

Granted for me my Gentoo boxes typically use tmpfs when I can, but being RAM limited I cannot always use tmpfs. I do have a 180G SATA SSD that I've gone 23TB written according to SMART, but it has a minimum 540TBW estimate endurance limit
No, that PC is mostly idle. It's my desktop - it's running 24/7, but the only server I'm running is Samba for my local network - and the files it is serving at on /home, so not on the NVMe. Portage is definitely the most disk-intense activity on that PC - when I run it, which is at most once a day, and not every day.

The swap partition is also there, but with 32GB RAM, it's not getting much use. In hindsight, I should have put it on the HDD, but I don't think it is a factor.

Running iotop gives mostly Google Chrome as the main io process, but, again, /home isn't on the NVMe. And iotop's "Current DISK WRITE" is at or close to 0, with bursts in the 300 K/s range.

ntop:

Code: Select all

    0[|                         0.7%]   3[|                         0.7%]   6[||                        2.0%]   9[||                        3.3%]
    1[||                        1.3%]   4[                          0.0%]   7[||                        1.3%]  10[                          0.0%]
    2[||                        1.3%]   5[||                        2.0%]   8[                          0.0%]  11[|                         0.7%]
  Mem[|||||||||||||||||||||||||||||||||||||||||              1.83G/31.2G] Tasks: 101, 452 thr, 142 kthr; 1 running
  Swp[||                                                     7.90M/32.0G] Load average: 1.66 1.95 2.04 
                                                                          Uptime: 72 days, 20:31:53
@NeddySeagoon you're right, 60GB/hour is rather high for a PC that's mostly idle.

@eccerr0r I think you might be on to something with firmware bugs...
Top
eccerr0r
Watchman
Watchman
Posts: 10239
Joined: Thu Jul 01, 2004 6:51 pm
Location: almost Mile High in the USA
Contact:
Contact eccerr0r
Website

  • Quote

Post by eccerr0r » Thu May 11, 2023 6:25 am

Can these newer nvme SSDs sustain 1GB/sec written?
Writing through 2PB would take less than 1 month...
Intel Core i7 2700K/Radeon Firepro W2100/24GB DDR3/800GB SSD
What am I supposed watching?
Top
Black
Apprentice
Apprentice
User avatar
Posts: 158
Joined: Tue Dec 10, 2002 11:33 pm
Location: Québec, Canada

  • Quote

Post by Black » Mon May 15, 2023 3:07 am

eccerr0r wrote:One thing that is suspicious is that the read/write ratio is oddly skewed to writes - meaning that it's written and never read back...
/var/log ?
Top
Hu
Administrator
Administrator
Posts: 24395
Joined: Tue Mar 06, 2007 5:38 am

  • Quote

Post by Hu » Mon May 15, 2023 3:43 am

Yes, logs are written and often not read, but typical logs should not be nearly large enough for that to be noticeable at this scale.
Top
eccerr0r
Watchman
Watchman
Posts: 10239
Joined: Thu Jul 01, 2004 6:51 pm
Location: almost Mile High in the USA
Contact:
Contact eccerr0r
Website

  • Quote

Post by eccerr0r » Mon May 15, 2023 6:13 am

only thing that could do this are:

- Backups (unless you verify)... I had one hard drive that I only wrote backups to (as well as it kept on getting dropped from the array for electrical problems, so it kept on getting resilvered and that's all writes)
- killer endurance testing
- sabotage...

mystery continues...
Intel Core i7 2700K/Radeon Firepro W2100/24GB DDR3/800GB SSD
What am I supposed watching?
Top
Black
Apprentice
Apprentice
User avatar
Posts: 158
Joined: Tue Dec 10, 2002 11:33 pm
Location: Québec, Canada

  • Quote

Post by Black » Thu Jun 08, 2023 12:36 pm

I have a new NVMe "drive" on my desk, and I will switch in the near-future, but in the meantime, here's some interesting data. I don't think it shows anything (other than it doesn't match). I have run smartctl twice, at 24-hour interval. iotop has been running (in accumulation mode) for that same period. md127 is a RAID1 array of spinning rust - so not the NVMe. Chrome, running under user "black" should be writing to the home folder, which is not on the NVMe (it's on md127 - the spinning rust). Syncthing's folders are also on md127.

/var/portable/tmp has been in tmpfs for a month now and doesn't appear to make a difference. I have included the relevant fstab line, in case I made a newbie mistake there.

Code: Select all

Every 2.0s: smartctl -A /dev/nvme0                           blackphoenix: Wed Jun  7 08:08:57 2023

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.12-gentoo] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF SMART DATA SECTION ===
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x04
Temperature:                        30 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    100%
Data Units Read:                    311,084 [159 GB]
Data Units Written:                 2,127,156,742 [1.08 PB]
Host Read Commands:                 11,576,860
Host Write Commands:                8,419,516,333
Controller Busy Time:               95,656
Power Cycles:                       47
Power On Hours:                     17,402
Unsafe Shutdowns:                   17
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0


Every 2.0s: smartctl -A /dev/nvme0                           blackphoenix: Thu Jun  8 07:52:55 2023

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.12-gentoo] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF SMART DATA SECTION ===
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x04
Temperature:                        30 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    100%
Data Units Read:                    311,797 [159 GB]
Data Units Written:                 2,131,061,845 [1.09 PB]
Host Read Commands:                 11,581,631
Host Write Commands:                8,434,917,922
Controller Busy Time:               95,837
Power Cycles:                       47
Power On Hours:                     17,426
Unsafe Shutdowns:                   17
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Code: Select all

Total DISK READ:         0.00 B/s | Total DISK WRITE:         0.00 B/s
Current DISK READ:       0.00 B/s | Current DISK WRITE:       0.00 B/s
  PID  PRIO  USER     DISK READ DISK WRITE>  SWAPIN      IO    COMMAND                                              
 1161 be/3 root          0.00 B   1629.28 M  0.00 %  0.00 % [jbd2/md127-8]
  168 be/3 root          0.00 B    697.21 M  0.00 %  0.00 % [jbd2/nvme0n1p3-8]
22754 be/4 black       692.00 K    105.75 M  0.00 %  0.13 % chrome --profile-directory=Default --disable-async-dns
22799 be/4 black        52.00 K     56.47 M  0.00 %  0.07 % chrome --type=utility --uti~,13347491690870620486,262144
31748 ?dif syncthin      6.39 M     53.09 M  0.00 %  0.02 % syncthing -no-browser -home~ddress=http://127.0.0.1:8384
 3120 be/4 black       136.00 K     44.25 M  0.00 %  0.00 % liferea
 1567 be/4 root          0.00 B     28.40 M  0.00 %  0.20 % syslogd -F -m 0 -s -s
22811 be/4 black         4.00 K     23.26 M  0.00 %  0.05 % chrome --type=utility --uti~,13347491690870620486,262144
 3136 be/4 black       928.00 K      3.07 M  0.00 %  0.00 % WebKitNetworkProcess 7 18
22939 be/4 black         0.00 B      2.49 M  0.00 %  0.00 % chrome --type=renderer --cr~,13347491690870620486,262144
32232 be/4 black         0.00 B   1916.00 K  0.00 %  0.00 % chrome --type=renderer --cr~,13347491690870620486,262144
22882 be/4 black         0.00 B   1244.00 K  0.00 %  0.00 % chrome --type=renderer --cr~,13347491690870620486,262144
 1465 be/4 black         0.00 B    972.00 K  0.00 %  0.00 % chrome --type=renderer --cr~,13347491690870620486,262144
19559 be/4 root          0.00 B    624.00 K  0.00 %  0.00 % nmbd -D
23088 be/4 black         0.00 B    604.00 K  0.00 %  0.00 % chrome --type=renderer --cr~,13347491690870620486,262144
25183 be/4 black         0.00 B    276.00 K  0.00 %  0.00 % chrome --type=renderer --cr~,13347491690870620486,26214
Relevant line of /etc/fstab:

Code: Select all

PARTUUID=8ca208e8-2e44-454a-b4ec-51e76d3acdab		/		ext4		noatime		0 1
Top
eccerr0r
Watchman
Watchman
Posts: 10239
Joined: Thu Jul 01, 2004 6:51 pm
Location: almost Mile High in the USA
Contact:
Contact eccerr0r
Website

  • Quote

Post by eccerr0r » Thu Jun 08, 2023 2:02 pm

You're still "writing" 100TB/month somehow!

is there anything funky show up in your dmesg?

What happens if you mount the disk from livecd (R/W) and wait out a similar period? Or probably at least an hour?
Intel Core i7 2700K/Radeon Firepro W2100/24GB DDR3/800GB SSD
What am I supposed watching?
Top
Black
Apprentice
Apprentice
User avatar
Posts: 158
Joined: Tue Dec 10, 2002 11:33 pm
Location: Québec, Canada

  • Quote

Post by Black » Thu Jun 08, 2023 3:48 pm

Filtering out the UFW (Uncomplicated FireWall) that I started using about 2 or 3 months ago (therefore, long after this excessive writing started), I get the output below.

I just found this page for ArchLinux stating there is an issue with that exact same drive with that exact same firmware revision. I don't have the exact same symptoms - the drive does not become unresponsive after a while, and I ran it for around 300 days without rebooting at some point. I'll have to try updating the firmware, or at least passing the kernel parameter to set a max latency to see if it changes anything.

Thanks for the livecd suggestion, I'll give that one a try as well.

Code: Select all

[1494969.021227] nvme nvme0: I/O 704 (I/O Cmd) QID 4 timeout, aborting
[1494969.021256] nvme nvme0: I/O 512 (I/O Cmd) QID 6 timeout, aborting
[1494969.021274] nvme nvme0: I/O 513 (I/O Cmd) QID 6 timeout, aborting
[1494969.021283] nvme nvme0: I/O 514 (I/O Cmd) QID 6 timeout, aborting
[1494969.021290] nvme nvme0: I/O 515 (I/O Cmd) QID 6 timeout, aborting
[1494999.229229] nvme nvme0: I/O 28 QID 0 timeout, reset controller
[1494999.229272] nvme nvme0: I/O 704 QID 4 timeout, reset controller
[1495061.697461] nvme nvme0: Abort status: 0x371
[1495061.697464] nvme nvme0: Abort status: 0x371
[1495061.697465] nvme nvme0: Abort status: 0x371
[1495061.697466] nvme nvme0: Abort status: 0x371
[1495061.697466] nvme nvme0: Abort status: 0x371
[1495061.717464] nvme nvme0: 12/0/0 default/read/poll queues
[1495526.205328] nvme nvme0: I/O 0 (I/O Cmd) QID 11 timeout, aborting
[1495526.205358] nvme nvme0: I/O 1 (I/O Cmd) QID 11 timeout, aborting
[1495526.205377] nvme nvme0: I/O 2 (I/O Cmd) QID 11 timeout, aborting
[1495526.205385] nvme nvme0: I/O 3 (I/O Cmd) QID 11 timeout, aborting
[1495526.205392] nvme nvme0: I/O 4 (I/O Cmd) QID 11 timeout, aborting
[1495556.221333] nvme nvme0: I/O 0 QID 11 timeout, reset controller
[1495556.801271] nvme nvme0: I/O 29 QID 0 timeout, reset controller
[1495618.755318] nvme nvme0: Abort status: 0x371
[1495618.755328] nvme nvme0: Abort status: 0x371
[1495618.755333] nvme nvme0: Abort status: 0x371
[1495618.755337] nvme nvme0: Abort status: 0x371
[1495618.755341] nvme nvme0: Abort status: 0x371
[1495618.777427] nvme nvme0: 12/0/0 default/read/poll queues
[1495799.485337] nvme nvme0: I/O 64 (I/O Cmd) QID 3 timeout, aborting
[1495799.485366] nvme nvme0: I/O 65 (I/O Cmd) QID 3 timeout, aborting
[1495799.485385] nvme nvme0: I/O 66 (I/O Cmd) QID 3 timeout, aborting
[1495799.485393] nvme nvme0: I/O 67 (I/O Cmd) QID 3 timeout, aborting
[1495799.485401] nvme nvme0: I/O 68 (I/O Cmd) QID 3 timeout, aborting
[1495829.693302] nvme nvme0: I/O 64 QID 3 timeout, reset controller
[1495831.743273] nvme nvme0: I/O 28 QID 0 timeout, reset controller
[1495893.185661] nvme nvme0: Abort status: 0x371
[1495893.185665] nvme nvme0: Abort status: 0x371
[1495893.185667] nvme nvme0: Abort status: 0x371
[1495893.185668] nvme nvme0: Abort status: 0x371
[1495893.185672] nvme nvme0: Abort status: 0x371
[1495893.205196] nvme nvme0: 12/0/0 default/read/poll queues
[1496162.493336] nvme nvme0: I/O 768 (I/O Cmd) QID 8 timeout, aborting
[1496162.493365] nvme nvme0: I/O 769 (I/O Cmd) QID 8 timeout, aborting
[1496162.493385] nvme nvme0: I/O 770 (I/O Cmd) QID 8 timeout, aborting
[1496162.493397] nvme nvme0: I/O 771 (I/O Cmd) QID 8 timeout, aborting
[1496162.493418] nvme nvme0: I/O 772 (I/O Cmd) QID 8 timeout, aborting
[1496192.701334] nvme nvme0: I/O 768 QID 8 timeout, reset controller
[1496193.213322] nvme nvme0: I/O 28 QID 0 timeout, reset controller
[1496253.633570] nvme nvme0: Abort status: 0x371
[1496253.633573] nvme nvme0: Abort status: 0x371
[1496253.633574] nvme nvme0: Abort status: 0x371
[1496253.633575] nvme nvme0: Abort status: 0x371
[1496253.633575] nvme nvme0: Abort status: 0x371
[1496253.654974] nvme nvme0: 12/0/0 default/read/poll queues
[1496343.229376] nvme nvme0: I/O 128 (I/O Cmd) QID 3 timeout, aborting
[1496343.229404] nvme nvme0: I/O 129 (I/O Cmd) QID 3 timeout, aborting
[1496343.229424] nvme nvme0: I/O 130 (I/O Cmd) QID 3 timeout, aborting
[1496343.229433] nvme nvme0: I/O 131 (I/O Cmd) QID 3 timeout, aborting
[1496343.229440] nvme nvme0: I/O 132 (I/O Cmd) QID 3 timeout, aborting
[1496373.437379] nvme nvme0: I/O 128 QID 3 timeout, reset controller
[1496374.973334] nvme nvme0: I/O 28 QID 0 timeout, reset controller
[1496433.863355] nvme nvme0: Abort status: 0x371
[1496433.863371] nvme nvme0: Abort status: 0x371
[1496433.863378] nvme nvme0: Abort status: 0x371
[1496433.863383] nvme nvme0: Abort status: 0x371
[1496433.863389] nvme nvme0: Abort status: 0x371
[1496433.885341] nvme nvme0: 12/0/0 default/read/poll queues
[1497018.561425] nvme nvme0: I/O 896 (I/O Cmd) QID 1 timeout, aborting
[1497018.561455] nvme nvme0: I/O 897 (I/O Cmd) QID 1 timeout, aborting
[1497018.561475] nvme nvme0: I/O 898 (I/O Cmd) QID 1 timeout, aborting
[1497018.561484] nvme nvme0: I/O 899 (I/O Cmd) QID 1 timeout, aborting
[1497018.561491] nvme nvme0: I/O 900 (I/O Cmd) QID 1 timeout, aborting
[1497048.765429] nvme nvme0: I/O 13 QID 0 timeout, reset controller
[1497048.765474] nvme nvme0: I/O 896 QID 1 timeout, reset controller
[1497109.698422] nvme nvme0: Abort status: 0x371
[1497109.698434] nvme nvme0: Abort status: 0x371
[1497109.698438] nvme nvme0: Abort status: 0x371
[1497109.698442] nvme nvme0: Abort status: 0x371
[1497109.698445] nvme nvme0: Abort status: 0x371
[1497109.717281] nvme nvme0: 12/0/0 default/read/poll queues
[1498502.845492] nvme nvme0: I/O 704 (I/O Cmd) QID 2 timeout, aborting
[1498502.845523] nvme nvme0: I/O 705 (I/O Cmd) QID 2 timeout, aborting
[1498502.845542] nvme nvme0: I/O 706 (I/O Cmd) QID 2 timeout, aborting
[1498502.845551] nvme nvme0: I/O 707 (I/O Cmd) QID 2 timeout, aborting
[1498502.845558] nvme nvme0: I/O 708 (I/O Cmd) QID 2 timeout, aborting
[1498533.053493] nvme nvme0: I/O 704 QID 2 timeout, reset controller
[1498534.077495] nvme nvme0: I/O 29 QID 0 timeout, reset controller
[1498596.546319] nvme nvme0: Abort status: 0x371
[1498596.546331] nvme nvme0: Abort status: 0x371
[1498596.546335] nvme nvme0: Abort status: 0x371
[1498596.546339] nvme nvme0: Abort status: 0x371
[1498596.546347] nvme nvme0: Abort status: 0x371
[1498596.572858] nvme nvme0: 12/0/0 default/read/poll queues
Top
Anon-E-moose
Watchman
Watchman
User avatar
Posts: 6566
Joined: Fri May 23, 2008 7:31 pm
Location: Dallas area

  • Quote

Post by Anon-E-moose » Thu Jun 08, 2023 7:35 pm

That's an insane amount of disk writes, for that short a time.

I suppose it could happen if, you "emerge -e" several times a day or are using some part of the nvme for swap space and keep running out of memory or you have several things writing to /var/log/<something> constantly.

Edit to add: From kingston site about smart data
For the NVM command set, logical blocks written as part of Write operations shall be included
in this value. Write Uncorrectable commands shall not impact this value.
Not sure what constitutes a logical block (as opposed to physical block) but might explain the high write amount if logical is much larger than physical.
UM780 xtx, 6.18 zen kernel, gcc 15, openrc, wayland
minixforum m1-s1 max -- same software as above but used for ai learning


Zealots are gonna be zealots, just like haters are gonna be haters
Top
toralf
Developer
Developer
User avatar
Posts: 3944
Joined: Sun Feb 01, 2004 2:58 pm
Location: Hamburg
Contact:
Contact toralf
Website

  • Quote

Post by toralf » Thu Jun 08, 2023 8:38 pm

At the tinderbox I made similar experiences.

From the smartctl values about 1.4 PB of data were written in the last 2 yrs to a BTRFS filesystem spawned at 2 partitions of 2 NVMe drives.
This are about 24 MiB/sec . The Grafana metrics node_disk_written_bytes_total (I do use it since 2 months) told me the same.
What is interesting, is that this value dropped down to 9-10 MiB/sec since kernel 6.3.x. And nothing else was changed at the server.

The emerge is made using a tmpfs for /var/tmp/portage.

FWIW, the nightly house keeping process here - which deletes about 10-100 GB of old data at that file system- is shown as node_disk_written_bytes_total too. So there's a big discrepancy between the house kept space and the reported written space, the factor is still 10x or 20x.
Top
eccerr0r
Watchman
Watchman
Posts: 10239
Joined: Thu Jul 01, 2004 6:51 pm
Location: almost Mile High in the USA
Contact:
Contact eccerr0r
Website

  • Quote

Post by eccerr0r » Thu Jun 08, 2023 8:44 pm

Based on the 24 hour sample, the iotop written bytes and the hard drive written bytes actually seem to correspond (assuming 512-byte logical blocks), but that 24 hour sample of 700MB/day, if sustained, would only 22GB/month, nowhere the 100TB that was measured.

Is your ext4 filesystem formatted for 512 or 4096 blocks? 512 byte blocks or perhaps partition alignment problems could cause some extraneous writes.

Did you even expect that 700MB writes that one day?

I never took a look at how much I write per day...
Intel Core i7 2700K/Radeon Firepro W2100/24GB DDR3/800GB SSD
What am I supposed watching?
Top
Black
Apprentice
Apprentice
User avatar
Posts: 158
Joined: Tue Dec 10, 2002 11:33 pm
Location: Québec, Canada

  • Quote

Post by Black » Thu Jun 08, 2023 8:57 pm

eccerr0r wrote:Did you even expect that 700MB writes that one day?
No, all I'm doing is browsing the net, watch a few Youtube videos. Syncthing is for my LAN, not much happening there. The PC is taking backups at night, but sending them to another PC, so that's just reading, not writing. /var/log doesn't seem to move much (if at all) when I look at it - though UFW seems to write by burst. I just turned it off and reran smartctl, I'll check again tomorrow to see if there's a change.
Top
Anon-E-moose
Watchman
Watchman
User avatar
Posts: 6566
Joined: Fri May 23, 2008 7:31 pm
Location: Dallas area

  • Quote

Post by Anon-E-moose » Thu Jun 08, 2023 9:43 pm

I suppose you could have buggy firmware.

If interested you could find the firmware rev and check google for that model and firmware version to see if there are reported problems.
UM780 xtx, 6.18 zen kernel, gcc 15, openrc, wayland
minixforum m1-s1 max -- same software as above but used for ai learning


Zealots are gonna be zealots, just like haters are gonna be haters
Top
eccerr0r
Watchman
Watchman
Posts: 10239
Joined: Thu Jul 01, 2004 6:51 pm
Location: almost Mile High in the USA
Contact:
Contact eccerr0r
Website

  • Quote

Post by eccerr0r » Thu Jun 08, 2023 9:47 pm

might be interesting to take a log snapshot every day and see if there's some anomalous behavior, but missing one day of 40MB/sec writes all day is kind of hard to make up - so it's probably not demand writes, more like firmware or consequential writes going on.
Intel Core i7 2700K/Radeon Firepro W2100/24GB DDR3/800GB SSD
What am I supposed watching?
Top
Anon-E-moose
Watchman
Watchman
User avatar
Posts: 6566
Joined: Fri May 23, 2008 7:31 pm
Location: Dallas area

  • Quote

Post by Anon-E-moose » Thu Jun 08, 2023 9:50 pm

Do you have discard turned off and run fstrim periodically
UM780 xtx, 6.18 zen kernel, gcc 15, openrc, wayland
minixforum m1-s1 max -- same software as above but used for ai learning


Zealots are gonna be zealots, just like haters are gonna be haters
Top
Black
Apprentice
Apprentice
User avatar
Posts: 158
Joined: Tue Dec 10, 2002 11:33 pm
Location: Québec, Canada

  • Quote

Post by Black » Fri Jun 09, 2023 12:57 am

Anon-E-moose wrote:Do you have discard turned off and run fstrim periodically
Unless there's something specific to disable discard, I did not willingly turn it on - the "discard" option is not in my fstab. I actually didn't even know about that until about a month ago when I made the first post of this thread.

As to the fstrim command, I tried running it once in a dry-run and it said it didn't have anything to do:

Code: Select all

blackphoenix / # fstrim -n -v /
/: 0 B (dry run) trimmed
Top
Goverp
Advocate
Advocate
User avatar
Posts: 2402
Joined: Wed Mar 07, 2007 6:41 pm

  • Quote

Post by Goverp » Fri Jun 09, 2023 9:15 am

I've sometimes wondered if an overzealous combination of logging (or writing anything) and syncing can cause this sort of problem - the zeal being to sync after every line rather than let the kernel do its thing. Not syncing means the writes would be buffered, and there's a danger of losing the last buffer(s) if there's a power outage, but the cost of syncing on SSD or NVMe is a write (=new block allocated and written for some value of "block" for every single line...) I presume databases and other transactional mechanisms have some way round this for their journals; alternatively, just ensure the journal is on spinning rust.

FWIW I have a 5 disk RAID 10 array that I use for /home,and /var/tmp, and run emerges in a chroot in /home/packager/chroot to create binary packages, and then install the binpkgs into the root filesystem on NVMe, so all the compilation stuff happens on spinning disks.
Greybeard
Top
Black
Apprentice
Apprentice
User avatar
Posts: 158
Joined: Tue Dec 10, 2002 11:33 pm
Location: Québec, Canada

  • Quote

Post by Black » Fri Jun 09, 2023 2:51 pm

Just another data point: I discovered the inotifywait command, so I'm running it on /tmp.

In the last 2.5 hours, Google Chrome is the only process that has written anything in /tmp - 740 times, either create, modify, or delete a file there. For most of the time, Chrome is actually just sitting there, as I'm working away on another PC. It doesn't mean it's all Chrome's fault, but it sure doesn't help. I guess I should run inotifywait on the entire partition.

I also ran inotifywait on /var/log, and only 75 writes were done in the same time period.
Top
Anon-E-moose
Watchman
Watchman
User avatar
Posts: 6566
Joined: Fri May 23, 2008 7:31 pm
Location: Dallas area

  • Quote

Post by Anon-E-moose » Fri Jun 09, 2023 3:12 pm

Do you have nvme-cli installed? It has lots of useful options for nvme investigation.
UM780 xtx, 6.18 zen kernel, gcc 15, openrc, wayland
minixforum m1-s1 max -- same software as above but used for ai learning


Zealots are gonna be zealots, just like haters are gonna be haters
Top
Black
Apprentice
Apprentice
User avatar
Posts: 158
Joined: Tue Dec 10, 2002 11:33 pm
Location: Québec, Canada

  • Quote

Post by Black » Fri Jun 09, 2023 3:41 pm

Anon-E-moose wrote:Do you have nvme-cli installed? It has lots of useful options for nvme investigation.
Yes I do, but I haven't used that before. Any hint as to which commands to look at?

Thank you (and everyone else)!
Top
eccerr0r
Watchman
Watchman
Posts: 10239
Joined: Thu Jul 01, 2004 6:51 pm
Location: almost Mile High in the USA
Contact:
Contact eccerr0r
Website

  • Quote

Post by eccerr0r » Fri Jun 09, 2023 3:50 pm

Searching the web, I get a lot of hits on kingston ssds having this behavior...

Currently I only have Intel, Samsung(mPCIe) Patriot (mPCIe), HP, and Micron/Crucial SSDs ... they don't seem to exhibit this behavior though the Samsung I accidentally swap stormed on and ate a chunk of its life ...
Intel Core i7 2700K/Radeon Firepro W2100/24GB DDR3/800GB SSD
What am I supposed watching?
Top
Post Reply

33 posts
  • 1
  • 2
  • Next

Return to “Other Things Gentoo”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic