Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Other Things Gentoo
  • Search

Interpreting nvme-cli logs

Still need help with Gentoo, and your question doesn't fit in the above forums? Here is your last bastion of hope.
Post Reply
Advanced search
6 posts • Page 1 of 1
Author
Message
grooveman
Veteran
Veteran
User avatar
Posts: 1217
Joined: Mon Feb 24, 2003 5:24 pm

Interpreting nvme-cli logs

  • Quote

Post by grooveman » Thu Dec 23, 2021 12:59 pm

Hi.

I had some problems with an nvme drive I had. The system kept locking up. I couldn't backup the drive because it would get about 30 gigs in, then crash. So, I got a new drive, and restored from my last god backup to it, and it my system now works perfectly. A very happy ending to a story that could have been a disaster, and certainly testimony to having regular backups running...

but...

The old drive is still under warranty, and I'm trying to determine if it is any good anymore. I ran a shred on it... and it gave no complaints. That surprised me, so I wrote zeros to it -- and to my surprise, it executed this on the entire drive without a single complaint. At this point, I begin to wonder if there really is a problem with the drive... I hook it back up, and I use nvme-cli. I do a long test, and after a couple hours, I get my results:

Code: Select all

Device Self Test Log for NVME device:nvme0
Current operation  : 0
Current Completion : 0%
Self Test Result[0]:
  Operation Result             : 0
  Self Test Code               : 2
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0x25d0
  Vendor Specific              : 0 0
Self Test Result[1]:
  Operation Result             : 0
  Self Test Code               : 1
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0x25cf
  Vendor Specific              : 0 0
Self Test Result[2]:
  Operation Result             : 0
  Self Test Code               : 2
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0x25c6
  Vendor Specific              : 0 0
Self Test Result[3]:
  Operation Result             : 0
  Self Test Code               : 1
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0x25c2
  Vendor Specific              : 0 0
Self Test Result[4]:
  Operation Result             : 0
  Self Test Code               : 1
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0x1161
  Vendor Specific              : 0 0
Self Test Result[5]:
  Operation Result             : 0xf
Self Test Result[6]:
  Operation Result             : 0xf
Self Test Result[7]:
  Operation Result             : 0xf
Self Test Result[8]:
  Operation Result             : 0xf
Self Test Result[9]:
  Operation Result             : 0xf
Self Test Result[10]:
  Operation Result             : 0xf
Self Test Result[11]:
  Operation Result             : 0xf
Self Test Result[12]:
  Operation Result             : 0xf
Self Test Result[13]:
  Operation Result             : 0xf
Self Test Result[14]:
  Operation Result             : 0xf
Self Test Result[15]:
  Operation Result             : 0xf
Self Test Result[16]:
  Operation Result             : 0xf
Self Test Result[17]:
  Operation Result             : 0xf
Self Test Result[18]:
  Operation Result             : 0xf
Self Test Result[19]:
  Operation Result             : 0xf
But what the heck do they mean? I cannot find this documented anywhere... I was expecting something less cryptic than this... or at least some thorough documentation on how to interpret the results... But what does Self Test Code 1 or 2 mean? If the drive is showing as healthy, there is no point in sending it back to Western Digital (it is an SN750, by the way). They will just throw it back in my face, and it will waste both of our time. Meanwhile, I'll have an NVME that I do not trust... that is of marginal use to me.

Anyone know of any documentation on this subject? Anyone know how to interpret this?

Thanks.

G
To look without without looking within is like looking without without looking at all.
Top
Anon-E-moose
Watchman
Watchman
User avatar
Posts: 6566
Joined: Fri May 23, 2008 7:31 pm
Location: Dallas area

  • Quote

Post by Anon-E-moose » Thu Dec 23, 2021 1:35 pm

Not sure what the tests are but there are some things you can check/investigate

Is there a firmware update for the nvme drive (check with WD support)
Not sure which kernel version you're running but there's always possibility that it might need a newer driver (later kernel)
Could be something not right between the MB and the nvme.
UM780 xtx, 6.18 zen kernel, gcc 15, openrc, wayland
minixforum m1-s1 max -- same software as above but used for ai learning


Zealots are gonna be zealots, just like haters are gonna be haters
Top
mike155
Advocate
Advocate
Posts: 4438
Joined: Fri Sep 17, 2010 11:33 pm
Location: Frankfurt, Germany

  • Quote

Post by mike155 » Thu Dec 23, 2021 4:28 pm

I couldn't backup the drive because it would get about 30 gigs in, then crash.
How often did you run fstrim on your old drive?
Top
Hu
Administrator
Administrator
Posts: 24392
Joined: Tue Mar 06, 2007 5:38 am

  • Quote

Post by Hu » Thu Dec 23, 2021 4:50 pm

Did you ever get any kernel logs from when the old drive crashed, or was the system too broken to save those? If you got them, what did the kernel print?

My guess based on your reported symptoms is that the drive had a bad area that it handled very poorly, but when you rewrote the entire drive, you forced the drive to remap that area out of existence. The remaining sectors are usable, at least for now. Whether they will remain that way is unknown.
Top
grooveman
Veteran
Veteran
User avatar
Posts: 1217
Joined: Mon Feb 24, 2003 5:24 pm

  • Quote

Post by grooveman » Thu Jan 27, 2022 3:38 pm

I didn't think you needed to run the trim function on contemporary drives.

The thing behaves normally, so I'm not sure why it got so grumpy.

Anyway, thanks for the input.
To look without without looking within is like looking without without looking at all.
Top
jonas21
n00b
n00b
Posts: 1
Joined: Mon Oct 24, 2022 6:36 am

  • Quote

Post by jonas21 » Mon Oct 24, 2022 6:44 am

I was looking for the cryptic results, too. It seems this is not well documentated with nvme-cli. The codes are actually listed from the NVME spec, their meaning is as follows:

The "Operating Result" field:

0h Operation completed without error
1h Operation was aborted by a Device Self-test command
2h Operation was aborted by a Controller Level Reset Operation was aborted due to a removal of a namespace from the
3h namespace inventory
4h Operation was aborted due to the processing of a Format NVM command A fatal error or unknown test error occurred while the controller was
5h executing the device self-test operation and the operation did not complete Operation completed with a segment that failed and the segment that
6h failed is not known Operation completed with one or more failed segments and the first
7h segment that failed is indicated in the Segment Number field
8h Operation was aborted for unknown reason
9h Operation was aborted due to a sanitize operation Ah to Eh Reserved Fh Entry not used (does not contain a test result)

"Self Test Code" field:

0h Reserved
1h Short device self-test operation
2h Extended device self-test operation
3h to Dh Reserved
Eh Vendor specific
Fh Reserved

"Segment number" field:

Segment Number: This field indicates the segment number (refer to section 8.11) where the first self-test failure occurred. If Device Self-test Status field bits [3:0] are not set to 7h, then this field should be ignored.


"Valid Diagnostic information" field:

Bits 7:4 are reserved.
Bit 3 (SC Valid): If set to ‘1’, then the contents of Status Code field is valid. If cleared to ‘0’, then the contents of Status Code field is invalid.
Bit 2 (SCT Valid): If set to ‘1’, then the contents of Status Code Type field is valid. If cleared to ‘0’, then the contents of Status Code Type field is invalid.
Bit 1 (FLBA Valid): If set to ‘1’, then the contents of Failing LBA field is valid. If cleared to ‘0’, then the contents of Failing LBA field is invalid.
Bit 0 (NSID Valid): If set to ‘1’, then the contents of Namespace Identifier field is valid. If cleared to ‘0’, then the contents of Namespace Identifier field is invalid.
Top
Post Reply

6 posts • Page 1 of 1

Return to “Other Things Gentoo”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic