Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Issues checking a disk [Solved]
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
iandoug
l33t
l33t


Joined: 11 Feb 2005
Posts: 832
Location: Cape Town, South Africa

PostPosted: Sun Oct 10, 2021 4:10 pm    Post subject: Issues checking a disk [Solved] Reply with quote

Hi

We're are suffering loadshedding again which means frequent shut downs and reboots.

Yesterday at boot, it found and corrected some errors on sde1

Today it got stuck on sde1 again, then said UNEXPECTED INCONSISTENCIES, run frck manually.

Then it just sat there and looked at me.... eventually I pressed the reset button, on this reboot i got stuck there again and eventually printed message similar to below, then finished boot after a while.

After booting, I ran fsck.

Code:

trooper /home/ian # fsck -y /dev/sde1
fsck from util-linux 2.37.2
e2fsck 1.46.4 (18-Aug-2021)
fsck.ext3: Attempt to read block from filesystem resulted in short read while trying to open /dev/sde1
Could this be a zero-length partition?


Is there some software that can fix this? (as in, find the back-up tables and restore)
Will shut down now and check that it's not a loose cable or something ...

Thanks, Ian
_________________
Asus X570-PRO, Ryzen 7 5800X, GeForce GTX 1650, 32 GB RAM | Asus Sabertooth P990, AMD FX-8150, GeForce GTX 560, 16GB Ram


Last edited by iandoug on Mon Nov 08, 2021 6:36 pm; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54237
Location: 56N 3W

PostPosted: Sun Oct 10, 2021 4:37 pm    Post subject: Reply with quote

iandoug,

Is the partition table still intact?

What does
Code:
fdisk -l /dev/sde
tell.

What is sde1 used for?
What filesystem type does it contain?

fsck often makes a dad situation worse. If you don't have backups make a copy of all of sde.
That will be your 'undo' if fsck just digs a deeper hole for you.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
iandoug
l33t
l33t


Joined: 11 Feb 2005
Posts: 832
Location: Cape Town, South Africa

PostPosted: Sun Oct 10, 2021 5:13 pm    Post subject: Reply with quote

Hi Neddy

fdisk finds nothing.

Forgot to mention that during boot it says that that drive was not registered with DBUS, "despite waiting 1000..000 ms".

Tried to find exact message in dmesg, it's not there, but all this is:
https://pastebin.com/hxi9bqU5

It's a 500GB Data disk (WD, FWIW), EXT4 I think. Don't think there was anything crucial on there. Have no backups, my tape drive is broken... :-(

I see one of the fans is also not running. But drive is sharing power with another drive which is ok.

Let me see if another SATA channel works ...
_________________
Asus X570-PRO, Ryzen 7 5800X, GeForce GTX 1650, 32 GB RAM | Asus Sabertooth P990, AMD FX-8150, GeForce GTX 560, 16GB Ram
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Sun Oct 10, 2021 5:26 pm    Post subject: Reply with quote

iandoug, dmesg shows many hardware/bus errors.

Fix the hardware before running fsck - otherwise it is possible that fsck deletes your data...
Back to top
View user's profile Send private message
iandoug
l33t
l33t


Joined: 11 Feb 2005
Posts: 832
Location: Cape Town, South Africa

PostPosted: Sun Oct 10, 2021 5:48 pm    Post subject: Reply with quote

mike155 wrote:
iandoug, dmesg shows many hardware/bus errors.

Fix the hardware before running fsck - otherwise it is possible that fsck deletes your data...


Not sure HOW to do that :-)

Possibly the auto-checks at boot may already have done the deleting for me ...

I reseated both cables to drive but it made no difference... maybe needs a different SATA channel.

Will revert, just doing some daily work first then can fiddle more.

Thanks, Ian
_________________
Asus X570-PRO, Ryzen 7 5800X, GeForce GTX 1650, 32 GB RAM | Asus Sabertooth P990, AMD FX-8150, GeForce GTX 560, 16GB Ram
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54237
Location: 56N 3W

PostPosted: Sun Oct 10, 2021 8:08 pm    Post subject: Reply with quote

iandoug,

Code:
[   52.992983] ata6.00: cmd 60/00:b0:3f:00:00/01:00:00:00:00/40 tag 22 ncq dma 131072 in
                        res 40/00:b4:3f:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error) 


Can be the SATA cable, the motherboard SATA port, or the interface on the HDD.

Change the SATA data cable as an easy first step.
If you don't have a SATA coble in you box of bits, 'wipe' the connectors both ends by unplugging and replugging them two or three times.
If that works, it won't work for very long.

Code:
[   77.036803] ata6.00: failed to IDENTIFY (I/O error, err_mask=0x5)
means that the kernel knows nothing about the drive, so no other useful communication wiith the drive is possible.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
figueroa
Advocate
Advocate


Joined: 14 Aug 2005
Posts: 2963
Location: Edge of marsh USA

PostPosted: Tue Oct 12, 2021 3:40 am    Post subject: Reply with quote

Also, don't ignore the power connection. Easy quick check is swap connections with other SATA device. A recent disk issue on our school's business office desktop turned up a cracked power connector.

What is loadshedding?
_________________
Andy Figueroa
hp pavilion hpe h8-1260t/2AB5; spinning rust x3
i7-2600 @ 3.40GHz; 16 gb; Radeon HD 7570
amd64/23.0/split-usr/desktop (stable), OpenRC, -systemd -pulseaudio -uefi
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21633

PostPosted: Tue Oct 12, 2021 3:37 pm    Post subject: Reply with quote

figueroa wrote:
What is loadshedding?
The power company has insufficient generation capacity to cover demand, so they deliberately blackout some of their customers to get demand down to a level they can cover. When done fairly, the blackouts rotate so that no one customer stays down for the duration of the imbalance. However, that also means that powering up electronics without a UPS is a risky affair, since the blackout may rotate back into your area and bring you down again.
Back to top
View user's profile Send private message
figueroa
Advocate
Advocate


Joined: 14 Aug 2005
Posts: 2963
Location: Edge of marsh USA

PostPosted: Tue Oct 12, 2021 3:48 pm    Post subject: Reply with quote

Thanks, Hu. When this was first posted, I tried an Internet search for the term and got nothing meaningful. Today, it's all about rolling blackouts, so I may have mistyped it. I have UPSs on every electronic device in the house out of abundance of caution, yet we enjoy just about the most stable power I could ever imagine, and yet it's a rural electric cooperative that buys power primarily from Georgia Power.

ADDED: At the school where I support the network and desktop PCs (500 miles remote from me), we also use UPSs on mission critical computers and network devices. We've been luck and have not suffered any physical damage that can be tracked to power outages. We do, however, have a lot of seriously old (over 8 years) hard drives that periodically swan dive out of service without warning. Fortunately, backups are automatic and redundant.
_________________
Andy Figueroa
hp pavilion hpe h8-1260t/2AB5; spinning rust x3
i7-2600 @ 3.40GHz; 16 gb; Radeon HD 7570
amd64/23.0/split-usr/desktop (stable), OpenRC, -systemd -pulseaudio -uefi


Last edited by figueroa on Tue Oct 12, 2021 6:09 pm; edited 1 time in total
Back to top
View user's profile Send private message
iandoug
l33t
l33t


Joined: 11 Feb 2005
Posts: 832
Location: Cape Town, South Africa

PostPosted: Tue Oct 12, 2021 5:27 pm    Post subject: Reply with quote

Load-shedding is the South African term.

Overseas they use rolling blackouts or brownouts.

We are currently enduring load-shedding since last Thursday, due to end this Thursday.

We're at "Level 2" but how that is implemented depends on where you are and who you get your power from.
I'm in Cape Town and we have a pumped storage hydro-electric scheme, so municipality pumps water to top dam at night, and then generates during the day, meaning that Cape Town reverts to Level 1 during day while rest of country is at level 2.

I have UPS in office and another for the NAS servers, Fibre box, etc, but they can't stay up for the 2 - 2.5 hours that we shed at a time.

So I need to keep shutting down the computers (except firewall) and restarting, which is a pain when they go off from 2AM to 4AM etc ... I have cron jobs that run in the night/early morning.

I think two of my older PCs suffered damage because neither will boot ... they were in another room sans UPS... it's the power surge when it comes on I think ... as the power network itself stabilises. Been reports of people losing fridges etc as well.

Cheers, Ian
_________________
Asus X570-PRO, Ryzen 7 5800X, GeForce GTX 1650, 32 GB RAM | Asus Sabertooth P990, AMD FX-8150, GeForce GTX 560, 16GB Ram
Back to top
View user's profile Send private message
iandoug
l33t
l33t


Joined: 11 Feb 2005
Posts: 832
Location: Cape Town, South Africa

PostPosted: Wed Oct 13, 2021 5:30 pm    Post subject: Reply with quote

Hi guys

Okay, I transplanted the problematic drive from the Trooper box to the Fractal box. The Trooper box dates from 2013 and has basically been on 24/7 since then.

I no longer get those hardware errors when booting Trooper. Possibly the drives are now running at 6 instead of 1.5. The PC was "sluggish" before.

Installing in Fractal meant rearranging the internal structure, which was not without issues....

Anyway, the BIOS/Kernel/Whatever is now playing silly buggers with me... plugging in the drive breaks the boot process... either because it can't find sda3, or can't find the madm drives.

After much creative rearranging of cables to sockets, I have given up and decided to switch to UUID syntax in fstab. Why this is necessary is beyond me, drives should be allocated in order of hardware port number, not randomly.

Now I need some advice in fixing fstab so that it still boots and does not leave me looking for a rescue thumbdrive. :-)

I have:

Code:

# blkid
/dev/md127: UUID="04d91321-b09c-4a74-ac73-a0c5f108ac96" BLOCK_SIZE="4096" TYPE="ext4"
/dev/sdb1: UUID="0270db76-3666-5b96-d9cb-e1224a65bf2c" UUID_SUB="77b7878f-5f94-8b6a-2c5c-408fe52be6f1" LABEL="fractal:1" TYPE="linux_raid_member" PARTUUID="c9e49a2e-60e2-fa4c-bb85-0b302264b40c"
/dev/sdc1: UUID="0270db76-3666-5b96-d9cb-e1224a65bf2c" UUID_SUB="1bdcbed0-b379-d684-4d50-7ed0cd67ee04" LABEL="fractal:1" TYPE="linux_raid_member" PARTUUID="2e50ed27-0faa-234b-b2ca-674cd0328bf1"
/dev/sda2: UUID="c10d95a5-db1c-473f-87b9-c2d9c108c286" TYPE="swap" PARTUUID="e549871a-9a5d-8c4e-8459-a8dc4702dde9"
/dev/sda3: UUID="14e0225a-638b-49fd-ae9d-d2f3a807fcec" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="6ecb5d48-00d6-ef4e-9ddc-6efbd75e448f"
/dev/sda1: UUID="8569-D7A3" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="b4f42392-c85d-b64c-8919-7166ddf4db17"


And fstab is
Code:

/dev/sda1              /boot            vfat            defaults,noatime  0 2
/dev/sda2              none             swap            sw                0 0
/dev/sda3              /                ext4            noatime           0 1
/dev/md127             /home            ext4            noatime           0 3



So would this be correct?

Code:


# /dev/sda1
UUID=8569-D7A3   /boot            vfat            defaults,noatime  0 2

# /dev/sda2
UUID=c10d95a5-db1c-473f-87b9-c2d9c108c286       none             swap            sw                0 0

# /dev/sda3
UUID=14e0225a-638b-49fd-ae9d-d2f3a807fcec         /                ext4            noatime           0 1

# /dev/md127
UUID=04d91321-b09c-4a74-ac73-a0c5f108ac96             /home            ext4            noatime           0 3


And I don't need to mention sdb or sdc ?

I still don't follow how this syntax knows which is sda, sdb, sdc, sdd, etc ...

Thanks, Ian
_________________
Asus X570-PRO, Ryzen 7 5800X, GeForce GTX 1650, 32 GB RAM | Asus Sabertooth P990, AMD FX-8150, GeForce GTX 560, 16GB Ram
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54237
Location: 56N 3W

PostPosted: Wed Oct 13, 2021 7:04 pm    Post subject: Reply with quote

iandoug,

UUID is a property of a filesystem. If you make a new filesystem over an old one, the UUID is changed.
PARTUUID is a property of a partition. It will not change regardless of what happens to any filesystem that partition may hold, even if the filesystem is changed.

When you give UUID (or PARTUUID) in fstab, mount looks at all known locations until it finds the UUID you specified and returns the kernel major:minor device numbers, and says to the kernel, its that one.

In your situation, the major:minor device numbers keep changing.
The relationship between sd* and the device numbers is fixed.
Code:
ls /dev/sd* -l
brw-rw---- 1 root disk 8,   0 Sep 30 07:53 /dev/sda
brw-rw---- 1 root disk 8,   1 Sep 12 12:30 /dev/sda1
brw-rw---- 1 root disk 8,   2 Sep 12 12:30 /dev/sda2
brw-rw---- 1 root disk 8,   3 Sep 30 08:24 /dev/sda3
brw-rw---- 1 root disk 8,  16 Sep 12 12:30 /dev/sdb
brw-rw---- 1 root disk 8,  17 Sep 12 12:30 /dev/sdb1
brw-rw---- 1 root disk 8,  18 Sep 12 12:30 /dev/sdb2
brw-rw---- 1 root disk 8,  19 Sep 12 12:30 /dev/sdb3

but the kernel allocates the drives as they are detected. That's not constant. It varies with spin up times, which are temperature and age dependent.

Using UUID discovers the needed information at mount time. It continues to work if you move a drive to a USB enclosure, or add more drives between existing drives.
All in all, UUID is robust.

A warning though. The kernel understands device numbers, the /dev/ names and PARTUUID.
Using UUID to describe root on the kernel command line requires an initrd as the kernel does not understand UUID.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
iandoug
l33t
l33t


Joined: 11 Feb 2005
Posts: 832
Location: Cape Town, South Africa

PostPosted: Wed Oct 13, 2021 8:16 pm    Post subject: Reply with quote

Hi Neddy

System is detecting old drive first, in preference to new ones :-)

So is this better then?

Code:

# /dev/sda1
PARTUUID=b4f42392-c85d-b64c-8919-7166ddf4db1   /boot            vfat            defaults,noatime  0 2

# /dev/sda2
PARTUUID=e549871a-9a5d-8c4e-8459-a8dc4702dde9       none             swap            sw                0 0

# /dev/sda3
PARTUUID=6ecb5d48-00d6-ef4e-9ddc-6efbd75e448f         /                ext4            noatime           0 1

# /dev/md127
UUID=04d91321-b09c-4a74-ac73-a0c5f108ac96             /home            ext4            noatime           0 3


Thanks, Ian
_________________
Asus X570-PRO, Ryzen 7 5800X, GeForce GTX 1650, 32 GB RAM | Asus Sabertooth P990, AMD FX-8150, GeForce GTX 560, 16GB Ram
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21633

PostPosted: Wed Oct 13, 2021 8:29 pm    Post subject: Reply with quote

For filesystems other than root, you can test this live by unmounting the relevant filesystem, then using mount /home (for example). mount is then forced to use fstab to discover that you want a UUID based mount, and in turn forced to do the UUID-driven search to find /home.

The PARTUUID for boot looks a character short. Other than that, the identifiers seem to match with your earlier lsblk output, and you are correctly matching uuid to uuid and partuuid to partuuid. (Giving one where the other is needed would fail, but you got this part correct.)
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54237
Location: 56N 3W

PostPosted: Wed Oct 13, 2021 8:39 pm    Post subject: Reply with quote

iandoug,

In fstab, it makes no difference. To read fstab, root has to be mounted.

-- edit --

You may not umount /home if a normal user is logged in as it will be in use.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
iandoug
l33t
l33t


Joined: 11 Feb 2005
Posts: 832
Location: Cape Town, South Africa

PostPosted: Wed Oct 13, 2021 8:48 pm    Post subject: Reply with quote

Hi Guys

Thanks.

Do I need to fiddle with Grub at all? Or will this "just work" ?

Thanks, Ian
_________________
Asus X570-PRO, Ryzen 7 5800X, GeForce GTX 1650, 32 GB RAM | Asus Sabertooth P990, AMD FX-8150, GeForce GTX 560, 16GB Ram
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54237
Location: 56N 3W

PostPosted: Wed Oct 13, 2021 8:53 pm    Post subject: Reply with quote

iandoug,

What is on your kernel command line in grub.cfg?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
iandoug
l33t
l33t


Joined: 11 Feb 2005
Posts: 832
Location: Cape Town, South Africa

PostPosted: Wed Oct 13, 2021 9:03 pm    Post subject: Reply with quote

NeddySeagoon wrote:
iandoug,

What is on your kernel command line in grub.cfg?


# Boot with network interface renaming disabled
GRUB_CMDLINE_LINUX="net.ifnames=0"

?

This be Grub 2 which is new to me, I think what you want to see is here:

Code:

### BEGIN /etc/grub.d/10_linux ###
menuentry 'Gentoo GNU/Linux' --class gentoo --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-14e0225a-638b-49fd-ae9d-d2f3a807fcec' {
        load_video
        insmod gzio
        insmod part_gpt
        insmod fat
        set root='hd0,gpt1'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt1 --hint-efi=hd0,gpt1 --hint-baremetal=ahci0,gpt1  8569-D7A3
        else
          search --no-floppy --fs-uuid --set=root 8569-D7A3
        fi
        echo    'Loading Linux 5.10.27-gentoo ...'
        linux   /vmlinuz-5.10.27-gentoo root=/dev/sda3 ro net.ifnames=0
}
submenu 'Advanced options for Gentoo GNU/Linux' $menuentry_id_option 'gnulinux-advanced-14e0225a-638b-49fd-ae9d-d2f3a807fcec' {
        menuentry 'Gentoo GNU/Linux, with Linux 5.10.27-gentoo' --class gentoo --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-5.10.27-gentoo-advanced-14e0225a-638b-49fd-ae9d-d2f3a807fcec' {
                load_video
                insmod gzio
                insmod part_gpt
                insmod fat
                set root='hd0,gpt1'
                if [ x$feature_platform_search_hint = xy ]; then
                  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt1 --hint-efi=hd0,gpt1 --hint-baremetal=ahci0,gpt1  8569-D7A3
                else
                  search --no-floppy --fs-uuid --set=root 8569-D7A3
                fi
                echo    'Loading Linux 5.10.27-gentoo ...'
                linux   /vmlinuz-5.10.27-gentoo root=/dev/sda3 ro net.ifnames=0
        }
        menuentry 'Gentoo GNU/Linux, with Linux 5.10.27-gentoo (recovery mode)' --class gentoo --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-5.10.27-gentoo-recovery-14e0225a-638b-49fd-ae9d-d2f3a807fcec' {
                load_video
                insmod gzio
                insmod part_gpt
                insmod fat
                set root='hd0,gpt1'
                if [ x$feature_platform_search_hint = xy ]; then
                  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt1 --hint-efi=hd0,gpt1 --hint-baremetal=ahci0,gpt1  8569-D7A3
                else
                  search --no-floppy --fs-uuid --set=root 8569-D7A3
                fi
                echo    'Loading Linux 5.10.27-gentoo ...'
                linux   /vmlinuz-5.10.27-gentoo root=/dev/sda3 ro single net.ifnames=0
        }
        menuentry 'Gentoo GNU/Linux, with Linux 5.10.27-gentoo.old' --class gentoo --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-5.10.27-gentoo.old-advanced-14e0225a-638b-49fd-ae9d-d2f3a807fcec' {
                load_video
                if [ "x$grub_platform" = xefi ]; then
                        set gfxpayload=keep
                fi
                insmod gzio
                insmod part_gpt
                insmod fat
                set root='hd0,gpt1'
                if [ x$feature_platform_search_hint = xy ]; then
                  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt1 --hint-efi=hd0,gpt1 --hint-baremetal=ahci0,gpt1  8569-D7A3
                else
                  search --no-floppy --fs-uuid --set=root 8569-D7A3
                fi
                echo    'Loading Linux 5.10.27-gentoo.old ...'
                linux   /vmlinuz-5.10.27-gentoo.old root=/dev/sda3 ro net.ifnames=0
        }
        menuentry 'Gentoo GNU/Linux, with Linux 5.10.27-gentoo.old (recovery mode)' --class gentoo --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-5.10.27-gentoo.old-recovery-14e0225a-638b-49fd-ae9d-d2f3a807fcec' {
                load_video
                if [ "x$grub_platform" = xefi ]; then
                        set gfxpayload=keep
                fi
                insmod gzio
                insmod part_gpt
                insmod fat
                set root='hd0,gpt1'
                if [ x$feature_platform_search_hint = xy ]; then
                  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt1 --hint-efi=hd0,gpt1 --hint-baremetal=ahci0,gpt1  8569-D7A3
                else
                  search --no-floppy --fs-uuid --set=root 8569-D7A3
                fi
                echo    'Loading Linux 5.10.27-gentoo.old ...'
                linux   /vmlinuz-5.10.27-gentoo.old root=/dev/sda3 ro single net.ifnames=0
        }
}


Thanks, Ian
_________________
Asus X570-PRO, Ryzen 7 5800X, GeForce GTX 1650, 32 GB RAM | Asus Sabertooth P990, AMD FX-8150, GeForce GTX 560, 16GB Ram
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54237
Location: 56N 3W

PostPosted: Thu Oct 14, 2021 7:31 am    Post subject: Reply with quote

iandoug,

Code:
vmlinuz-5.10.27-gentoo root=/dev/sda3
and the lack of an initrd was what we wanted to see.

You should change root=/dev/sda3 to root=PARTUUID=...
as UUID will not work without an initrd.

Then you can randomise you HDD connections almost however you want and it will sill work.
I say 'almost' as there are some extra conditions for root on USB, which you don't need to know about for this.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
iandoug
l33t
l33t


Joined: 11 Feb 2005
Posts: 832
Location: Cape Town, South Africa

PostPosted: Thu Oct 14, 2021 7:47 am    Post subject: Reply with quote

Hi Neddy

Thanks. I have been using old grub on all my boxes, this is first time with grub2 and so am a bit lost as to where things get specified.

Do I change this
Code:

GRUB_CMDLINE_LINUX="net.ifnames=0"


in /etc/default/grub

to

Code:

GRUB_CMDLINE_LINUX="root=PARTUUID=6ecb5d48-00d6-ef4e-9ddc-6efbd75e448f net.ifnames=0"


I see no mention of which partition to boot in /etc/default/grub. (As a noob, grub2 seems convoluted and confusing... things hidden away)

Thanks, Ian
_________________
Asus X570-PRO, Ryzen 7 5800X, GeForce GTX 1650, 32 GB RAM | Asus Sabertooth P990, AMD FX-8150, GeForce GTX 560, 16GB Ram
Back to top
View user's profile Send private message
iandoug
l33t
l33t


Joined: 11 Feb 2005
Posts: 832
Location: Cape Town, South Africa

PostPosted: Thu Oct 14, 2021 10:35 am    Post subject: Reply with quote

Sigh.

Remember, ‘GRUB_DISABLE_LINUX_PARTUUID’ and ‘GRUB_DISABLE_LINUX_UUID’ are also considered to be set to ‘false’ when they are unset.
https://www.gnu.org/software/grub/manual/grub/html_node/Root-Identifcation-Heuristics.html

vs this, which says default is true:

GRUB_DISABLE_LINUX_PARTUUID true Since version 2.04. If false, and if there is either no initramfs or GRUB_DISABLE_LINUX_UUID is set to true, ${GRUB_DEVICE_PARTUUID} is passed in the root parameter on the kernel command line.
https://wiki.gentoo.org/wiki/GRUB2/Configuration_variables

Or am I reading it wrong?

Thanks, Ian
_________________
Asus X570-PRO, Ryzen 7 5800X, GeForce GTX 1650, 32 GB RAM | Asus Sabertooth P990, AMD FX-8150, GeForce GTX 560, 16GB Ram
Back to top
View user's profile Send private message
iandoug
l33t
l33t


Joined: 11 Feb 2005
Posts: 832
Location: Cape Town, South Africa

PostPosted: Tue Oct 26, 2021 7:33 am    Post subject: Reply with quote

mike155 wrote:
iandoug, dmesg shows many hardware/bus errors.

Fix the hardware before running fsck - otherwise it is possible that fsck deletes your data...


Okay, long story short... I modified grub config to have GRUB_DISABLE_LINUX_PARTUUID=false ,
, regenerated and checked and installed the config, pleaded with the deities and rebooted.

So now the boot disk and raid detection are okay.

I added another blank disk as "sanity control", that gets picked up ok.

However when I add the problem disk, the boot process hangs and the raid is not set up properly.

I then plugged it into an older PC, (Manjero), with simpler setup and known good cables. The OS detects the drive, and fdisk is happy to work with it. Did not see any hardware errors in dmesg.

So I'm running fsck on it, it reports errors on disk, and started scanning.

Now says "Error reading block 99828 (input/output error) while getting next inode from scan. Ignore error<y>?"

I suppose I shall have to answer Yes, unless there is a better option.

I guess this means "bad sectors on disk" ?

Thanks, Ian
_________________
Asus X570-PRO, Ryzen 7 5800X, GeForce GTX 1650, 32 GB RAM | Asus Sabertooth P990, AMD FX-8150, GeForce GTX 560, 16GB Ram
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54237
Location: 56N 3W

PostPosted: Tue Oct 26, 2021 7:56 am    Post subject: Reply with quote

iandoug,

If the drive worked elsewhere, its not the drive or data cable in use there.
That points to the SATA port on the motherboard (in the original system) being defective, or with a low probability, the power.

Answer N to fsck. It can be a filesystem destruction tool. Its OK to look but not let it make changes.

Run the long test with smartclf for a low level surface scan, without data being exported over the interface.
dd the entire drive to /dev/null to do the same thing but using the SATA interface.
The filesystem is the next layer up. Do not test that unless one or both of the above tests pass.

Was the original drive a member of the RAID set.
It will have an old event count and mdadm won't be happy. It won't be happy with two members of the RAID set in the same slot either.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
iandoug
l33t
l33t


Joined: 11 Feb 2005
Posts: 832
Location: Cape Town, South Africa

PostPosted: Tue Oct 26, 2021 9:18 am    Post subject: Reply with quote

NeddySeagoon wrote:

If the drive worked elsewhere, its not the drive or data cable in use there.
That points to the SATA port on the motherboard (in the original system) being defective, or with a low probability, the power.


Motherboard (this box) may be getting faulty, but think it may be drive too... the boot before All Hell Broke Loose also found and fixed errors with the drive.

NeddySeagoon wrote:
iandoug,
Answer N to fsck. It can be a filesystem destruction tool. Its OK to look but not let it make changes.


That aborted fsck.

NeddySeagoon wrote:

Run the long test with smartclf for a low level surface scan, without data being exported over the interface.
dd the entire drive to /dev/null to do the same thing but using the SATA interface.
The filesystem is the next layer up. Do not test that unless one or both of the above tests pass.


I assume you mean smartctl which I have not used before in this way. Was not even installed on the Manjaro box.
Short test passed, running long test, will take almost 3 hours. A "verbose" option would be nice.

I understand your next step as "copy all contents to nowhere" ... will try that later. Power is going off again shortly after the long test finishes.

NeddySeagoon wrote:

Was the original drive a member of the RAID set.
It will have an old event count and mdadm won't be happy. It won't be happy with two members of the RAID set in the same slot either.



No, is stand-alone drive, but its issues somehow confused Raid detection on my new box.

Thanks, Ian
_________________
Asus X570-PRO, Ryzen 7 5800X, GeForce GTX 1650, 32 GB RAM | Asus Sabertooth P990, AMD FX-8150, GeForce GTX 560, 16GB Ram
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54237
Location: 56N 3W

PostPosted: Tue Oct 26, 2021 12:33 pm    Post subject: Reply with quote

iandoug,

fsck is a last resort, after you have validated backups or a ddrescue image of the drive.
While the interface with the drive is under a clould, you know nothing about the the content of the drive with any certainty.
That being the case, its not safe to let fsck make any changes.

The long test does a "copy all contents to nowhere" but the data never leaves the drive.
If that fails, the drive is probably scrap because it can't read its own writing but see later.

If the long test passes the internals of the drive are OK and the "copy all contents to nowhere" copies the entire contents of the drive over the interface.
That checks the interface separately from the rest of the drive.

I said above, that the drive is probably scrap if it can't read its own writing. That means the failing sector remapping hasn't work.
The idea is that drives move data from a failing sector to a spare sector when they detect read problems.
This remapping can be forced with a write to the 'failed' sector.
Either the write will succeed to the original sector, or the write will fail to the original sector and the drive will write it to a space sector.
It appears to succeed in either case.
You can tell from the smart data. Has the Pending Sector Count or the Reallocated Sector Count changed?

Long stot to explain the odd behaviour. HAve you ever dd imaged the drive or any of its partitions.
That duplicates UUIDs and having duplicate UUIDs in the same system is a verybadthing.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum