View previous topic :: View next topic |
Author |
Message |
szatox Advocate
Joined: 27 Aug 2013 Posts: 3136
|
Posted: Sat Sep 09, 2017 2:14 pm Post subject: Recovering overflown thin LVM |
|
|
I was testing some tricks on LVM and I allowed the pool hosting thin volume and a bunch of snapshots to overflow. I did it intentionally, to see what would happen, and quite frankly I hoped for more graceful failure mode.
Currently the pool is stuck. I tried extending it (worked, but it didn't help), deleting snapshots (failed completely), deleting volume (failed too).
I hoped reboot would drop the write queue (which appeared to be the problem before), unfortunately I'm not able to activate the pool anymore.
Code: | # lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Synct
bhworker vm -wi-a----- 20.00g
bhwrk2 vm -wi-a----- 20.00g
gentoo.audio.studio vm -wi-a----- 20.00g
snap10 vm Vwi---tz-- 1.00g thinpool thin1
snap11 vm Vwi---tz-- 1.00g thinpool thin1
snap13 vm Vwi---tz-- 1.00g thinpool thin1
snap14 vm Vwi---tz-- 1.00g thinpool thin1
snap4 vm Vwi---tz-- 1.00g thinpool thin1
snap5 vm Vwi---tz-- 1.00g thinpool thin1
snap6 vm Vwi---tz-- 1.00g thinpool thin1
snap7 vm Vwi---tz-- 1.00g thinpool thin1
snap8 vm Vwi---tz-- 1.00g thinpool thin1
snap9 vm Vwi---tz-- 1.00g thinpool thin1
thin1 vm Vwi---tz-- 1.00g thinpool
thinpool vm twi---tz-- 12.00g
thinpool_meta0 vm -wi-a----- 4.00m
# lvchange vm/thinpool -ay
device-mapper: resume ioctl on (253:6) failed: No space left on device
Unable to resume vm-thinpool-tpool (253:6)
# lvremove vm/snap4
device-mapper: resume ioctl on (253:6) failed: No space left on device
Unable to resume vm-thinpool-tpool (253:6)
Failed to update pool vm/thinpool.
|
Some snapshots are duplicated, the actual total volume of data within thinpool is less than 10GB.
Volume named "thinpool_meta0" is a leftover from my attempt to repair metadata. Too bad, didn't work either.
Also, one more OOPS here:
Code: |
# lvremove vm/thinpool
Removing pool "thinpool" will remove 11 dependent volume(s). Proceed? [y/n]: y
device-mapper: resume ioctl on (253:6) failed: No space left on device
Unable to resume vm-thinpool-tpool (253:6)
Failed to update pool vm/thinpool.
|
So.... I guess removing subvolumes requires the pool to activate first, which means the only obvious way left is to nuke the whole volume group from the orbit.
Note: those 3 machines on top of the list are not a part of thinpool, they activate properly and I can boot VMs installed there without any problem, even though they are in the same volume group. At least the damage is contained by that pool.
Now, this is _not_ an emergency for me, I can lose this particular machine at cost of a little inconvenience rather than a total disaster. Still, I'd be more comfortable if I knew I can clean this up and keep going. You know, shit happens to actually important machines too, and losing everything just because you overcommitted and ran out of space is pathetic. (Yes, yes, I know, backups...)
All hints regarding less obvious ways to recover are welcome. |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3136
|
Posted: Sun Sep 10, 2017 6:40 pm Post subject: |
|
|
Not a perfect solution, but a thing that can help mitigating damage a bit.
vgcfgbackup -> open the dump with text editor -> remove thin pool and related objects -> vgcfgrestore -> reboot.
Thin pool will vanish releasing its assigned resources.
Still, if anyone has any idea what could I have done to avoid losing data, don't hesitate sharing them. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|