Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
How to easily recover boot failures of a headless server?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
ghutzl
Tux's lil' helper
Tux's lil' helper


Joined: 29 May 2005
Posts: 117
Location: Germany

PostPosted: Sun Mar 04, 2012 11:53 am    Post subject: How to easily recover boot failures of a headless server? Reply with quote

Hello!

Yesterday I had a boot failure of my gentoo media/mythtv server. It simply did not boot any more. Because this machine is in the basement of the house and is not easily physically accessible recovery was not easy. There is also no monitor and keyboard connected to it so I had to plug out all connectors and bring it up to where I had a monitor and keyboard available. After booting the machine it turned out that one partition for some reason was corrupted and fsck failed and as a result gentoo refused to boot normally. Luckily the affected partiton was just /var/tmp so no important files got lost.

This machine does not have a dvd drive connected to it. I tried to use systemrescuecd on a usb stick but the stick did not boot in this machine it just gives me the message "Boot error". So the only way to recover was PXE boot over the network which is, by the way, the way I used to install gentoo on it. A simple fsck of the partition and subsequent reboot was all I needed to revive the machine.

After the whole procedure I thought about better ways of recovering that system from a boot failure in the future and came up with several options:

1. Try to get systemrescuecd usb stick to boot in this system. If I manage to do this I would just create a stick that boots and starts sshd so I can login to the system remotely and recover it. It is totally acceptable for me to go down and plug the stick in. But this will probably not help in all cases. Even in the present case I would not even have seen what the reason for the boot failure was as I would not have seen the error messages on the screen during boot. Anyhow, does anyone have an idea what could be wrong with booting the usb stick? I have enabled booting from usb-hdd and it tries to boot from the stick but then I just see the message "Boot error". I used the instructions in the systemrescuecd wiki to create the stick and it successfully boots in other systems. The motherboard is a Gigabyte GA-E350N-USB3 miniITX. Maybe someone has experience with USB stick booting on it? I installed the latest BIOS but still get the boot error.

2. Connect a serial console. I could also go down with a laptop and connect it to a serial connector as a serial console. I am not sure if that is possible with USB as this machine does not have a good old RS232 port. I searched the web a bit and did not find any information about USB serial consoles yet.

3. Connect via VNC or whatever remote display solution to the BIOS or at least grub as long as grub is still working (I am using grub2). I found out that only professional server motherboards have the capability of remotely accessing the BIOS so this is probably not possible in my case. So what about grub2? Does anyone know if I can connect to grub remotely via the network? My idea would be this: Start the machine (I am doing that via WOL) then quickly connect a vnc client to it to interrupt the normal boot process (maybe increasing the boot wait to 10-15sec. to have enough time to connect). The ultimate solution would be to see all the boot messages on the remote vnc client, so I would have seen the reason for the boot failure in my case. Also I would be able to select another grub menu entry to boot an already prepared recovery system. Does anyone know if anything like this exists?

I hope someone has some idea that could help here.

Thank you.
_________________
Check out this RPG: http://www.daysofdawn.com/. You can preorder it on their page. Linux needs some good RPGs and this one looks very promising!
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2448
Location: Germany

PostPosted: Sun Mar 04, 2012 2:33 pm    Post subject: Re: How to easily recover boot failures of a headless server Reply with quote

ghutzl wrote:
2. Connect a serial console. I could also go down with a laptop and connect it to a serial connector as a serial console. I am not sure if that is possible with USB as this machine does not have a good old RS232 port. I searched the web a bit and did not find any information about USB serial consoles yet.


That's what I use with my Intel NAS. It didn't have a RS232 port on the case it came with, but there was a RS232 header on the board so I added the RS232 port myself (with a $2 header<->port cable slot bracket thingy). Of course this is only when network fails or the system fails to bring up a SSH (regular system) or telnet (initramfs failure). So maybe check if there's really no way to add RS232 to it.

Otherwise if it has VGA or something, buy an old uses TFT monitor and a cheap keyboard, you can probably get something 17" ish for little money, if it's a crappy monitor; as long as it still works it should be okay for debugging simple issues.

With PXE boot you could provide a rescue system (if it can try to boot PXE and then fall back to USB/HDD boot when PXE does not offer anything).

If it's reasonably modern hardware you could even boot the actual system inside the rescue system using a virtualization method like KVM or similar. This would provide you with remote console / desktop as well. It's still not the same as running natively on the hardware though
Back to top
View user's profile Send private message
ghutzl
Tux's lil' helper
Tux's lil' helper


Joined: 29 May 2005
Posts: 117
Location: Germany

PostPosted: Tue Mar 06, 2012 9:01 am    Post subject: Reply with quote

As far as I know my board does not have a RS232 port, not even as an option like you describe it.

I also had the idea of enabling PXE boot by default and only enable the PXE server when I need a rescue system. But the boot process takes longer then because it has to wait for the PXE boot timeout before it continues with normal boot. So that is certainly an option but not my preferred one.

Your idea with booting the system inside a virtual machine is great. That is certainly the way to go if I ever have to debug boot problems with this system. Thank you for that!
_________________
Check out this RPG: http://www.daysofdawn.com/. You can preorder it on their page. Linux needs some good RPGs and this one looks very promising!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum