Ubuntu on EC2 boots into grub rescue. No way to access it. Here is how I fixed it.
I recently upgraded one of my machines from Ubuntu Bionic to 20.04 Focal.
And of course, while the upgrade process went mostly smooth, something went wrong. It kept asking me if I want to keep my current grub conf file and I said yes (many times).
Which was wrong.
The machine eventually rebooted and never came back. No SSH and no other method offered by AWS allowed me to connect to it, not even the serial console.
But I was able to see a screenshot, it showed the machine booted into the Grub rescue system. There is no way to access it without doing some prior modification to the grub conf file.
If you want to setup your grub to be accessible via serial console, you can follow these steps: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/grub.html
But how can we access the file when it doesn’t want to boot?
The idea: a rescue system
I created a quick rescue system using Ubuntu on a T3.micro instance and mounted my broken system volume to /mnt/rescue. The idea was to look around and see what can be fixed.
And with chroot, it is possible to even re-install grub.
Before starting, I had to detach the volume of the bricked instance, and attach it to the newly created rescue system. Then, as root on the rescue system do this:
# see all available block devices: lsblk # For me, it showed nvme1n1 with a partition nvme1n1p1. Make sure you choose the correct partition! # we will need these 2 variables: rescuedev=/dev/nvme1n1p1 rescuemnt=/mnt/rescue/ mkdir /mnt/rescue/ mount $rescuedev $rescuemnt # mount all the special file systems: for i in proc sys dev run; do mount --bind /$i $rescuemnt/$i ; done # now jump inside: chroot $rescuemnt
Verify we are chroot
Jumping into chroot is unspectacular. The prompt doesn’t change, and one AWS Ubuntu machine looks similar to another AWS Ubuntu machine. So I wanted a method to verify I am inside chroot, before doing anything potentially dangerous with grub.
#!/bin/bash if [ "$(stat -c %d:%i /)" != "$(stat -c %d:%i /proc/1/root/.)" ]; then echo "We are chrooted!" else echo "Business as usual" fi
I found the script snippet here: https://unix.stackexchange.com/questions/14345/how-do-i-tell-im-running-in-a-chroot
The solution
Now we are virtually inside the broken machine. I changed my grub conf file as explained in the link above. But it turns out it wasn’t even necessary. The 1 step that did the trick was:
grub-install /dev/nvme1n1
Now, after re-attaching the volume to the bricked machine, it booted normally and I was able to ssh into it.