WARNING: There are steps, references, commands and/or instructions in this article that can be very dangerous to your filesystem and possibly cause irreversible damage to your data. You are responsible for maintaining your data – use these commands only after you fully understand the implications of what they do and you’re comfortable and competent. Any damage resulting from using any of these commands or guidance of this article is fully your responsibility – I can’t be responsible for your data. By continuing, you agree to release the author, the hosting service for this site, anybody posting comments to this site, your dog, etc. free from liability should any damage occur, whether accidental or intentional.
I use Ubuntu quite a bit and have recently started using LVM on an internal FTP/TFTP/SCP “drop box” of sorts. The reason for LVM is that I needed to dynamically “grow” the amount of disk space available for the server (it was full). Being a VM, it’s easy to carve out another virtual disk and assign it to the VM. LVM takes over from there…
This is all fine-and-dandy if your storage is stable, however we ran into an issue where storage was oversubscribed and completely full on the SAN. This particular server was running on a NetApp SAN and the volumes ended up being taken offline. This happened automatically on the SAN. After freeing up some space for the remaining volumes, I brought the volumes back up. Either this process itself or having me shutdown the VM (I hadn’t noticed that the storage was gone at the point I shut the VM down) resulted in a corrupted filesystem.
Upon trying to boot the server back up, I was presented with a BusyBox screen and several error messages indicating that the filesystem was messed up. I received a message like “Target filesystem doesn’t have /sbin/init” on the screen (with the BusyBox prompt).
Here’s what I did to fix the filesystem and restore the system to full functionality, and a lesson I learned.
To restore the system:
- Download the SystemRescueCd ISO (www.sysresccd.org)
- Create a new virtual disk (for copying data that I couldn’t live without to) and assign this to the VM
- Mount the ISO in the VM on startup and boot off of the CD
- Setup the networking (assign an IP address, default gateway, etc.)
- SSH into the server
- Check the LVM for filesystem errors
- e2fsck -n /dev/mapper/<LVM name>
- If it shows errors, you might want to continue to fix these errors (if possible)
- Mount the old LVM
- mkdir /mnt/t (t for temp, or whatever directory name you desire)
- mount /dev/mapper/<LVM name> /mnt/t
- Create a partition on and format the new virtual disk
- Plenty of resources on this – Google for the filesystem you’re wanting to use (ext2, ext3, ext4, etc.)
- Mount the virtual disk
- mkdir /mnt/n (n for new – again, whatever you want)
- mount /dev/sdc1 /mnt/n
- Copy data that I couldn’t live without from the old partition to the new one (so from /mnt/t to /mnt/n)
- Unmount and fix errors on the old LVM partition
- umount /dev/mapper/<LVM name>
- e2fsck -v /dev/mapper/<LVM name>
- I know there are ways to have e2fsck automatically fix errors, but I wanted to see and approve each error, so I went the somewhat slow path
- Shutdown the VM, remove the SystemRescueCD ISO and the new virtual disk
- Try booting the VM
After following these steps (best as I can remember), the system was back working again. In this system, I had a single LVM that was pretty much an “everything” partition – root, FTP/TFTP/SCP storage, etc. This leads up to my lessons learned:
- On file servers, particularly VMs, it’s so easy to carve out additional virtual disks, keep your root filesystem (boot loader, kernel, etc.) on one virtual disk that’s only used for base OS functionality. Create another virtual disk(s) for file storage, using LVMs if necessary. This allows you to still easily access the data by simply assigning the virtual disk to another “clean” Ubuntu install, bypassing the need to boot off of the SystemRescueCD ISO. In my instance, everything was combined. Since it was on a SAN, I had assumed very little risk of data corruption (messed up filesystem), however oversubscription can cause problems.
- Use oversubscription on your storage sparingly and in a planned fashion. This bit me, and I’ve heard other IT professionals say “Oh, don’t worry about it – it won’t use it”. While it might be unforeseeable that a system consume all of the allocated space, it is possible and should be guarded against. What happens if it does consume all of the space (logs, updates, etc. can all contribute to filesystem growth). Ensure that your core, mission-critical systems are NOT using oversubscribed storage.
- When working with LVMs, don’t point to the physical disks (/dev/sdb) and partitions (/dev/sdb1) for troubleshooting – it’ll get you nowhere. Using the pvdisplay, lvdisplay, etc. commands (for examining your LVMs), you’ll be able to see the LVM name, as well as partitions that comprise the LVMs. Focus on the LVM, not the partition (at least in my case). This isn’t to say that sometimes there are physical issues occurring (SAN stats or SMART errors on a local disk should help here).
So, your mileage may vary, but this is what I did to fix the LVM filesystem (and restore functionality of my system).
What do you consider best-practices for Linux VM creation as well as general filesystem tasks? Do you have a different tip or trick than I’ve mentioned above?
Until next time…