How to fix a broken machine

   1 [14:01] <cjohnston> up next is popey...
   2 [14:01] <cjohnston> popey needs a couple minutes, so we will start shortly
   3 === starcraft is now known as starcraftman
   4 [14:06] <popey> Hello
   5 [14:07] <popey> == Introduction ==
   6 [14:07] <popey> Hello, I am Alan Pope. I'm a member of my LoCo Team in the UK where we make a little podcast <http://podcast.ubuntu-uk.org/>. I'm also on the Ubuntu Community Council, LoCo Council and EMEA Membership Board.
   7 [14:07] <popey> I try to give people support in #ubuntu and #ubuntu-uk and on http://askubuntu.com/ when I get time.
   8 [14:08] <popey> This session is called "How to fix a broken machine".
   9 [14:08] <popey> The target audience for this session is 'People who may at some point experience a broken Ubuntu machine' - which lets face it is potentially everyone :)
  10 [14:08] <popey> The goal of the session is to empower people to fix their own computer when something goes wrong.
  11 [14:08] <popey> What do I mean by "goes wrong" and "broken"?
  12 [14:09] <popey> The types of issues I am thinking of are the 'potential show-stoppers' which could be described using phrases like:-
  13 [14:09] <popey> "When I turn my computer on I get..."
  14 [14:09] <popey> "... no desktop or logon screen!"
  15 [14:09] <popey> "... a black screen!"
  16 [14:09] <popey> "... some crazy text I don't understand!"
  17 [14:09] <popey> Has this ever happened to you?
  18 [14:09] <popey> (It has to me, on more occasions than I can remember!)
  19 [14:10] <popey> == What I will cover ==
  20 [14:10] <popey> * An overview of the boot process
  21 [14:10] <popey> * What goes wrong
  22 [14:10] <popey> * Toolbox contents
  23 [14:10] <popey> * Diagnosing issues
  24 [14:10] <popey> * Solving some problems that occur
  25 [14:10] <popey> I'm happy to take questions any time :D
  26 [14:10] <popey> == What I can't cover ==
  27 [14:10] <popey> * Installations of Ubuntu using WUBI (Windows on Ubuntu Installer)
  28 [14:10] <popey>  - Whilst many bits are similar between a 'bare metal' install of Ubuntu and one done inside Windows, I've not had enough experience of it to speak about it confidently
  29 [14:11] <popey> == WARNING! ==
  30 [14:11] <popey> Despite this being a 'user day' I will be using the command line!
  31 [14:11] <popey> The reason for this is that when there are show-stopping bugs which prevent you getting a GUI up and running, the command line is the best way to fix the issue.
  32 [14:11] <popey> Note: I do not consider 'reinstall Ubuntu' as a fix, it's a work-around.
  33 [14:11] <popey> Ok, so lets start..
  34 [14:11] <popey> = An overview of the boot process =
  35 [14:11] <popey> (simplified)
  36 [14:11] <popey> (massively)
  37 [14:12] <popey> So what actually happens on a standard installation of Ubuntu on a pretty normal Intel/AMD based PC?
  38 [14:12] <popey> (the reason I say Intel/AMD based PC is because things are a little different on ARM, Phones, Macs etc)
  39 [14:12] <popey> Right after you power on, a bunch of checks are done, hardware is looked for and eventually the BIOS (chip on the motherboard) does some hardware checks called POST (Power On Self Test)
  40 [14:12] <popey> If those pass (and your BIOS is configured correctly) then the BIOS will then go looking for a device to boot from.
  41 [14:13] <popey> <http://www.howstuffworks.com/pc3.htm> describes that process in a little more detail.
  42 [14:13] <popey>  
  43 [14:13] <popey> Under normal circumstances you can boot from:-
  44 [14:13] <popey> - An internal hard disk (be that of a spinning variety or a new fangled SSD (Solid State Drive)
  45 [14:13] <popey> - An external device attached via USB
  46 [14:13] <popey> - Some kind of optical device (like a CD or DVD)
  47 [14:13] <popey> - A network server - often called PXE (Pre-boot eXecution Environment)
  48 [14:14] <popey> - Some other magic from the future that hasn't been invented yet.
  49 [14:14] <popey>  
  50 [14:14] <popey> Lets assume that it's booting from a local hard disk which happens to contain Ubuntu.
  51 [14:14] <popey>  
  52 [14:14] <popey> The BIOS will look for an MBR (Master Boot Record) which is located at the start of the disk. We call this 'Stage 1'
  53 [14:14] <popey> The MBR is pretty small and contains enough program code to find, load and execute the next part of the boot process.
  54 [14:14] <popey> The program used on Ubuntu is called GRUB (Grand Unified Bootloader), and we call this 'Stage 2'.
  55 [14:15] <popey> Optionally at this point we may get a menu displayed by GRUB. If no menu appears then it can be triggered by pressing and holding down the [SHIFT] key as GRUB loads.
  56 [14:15] <popey> (the menu is configured based on files in /boot/grub/ - specifically /boot/grub/grub.cfg)
  57 [14:15] <popey> (older versions of GRUB in previous releases of Ubuntu used a different file)
  58 [14:15] <popey> The menu contains a list of operating systems to boot from. If one is selected from the menu then it will continue the boot process..
  59 [14:15] <popey> The next step is to load the operating system selected in GRUB, this is usually a two-stage operation.
  60 [14:16] <popey> (well, technically it's a many many stage operation, but I'm simplifying remember) :)
  61 [14:16] <popey> First a small Linux Kernel is loaded which is called an 'Initial RAMdisk' (although you may see it called 'initrd' or 'initramfs'.
  62 [14:16] <popey> Secondly the actual Linux Kernel which will be used by your desktop/laptop is loaded. The Kernel has drivers for many things, and will go through its initialisation process loading whatever drivers are necessary.
  63 [14:16] <popey> Some are built into the Kernel image when it's compiled by the developers.
  64 [14:16] <popey> Some are 'modules' which the Kernel can load in and run after it's started running.
  65 [14:17] <popey> (the kernel usually figures out what modules to load, but sometimes it needs a little help)
  66 [14:17] <popey> Once the Kernel is up and running it hands over to a program called 'init' (short for Initialisation). On Ubuntu 'init' is provided by a package called 'Upstart'.
  67 [14:17] <popey> init/upstart then takes care of loading all the subsequent programs required to get the desktop up and running.
  68 [14:17] <popey> (which you might see listed in places like /etc/rc2.d/ )
  69 [14:18] <popey> At some point during that last process GDM (GNOME Display Manager) will load. This will present the logon screen (or not if you chose to skip the logon screen).
  70 === xindz`off is now known as xindz
  71 [14:18] <popey> (KDE and XFCE use other display managers like KDM)
  72 [14:18] <popey> Once you have logged in your desktop starts.
  73 [14:18] <popey>  
  74 [14:18] <popey> So thats BIOS -> MBR -> Stage 1 -> Stage 2 (GRUB) -> Initial RAMdisk -> Kernel -> init (Upstart) -> GDM -> Desktop
  75 [14:18] <popey> (roughly speaking)
  76 [14:19] <popey> As I said, massively simplified. If you'd like to know more about the boot process I'd recommend reading this page (which is quite technical, but interesting):-
  77 [14:19] <popey> http://www.ibm.com/developerworks/linux/library/l-linuxboot/
  78 [14:19] <popey>  
  79 [14:19] <popey> You can find out more about GRUB at:-
  80 [14:19] <popey> http://en.wikipedia.org/wiki/GNU_GRUB
  81 [14:19] <popey>  
  82 [14:19] <popey> You can find out more about Upstart at:-
  83 [14:19] <popey> http://upstart.ubuntu.com/
  84 [14:19] <popey>  
  85 [14:19] <popey> Any questions about that boot process?
  86 [14:20] <popey>  
  87 [14:20] <popey> = What goes wrong? =
  88 [14:20] <popey> So now you know that the boot process consists of a bunch of parts linked together, lets see where it can all fall apart!
  89 [14:20] <popey>  
  90 [14:20] <popey> == Hardware Failure ==
  91 [14:20] <popey> Commonly forgotten about when diagnosing issues with Ubuntu is that it's possible for hardware to fail. Some ways in which hardware can fail and what the result could be:-
  92 [14:20] <popey> * Failing RAM
  93 [14:20] <popey>  - Unfortunately this can cause all kinds of issues which are often hard to pinpoint
  94 [14:21] <popey>  - Machine may not boot at all, it may boot to the BIOS and no further, or it may get part way through booting and then lock-up, shutdown or reboot.
  95 [14:21] <popey>  
  96 [14:21] <popey> * Failing Disk
  97 [14:21] <popey>  - When a disk is failing it can also look like files are corrupted, so can look like a piece of software has gone wrong
  98 [14:21] <popey>  - Machine may not boot at all beyond the BIOS, or it may get part way through and (as with RAM issues) fail in sporadic or spectacular ways
  99 [14:21] <popey>  
 100 [14:21] <popey> * Failing CPU
 101 [14:21] <popey>  - In my experience least likely, but can happen. More likely is failing to cool the CPU properly, so this could mean a failed heatsink/fan or broken down thermal paste between the CPU and heatsink.
 102 [14:22] <popey>  - Usually this results in a failure to boot at all, beeps during the POST or random crashes
 103 [14:22] <popey>  
 104 [14:22] <popey> * Failing GPU (Video Card)
 105 [14:22] <popey>  - If completely failed then this can cause a machine to not boot, but beep in a cryptic but documented way when the video card is tested during boot up.
 106 [14:22] <popey>  - This often manifests itself as a blank screen or more commonly as colourful screen corruption, which may last all the way from the BIOS screen through to GDM.
 107 [14:22] <popey>  
 108 [14:22] <popey> Ok, so that's a few ways in which hardware can fail, lets move on to see other ways we can fail.
 109 [14:23] <popey>  
 110 [14:23] <popey> = BIOS =
 111 [14:23] <popey> * Can't get past the BIOS boot screen
 112 [14:23] <popey> * BIOS boots off wrong device
 113 [14:23] <popey>  
 114 [14:23] <popey> == GRUB ==
 115 [14:23] <popey> * Can't boot past stage 1 or 2, but get a blank screen, or perhaps the GRUB menu, but can't get past it.
 116 [14:23] <popey>  
 117 [14:24] <popey> == initramdisk ==
 118 [14:24] <popey> * Get dropped to an (initramfs) prompt
 119 [14:24] <popey>  
 120 [14:24] <popey> All of which makes a user into a sad puppy :(
 121 [14:24] <popey>  
 122 [14:24] <popey> == Kernel ==
 123 [14:24] <popey> * Kernel panic on boot
 124 [14:24] <popey> * Kernel cannot find root file system
 125 [14:24] <popey> * Graphics driver no loading
 126 [14:25] <popey>  
 127 [14:25] <popey> ..and so on. So there's a large and diverse number of ways in which the system can fail to boot. Lets move on to look at some tools we can use to diagnose these issues.
 128 [14:25] <popey>  
 129 [14:25] <popey> == Toolbox Contents ==
 130 [14:26] <popey> Here's some things that are useful to have when diagnosing issues!
 131 [14:26] <popey> * Ubuntu Live CD
 132 [14:26] <popey>  - Essential because it contains the following great diagnostic utilities
 133 [14:26] <popey>   - Memtest, GRUB, GParted, a web browser in which you can go and find answers to all your questions :D -> see http://askubuntu.com/
 134 [14:26] <popey>  
 135 [14:26] <popey> * Ubuntu Live USB Key
 136 [14:26] <popey>  - Create using USB creator, UNetbootin, or by installing onto USB from CDROM (or other USB stick)
 137 [14:26] <popey>  - Useful for when you have netbook or other machine which has no optical drive
 138 [14:27] <popey>  - You can install from CD onto USB stick which means you can boot from it and add additional tools / utilities to the stick.
 139 [14:27] <popey> I personally carry round a 32GB usb stick which has Ubuntu installed on it, and from there can diagnose issues with a hard disk based install. Very handy!
 140 [14:27] <popey> * Ethernet Cable
 141 [14:27] <popey>  - For when wireless messes up :(
 142 [14:27] <popey> * USB Hard disk
 143 [14:27] <popey>  - Useful for backing up the system
 144 [14:28] <popey> * A backup of your existing system! <- most essential!
 145 [14:28] <popey> * A backup of any passwords / encryption keys which may have been used to install the system
 146 [14:28] <popey>  
 147 [14:29] <popey> == Diagnosing Issues ==
 148 [14:29] <popey> So given we now know the steps of the boot process we can identify potentially where issues can occur. First off I'd start by asking some questions:-
 149 [14:29] <popey> * What has changed since the system worked?
 150 [14:29] <popey> often the answer from the user is "nothing" and "I never touched it"
 151 [14:30] <popey> (until you shine a light in their face and threaten them)
 152 [14:30] <popey> So this could include:-
 153 [14:30] <popey>  - Hardware added / taken away / moved ?
 154 [14:31] <popey> e.g. adding a new USB device on my desktop caused it not to boot!
 155 [14:31] <popey> (the USB device was an Android phone) :(
 156 [14:31] <popey>  - Software updates, drivers added, new packages installed, packages removed?
 157 [14:31] <popey> e.g. installing the latest and greatest applications you saw on omgubuntu! :D
 158 [14:32] <popey>  
 159 [14:32] <popey> * What has been done differently today than on other days?
 160 [14:32] <popey> Maybe today the user didn't plug in their external screen, but usually they do..
 161 [14:33] <popey> Any little change can potenially cause an issue.
 162 [14:33] <popey>  
 163 [14:33] <popey> The final question:-
 164 [14:33] <popey> * Do I have a backup of all my data and passwords/keys?
 165 [14:33] <popey> Because it's entirely possible that the system is irreversibly broken, and if that's the case then new hardware or a reinstall may be the _only_ option.
 166 [14:33] <popey> But I'd like to hope we can fix most things without a reinstall.
 167 [14:33] <popey>  
 168 [14:34] <popey> == Solving Problems That Occur ==
 169 [14:34] <popey>  
 170 [14:34] <popey> Ok, so it's impossible in the next 30 minutes for me to go through every single possible scenario of what might go wrong :D
 171 [14:34] <popey> But I can pick out a couple that are quite common
 172 [14:35] <popey> And I can explain how to get your environment setup so you can diagnose these issues
 173 [14:35] <popey>  
 174 [14:35] <popey> The main thing that's useful to do when you have a completely unbootable system, is to boot to another install and inspect the contents of the broken system
 175 [14:35] <popey> That can be done in at least two ways:-
 176 [14:35] <popey>  
 177 [14:36] <popey> 1. Pull the disk out of the broken machine and put it in another machine as a slave disk
 178 [14:36] <popey> (this is actually impossible in some systems like the Asus Eee 900 where the SSD 'disk' is soldered onto the motherboard)
 179 [14:37] <popey> (this is also unpleasant to do on many laptops because you have a bazillion screws to undo and little ribbon cables that tear easily)
 180 [14:37] <popey> So that's why I often advocate option 2..
 181 [14:37] <popey> 2. Boot from an Ubuntu Live CD / USB stick.
 182 [14:37] <popey> (hence me saying you should always carry one) :D
 183 [14:37] <popey>  
 184 [14:38] <popey> Now, booting from a Live CD/USB stick is great, and can let you inspect the contents of the hard disk of the broken machine
 185 [14:38] <popey> But you often need to do more than look.
 186 [14:38] <popey>  
 187 [14:38] <popey> In the same way that a Doctor might start by giving a patient a non-invasive X-Ray or MRI scan
 188 [14:39] <popey> Sometimes you need to slice people open in order to figure out what's wrong
 189 [14:39] <popey> Note: I do not advocate slicing people open randomly :)
 190 [14:39] <popey>  
 191 [14:39] <popey> So the diagnostic tool I'm going to talk about is using a Live CD/USB with a program called "chroot"
 192 [14:40] <popey> The main use for chroot here is to attach to / login to, an existing install of Ubuntu and then look, and optionally make changes where needed.
 193 [14:40] <popey>  
 194 [14:40] <popey> This is a slightly complex process which involves a lot of commands on the command line:-
 195 [14:40] <popey> Here's what I do:-
 196 [14:41] <popey> 1. Boot from a Live CD/USB
 197 [14:41] <popey> 2. Use "sudo fdisk -l" to identify which device my local hard disk is
 198 [14:42] <popey> http://paste.ubuntu.com/559909/ <- that is the result of me running that command on my PC
 199 [14:42] <popey> As you can see I have 3 disks on this computer.
 200 [14:42] <popey>  /dev/sda is a 500GB disk
 201 [14:42] <popey>  /dev/sdb is a 250GB disk
 202 [14:42] <popey>  /dev/sdg is a 32GB USB stick
 203 [14:43] <popey> Now looking closer at /dev/sda we can see multiple partitions:-
 204 [14:43] <popey> /dev/sda1   *           1       24832   199463008+   7  HPFS/NTFS
 205 [14:43] <popey> ^^ that one contains Windows XP
 206 [14:43] <popey> So I am not interested in that for this session :)
 207 [14:43] <popey> /dev/sda2           32251       60802   229338113    5  Extended
 208 [14:43] <popey> ^^ this is an extended partition, it contains the next three partitions.
 209 [14:44] <popey> (note: there is a limit of 4 primary partitions on disks, so we tend to have lots of extra partitions under an Extended one, so we can have more than 4)
 210 [14:44] <popey> /dev/sda5           32251       59639   220001280   83  Linux
 211 [14:45] <popey> thats a biggie!
 212 [14:45] <popey> That's my root partition, which contains everything on my pc.
 213 [14:45] <popey> Some people separate / and /home as separate partitions, I haven't on this PC.
 214 [14:45] <popey> /dev/sda6           59640       60802     9335808   82  Linux swap / Solaris
 215 [14:45] <popey> that's my swap space.
 216 [14:46] <popey> Ok, so assuming that /dev/sda5 is my root partition and it's in some way broken.
 217 [14:46] <popey> Here's what we move on to in step 3.
 218 [14:46] <popey> 3. Open a terminal
 219 [14:46] <popey> Now this may be scary for some, but it's really the best way to diagnose these issues
 220 [14:46] <popey> 4. Issue the following commands:-
 221 [14:46] <popey> (assuming /dev/sda5 is indeed the broken root partition)
 222 [14:47] <popey> Now, remember at this point we're booted into a Live USB/CD environment.
 223 [14:47] <popey> We need to make some directories under which we will "mount" the broken installation.
 224 [14:47] <popey> mkdir ~/target
 225 [14:47] <popey> mkdir ~/target/dev
 226 [14:47] <popey> mkdir ~/target/sys
 227 [14:47] <popey> mkdir ~/target/proc
 228 [14:47] <popey> ~ means "my home directory", so all of those will be in the home directory of the Live environment user
 229 [14:47] <popey> (the above are going to be used to 'mount up' the broken install into)
 230 [14:47] <popey> Now we mount up:-
 231 [14:48] <popey> sudo mount /dev/sda5 ~/target
 232 [14:48] <popey> (the above mounts up the hard disk into onto the target directory, you can see your files in there)
 233 [14:48] <popey> So you could open nautilus on the Live CD/USB and navigate to the home directory, and under 'target/' you'll see your files (hopefully)
 234 [14:48] <popey> (the next three lines mount up some special folders that may be needed by the diagnostic tools)
 235 [14:48] <popey> sudo mount -o bind /dev ~/target/dev
 236 [14:48] <popey> sudo mount -o bind /proc ~/target/proc
 237 [14:48] <popey> sudo mount -o bind /sys ~/target/sys
 238 [14:49] <popey> (At this point we have the local install mounted up in a folder)
 239 [14:49] <popey> Now we "chroot" into it..
 240 [14:49] <popey> 5. 'chroot' into the install with the following command:-
 241 [14:49] <popey> chroot ~/target
 242 [14:50] <popey> actually, sorry.
 243 [14:50] <popey> sudo chroot ~/target
 244 [14:50] <popey> (at this point we'll get a root prompt inside the 'broken' system and can start poking around looking for problems).
 245 [14:50] <popey>  
 246 [14:50] <popey> This is the single most useful diagnostics system (in my opinion) for troubleshooting a broken system.
 247 [14:51] <popey> Now we're running low on time, so I'll go through an example of a specific issue that can be fixed once we have the sytem in this state.
 248 [14:51] <ClassBot> There are 10 minutes remaining in the current session.
 249 [14:51] <popey>  
 250 [14:51] <popey> === Fixing broken GRUB ===
 251 [14:52] <popey> If GRUB has become broken we can at this point check the configuration of GRUB by looking in ~/target/boot/grub/grub.cfg
 252 [14:52] <popey> Here is my grub.cfg http://paste.ubuntu.com/559911/
 253 [14:52] <popey> Note at the top:- "DO NOT EDIT THIS FILE"
 254 [14:53] <popey> However it is 'interesting' to look at
 255 === MichealH is now known as MH0
 256 [14:54] <popey> So fixing all grub issues is beyond the time we have here, but, now we have the system mounted up and chrooted we can perform additional steps which can help the diagnosis or indeed resolve the issue)
 257 [14:54] <popey> so for example at this point I would head over to the following pages:-
 258 [14:54] <popey> https://help.ubuntu.com/community/GrubHowto
 259 [14:54] <popey> (or if windows had rubbed out my grub install) :-
 260 [14:54] <popey> https://help.ubuntu.com/community/RecoveringUbuntuAfterInstallingWindows
 261 [14:55] <popey>  
 262 [14:55] <popey> The massive stumbling block many people have is a chicken-and-egg problem that goes like this:-
 263 [14:55] <popey> 1. My system wont boot, grub is broken
 264 [14:56] <popey> 2. To fix grub, boot your system and run these commands..
 265 [14:56] <popey> 3. Goto 1.
 266 [14:56] <popey>  
 267 [14:56] <ClassBot> There are 5 minutes remaining in the current session.
 268 [14:56] <popey> By using the chroot method above, we can circumvent the "my system wont boot" and get logged into the system, bypassing the broken grub / kernel / video issue that's preventing bootup in the first place
 269 [14:56] <popey>  
 270 [14:57] <popey> I'll very quickly mention another issue that this can help fix..
 271 [14:57] <popey>  
 272 [14:57] <popey> === Fixing a half-finished upgrade/update ===
 273 [14:57] <popey> If the system powered down or crashed during system update, sometimes this can render the system unusable until the updates are 'finished'. Some commands which can help here:-
 274 [14:57] <popey> (assuming you're already chrooted in)
 275 [14:57] <popey> dpkg --configure -a
 276 [14:57] <popey> (will continue processing outstanding updates)
 277 [14:57] <popey> and..
 278 [14:57] <popey> apt-get dist-upgrade
 279 [14:58] <popey> (which will be useful if some updates need to be installed)
 280 [14:58] <popey> Note: Both of those commands, I did not use sudo.
 281 [14:58] <popey> The reason for that is because when you "chroot" you "become root" on the system. this is of course dangerous, so be careful.
 282 [14:58] <popey>  
 283 [14:59] <popey> Hmm, I'm getting to the end of my slot, and we don't really have any time left for questions, sorry about that.
 284 [14:59] <popey> I'm hoping to make a video about this which I'll put up online at http://ucasts.tv/
 285 [14:59] <popey> feel free to ping me on irc if you have questions later :)
 286 [15:00] <popey> Apparently the next session is blank for some reason. Free period!
 287 [15:01] <popey> thanks to everyone for watching me wibble on, and thanks to the classroom team!
 288 [15:01] <ClassBot> Logs for this session will be available at http://irclogs.ubuntu.com/2011/01/29/%23ubuntu-classroom.html
 289 [15:01] <popey> thanks ClassBot !

UserDays/01292011/How to fix a broken machine (last edited 2011-01-29 19:09:01 by ptr)