Monday, June 11, 2012

5 Basic Steps For Troubleshooting Any Computer Problem

5 Basic Steps For Troubleshooting Any Computer ProblemI have been troubleshooting computers for most of my life. I went from fixing the family computer, to fixing my own computer, to building myself a new computer to fixing other peoples computers. So I think its fair to say I've encountered a fair amount of computer problems. There are still new problems popping up every day, and even someone who has been working in the computer field for a long time will encounter problems that they have never seen before (some might not admit it, though). So its good for anyone who works with computers to learn how to troubleshoot.

While I cannot impart my entire knowledge of troubleshooting on you in this one post, I can still give you an insight into how I go about troubleshoot computer problems. I have broken the basics of computer troubleshooting into five steps, to give you an understanding of the process being used. This can be used as a basic guide that you can follow for anything problem a computer can throw at you.

Most professionals will tell you that five steps do not even come close to explaining the entire troubleshooting process, and they are (mostly) right. But I am not teaching you how to troubleshoot computers on a professional level, rather I am showing you how the troubleshooting process works so as you learn, you can develop your own troubleshooting method.

1. Identify The Problem

Now you might think that this is an obvious step but sometimes it is overlooked or not done properly. First off, you break the problem into one of two categories, either Hardware or Software. This is done by eliminating all possible causes of one section then the other or simply figuring out what an error message says. So an error message with a path to a program file is probably a software problem, while numerous beeps emitting from your computer during the POST is probably a hardware problem.

Note: if the error message gives you some sort of code, copy that down then Google it, or put it in a post on a forum asking for help.

Next you break it down further. If it is a software problem, identify the program and all possible causes that might be interfering with this program such as recently installed software, or other programs running at the same time that do the same thing as the program having (or creating) problems. If it is a hardware problem, attempt to isolate the issue by removing all peripherals one at a time and checking all of your hardware for any signs of damage, disconnected/frayed wires, or anything out of the ordinary.

2. Make A Plan

So you think you have identified the problem. You probably have, but remember, these problems might not be so straightforward as they seem. You want to create a plan on how you are going to fix it. For a simple computer problem, it can just be a little game plan in you head nothing fancy. Make a list of your possible causes and solutions to fix them. Then think ahead, and figure out what to do if those solutions fail. Or if other problems arise.

3. Prepare Your Arsenal

You have a list of probable causes and possible solutions, so now you need to gather the tools and programs you are going to use to fix this problem. If you do not do this often, you will probably have to go around downloading all the tools you need. Many people who do this often usually carry around a USB flash drive with various tools, programs and sometimes even a couple of operating systems. Some of my favorite tools are Hijack This, Defraggler, Ccleaner and Malwarebytes. I usually like to have a linux OS on my flash drive as well, either Backtrack or a rescue OS. Don't forget to include anti-virus software if there isn't one already on the computer. Remember, system utilities are still a vital part of any troubleshooting process.

4.Attack!

Everything is now setup, you are ready to start the process of rescuing your computer. get to it! Remember: go slowly and make sure you are doing everything correctly.

5.Double Check

So now that you have removed all the problems on your system, you should now make sure everything works correctly. Restart several times while checking to make sure anything and everything is working properly. Sometimes you might encounter a problem that you think you've fixed only to realized you broke something else in the process. Open all the programs you normally use, and keep your eyes open for any weird behavior or signs of another problem.

Sunday, June 10, 2012

Troubleshooting 101

Troubleshooting 101Perhaps nothing is more essential to the execution of IT tasks than troubleshooting. You can gather requirements, design, plan, and optimize until the cows come home but at some point something is going to go unexpectedly wrong. That's when the process of troubleshooting comes to the forefront.

Most engineers think they are good troubleshooters, applying experience, intuition, and often brute force to beat any problem into submission. But, faster and less painful solutions can be found by applying a structured, intentional approach:
Gather Good Information
Find the Right Problem to Solve
Validate Your Solution

GATHER GOOD INFORMATION

You can't troubleshoot something you don't understand. This starts with RTFM (Reading the Fine Manual). Well, that's the G rated version of the acronym but the advice still stands. Gather as much information about the system you are troubleshooting as you can. If there is a manual, read it. All of it.

Understanding the system means that when you fix something you'll be less likely to break other things. Know how the system interacts with other systems and figure out what "normal" operation is. You can't recognize abnormal behavior if you don't know what normal is like.

Understand your toolkit. Be completely familiar with the software and hardware tools at your disposal so you aren't trying to learn their use while diagnosing a system problem. In other words, be prepared.

Interview those affected by the failure. Find out the last time the system operated properly. What changed since then that might have affected the system? Be sure to look for seemingly unrelated events.

Are you getting an error message? Do a Google search on the error message and see what others may be reporting about causes and the potential solution.

REPRODUCE THE FAILURE

The first step in troubleshooting is to try to reproduce the reported failure. In most cases, the failure will be reported to you second hand and the information may be inaccurate or misleading. There are several reasons to reproduce the failure:
First Hand Evidence - Reproduce the failure so you can see it happen. Extra points if you can make it fail at will.
Indications of Cause - Knowing the conditions under which the failure occurs will provide great insight into the possible causes.
So You Know if You Fixed It - The only way to validate that you've actually fixed the problem is to execute the steps that produce the failure and NOT to see the failure.

Write down the steps you take to create the failure, follow those steps and make it fail again. When in doubt, start at the beginning. Reboot the system and start from a clean testing condition but try to find conditions that lead to a reproducible failure.

INTERMITTENT FAILURES

Many tough problems are intermittent. You may have seen the failure once but your attempts to reproduce the failure don't have the same starting conditions, inputs, events, or outside influences. Many times we cannot control all of the influencing factors in the system.

So, what do you do? Start by trying to catalog all of the potential conditions affecting the system you are troubleshooting. Write them all down. Control and vary those conditions one at a time to get the problem to behave differently. Hopefully one of those changes will cause the problem to occur with different frequency, intensity, or outcome and that will suggest an additional avenue of investigation.

What if it is STILL intermittent? Capture more information when the failure occurs and gather data from as many failures as possible. Analyze the data for common characteristics and conditions. And, don't assume that just because you haven't seen the failure in the last 20 tests that the problem is fixed. If you didn't fix it then it isn't really fixed. Few problems are self-correcting.

OBSERVE OBJECTIVELY

Most engineers jump to conclusions about the cause of a problem prematurely. Make sure you really look at the behavior of the system. Stop thinking and just observe the system in a completely objective, dispassionate, cold, robotic manner. See the failure occur in detail. Typically we get reports of the result of the failure but not the details of the failure itself. Try to observe the failure occurring in detail. Apply instrumentation to the system to gather more information about the conditions and behavior of the failure. Enable system notifications and logging but be aware that the act of actively observing the system can alter its behavior (Heisenberg uncertainty principle).

FIND THE RIGHT PROBLEM TO SOLVE

Solutions are frequently obvious. It's finding the right problem to solve that's the hard part. If your initial problem domain is the entire system you are troubleshooting, find a way to cut the system in half. Observe the behavior in each half of the system. If the problem occurs in one half of the system, cut that part in half again and repeat until you've narrowed the scope of investigation as far as possible.

CHANGE ONE THING AT A TIME

If you change multiple things at once and the problem goes away, you'll never know which change was the one that fixed the problem. The same thing applies to the tests you are using. Since the tests or instrumentation you use can affect the problem, change one test at a time. When in doubt, apply the same tests to a known good system and compare the data.

KEEP A LOG

Write down what you did, when you did it and in what order, and what happened. Be detailed! You may need to refer back to your log for additional insight and to have data to correlate with other systems or observations. Don't trust your memory!

QUESTION YOUR ASSUMPTIONS

First, you need to know what assumptions you are making. This is easier said than done! Stop and step back from the problem and try to take the place of an external observer. Have you made assumptions about the situation or system behavior without realizing it? Think divergently about all the implied assumptions that may have been made. Question each of these assumptions. Is there a test that can be performed to confirm or deny the assumption? Have you assumed that your test or tool is accurate and working properly? Can you validate your test or tool to be sure it is providing valid information?

A FRESH PERSPECTIVE

It's easy to get so dug into a problem that it is impossible to see the forest for the trees. Ask for help. Get another set of eyes to lend a fresh insight. When asking for help, report the symptoms and observations, not your theories. Be receptive to the input of others.

VALIDATE YOUR SOLUTION

If you didn't fix it, it's still broken. Don't assume that your action fixed the problem. Prove it! If you have a sequence of steps that reliably reproduces the failure, repeat those steps and validate that the problem does not occur. If you are unsure if your fix really did address the issue, remove the fix and make the problem occur again. Then, place your fix back into place and verify that the problem does not occur. If you can make the problem occur and not at will, you've clearly found the issue and a fix.

Be sure you fixed the cause of the problem and are not just masking the result. Remember, problems never just go away by themselves. You need to be sure you really did fix it.

THE BOTTOM LINE

We frequently rely on our experience and intuition when troubleshooting, but applying this structured approach can yield better quality solutions in less time and with fewer unwanted side effects.
Automotive Automotive