Measuring Reliability

How do you know how reliable a computer system is? We talk about system availability meaning how long it is available for. For example, if a system is down (unavailable) for 1 hour in a 100 hour period, then the system availability is 99% for this period. Normally equipment manufacturers will quote average availability over a long period of time. Another way of measuring a system’s reliability is to measure the length of time between system failures. MTBF is the Mean Time Between Failure and is quoted by equipment manufacturers as an indication of reliability.

Q. Make a list of systems failures from research on the internet. (Remember to note your sources.) Suggest ways which in your opinion these failures could have been avoided.

Protecting against system failures

If a computer system’s reliability is very important then a company may need to have spare hardware in place just in case of system failures. Hardware redundancy is one way to protect systems. This means having more than one of a critical component. For example, the computers in a commercial aeroplane will all have backup computers to take their place if they go wrong.

If software or hardware problems occur, the data is most at risk. This will be stored on a disk on a server somewhere. As well as redundant hardware companies must make sure that data is backed up regularly.

Sometimes computer systems are so important to a company that they cannot function without them. A computer system failure could mean that a company goes out of business in just a few days. Disaster recovery plans are sometimes put in place so a company can switch over to a complete, new system with all their own data on it, often with hours. There are disaster recovery companies that specialise in offering this service to companies who depend on their computer systems.

Q. What is meant by the term RAID storage?