Dependable operational processes are processes that are designed in such a way that operator errors can be detected and corrected before they have serious consequences. In designing dependable operational processes, you have to take a range of technical and human factors into account.
As I have discussed in Chapter 11 (Socio-technical systems), the dependability of a software system is influenced by technical and human factors. The people using a system do not always follow instructions on how the system should be used; they make mistakes where they forget to take some actions, present the system with unexpected inputs or misread outputs; they misunderstand what the system is supposed to do or how it will behave in response to environmental events, such as a power failure.
When system failures occur, it is often the case that these failures are triggered by some human action or by an operator failing to take some action. In those circumstances, the cause of the failure is often attributed to ‘human error’. The operator or user of the system has not behaved as specified and so are deemed to be responsible for the system failure. However, simply blaming the human is often an easy way to avoid deeper, more fundamental problems in the system design. For example:
1.Why did the designers not anticipate the possibility that system users would make mistakes – after all, making mistakes is a universal human characteristic?
2.Does a system have particular characteristics that increase the chances of people making mistakes? For example, if a system presents information in a way that is hard to understand, then it is likely that users, particularly if they are working under stress, will make mistakes and will read the information incorrectly.
3.Did the environment in which the system was used contribute to the ‘human error’. For example, control room systems often set of audible alarms when problems are detected. When these alarms are going off, system operators are inevitably distracted by them. If the system requires these operators to input details of the situation, mistakes are likely to occur.
Reason's Swiss Cheese model of error that I covered in Chapter 11 suggests that some human action is often the trigger for a set of subsequent events that ultimately lead to system failure. Therefore, designing a dependable process which includes checks for human actions that are potentially erroneous is an important way of improving overall system dependability.
Mechanisms that can be used include confirmation where an operator has to confirm a potentially dangerous action (although this is not foolproof as sometimes people confirm without thinking), reasonableness checking of inputs and ensuring that more than one operator has to provide input for critical functions.