Вы находитесь на странице: 1из 10

www.hse.gov.uk info@hse.gov.

uk

www.instmc.org.uk publications@instmc.org.uk

BETTER ALARM HANDLING A practical application of human factors


John Wilkinson, HM Specialist Inspector Dr Debbie Lucas, Principal Psychologist Human Factors Team Health and Safety Executive Hazardous Installations Directorate This article was originally published in the Measure & Control Journal, Vol. 35, March 2002 The Institute of Measurement and Control, UK

Introduction
This paper is based on a presentation given at the IBC Alarms Conference, June 2000. Its purpose is to provide practical information for designers and users of alarm systems. This paper provides practical examples and advice and sets alarm handling problems in a safety management system (SMS) context. Appendix 1 gives a summary of the Texaco incident and Appendix 2 reports on a case study. Human factors (HF) are often described as the thread that runs through any safety management system. In this paper we will consider: The continuing HF problems in alarm systems and their solutions; How alarms are actually used (not necessarily how designers think they are used!); How the competency of designers, installers and operators may be established; and Learning the lessons from previous accidents, incidents and near-misses.

HSE considers alarm handling to be a continuing major safety issue. There is no room for complacency, even though 8 years have passed since the Texaco incident. Incidents are still occurring involving alarm systems and there are still significant problems with alarms systems on some major hazard sites. Training, competency and user support are still key areas and users and designers need to be aware of each others requirements. However, solutions are available by original design or by modification, and practical guidance is also available. In 1999, HSE published the revised guidance on human factors. HSG 48 Reducing error and influencing behaviour1. This is HSEs core all-industry guidance and provides a simple and practical introduction to the subject. In the same way the Engineering Equipment and Materials Users Association (EEMUA) Guide2 provides generic all-industry technical guidance. In both cases the industry sector concerned can use the guidance to get started. We expect companies to review their alarm systems from a human factors viewpoint and to seek continuous improvement.

www.safetyusersgroup.com

This document is available on

www.hse.gov.uk info@hse.gov.uk

www.instmc.org.uk publications@instmc.org.uk

HSE Alarms Strategy


In the period since the Texaco accident, HSE has restructured the high hazard, low risk parts of the organisation. The Chemicals and Hazardous Installations Division was formed, now brigaded with our Offshore and Mines Divisions as HID - the Hazardous Installations Directorate. A major part of its mission is to provide a more clearly accountable focus both within HSE, and for liaison with industry and other stakeholders, to ensure that the lessons from accidents and near misses are properly learnt. So the current alarms strategy is derived from the Texaco investigation report recommendations as modified by more recent research and local project work. HSE commissioned initial research that resulted in the Bransby & Jenkinson Report3 (Contract Research Report - CRR - 166/1998). The report concluded that the alarm system problems identified as a result of the Texaco investigation were widespread in industry and that those problems can be solved or prevented.

Texaco

This incident and its lessons have been well documented4, and a summary of the incident is given in Appendix 1. The key Texaco problems included: Alarm floods Too many standing alarms Control displays and alarms which did not aid operatives No clear process overview to help diagnosis Alarms which presented faster than they could be responded to 87% of the 2040 alarms displayed as high priority, despite many being informative only Safety critical alarms were not clearly distinguished.

The other key lesson was that the management of the alarm system cannot be successfully dealt with in isolation from the overall safety management system (SMS) context. In other words this is not just a technical issue and both users and designers/installers need to keep this in mind. One obvious question, but one which designers have failed to take account of in the past, is how alarms are actually used by operators on site. The SMS failures identified after the Texaco incident included: Deficiencies in the plant modification procedure An inadequate instrument maintenance system Inadequate training and competence of operators A lack of clear guidance on managing unplanned events and when to initiate emergency plant shutdown A lack of clear authority to initiate shutdown.

Ultimately, plant safety should not depend on an operator response to an alarm. In industry the chances of an operator failing to act in such circumstances can be very much higher than one may think. In two recent LPG releases from road tankers, the operators present failed to use the tanker emergency stop buttons - there were a variety of contributory reasons in each case (not least of which was the distraction of being enveloped in a vapour cloud of unignited LPG!). Nevertheless, there are real world effects of an emergency or upset situation on human reliability that a designer or risk assessor may not always take into account.

www.safetyusersgroup.com

This document is available on

www.hse.gov.uk info@hse.gov.uk

www.instmc.org.uk publications@instmc.org.uk Reliance on people or automatic systems for safety-critical functions needs to be properly defined at the design stage, and then assured in the same way that any other system - such as quality - is.

The EEMUA Guide


This guidance incorporates the lessons and conclusions from the Bransby and Jenkinson report. HSE regards the guide as being the nearest thing to a standard currently available. The guide was written for engineers and engineering managers. Its purpose is to stimulate discussion and encourage industry to develop its principles to meet specific safety applications. The Guide provides generic all-industry technical guidance. Human factors is an implicit but nevertheless key element. The recent Better Alarm Handling information sheet5 teases out these elements and makes them explicit in a practical, simple and easily communicated form.

HSE Local Project in the North East 1999


This project arose out of the Bransby and Jenkinson report. The aim of the project was to form a clear view about how well companies are equipped to deal with human factors and to establish a baseline for further work. Specifically, via a question set, it looked at the following issues at three major hazard sites: Why is the operator(s) there? What demands are made on them? What support do they have? What standards exist against which to gauge performance?

The main findings were: a) Installation of plant wide control systems, with central monitoring (DCS) was closely linked to a proliferation of alarms that were in fact mostly status indicators. b) Structured assessments of what might go wrong (HAZOPs) generally added extra alarms, again often not true alarms. c) Corporate standards are available but varied in quality. IEC61508 was mentioned in some but this standard contains no specific guidance on human factors or on the overall management and implementation of alarm systems. d) The sites visited had analysed alarm rates and spurious alarms but they had not established performance standards suitable for monitoring progress. Even in normal operation one plant still experienced peaks of 64 alarms per hour. e) Simulators were reported - and confirmed - as bringing significant benefits and savings in identifying and remedying potential operator problems.

Amongst the issues arising from the project it was recognised that there were re-engineering difficulties on existing plants (the I wouldnt start from here syndrome) and associated inspection difficulties for HSE. It also confirmed that many companies would initially find it difficult to justify the adequacy of their alarm systems, where this was required by the Control of Major Accident Hazards (COMAH) Regulations6. There was a general lack of human factors expertise on site or available to them and the lack of clear standards and benchmarks was confirmed. Overall the project showed that, for critical safety-related activity, a good risk assessment was the key starting point.

www.safetyusersgroup.com

This document is available on

www.hse.gov.uk info@hse.gov.uk

www.instmc.org.uk publications@instmc.org.uk

Human Factors Team Strategy


In the past, human factors development has sometimes been hampered by a research-led focus that has not always delivered a practical product or tool to the user or designer. With this in mind, and following the results of the HSE research and local project, we set out to produce simple and practical guidance for inspectors and industry, providing a link between the research and higher-level guidance, and the users. The revised HSG 48 had, in the interim, been produced, and so generic all-industry guidance on human factors was now available. Better Alarm Handling should be seen in an overall context of encouraging companies to seek the best opportunities for improvement rather than just to prospect for problems alone. The key message is that it is never going to be acceptable to conduct one review and implement the conclusions. This has to be part of a continuous process of improvement, just as it is with most other business areas such as quality. Why manage safety any differently?

Better Alarm Handling


The guidance provides a simple 3-stage approach, based on that taken in HSG 48: Find out if you have a problem Decide what to do and take action Manage and check what you have done

Step 1: Find out if you have a problem The advice is to take some measurements (metrics) and talk to those involved in the process, including operators and line managers. The perception may be that all is well but are there systems in place that would reliably confirm this? It is important to remember that absence of evidence is not evidence of absence in this or any other case. Given the frequency of major accidents a 10-20 year record of no incidents on one site is not on its own sufficient reason for complacency. And how are new alarms added or existing ones modified? Is the design to a standard that takes account of human limitations? Step 2: Decide what to do and take action Form a representative team to progress the issues. Implement some quick wins both to deal with the problems and to give positive feedback to those involved that action is taking place. Establish operator competency and identify training needs. Provide support (e.g. on- or off-line help including diagnostics, clear navigation routes around the screen pages), to help operators respond effectively to alarms including in emergencies. Step 3: Manage and check what has been done Make the approach systematic and part of the SMS. This is not an add-on. Draw up an alarm strategy and a standard for the site. Carry out audits and review the results. Repeat the baseline measures to check progress. Get it right first time where the opportunity is offered e.g. through the site purchase specification system, and the change management procedures.

www.safetyusersgroup.com

This document is available on

www.hse.gov.uk info@hse.gov.uk

www.instmc.org.uk publications@instmc.org.uk

HSE Alarms Strategy - Inspection and Enforcement


HSEs current approach in the chemical industry sector is to: Raise awareness and provide information Promote consideration of the issue within the whole life cycle of the system Promote consideration of the key role of the operator in the system

HSE will consider enforcement where there is: No fundamental review and/or identification of safety critical alarms; and High reliance on operator to react or respond to alarms to prevent a major accident

But - as always a key consideration will be proportionality. Any action should be proportionate to the hazards and risks in each case. In the same way we would expect companies to have targeted their own efforts on key areas identified, for example by risk assessments including those in COMAH safety reports.

Users
HSE would expect to see: A policy that recognises human factors in alarm handling as a management issue A logical process in train which has assessed/is assessing the current situation A sensible action programme to deal with issues found, and For COMAH, a rigorous demonstration that human factors have been addressed adequately where operator response to alarms is claimed as defence against major accidents.

Users should evaluate, prioritise and modify existing alarm systems, taking account of the degree of risk to target their efforts. They should ensure new designs meet EEMUA standards and take into account human limitations. The alarm system should be managed as an integral part of the SMS, and as part of a continuous improvement programme. Remember that, no matter how well designed, no alarm system can operate effectively if the work loading and staffing levels do not take account of all foreseeable conditions (from normal, through upset, shutdown and start up, to emergency) and if operators are not competent or if their needs have not been considered in the new design or modification. Users also need to consider shift lengths and patterns, and fatigue factors - otherwise there may be no response when one is most needed.

Designers
HSE would expect you designers to follow the EEMUA Guide principles with the SMS/Safety Report context raised and considered as part of the overall solution for a proposed new or modified alarm system. Remember that the needs of installers and commissioning engineers will not be the same as those of the final system practitioners e.g. designers, installers, maintainers.

Summary for designers and users


Dealing with alarm systems is not just about technical specification and engineering: it is part of the overall SMS, a continuous improvement culture and change management. The role of the operator must be considered throughout the whole life cycle. Human factors is not like a coat of paint which can be added later in the design process.
www.safetyusersgroup.com
5

This document is available on

www.hse.gov.uk info@hse.gov.uk

www.instmc.org.uk publications@instmc.org.uk

Some suggestions to start the process


Users - have you: Started your action programme? Considered human and not just technical factors? Read Better Alarm Handling and HSG 48? - and circulated the key messages to all those involved in alarm system management and use? Considered how you assure reliance on operators e.g. in your safety report? Identified one thing to improve on your return to site from this and the following sessions?

Designers - have you: Considered human factors and ergonomics explicitly in design and how you can better incorporate them in the design process? Selected a priority area for improvement in your design approach? Involved the users?

Example (users and designers)


Consider starting with a brainstorm session with a representative team. Identify a key area, implement the quick wins, and put a timetabled action plan in place.(copyright HMSO 2001)

References
1. Anon. Reducing Error and Influencing Behaviour HSG 48, HSE Books, 1999, ISBN 0 7176 2452 8 2. Anon Alarm Systems, A guide to Design, Management and Procurement EEMUA Publication 191, 1999, ISBN 0 85931 076 0 available from The Engineering Equipment and Materials Users Association, 54 Beech Street, London EC2Y 8AD. 3. Bransby and Jenkinson The management of alarm systems CCR 166, HSE Books, 1998, ISBN 0 7176 1515 4 4. Anon The explosion and fires at the Texaco Refinery, Milford Haven, 23 July 1994 HSE Books, 1997, ISBN 0 7176 1413 1 5. Anon Better Alarm Handling HSE Information Sheet, Chemical sheets 6, HSE Books, also available on the HSEs website at www.hse.gov.uk/pubns/chi6.pdf 6. Anon A guide to the Control of Major Accident Hazards Regulation 1999 L111, HSE Books, 1999, ISBN 0 71276 1604 5

www.safetyusersgroup.com

This document is available on

www.hse.gov.uk info@hse.gov.uk

www.instmc.org.uk publications@instmc.org.uk

APPENDIX 1
Texaco Refinery explosion and fires 1994: Human factors in the incident The event Twenty tonnes of hydrocarbon were released and exploded when a slug of liquid was sent through the flare system pipeline, which failed. The site suffered severe damage, and UK refinery capacity was significantly affected. Only luck prevented multiple deaths. It was a Sunday, and some people had left the area just before the explosion. The following description of the event has been considerably simplified to highlight the human factors aspects of the case. The incident involved three interconnected process vessels. A loss of feed to vessel 1 caused the valve A to close to prevent the vessel being emptied. As vessel 2 emptied, valve B closed, trapping in the remaining liquid. As heat was still being applied, this liquid vaporised, and the vessel vented into the flare system, through the flare stack knock-out drum, which catches liquid to prevent it going to flare. Meanwhile, the feed to vessel 1 had been restored, and valve A was opened. This should have caused valve B to open, but this did not occur. The operators were aware that vessel 2 was still overfilling, so they opened valve C to provide another route out of that vessel. This resulted in a high liquid level in the flare stack knock out drum. Due to a previous modification, there was no facility to pump out the knock-out drum quickly. By this time, the operators were concentrating on the screens that showed the problems in vessels 1 and 2, and were not being helped by the flood of alarms being generated. The combination of a high liquid level in the knock-out drum, and vessel 2 venting into the flare system again, caused a slug of liquid to be carried through the knock-out drum and into the flare line, which collapsed at a weak point. Consequences Fatal injuries were avoided only by luck e.g. contractors in a van were about to enter the area when the explosion happened; the concrete roof of a building fell in minutes after people had left it. The rebuilding costs were 48,000,000. Texaco and Gulf Oil were prosecuted by HSE and fined 370,000. UK refining capacity was significantly affected. Lesson 1 - Alarm System The control displays and alarms did not aid operatives. A process overview would have helped diagnosis. The alarms appeared faster than they could be responded to and key alarms were missed in the flood. 87% of the 2040 alarms displayed as "high" priority, despite many being informative only - safety critical alarms were not distinguishable from the rest. Lesson 2 - Safety Management System (SMS) SMS failures included: The plant modification procedure did not prevent removal of the flare knock-out drum emptying facility The instrument maintenance system did not prevent 40% of instruments from being defective

Lesson 3 - Training and competence Training should include: Clear guidance on how to manage unplanned events Clear guidance on when to initiate emergency plant shutdown Clear authority to initiate shutdown

Lesson 4 Ultimate plant safety must not depend on operator response.

www.safetyusersgroup.com

This document is available on

www.hse.gov.uk info@hse.gov.uk

www.instmc.org.uk publications@instmc.org.uk

APPENDIX 2
A case study of improvements identified and made to a specific - and complex - alarm system Introduction and background In this case a control room had been designed and set up to control a large number of widely located and linked units. The problem identified by HSE Inspectors was that the operators were faced with a very userunfriendly set of displays with a very large number of undifferentiated alarms being very poorly presented.

The designers had set out with the best intentions but, in seeking to alarm virtually anything that moved in the system, they had not considered the operators' needs in the control room and had become progressively blind to the main object of the exercise - to provide effective control. The installers and commissioning engineers did not consider this to be a problem because of their detailed familiarity with the system from first design onwards but since the operators were not involved in the design, their different needs were not taken properly into account. Operators were faced with long lists of alarms that they had to scroll through constantly. The actual alarms were hard to pick out, being identified only by a long number string, and the safety critical alarms weren't differentiated from the rest. Many of the 'alarms' weren't true alarms i.e. no defined operator response was required. Some of them repeated so filling up the screen display. To add to their difficulties the system required them to both accept and clear many of the alarms.

www.safetyusersgroup.com

This document is available on

www.hse.gov.uk info@hse.gov.uk

www.instmc.org.uk publications@instmc.org.uk In this situation, operators being human, inventive and wishing to get the job done, they found their own shortcuts and methods to try to cope. Given that these methods were unsystematic and resulted in un-assessed real changes, they introduced the possibility of further errors into the system. For example, operators were routinely 'shelving', or otherwise 'fixing', alarms to get them off screen so that they could focus better on what they thought were the key ones. Some key lessons Perhaps one key lesson here for designers and users is that the needs of those installing and commissioning such a complex system are not the same as the needs of the final users of the system. Traditionally those installing and commissioning systems like all key process stages or functions to be monitored and so they are often routinely alarmed too. However, the alarm system needs of the user (including the operators) will often be different and, if they haven't been involved in the design process (or at least considered) further problems may arise. When the project is handed over this can create real difficulties for the operators in control when they are left with a system that is over-complex for their day-to-day production or other control needs. If the design of a new system is used as an opportunity to solve a wide range of other related or unrelated problems (the 'bandwagon' effect) then the end result could well be messy if key aims and objectives are not clearly set from the start and then implemented. In particular both human factors and human reliability need to be considered. Identified improvements These ranged from some very simple additions to the screen display, to wider SMS solutions. For example: Specific Navigation - provision of a button to allow operators to return instantly to the top (now priority) page of the alarm list being viewed. Assessment of the actual hazards so that safety critical alarms could be identified and prioritised. Colour coding of alarms to reflect their importance and type. Provision of a priority filter list to allow operators to pick out key alarms in the event of an alarm flood and to allow them to shift easily between alarm categories. A review and subsequent reduction in the number of alarms. Operators could no longer 'shelve' (suppress or 'hand-dress i.e. replace them with a fixed value) alarms without going through EEMUA Guide safeguards i.e. - provision of quick and easy access to view the shelved alarms and print them off - unshelving by the operators is made easy - adequate shift handover arrangements for shelved alarms - operator training on shelving implications and subsequent monitoring - prevention of one operator being able to shelve an alarm in an area also controlled by another operator without that operator being made aware of it Removal of 'alarms' which in fact were status indicators only or which were not intended for action by the control room operators i.e. does the alarm require a defined operator response or not? Elimination of alarm list flooding with repeating alarms - introduction of single line annunciation. The previous requirement to both accept all alarms and accept their later clearance was removed (except in some carefully-defined special cases) so that clearance no longer routinely required an operator response. Where alarms were both accepted and cleared they were now prevented from just disappearing off the alarm list until they are 'repacked' by the operator. A repack facility was introduced to avoid alarm messages moving up or down the alarm list like this without the operator requesting it, so avoiding the possibility of the operator not being able to find the alarm again.

www.safetyusersgroup.com

This document is available on

www.hse.gov.uk info@hse.gov.uk

www.instmc.org.uk publications@instmc.org.uk General Adequate monitoring and analysis of the alarm system and operator handling was put in place so that further improvements could be made over time. Recommendations were also made for some longer-term fundamental redesign of key parts of the system. Competencies were reviewed and further targeted training introduced together with suitable 'refresher' training and monitoring of performance. A formal change procedure was introduced which included the operators. Procedures were reviewed - and new ones introduced following wide consultation, and were tested and monitored for useability. The link back to the safety report was reviewed and the consequences of operator error where high reliance was placed on operator response were reassessed - the results were fed back into the redesign and improvement process. An HF 'champion' - a senior manager was appointed to provide a focus and management drive to ensure the recommendations were implemented with specific milestones being set. The basic ergonomics of the control room - and operator control over them e.g. heat, light and ventilation as well as layout, comfort etc - were reviewed. The company's project management process was reviewed to ensure that in future it worked better e.g. some key issues were identified at early stages of the project but were not then dealt with. Rostering, including shift patterns and lengths were reviewed to consider potential fatigue problems e.g. some operators were working 7 nights in succession.

www.safetyusersgroup.com

10

This document is available on

Вам также может понравиться