Академический Документы
Профессиональный Документы
Культура Документы
uk
www.instmc.org.uk publications@instmc.org.uk
Introduction
This paper is based on a presentation given at the IBC Alarms Conference, June 2000. Its purpose is to provide practical information for designers and users of alarm systems. This paper provides practical examples and advice and sets alarm handling problems in a safety management system (SMS) context. Appendix 1 gives a summary of the Texaco incident and Appendix 2 reports on a case study. Human factors (HF) are often described as the thread that runs through any safety management system. In this paper we will consider: The continuing HF problems in alarm systems and their solutions; How alarms are actually used (not necessarily how designers think they are used!); How the competency of designers, installers and operators may be established; and Learning the lessons from previous accidents, incidents and near-misses.
HSE considers alarm handling to be a continuing major safety issue. There is no room for complacency, even though 8 years have passed since the Texaco incident. Incidents are still occurring involving alarm systems and there are still significant problems with alarms systems on some major hazard sites. Training, competency and user support are still key areas and users and designers need to be aware of each others requirements. However, solutions are available by original design or by modification, and practical guidance is also available. In 1999, HSE published the revised guidance on human factors. HSG 48 Reducing error and influencing behaviour1. This is HSEs core all-industry guidance and provides a simple and practical introduction to the subject. In the same way the Engineering Equipment and Materials Users Association (EEMUA) Guide2 provides generic all-industry technical guidance. In both cases the industry sector concerned can use the guidance to get started. We expect companies to review their alarm systems from a human factors viewpoint and to seek continuous improvement.
www.safetyusersgroup.com
www.hse.gov.uk info@hse.gov.uk
www.instmc.org.uk publications@instmc.org.uk
Texaco
This incident and its lessons have been well documented4, and a summary of the incident is given in Appendix 1. The key Texaco problems included: Alarm floods Too many standing alarms Control displays and alarms which did not aid operatives No clear process overview to help diagnosis Alarms which presented faster than they could be responded to 87% of the 2040 alarms displayed as high priority, despite many being informative only Safety critical alarms were not clearly distinguished.
The other key lesson was that the management of the alarm system cannot be successfully dealt with in isolation from the overall safety management system (SMS) context. In other words this is not just a technical issue and both users and designers/installers need to keep this in mind. One obvious question, but one which designers have failed to take account of in the past, is how alarms are actually used by operators on site. The SMS failures identified after the Texaco incident included: Deficiencies in the plant modification procedure An inadequate instrument maintenance system Inadequate training and competence of operators A lack of clear guidance on managing unplanned events and when to initiate emergency plant shutdown A lack of clear authority to initiate shutdown.
Ultimately, plant safety should not depend on an operator response to an alarm. In industry the chances of an operator failing to act in such circumstances can be very much higher than one may think. In two recent LPG releases from road tankers, the operators present failed to use the tanker emergency stop buttons - there were a variety of contributory reasons in each case (not least of which was the distraction of being enveloped in a vapour cloud of unignited LPG!). Nevertheless, there are real world effects of an emergency or upset situation on human reliability that a designer or risk assessor may not always take into account.
www.safetyusersgroup.com
www.hse.gov.uk info@hse.gov.uk
www.instmc.org.uk publications@instmc.org.uk Reliance on people or automatic systems for safety-critical functions needs to be properly defined at the design stage, and then assured in the same way that any other system - such as quality - is.
The main findings were: a) Installation of plant wide control systems, with central monitoring (DCS) was closely linked to a proliferation of alarms that were in fact mostly status indicators. b) Structured assessments of what might go wrong (HAZOPs) generally added extra alarms, again often not true alarms. c) Corporate standards are available but varied in quality. IEC61508 was mentioned in some but this standard contains no specific guidance on human factors or on the overall management and implementation of alarm systems. d) The sites visited had analysed alarm rates and spurious alarms but they had not established performance standards suitable for monitoring progress. Even in normal operation one plant still experienced peaks of 64 alarms per hour. e) Simulators were reported - and confirmed - as bringing significant benefits and savings in identifying and remedying potential operator problems.
Amongst the issues arising from the project it was recognised that there were re-engineering difficulties on existing plants (the I wouldnt start from here syndrome) and associated inspection difficulties for HSE. It also confirmed that many companies would initially find it difficult to justify the adequacy of their alarm systems, where this was required by the Control of Major Accident Hazards (COMAH) Regulations6. There was a general lack of human factors expertise on site or available to them and the lack of clear standards and benchmarks was confirmed. Overall the project showed that, for critical safety-related activity, a good risk assessment was the key starting point.
www.safetyusersgroup.com
www.hse.gov.uk info@hse.gov.uk
www.instmc.org.uk publications@instmc.org.uk
Step 1: Find out if you have a problem The advice is to take some measurements (metrics) and talk to those involved in the process, including operators and line managers. The perception may be that all is well but are there systems in place that would reliably confirm this? It is important to remember that absence of evidence is not evidence of absence in this or any other case. Given the frequency of major accidents a 10-20 year record of no incidents on one site is not on its own sufficient reason for complacency. And how are new alarms added or existing ones modified? Is the design to a standard that takes account of human limitations? Step 2: Decide what to do and take action Form a representative team to progress the issues. Implement some quick wins both to deal with the problems and to give positive feedback to those involved that action is taking place. Establish operator competency and identify training needs. Provide support (e.g. on- or off-line help including diagnostics, clear navigation routes around the screen pages), to help operators respond effectively to alarms including in emergencies. Step 3: Manage and check what has been done Make the approach systematic and part of the SMS. This is not an add-on. Draw up an alarm strategy and a standard for the site. Carry out audits and review the results. Repeat the baseline measures to check progress. Get it right first time where the opportunity is offered e.g. through the site purchase specification system, and the change management procedures.
www.safetyusersgroup.com
www.hse.gov.uk info@hse.gov.uk
www.instmc.org.uk publications@instmc.org.uk
HSE will consider enforcement where there is: No fundamental review and/or identification of safety critical alarms; and High reliance on operator to react or respond to alarms to prevent a major accident
But - as always a key consideration will be proportionality. Any action should be proportionate to the hazards and risks in each case. In the same way we would expect companies to have targeted their own efforts on key areas identified, for example by risk assessments including those in COMAH safety reports.
Users
HSE would expect to see: A policy that recognises human factors in alarm handling as a management issue A logical process in train which has assessed/is assessing the current situation A sensible action programme to deal with issues found, and For COMAH, a rigorous demonstration that human factors have been addressed adequately where operator response to alarms is claimed as defence against major accidents.
Users should evaluate, prioritise and modify existing alarm systems, taking account of the degree of risk to target their efforts. They should ensure new designs meet EEMUA standards and take into account human limitations. The alarm system should be managed as an integral part of the SMS, and as part of a continuous improvement programme. Remember that, no matter how well designed, no alarm system can operate effectively if the work loading and staffing levels do not take account of all foreseeable conditions (from normal, through upset, shutdown and start up, to emergency) and if operators are not competent or if their needs have not been considered in the new design or modification. Users also need to consider shift lengths and patterns, and fatigue factors - otherwise there may be no response when one is most needed.
Designers
HSE would expect you designers to follow the EEMUA Guide principles with the SMS/Safety Report context raised and considered as part of the overall solution for a proposed new or modified alarm system. Remember that the needs of installers and commissioning engineers will not be the same as those of the final system practitioners e.g. designers, installers, maintainers.
www.hse.gov.uk info@hse.gov.uk
www.instmc.org.uk publications@instmc.org.uk
Designers - have you: Considered human factors and ergonomics explicitly in design and how you can better incorporate them in the design process? Selected a priority area for improvement in your design approach? Involved the users?
References
1. Anon. Reducing Error and Influencing Behaviour HSG 48, HSE Books, 1999, ISBN 0 7176 2452 8 2. Anon Alarm Systems, A guide to Design, Management and Procurement EEMUA Publication 191, 1999, ISBN 0 85931 076 0 available from The Engineering Equipment and Materials Users Association, 54 Beech Street, London EC2Y 8AD. 3. Bransby and Jenkinson The management of alarm systems CCR 166, HSE Books, 1998, ISBN 0 7176 1515 4 4. Anon The explosion and fires at the Texaco Refinery, Milford Haven, 23 July 1994 HSE Books, 1997, ISBN 0 7176 1413 1 5. Anon Better Alarm Handling HSE Information Sheet, Chemical sheets 6, HSE Books, also available on the HSEs website at www.hse.gov.uk/pubns/chi6.pdf 6. Anon A guide to the Control of Major Accident Hazards Regulation 1999 L111, HSE Books, 1999, ISBN 0 71276 1604 5
www.safetyusersgroup.com
www.hse.gov.uk info@hse.gov.uk
www.instmc.org.uk publications@instmc.org.uk
APPENDIX 1
Texaco Refinery explosion and fires 1994: Human factors in the incident The event Twenty tonnes of hydrocarbon were released and exploded when a slug of liquid was sent through the flare system pipeline, which failed. The site suffered severe damage, and UK refinery capacity was significantly affected. Only luck prevented multiple deaths. It was a Sunday, and some people had left the area just before the explosion. The following description of the event has been considerably simplified to highlight the human factors aspects of the case. The incident involved three interconnected process vessels. A loss of feed to vessel 1 caused the valve A to close to prevent the vessel being emptied. As vessel 2 emptied, valve B closed, trapping in the remaining liquid. As heat was still being applied, this liquid vaporised, and the vessel vented into the flare system, through the flare stack knock-out drum, which catches liquid to prevent it going to flare. Meanwhile, the feed to vessel 1 had been restored, and valve A was opened. This should have caused valve B to open, but this did not occur. The operators were aware that vessel 2 was still overfilling, so they opened valve C to provide another route out of that vessel. This resulted in a high liquid level in the flare stack knock out drum. Due to a previous modification, there was no facility to pump out the knock-out drum quickly. By this time, the operators were concentrating on the screens that showed the problems in vessels 1 and 2, and were not being helped by the flood of alarms being generated. The combination of a high liquid level in the knock-out drum, and vessel 2 venting into the flare system again, caused a slug of liquid to be carried through the knock-out drum and into the flare line, which collapsed at a weak point. Consequences Fatal injuries were avoided only by luck e.g. contractors in a van were about to enter the area when the explosion happened; the concrete roof of a building fell in minutes after people had left it. The rebuilding costs were 48,000,000. Texaco and Gulf Oil were prosecuted by HSE and fined 370,000. UK refining capacity was significantly affected. Lesson 1 - Alarm System The control displays and alarms did not aid operatives. A process overview would have helped diagnosis. The alarms appeared faster than they could be responded to and key alarms were missed in the flood. 87% of the 2040 alarms displayed as "high" priority, despite many being informative only - safety critical alarms were not distinguishable from the rest. Lesson 2 - Safety Management System (SMS) SMS failures included: The plant modification procedure did not prevent removal of the flare knock-out drum emptying facility The instrument maintenance system did not prevent 40% of instruments from being defective
Lesson 3 - Training and competence Training should include: Clear guidance on how to manage unplanned events Clear guidance on when to initiate emergency plant shutdown Clear authority to initiate shutdown
www.safetyusersgroup.com
www.hse.gov.uk info@hse.gov.uk
www.instmc.org.uk publications@instmc.org.uk
APPENDIX 2
A case study of improvements identified and made to a specific - and complex - alarm system Introduction and background In this case a control room had been designed and set up to control a large number of widely located and linked units. The problem identified by HSE Inspectors was that the operators were faced with a very userunfriendly set of displays with a very large number of undifferentiated alarms being very poorly presented.
The designers had set out with the best intentions but, in seeking to alarm virtually anything that moved in the system, they had not considered the operators' needs in the control room and had become progressively blind to the main object of the exercise - to provide effective control. The installers and commissioning engineers did not consider this to be a problem because of their detailed familiarity with the system from first design onwards but since the operators were not involved in the design, their different needs were not taken properly into account. Operators were faced with long lists of alarms that they had to scroll through constantly. The actual alarms were hard to pick out, being identified only by a long number string, and the safety critical alarms weren't differentiated from the rest. Many of the 'alarms' weren't true alarms i.e. no defined operator response was required. Some of them repeated so filling up the screen display. To add to their difficulties the system required them to both accept and clear many of the alarms.
www.safetyusersgroup.com
www.hse.gov.uk info@hse.gov.uk
www.instmc.org.uk publications@instmc.org.uk In this situation, operators being human, inventive and wishing to get the job done, they found their own shortcuts and methods to try to cope. Given that these methods were unsystematic and resulted in un-assessed real changes, they introduced the possibility of further errors into the system. For example, operators were routinely 'shelving', or otherwise 'fixing', alarms to get them off screen so that they could focus better on what they thought were the key ones. Some key lessons Perhaps one key lesson here for designers and users is that the needs of those installing and commissioning such a complex system are not the same as the needs of the final users of the system. Traditionally those installing and commissioning systems like all key process stages or functions to be monitored and so they are often routinely alarmed too. However, the alarm system needs of the user (including the operators) will often be different and, if they haven't been involved in the design process (or at least considered) further problems may arise. When the project is handed over this can create real difficulties for the operators in control when they are left with a system that is over-complex for their day-to-day production or other control needs. If the design of a new system is used as an opportunity to solve a wide range of other related or unrelated problems (the 'bandwagon' effect) then the end result could well be messy if key aims and objectives are not clearly set from the start and then implemented. In particular both human factors and human reliability need to be considered. Identified improvements These ranged from some very simple additions to the screen display, to wider SMS solutions. For example: Specific Navigation - provision of a button to allow operators to return instantly to the top (now priority) page of the alarm list being viewed. Assessment of the actual hazards so that safety critical alarms could be identified and prioritised. Colour coding of alarms to reflect their importance and type. Provision of a priority filter list to allow operators to pick out key alarms in the event of an alarm flood and to allow them to shift easily between alarm categories. A review and subsequent reduction in the number of alarms. Operators could no longer 'shelve' (suppress or 'hand-dress i.e. replace them with a fixed value) alarms without going through EEMUA Guide safeguards i.e. - provision of quick and easy access to view the shelved alarms and print them off - unshelving by the operators is made easy - adequate shift handover arrangements for shelved alarms - operator training on shelving implications and subsequent monitoring - prevention of one operator being able to shelve an alarm in an area also controlled by another operator without that operator being made aware of it Removal of 'alarms' which in fact were status indicators only or which were not intended for action by the control room operators i.e. does the alarm require a defined operator response or not? Elimination of alarm list flooding with repeating alarms - introduction of single line annunciation. The previous requirement to both accept all alarms and accept their later clearance was removed (except in some carefully-defined special cases) so that clearance no longer routinely required an operator response. Where alarms were both accepted and cleared they were now prevented from just disappearing off the alarm list until they are 'repacked' by the operator. A repack facility was introduced to avoid alarm messages moving up or down the alarm list like this without the operator requesting it, so avoiding the possibility of the operator not being able to find the alarm again.
www.safetyusersgroup.com
www.hse.gov.uk info@hse.gov.uk
www.instmc.org.uk publications@instmc.org.uk General Adequate monitoring and analysis of the alarm system and operator handling was put in place so that further improvements could be made over time. Recommendations were also made for some longer-term fundamental redesign of key parts of the system. Competencies were reviewed and further targeted training introduced together with suitable 'refresher' training and monitoring of performance. A formal change procedure was introduced which included the operators. Procedures were reviewed - and new ones introduced following wide consultation, and were tested and monitored for useability. The link back to the safety report was reviewed and the consequences of operator error where high reliance was placed on operator response were reassessed - the results were fed back into the redesign and improvement process. An HF 'champion' - a senior manager was appointed to provide a focus and management drive to ensure the recommendations were implemented with specific milestones being set. The basic ergonomics of the control room - and operator control over them e.g. heat, light and ventilation as well as layout, comfort etc - were reviewed. The company's project management process was reviewed to ensure that in future it worked better e.g. some key issues were identified at early stages of the project but were not then dealt with. Rostering, including shift patterns and lengths were reviewed to consider potential fatigue problems e.g. some operators were working 7 nights in succession.
www.safetyusersgroup.com
10