Вы находитесь на странице: 1из 225

Contents Copyright March, April 1996, February 1997, January 1998, April-July 1999, June 2000, January 2001,

, May 2001, February 2002, March 2004, April 2004. Risk & Reliability Associates Pty Ltd, Consulting Engineers. 5 Edition Cover by Peter Anderson 5 Edition Co-ordination and review by Kris Francis. th 5 Edition editing by Cherilyn Tillman and Bob Browning. Printed and Bound in Australia by Imscam Pty Ltd, Melbourne. This text is copyright. Apart from any fair dealing for the purpose of private study, research, criticism or review or as otherwise permitted under the Copyright Act, no part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means electronic, optic, mechanical, photocopying, recording or otherwise without the prior written permission from the publisher, Risk and Reliability Associates Pty Ltd. ISBN 0-9585241-3-0 RRP AUD $298.00 (including GST). Postage and handling extra.
th th

Published by: Risk & Reliability Associates Pty Ltd ACN: 072 114473 ABN: 98 072114473 Consulting Engineers Level 2 56 Hardware Lane MELBOURNE AUSTRALIA 3000 e-mail: web: fax: voice: publications@r2a.com.au http//www.r2a.com.au +61 3 9670 5278 +61 3 9602 4747

Also in Sydney and Wellington.

This text is intended to provide general information concerning the concepts and applications of risk and reliability theory. The text is used by R2A in its training courses on risk and reliability assessment. The examples and templates are provided as examples of the analytical tools used in assessing and managing risk. They should not be used a substitute for obtaining professional advice or assistance. The authors accept no responsibility for any errors or omissions in the material, or for the results of any actions taken as a result of using these examples or templates.

Risk & Reliability Associates Pty Ltd

Contents R2A Document Control Risk & Reliability An Introductory Text Edn. 1.0 2.0 3.0 3.1 3.2 3.3 4.0 5.0 Date 04/96 02/97 01/98 07/99 06/00 01/01 02/02 02/03/04 15/03/04 23/03/04 04/04/04 19/04/04 Section Issue/Nature of Revision First Edition Second Edition Third Edition Third Edition, Revised Third Edition, Second Revision Third Edition, Third Revision Fourth Edition Fifth Edition Typos and layout Chapter 16 & Index Chapters 17 & 18. Typos & Index Prepared: RMR RMR RMR RMR GEF LS GEF, CJT, RWB RMR, KJA, CJT, RWB RMR CJT, RWB RWB RWB Reviewed: KJA KJA

RMR RMR RMR CJT, RWB KNF RMR MK, FS, RMR RMR

Contributors to earlier editions and revisions include: Teresa Alam John Bellhouse Keith Hart Matthew Lambert Simon Meiers Paul Rees PM Strickland.

TABLE OF CONTENTS Preface to the 5 Edition A SHORT DICTIONARY OF RISK & RELIABILITY TERMS AND ACRONYMS
th

vi vii

ii

Risk & Reliability Associates Pty Ltd

Contents

PART 1 GENERAL PRINCIPLES


1. 1.1 1.2 1.3 1.4 1.5 1.6 2. 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 3. 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4. 4.1 4.2 4.3 4.4 4.5 4.6 4.7 5. 5.1 5.2 5.3 5.4 5.5 5.6 5.7 6. 6.1 6.2 6.3 6.4 6.5 6.6 INTRODUCTION TO RISK AND RELIABILITY CONCEPTS The Nature Risk Types of Risk Risk Management Evolution Historical Perspective of Risk Reliability Quality RISK PARADIGMS & MODELS The Rule of Law Insurance Asset Management Threats and Vulnerabilities Risk as Variance Best Practice Simulation Culture Paradigm Integration Risk Models RISK AND GOVERNANCE Risk Managements Role in Good Governance Corporate Governance Systems Origins of the Good Governance Movement The Rise of the Risk Society Governance and Non-Financial Risk Public Sector Governance and Risk Risk and Corporate Citizenship Fallout Severity Basic Principles of Good Corporate Urban Governance LIABILITY Criminal vs Civil Standard Common Law Criteria On Juries and Justice Due Diligence Safety Cases Adversarial Legal System Contradictions Risk Auditing Systems CAUSATION Paradigms Biological Metaphors Discrete State Concepts Time Sequence Energy Damage Energy Damage Models Latent Conditions RISK CRITERIA Legal Criteria Individual Risk Criteria Societal Risk Criteria Environmental Risk Criteria Insurance Criteria Ethical Criteria 1.1

2.1

3.1

4.1

5.1

6.1

Risk & Reliability Associates Pty Ltd

iii

Contents

PART 2 TECHNIQUES
7. 7.1 7.2 7.3 7.4 7.5 8. 8.1 8.2 8.3 8.4 9. 9.1 9.2 9.3 9.4 9.5 9.6 9.7 10. 10.2 10.3 10.4 10.5 10.6 10.7 11. 11.1 11.2 11.3 11.4 12. 12.1 12.2 12.3 12.4 TOP DOWN TECHNIQUES SWOT Assessments Upside and Downside Risk Vulnerability Assessments Enterprise Risk Profiling Project Risk Profiling RANKING TECHNIQUES Risk Registers Ranking Acute OH&S Hazards Ranking Property Loss Prevention Hazards Integrated Investment Ranking MODELLING TECHNIQUES Trees Blocks Integrated Presentation Models Common Cause Failures Human Error Rates Equipment Fault Rates System Safety Assurance BOTTOM UP TECHNIQUES RCM HazOps Common Mode Failures Risk Management and the Project Life Cycle QRA HACCP GENERATIVE TECHNIQUES James Reason et al Transparent Independent Rapid Risk Reporting Generative Interview Technique Generative Solutions Technique RISK & RELIABILITY MATHEMATICS Discrete Event Mathematics Breakdown Failure Mathematics State Theory Mathematics Fractional Dead Time Mathematics 7.1

8.1

9.1

10.1

11.1

12.1

iv

Risk & Reliability Associates Pty Ltd

Contents

PART 3 THEMES, APPLICATIONS AND CASE STUDIES


13. 13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 13.9 13.10 14. 14.1 14.2 14.3 14.4 14.5 15. 15.1 15.2 15.3 15.4 15.5 15.6 16. 16.1 16.2 16.3 16.4 16.5 17. 17.1 17.2 17.3 17.4 17.5 18. 18.1 18.2 18.3 18.4 PROCESS INDUSTRY MODELLING Safety Cases Context (Top Down) Quantitative Risk Assessment (QRA) Fire Modelling Pool Fires Jet Flames Explosions Toxic Gas Clouds Fire Safety Studies Risk Criteria Used in Australia and New Zealand CRISIS MANAGEMENT Intention Lessons in Fallout Management Design Stage Case Studies Conclusion INDUSTRY BASED CASE STUDIES Airspace Risk Assessment Train Operations Rail Model Fire Risk Management (in buildings) Transmission Line Risk Management Bushfire Risk Management Tunnel Risk Management OCCUPATIONAL HEALTH & SAFETY Legislative Framework OH & S Risk Assessment Performance Indicators Information Structures Audit & Safety Management Systems FINANCIAL RISK Risk and Opportunity Terms Utility and Risk Models Market Risk Mathematics SECURITY Security and Risk Management Security Terms Basic Elements of Security Management The Terrorist Threat 13.1

14.1

15.1

16.1

17.1

18.1

Risk & Reliability Associates Pty Ltd

Contents Preface to the 5th Edition This is the 5th Edition of Risk and Reliability - An Introductory Text. Risk and Reliability Associates Pty Ltd published the first edition of this Text in April 1998. Presently the Text has three parts. Parts 1-2 are based on the very successful 2-day risk management short courses presented by R2A director Richard Robinson for EEA (Engineering Education Australia). Part 3 summarises published R2A practice experience. R2As intention is to extend the Text to four parts so as to include material based on the System Safety Assurance Course presented by R2A Director Kevin Anderson for EEA. This course presently uses the th th 4 Edition as background reading, but work on the 6 edition is scheduled for later in 2004. The evolving nature of risk and risk management in the contemporary globalising environment that is sometimes described as the Risk Society necessitates frequent revision and additions. The recent spate of high profile, local and overseas corporate failures, for example, has created unprecedented interest in corporate governance. The evident vulnerabilities flowing from large-scale technology require scrutiny both from accidental and deliberate actions. And liability is increasingly ubiquitous. An integration of top down and bottom up risk management concepts and techniques as explained in Parts 1-2 becomes necessary to cope with the widening range and severity of modern risk. Part 3 comprises technical explanations of the practical applications of these concepts and techniques. The addition of Part 4 to the planned 6 Edition will address risks resulting from the rise of computer systems, and how, in the context of human frailty, such risks can be managed.
th

R W Browning Hardware Lane, Melbourne March 2004

vi

Risk & Reliability Associates Pty Ltd

Contents A Short Dictionary of Risk & Reliability Terms and Acronyms The dictionary below defines the usage of key terms in the R2A Text. Given the multi-disciplinary nature of risk management, different specialist groups often attribute different meanings to commonly used terms and different terms are often used for similar or near identical concepts. Items underlined are referenced as a separate entry in the R2A dictionary. For simplicity, acronyms have been included rather than giving them a separate listing. The list is adapted from an earlier list presented in a paper by R M Robinson and D B L Viner (1983). Accountability ALARA ALARP Algorithm Asset The property that ensures that the actions of an entity can be traced. As Low as Reasonably Achievable. As Low As Reasonably Practicable. An explicit and finite step-by-step procedure for solving a problem or achieving a required end. In engineering and commerce, usually a capital cost item. In security, insurance and loss control, usually refers to an item that if (accidentally) lost would cause a loss. An inspection or checking of methods of doing business. Data collected and potentially used to facilitate an audit. The ratio of the total system or entity up time to system or entity elapsed time, the latter being the sum of the total up time and down time. It is therefore a function of reliability and repair time. In insurance terms, the loss of profits over a defined period, typically a year; otherwise any production or sales stoppage. The unwritten law derived from the traditional law of England as developed by judicial precedence, interpretation, expansion and modification (Butterworths (1998). Concise Australian Legal Dictionary. Butterworth, Australia). Common Mode Failures refer to the simultaneous failure of multiple components or systems due to a single, normally, external cause such an earthquake or fire. It is used to distinguish discreet failures of individual components or systems due to a defect arising locally within that component or system. In commercial terms it refers to threats whose occurrence would simultaneously affect multiple inputs to any equation, for example, the advent of a third world war, change in interest rates, raw material sources and the like. Consequence/s The actual or potential degree of severity of loss or gain.

Audit Audit Trail Availability

Business Interruption Common Law

Common Mode Failure

Risk & Reliability Associates Pty Ltd

vii

Contents Controls The most common term used in safety and in this context means to hold in check or to restrain. It encompasses a large range of measures taken to reduce the likelihood and consequences of adverse outcomes. Controls can encompass both protection and precautions. For example, personal protective equipment is generally protection. The usual hierarchy of controls is: Elimination, that is, removal of the hazard or risk Engineering controls, that is, those that design out the hazard or reduce it Substitution of a less hazardous substance or equipment or process Administrative controls such as job rotation to reduce exposure time to the hazard Personal protective equipment, for example, dust masks, hearing protectors, gloves etc Critical Control Point (CCP) Damage Control A point, step or procedure at which control can be applied and a food safety hazard can be prevented, eliminated or reduced to acceptable levels. Procedures designed to minimise the severity of loss. The same performance of a function by two or more independent and dissimilar means (of particular reference to software) (Smith D J (1993) Reliability, th Maintainability and Risk. Practical Methods for Engineers. 4 Edition. Butterworth Heinemann, Oxford). A minimum standard of behaviour involving a system which provides against contravention of relevant regulatory provisions and adequate supervision ensuring that the system is properly carried out (Butterworth (1998). Concise Australian Legal Dictionary. Butterworth, Australia). A statutory defence to a charge of causing or permitting environmental harm or pollution (Butterworth (1998). Concise Australian Legal Dictionary. Butterworth, Australia). Engineering Those activities devoted to changing the material world to a desired state (Robinson Richard M (1981). An Outline of the Philosophy of Engineering and its Consequences, General Engineering Transactions, Engineers Australia, Vol. GE5, No.1, July 1981 pp.35-41). Environmental Risk Assessment. External Risk Reduction Facility. The trading name of The Institution of Engineers, Australia An event or continuing process, which if realised, will lead to circumstances having a potential to degrade, directly or indirectly, the quality of the environment in the short or long term. (Wright N H (1993). Development of Environmental Risk Assessment (ERA) in Norway. Norske Shell Exploration and Production). A measure of potential threats to the environment, which combines the probability that the events will cause, or lead to degradation of the environment and the severity of that degradation Wright N H (1993). Development of Environmental Risk Assessment (ERA) in Norway. Norske Shell Exploration and Production). Equipment Under Control. An incident or situation, which occurs in a particular place during a particular interval of time. (AS 4360:1999 Risk Management).

Due Diligence

ERA ERRF Engineers Australia Environmental Hazard

Environmental Risk

EUC Event

viii

Risk & Reliability Associates Pty Ltd

Contents Event Tree Analysis A hazard identification and frequency analysis technique, which employs inductive reasoning to translate different initiating events into possible outcomes. (AS 3931:1998 Risk Analysis of Technological Systems Applications Guide). These are displayed graphically. A cessation of function that has consequences (usually meaning death, injury or damage) beyond a component or entity merely becoming unavailable to perform its function. It can also be referred to as a hazardous failure (Smith D J (1993) Reliability, Maintainability and Risk. th Practical methods for Engineers. 4 Edition. Butterworth Heinemann, Oxford). See Fault. The inability of an entity to perform its required function, resulting in unavailability. Non-performance to some defined performance criterion (Smith D J (1993) Reliability, Maintainability and Risk. Practical methods for th Engineers. 4 Edition. Butterworth Heinemann, Oxford). It can also be referred to as a breakdown failure. Fractional Dead Time (a form of unavailability). The fraction of any time period that a defence or control system is dead (cannot operate correctly). It is therefore a function of audit frequency and the time to revive/restore the control system. Fault Modes and Effects Analysis. (AS 3931:1998 Risk Analysis of Technological Systems Applications Guide). Fault Modes, Effects and Criticality Analysis. (AS 3931:1998 Risk Analysis of Technological Systems Applications Guide). The rate at which something occurs per unit time. Fault Tree Analysis. A hazard identification and frequency analysis technique, which starts with the undesired event and determines all the ways in which it could occur. These are displayed graphically. (AS 3931:1998 Risk Analysis of Technological Systems Applications Guide). See Societal Risk Hazard and Critical Control Point analysis. An approach of identifying, evaluating and controlling safety hazards in food processes. A source of potentially damaging energy, which can give rise to a loss and used extensively by engineers and physical scientists. To be compared to a vulnerability. A source of potential harm or a situation with a potential to cause loss. (AS 3931:1998 Risk Analysis of Technological Systems Applications Guide and AS 4360:1999 Risk Management). A situation that could occur during the lifetime of a product system or plant that has the potential for damage to the environment. Process of recognising that a hazard exists and defining its characteristics. (AS 3931:1998 Risk Analysis of Technological Systems Applications Guide). HAZard and OPerability study. A formal analysis of a process or plant by the application of guidewords. Human Error Assessment and Reduction Technique.

Failure (risk)

Failure (reliability) Fault

FDT

FMEA FMECA Frequency FTA

Group Risk HACCP Hazard

Hazard Identification

HazOp HEART

Risk & Reliability Associates Pty Ltd

ix

Contents Heuristic Proceeding to a solution in the absence of an algorithm, by incremental exploration using conceptual devices such as ideal types, models and working hypotheses which are intended to provide solutions rather than explain facts. Highly Protected Risk. US engineering term used to describe a level of loss control excellence. Human Reliability Assessment. The Institution of Chemical Engineers (UK). The Institution of Professional Engineers, New Zealand An event or situation, which occurs in a particular place during a particular interval of time which should provide an alert to the risk management system. This can be a failure of a control system or a near miss. The frequency at which an individual may be expected to sustain a given level of harm from the realisation of specified hazards (Institution of Chemical Engineers (1985). Nomenclature for Hazard and Risk Assessment in the Process Industries. IChemE, Rugby, Warwickshire). A method of transferring risk by financial means. A property of an object or data that has not been modified and is fit for the purpose for which it is to be used. Internal Rate of Return. Job Safety Analysis. A failure which is not detected and/or enunciated when it occurs. (SAE ARP 4781:1998 Guidelines and Methods for Conducting the Safety Assessment process on Civil Airborne Systems and Equipment). A persons present or prospective legal responsibility, duty, or obligation (Butterworth (1998) Concise Australian Legal Dictionary. Butterworth, Australia). Life cycle costing provides a method for determining the total cost of a system over its entire life cycle and is used to establish the cost effectiveness of alternative asset solutions. Cost effectiveness is defined as the ratio of systems effectiveness to life cycle cost (Blanchard (1991) Systems Engineering Management. Prentice Hall; Blanchard and Fabrycky nd (1990). Systems Engineering and Analysis. 2 Edition, Prentice Hall International; Aslaksen and Belcher (1992). Systems Engineering. Prentice Hall). A term to describe the probability or frequency of an occurrence. The embarrassment, harm, financial loss, legal or other damage which could occur due to a loss event. Any negative consequence, financial or otherwise (AS 4360:1999 Risk Management) including death, injury, damage loss or breach of statute. It may lead to a claim and/or court proceedings. See occurrence.

HPR HRA IChemE IPENZ Incident

Individual Risk

Insurance Integrity IRR JSA Latent Condition

Liability

Life Cycle Costing

Likelihood Loss

Loss Event

Risk & Reliability Associates Pty Ltd

Contents Maintainability The set of technical processes that apply maintainability theory to establish system maintainability requirements, allocate these requirements down to system elements, and predict and verify system maintainability performance nd (Blanchard and Fabrycky (1990). Systems Engineering and Analysis. 2 Edition, Prentice Hall International). Mean Down Time. The act of reducing the severity of the potential adverse outcome. In the context of the types of controls listed above mitigation of risk could be achieved by any bar the first, that is, elimination. To check, supervise, observe critically, or record the progress of an activity, action or system on a regular basis in order to identify change. (AS 4360:1999 Risk Management)

MDT Mitigation

Monitor

Monte-Carlo Simulation A frequency analysis technique, which uses a model of the system to evaluate variations in input conditions and assumptions. (AS 3931:1998 Risk Analysis of Technological Systems Applications Guide) MORT MTBF MTTF MTTR Occurrence P&ID Paradigm Management Oversight and Risk Tree. Mean Time Between Failure. Mean Time To Failure. Mean Time To Repair. A sequence of events leading to damage or injury. Process (or Piping) and Instrumentation Diagram. A universally recognised knowledge system that for a time provides model problems and solutions to a community of practitioners (Kuhn T S (1970). nd The Structure of Scientific Revolutions. 2 Edition, enlarged, sixth impression. University of Chicago Press). In the risk context Reason (1993) has defined pathogens as analogous to latent failure in technical systems, similar to resident pathogens in the human body. (Managing the Management Risk: New Approaches to Organisational Safety Chapter 1 of Reliability and Safety in Hazardous Work Systems: Approaches to Analysis and Design.). That risk thought by an individual or group to be present in a given situation (Institution of Chemical Engineers (1985). Nomenclature for Hazard and Risk Assessment in the Process Industries. IChemE, Rugby, Warwickshire). Measures taken beforehand to ward off possible adverse events. In the context of risk management precautions are the result of prudent foresight, that is due diligence. In the context of a Cause-consequence model, precautions act before the loss of control point. The likelihood of an event occurring. A number in a scale from 0 to 1 that expresses the likelihood that one event will succeed another (Institution of Chemical Engineers (1985). Nomenclature for Hazard and Risk Assessment in the Process Industries. IChemE, Rugby, Warwickshire).).

Pathogen

Perceived Risk

Precautions

Probability

Risk & Reliability Associates Pty Ltd

xi

Contents Protection Protection has many meanings. However in the context of risk management it is the state of being protected or something that protects or preservation from injury or harm. In the context of a cause-consequence model, protection usually acts after the loss of control point such as much fire protection equipment. Quantified Risk Assessment. The estimation of a given risk by logical and analytical modelling techniques, or using statistical information from historical data from circumstances similar to existing or planned operations. Conformance to a set of requirements that, if met, results in an organisation, service or product that is fit for its intended purpose. Totality of characteristics of an entity that bear on its ability to satisfy stated and implied needs (AS/NZS 9000.1:1994 Model for Quality Assurance in Design, Development, Production, Installation and Servicing). Risk Adjusted Return On Capital. Reliability Block diagram. A frequency analysis technique that creates a model of the system and its redundancies to evaluate the overall system reliability. (AS 3931:1998 Risk Analysis of Technological Systems Applications Guide) Reliability Centred Maintenance. Restoration of a system to its desired state following a fault or failure. The probability that a device will satisfactorily perform a specified function, under given operating conditions, for a specified period of time (Smith David J (1993). Reliability, Maintainability and Risk. Practical Methods for Engineers. Fourth Edition. Butterworth Heinemann, Oxford.). The set of technical processes that apply reliability theory to establish system reliability requirements, allocate these requirements down to system elements, predict and verify system reliability performance and establish reliability growth programs (US MIL-HDBK-338-1A). The remaining level of (pure) risk after risk treatment measures have been taken. (AS 4360:1999 Risk Management) The human, physical and financial assets of an organisation. The chance of something happening that will have an adverse impact upon objectives. It is measured in terms of consequences and likelihood. (AS 4360:1999 Risk Management) The potential realisation of the unwanted consequences of an event from which there is no prospect of gain. Generally, risk deliberately undertaken for a perceived benefit. A systematic use of available information to determine how often specified events might occur and the magnitude of their consequences. (AS 4360:1999 Risk Management) The study of decisions subject to uncertain consequences. The overall process of risk analysis and risk evaluation. A plot of likelihood vs consequence for a series of events.

QRA

Quality

RAROC RBD

RCM Recovery Reliability

Reliability Engineering

Residual Risk Resource/s Risk

Risk (Pure) Risk (Speculative) Risk Analysis

Risk Assessment Risk Curve or Diagram

xii

Risk & Reliability Associates Pty Ltd

Contents Risk Engineering Risk Evaluation The application of engineering techniques to the risk management process. The process used to determine risk management priorities by comparing the level of risk against predetermined standards, target risk levels or other criteria. (AS 4360:1999 Risk Management). The methods applied to fund risk treatment and the financial consequences of risk. Note: in some industries risk financing relates to the funding of the financial consequences of risk. (AS 4360:1999 Risk Management) The observation and identification of new risk parameters (Rowe W D (1977). An Anatomy of Risk. Wiley Interscience, New York). The process of determining what can happen, why and how. (AS 4360:1999 Risk Management) The process of planning, organising, directing and controlling the resources and activities of an organisation in order to minimise the adverse effects of accidental losses to that organisation at least possible cost (Head E L (1978). The Risk Management Process. The Risk & Insurance Management Society Incorporated New York. Page 8) An acceptably low or tolerable level of risk. The opposite of dangerous. Safety Management Achievement Program. Term coined by the Victorian WorkCover Authority. The combination of availability, confidentiality and integrity. Examines how the results of a calculation or model vary as individual assumptions are changed. (AS 4360:1999 Risk Management). The measure of the absolute consequences of a loss, hazard or vulnerability, ignoring likelihood. In insurance terms the absolute magnitude of the dollars associated with a single (potential) loss event. The relationship between frequency and the number of people suffering from a specified level of harm in a given population from the realisation of specified hazards (Institution of Chemical Engineers (1985). Nomenclature for Hazard and Risk Assessment in the Process Industries. IChemE, Rugby, Warwickshire). Sometimes referred to as Group Risk. Those people and organisations who may affect, be affected by, or perceive themselves to be affected by, a decision or activity. (AS 4360:1999 Risk Management). Law created by legislation, that is, made by Parliament (Butterworth (1998) Concise Australian Legal Dictionary. Butterworth, Australia) A set of technical processes that apply risk management theory to establish system safety requirements. These requirements are allocated down to the system elements, and predict and verify system safety performance and direct actions to prevent and/or reduce unacceptable levels of identified safety hazards (Blanchard B (1991). Systems Engineering Management. Wiley Interscience) Safety Related System. Technique for Human Error Rate Prediction. An action or event that might prejudice any asset.

Risk Financing

Risk Identification

Risk Management

Safe SafetyMAP Security Sensitivity Analysis Severity

Societal Risk

Stakeholders

Statute Law System Safety

SRS THERP Threat

Risk & Reliability Associates Pty Ltd

xiii

Contents Tolerable Risk Risk that is not regarded as negligible or something that can be ignored, but must be kept under review and reduced further still (Health and Safety Executive (1988). The Tolerability of Risk From Nuclear Power Stations. HMSO, London). Value At Risk. A concept similar to that of Loss Expectancy. A weakness with regard to a threat. To be compared to a hazard. A method of 'completeness' checking for a defined scenario.

VAR Vulnerability Vulnerability Analysis

xiv

Risk & Reliability Associates Pty Ltd

Concepts

1.
1.1

Introduction to Risk and Reliability Concepts


The Nature of Risk

Risk means different things to different people at different times. However, one element that is common to all concepts of risk is the notion of uncertainty. If we knew what would happen next, there would be no risk. If immortal and omnipotent beings existed, the concept of risk would be incomprehensible to them. But in the world of finite beings, all face uncertain, possibly precarious futures. Risk, and what to do about it, are vital human concerns. Decision-making processes whether of statutory regulators, court judges, business managers or ordinary individuals reflect human concern to improve safety and security, and the reliability and efficacy of their endeavours in the face of ever present uncertainty. 1.2 Types of Risk

Risk is generally divided into two broad types: Pure Risk and Speculative or Business Risk. If the likely consequences of a risk are considered to be always bad, offering no prospect of gain, it is designated pure risk. The possible events or situations that pure risk poses are treated as hazards or vulnerabilities. If the possible consequences of a risk are considered potentially desirable, that risk is designated as speculative or business risk, and is treated as an opportunity. Consequently, risk is assessed according both to its estimated likelihood or probability (how often it is likely to occur) and the value of its estimated consequences (how desirable or undesirable its impact may be). 1.3 Risk Management Evolution USER Insurance Broker Insurance Company Safety Manager Risk Manager Line Manager Investment Manager Auditors Legal Advisors/Lawyers Board Members OBJECTIVES Maximise new clients Maximise profits Maximise underwriting profits Maximise safety budget Minimise loss Maximise corporate profits Meet production objectives Maximise profits Maximise investment returns Minimise Risk Confirm reality matches reports Manage (potential) conflicts Win court cases Maximise corporate profits Minimise personal liability LIMITATIONS Affordable services only Conflict of objectives Conflict of objectives Narrow approach Loss reduction may not be cost effective Lacks knowledge of specialised disciplines Not line management May not understand contribution of risk management to results Risk and profit do not directly accrue to adviser Historical analysis; the past may not reflect the future Disputes = prosperity Sign off is difficult Lacks knowledge of specialised disciplines

Users of the term "Risk Management" (Adapted from Blombery, 1982)

Risk & Reliability Associates Pty Ltd

1.1

Concepts Several large international insurance brokers introduced both the concept and the term "risk management" into Australia in the 1970s. The move derived largely from a marketing strategy to gain new clients. Subsequently, others outside the insurance industry took up the term, using it to serve various purposes. Because the term risk management is used now in many different ways by different groups of professionals, confusion often arises as to what precisely is being referred to. Blombery (1982) suggests that the best way to avoid misinterpreting intentions is to examine what the main professional users of the term customarily imply when they refer to risk management, as shown in the table above. NB: Recently the financial investment industry also adopted the concept, developing a new lexicon in the process. For example, VAR (Value At Risk), which is a variation on the more traditional term and Loss Expectancy, which historically has been used by the insurance industry (Taylor, 1996). 1.4 Historical Perspectives of Risk

What we think about risk and how we address it depends on the way we perceive that risk and what, at different times, we believe to be its cause. For example: 1.4.1 The Plague

When a society believes that the reason many are dying from the plague is because God is punishing people for their sins, it will manage the risk differently from a society that believes in viruses and bacteria. The following illustrates some early attempts to control the plague (Nohl, 1926): SPEYER 1347 A strict prohibition against gambling in churchyards. COUNCIL OF TOURNAI All concubines to be expelled or married; Sundays to be strictly observed; manufacture, sale and use of dice completely suppressed. (Dice factories turned to making rosary beads). ROUEN (France) 1507 'No gambling, cursing, drinking or excesses'. 1.4.2 United Kingdom - Public Health Reforms in the 1840s

A particularly interesting risk management issue arose with the control of epidemics in the UK in the 1830s and 40s (Winslow 1967). Note that at this time viruses and bacteria were not known. The then theory of contagion related to miasmas or clouds of noxious, odious gases. Chadwick's Report on the Sanitary Conditions of the Working Classes (1842) recognised that disease struck where there was work and urban congestion. By providing clean water, sanitation and reasonable housing, the problem would be contained, if not solved. In part, his concept was a flow on from the Crimean war and Florence Nightingale, that cleanliness is indeed next to Godliness To quote from Chadwick's report: ...That the expense of public drainage, of supplies of water laid on in houses, and of means of improved cleansing would be a pecuniary gain, by diminishing the existing charges attended on sickness and premature mortality. and That by the combinations of all these arrangements it is probable that the full insurable period of life indicated by the Swedish tables; that is an increase of thirteen years at least, may be extended to the whole of the labouring classes.

1.2

Risk & Reliability Associates Pty Ltd

Concepts Chadwicks arguments to justify his risk management recommendations appealed to humanitarianpublic interest benefits as well as cost savings over time. This did not achieve the immediate acceptance and success one might expect in todays more democratic society with greater capacity for public scrutiny, accountability, and liability. There were many with vested interests that could not see, or did not agree that the very expensive fresh water and sewerage treatment was necessary or even effective. Today, passive smoking may be considered in this same context. 1.4.3 The 1840 North American Factory Mutual System

In the early 1800s, cotton mills were a notorious source of fire and burned down regularly. A major part of the problem was the need to extract the cotton seeds from the cotton balls, which generated a significant amount of friction in a highly combustible medium. Zachariah Allen, a factory owner in the 1840s decided to build a superior mill. He fire-isolated the cotton gins, provided massive construction, and taught his people how to respond to a fire appropriately, using hoses and sand buckets. He then went to his existing underwriter and asked for a discount. The underwriter responded, No, the good pay for the bad. He then approached other owners who had built superior facilities and suggested that they pool the premiums they were paying to existing underwriters. As they should have fewer losses, they could then pay back a profit after a few years. This was a great success and was the forerunner of the Factory Mutual System and the "Highly Protected Risk" (HPR) concept. Such an engineering-underwriter viewpoint contrasts dramatically with a wholly financial view of insurance. With the Factory Mutual concept, only those plants that meet certain minimum design and management system requirements can join the premium pool. The loss rate will therefore remain static over time with minimal influence from market forces. With a purely financial approach a burning building can be insured if sufficient premium is paid. 1.4.4 Tripartite Risk Control Philosophies

For Health and Safety policy particularly, Australia adopted the philosophies of the United Kingdom, following from the work of the Robens Committee (Creighton, 1996). The general concept is that there are three key parties to the risk control process: those who own the industry, those who work there, and the government. Each party is of equal status. This particularly applies to the development of codes of practice and regulations. While the tripartite concept has driven traditional approaches to OH&S risk control processes, the emerging legal environment puts increasing emphasis on a fourth party. Attention is swinging to stakeholders. Stakeholders range from consumers of products such as food or pharmaceuticals to the public and communities disaffected by industrial pollution or corporate governance failures. 1.4.5 Bipartite Philosophies

An alternative is what might be called the bipartite approach apparently adopted by Germany, arising from industry based insurance efforts started by Bismarck in the 1890s. A bipartite guild (berufsgenossenschaft) is established for appropriate industries. The governments role is confined to ensuring that the process occurs; specifically that the industry guild exists, that it functions to determine what the acceptable levels of risk are for that industry and to ensure that the consequences of this target are appropriately funded by industry based insurance.

Risk & Reliability Associates Pty Ltd

1.3

Concepts 1.5 Reliability

Reliability is a risk-related concept, and a specific area of professional activity. The main concern of reliability-focussed professionals is to ensure that systems or system components work the first time they are required, and every time thereafter. The military has always had a very specific interest in this in both organisational and technological terms. The beginnings of the 20th century arms race in Europe can be traced to the involvement of industrial technology in production of the HMS Warrior in 1861. World War 1 provided the impetus to the development of the aircraft and armoured vehicles and the beginning of increasingly capable military equipment. World War ll brought the development of electronics and a dramatic increase in the complexity of increasingly accurate and destructive weapons. Such systems often consumed enormous resources yet failed to deliver effective service to the customers. As might be expected, the use of sophisticated valve based electronic systems in the emerging fighter jet industry proved very unreliable in the 1950s. 1.5.1 Failure Modes

Until the mid 1970s, reliability-focussed professionals saw system components as exhibiting a standard failure profile consisting of three separate characteristics: An infant mortality period due to quality of product failures. A useful life period with only random stress related failures A wear out period due to increasingly rapid conditional deterioration resulting from use or environmental degradation. These are shown in the figure below. The consequence of such beliefs was that equipment was taken out of service and maintained at particular intervals, regardless of whether it was exhibiting signs of wear or not.
Failure Rate

Time

Infant Mortality

Useful Life

Wear Out

Bathtub Failure Curve However, actuarial studies of aircraft equipment failure data conducted in the early 1970s identified a more complex relationship between age and the probability of failure below. It evolved in the private airline industry primarily through the activities of the Maintenance Steering Group of the International Air Transport Association. The final report of the Maintenance Steering Group in 1980 titled MSG-3, provided the backbone of the logic processes contained in the referenced texts and RCM analysis (Moubray, 1992).

1.4

Risk & Reliability Associates Pty Ltd

Concepts

Wear-in to Random Wear Out

4%

Random then Wear Out

2%

Steadily Increasing

5%

Inceasing during Wear-in and then Random

7%

89%

Random over measurable life

14 %

Wear-in then Random

68%
Failure Rate Curves Specifically, the bathtub curve was discovered to be one of the least common failure modes and that periodic maintenance increased the likelihood of failure. This led to the idea that the maintenance regime ought to be based on the reliability of the components and the required level of availability of the system as a whole.

Risk & Reliability Associates Pty Ltd

1.5

Concepts 1.6 Quality

Davis (2001) reviews a large number of contributors to the quality movement. Although there are differences in approach there appear to be 6 common principles namely; management commitment, measurement to determine current position and goals, quality teamwork in the workforce, system based tools, prevention is better than inspection, and customer focus. 1.3.1 W Edwards Deming (US circa 1948) Defines quality as a predicable degree of uniformity and dependability at low cost and suited to the market. The objective of his approach is to reduce the variability by continuous improvement, the "PDCA Cycle" (Plan, Do, Check, Act). Management is responsible for 94% of quality problems. 1.3.2 Joseph M Juran (US) Defines quality as fitness for use. He has a 10-step process to quality improvement. Like Deming, Juran believes that senior management are largely responsible for quality with less than 20% of quality issues being due to workers. However, quality improvements are not free. 1.3.3 Phillip B Crosby (US) Believes that quality is conformance to requirements. He introduced the concept of zero defects within the framework of his four quality absolutes. The cost of quality is the costs incurred due to nonconformance and therefore quality is free. 1.3.4 William E Conway (US) Has similar beliefs to Deming and indicates that quality increases productivity and lowers costs. He has a 6-tool process for quality improvement and advocates the use of simple statistical methods to identify problems and point to solutions. 1.3.5 Kaoru Ishikawa (Japan circa 1949) Focussed on seven basic tools for quality improvement, quality circles and company wide quality control (CWCC) from top to bottom. Cause and effect diagrams used extensively (see section 5.4). 1.3.6 Shigeru Mizuno (Japan) Promoted 7 tools for quality management; relations diagram, KJ or affinity diagram, systemic/tree diagram, matrix diagram, matrix data-analysis, process decision program chart, and arrow plan. 1.3.7 Masaaki Imai (Japan) Kaizen process to develop logical systemic thinking. Has an expanded form of the PDCA cycle. 1.3.8 Genichi Taguchi (Japan) Restates the Japanese view of investing first and not last. That is, design should be superior. 1.3.9 Shigeo Shingo (Japan) Promoted just in time manufacturing and defects = 0 (Poka-Yoke). 1.3.10 Armand V Feigenbaum (US) Holds that total quality management (TQM) is the way to completely manage an organisation. 1.3.11 Tom Peters (US) He has a focus of leadership and customer satisfaction rather than management. He includes tools like management by walking about (MBWA). 1.3.12 Claus Mller (Denmark) Personal quality is a central element of total quality with a focus on administrative improvement. 1.3.13 John Oakland (UK) Leadership is the key to business excellence and quality

1.6

Risk & Reliability Associates Pty Ltd

Concepts REFERENCES Blombery R I (1982). Risk Management Origins, Objectives and Directions. Proceedings of the Victorian Industrial Safety Convention, Vol. 1, 1982, pp.39-48. Chadwick E L (1842). Report on the Sanitary Condition of the Labouring Population of Great Britain. Presented to Both Houses of Parliament, London. Creighton W B (1996). Understanding Occupational Health and Safety in Victoria. 2 Federation Press.
nd

edition,

Davis, Dr Elwyn C (2001). The quality gurus: What have we learnt from them? Reprinted in Engineering World. December 2001 / January 2002. pp15-19. Moubray, John (1992). RCM II Reliability Centred Maintenance. Butterworth Heinemann Nohl J (1926). The Black Death, a Chronicle of Plague, George Allen & Unwin Ltd, London. Taylor R T and W A MacDonald (1996). The Future of Market Risk Management. Article in Financial Derivatives & Risk Management. Issue 6, June 1996. IFR Publishing Winslow C E A (1967). The Conquest of Epidemic Disease. The Hafner Publishing Company, New York, New York. The particularly relevant chapter is Chapter XII, the Great Sanitary Awakening. READING Beck Ulrich (1986). Risk Society: Towards a New Modernity. Translated Sage Publications, London. Reprinted 1998. Head E L (1978). The Risk Management Process. Incorporated New York. Page 8 The Risk & Insurance Management Society

McCabe FM (1978). Risk Management and the Australian Safety Practitioner. Marsh & McLennan Pty Ltd, Melbourne, Australia. Robinson R M, D B L Viner and M A Muspratt (1985). National and Public Risk: Risk Control Strategy Some Fundamentals. Paper presented at the ANZAAS Festival of Science, Monash University. Smith Anthony (1993). Reliability Centred Maintenance. McGraw Hill.

Risk & Reliability Associates Pty Ltd

1.7

Paradigms

2.0

Risk Paradigms and Models

Efforts to demonstrate how risk should best be managed have given rise to a number of risk management paradigms. A paradigm is a universally recognised knowledge system that for a time provides model problems and solutions to a community of practitioners (after Kuhn, 1970). New paradigms based on more comprehensive or convincing theories may supersede older ones or exist co-jointly with them. The following describes a number of the most common paradigms including some of the advantages and disadvantages of each: The paradigms are: i) ii) iii) The rule of law. Traditional risk management historically typified by the Lloyds Insurance and the Factory Mutual Highly Protected Risk (HPR) approaches. Asset based risk management, typified by engineering based Failure Modes, Effects and Criticality Analysis (FMECA), Hazard and Operability (HazOp) and Quantified Risk Assessment (QRA) 'bottom-up' approaches. Threat-based risk management typified by Strengths, Weaknesses, Opportunities and Threats (SWOT) and vulnerability type 'top-down' analyses. The comparatively recent market based risk management, which uses the notion of the risk being equal to variance with an equivalent risk of gain as well as risk of loss. Solution-based best practice risk management rather than hazard based risk management. The development of biological, systemic mutual feedback loop paradigms, practically manifested in hyper-reality computer based simulations. The development of risk culture concepts including quality type approaches.

iv) v) vi) vii) viii)

Many proprietary risk management systems integrate several of these approaches. 2.1 The Rule of Law

When everything else fails, the ultimate appeal is generally to the rule of law. In a very real sense, all the other paradigms represent methods of satisfying legal outcomes in the event of an adverse outcome. As a consequence, asking lawyers which paradigm is applicable to ensure due diligence generates a response that all paradigms, once they are explained, are necessary. The diagram below shows a pathogen based cause-consequence diagram in a legal context, with LOC indicating loss of control. The power of the legal approach is that it is time-tested and proven. If the judiciary is independent of political and commercial interests of the day, then an independent and potentially fair resolution of otherwise potentially catastrophic social dislocation can occur. Perhaps this is why it works: both the political and judicial systems must simultaneously fail before social breakdown occurs. The weakness of the legal approach, certainly in an adversarial legal system, is that the courts remain courts of law rather than courts of justice.

Risk & Reliability Associates Pty Ltd

2.1

Paradigms

WHAT Cradle

WRONG

WHY NOT
Event Horizon

WHAT IF

Pathogens (Whole of Life)

Hit LOC Miss

Grave
CAUSATION FORESEEABILITY

Immune System
PREVENTABILITY REASONABLENESS

Pathogen Cause-Consequence Model in Legal Context In the common law tests of negligence the four key words are Causation, Forseeability, Preventability and Reasonableness. This Rule of Law underpins the ALARP principle that risks shall be demonstrated to be As Low As Reasonably Practicable. It also provides a focus for other risk management principles including "not less safe", "continuous improvement" and "best practice. (i) (ii) (iii) (iv) Define WHAT we are talking about Identify what could go WRONG Control WHY it will not happen Assess balance of Precautions to the Consequences IF it did CAUSATION FORESEEABILITY PREVENTABILITY REASONABLENESS

Common Law is covered in more depth in Chapter 4. 2.2 Insurance Based Risk Management

The Lloyds Insurance and the Factory Mutual Highly Protected Risk (HPR) approaches historically typify this. Both consider empirical history to be the source of wisdom. Looking at past incidents and losses and comparing these to existing plants and facilities can make judgements made about risk. The difference is that one approach, Lloyds', has a financial focus, where the Factory Mutual focus is on a target level of engineered and management excellence. The power of the process is the very tangible nature of history and in a sense the results represent the ultimate Darwinian what if analysis. Its weakness is that in the modern rapidly changing world empirical history has become an increasingly less certain method of predicting the future. 2.3 Asset Based Risk Management

Asset based risk management is typified by engineering based FMECA, HazOp and QRA 'bottom-up' approaches. Any bottom up method has problems with common cause or common mode failures. A detailed assessment from individual components or sub-systems such as HazOp or FMECA examines how that component or sub-system can fail under normal operating conditions. It does not examine how a catastrophic failure elsewhere might affect this component or the others around it. One attempts to address such knock on effects in HazOps by a series of general questions after the detailed review is completed, but it nevertheless remains difficult to use a HazOp to determine credible worst-case scenarios. FMECA and QRA have the same problems.

2.2

Risk & Reliability Associates Pty Ltd

Paradigms The power of bottom up techniques lies in the detailed intense scrutiny of complex systems and the provision of closely coupled solutions to identified problems. Any proposed risk control solutions are focussed and specific. They can be easily considered for cost/benefit results. The resulting risk registers are powerful decision making tools. 2.4 Threats and Vulnerabilities

Threat based risk management is typified by SWOT and vulnerability type 'top-down' analyses. These methods mostly identify areas of general strategic concern rather than solutions to particular problems. A very simple example of a Threat and Vulnerability analysis is shown in the table below. Again this focuses on areas of concern rather than precise solutions. Threats Technical Community Political (change of government) Financial Natural Events Reputation xx x xxx x Critical Success Factors Operability xx x xxx xxx Staff xx xx x xxx x

Sample Vulnerability Matrix Scores xxx xx x Critical potential vulnerability that must be addressed. Moderate potential vulnerability. Minor potential vulnerability. No noticeable vulnerability.

The intersections of a threat with a "critical success factor" or "asset" are termed vulnerabilities. The SWOT analysis interpreted from a risk perspective provides insight into vulnerabilities or the risk of loss and value addeds, or the risk of gain. This is shown in the figure below.

External / Internal Factors

Opportunities

Threats

Value Addeds

Strategy

Vulnerabilities

Strengths

Weaknesses

Organisation
Augmented SWOT Process Obviously the effort in this model is to ensure that ownership of the upside (value-addeds) is retained, and that ownership of the downside (liabilities) is avoided.

Risk & Reliability Associates Pty Ltd

2.3

Paradigms 2.5 Risk as Variance

The comparatively recent market based risk management stems from the notion of risk being equal to variance with an equivalent risk of gain as well as risk of loss (see figure below). In finance, risk is normally assumed to be symmetric. This is not absolutely true, but by making such an assumption many of the tools of statistics become available, most notably the normal distribution, which is symmetric about its mean value. This is the principal strength of the approach.

Standard deviation deemed to equal risk

Pure Risk

Speculative Risk

Rate of Return
Standard Distribution showing the Mean and Variance However, from a systems engineering perspective at least, this should really be known as the "boom/bust" model since, if everyone uses the same model, mutual feedback loops are inevitable. If pure risk only is assumed, then self-dampening effects are likely, which is the position adopted by most engineers and technologists. Business risk is usually considered to be the sum of both pure risk and speculative risk. 2.6 Best Practice

So far all paradigms considered have been hazard based that is looking for problems and then solutions. In health & safety, a hazard is defined as a source of potentially damaging energy, which can give rise to a loss. In more general terms a hazard is a source of potential harm or a situation with a potential to cause loss. In this sense it is analogous to vulnerability, that is the potential impact of a threat upon an asset. Most risk systems like the Australian/New Zealand Risk Management Standard, AS/NZS 4360:1999 suggest a process of hazard identification, risk assessment, control option development and then implementation. An alternative to this is solution based 'best practice' risk management. The best practice risk management approach simply looks at all the good ideas other people in an industry use and see if there is any reason why such ideas ought not to be applied at your own site. In the figure below this means starting on the right rather than at the top or the left.

2.4

Risk & Reliability Associates Pty Ltd

Paradigms

Credible Hazards, Vulnerabilities or Pathogens

Hazard Assessment Assess Consequences Estimate Likelihoods

Risk

Control Options Mitigate Consequences Decrease Likelihood

Judgements Statute, TLS, ALARP, Common Law Due Diligence etc

Actions and Residual Risk Allocation

Best Practice Approaches


(TLS = Target Level of Safety)

The best practice approach is particularly powerful in a common law due diligence sense. The hazard assessment approach implies that statutes may be satisfied, target levels of safety met or 'As Low As Reasonably Practicable' (ALARP) arguments fulfilled. But if there were a simple solution to a trivial problem implemented at a competitor's facility then common law negligence could arise if something went wrong at the facility in question. A best practice process is one of the few approaches that target this difficulty. In a sense, this is confirming the view that liability arises when there are unimplemented good ideas rather than the existence of hazards or vulnerabilities in themselves. 2.7 Simulation

Biological/Computer Simulation Paradigms are derived from the application of evolutionary concepts developed in virtual reality. The most practical manifestation of biological paradigms is in computer simulations. This amounts to modelling a complex system in a virtual reality environment and playing endless what if scenarios. For example, oilrigs and process plants are generally modelled in 3D before construction so that designers and operators can walk around them' and in many ways try them out. If every component (or at least all those containing or controlling major energy sources) is identified and has its risk and reliability properties assigned to it then the designer can play god. Continuing this example, suppose every vessel in the plant knows what over temperature or overpressure it can withstand before rupture, and after having ruptured under such conditions can project and communicate its thermal and pressure energies to adjacent vessels, which then respond accordingly. If the designer then told one to explode, a chain reaction may result. This would depend on separation distances, the force of the explosion and very many other factors. But by resetting the computer simulation and exploding different vessels an evolutionary process of plant risk design can occur. That is, the designer could explode every vessel and keep adjusting the plant in small increments until the likelihood of secondary explosions is made vanishingly small. Obviously, this requires fearsome computer power, an extensive interpretation of nature and a belief that hyper-reality can come close to reality.

Risk & Reliability Associates Pty Ltd

2.5

Paradigms 2.8 Culture

James Reason (1997) develops a cultural paradigm model in several ways (he is a psychologist by training). He notes three types of risk culture: Pathological Culture Don't want to know Messengers are 'shot' Responsibility is shirked Failure is punished New ideas actively discouraged Bureaucratic Culture May not find out Messengers are listened to if they arrive Responsibility is compartmentalised Failures lead to local repairs New ideas often present new problems Generative Culture Actively seek it Messengers are trained and rewarded Responsibility is shared Failures lead to far reaching reforms New ideas are welcomed

Three Risk Cultures after Reason (1997) To some extent, those dealing with technological risks have generally suffered a decline in influence as business risks and associated risk management techniques have come to the fore over the past ten years. However, culture has now been identified as central to effective risk management suggesting a new focus has been emerging in the last five years as shown in the figure below. Reason's Pathogen model is discussed in Chapter 5.

Hazards Technological Risks

Vulnerabilities Business Risks

Pathogens Risk Culture

Movement from Technological to Business to Risk Culture

2.6

Risk & Reliability Associates Pty Ltd

Paradigms 2.8.1 Safety Culture

An interesting application of the cultural risk paradigm arises when considering safety in Australian industry. A major study endeavouring to determine why Australia has a good commercial aviation safety record documented aspects of Australian culture that affect safety performance (Braithwaite et al, 1997). The graph below reflects the answers that staff gave to a request from their manager to help paint his house. Australians have the highest likelihood (up to 95%) of any of the interviewed nations of saying, No.
Australia Netherlands UK West Germany USA Italy Japan Canada Poland Pakistan Mexico Hong Kong Malaysia Egypt Singapore Indonesia Nepal China
0 25 50 75

Percentage 100

No Responses to the question "Would you help paint your manager's house?" Australians tend to be individualistic and to have a low power-distance. That is, actions or instructions from others have a comparatively limited effect on the way in which they act. They perceive a relatively flat power gradient between manager and subordinate. For example, on aircraft flight decks junior crew members feel able to speak up without loss of face to the senior crew or other repercussions, if they think an error has occurred. This facilitates initiation of effective additional checks. In industries with different management styles, difficulties can arise. If a person being directed does not believe that the directive is either practical or safe, then that person will tend to assess the situation and do it his/her own way. The person may do so without declaring his/her intention or discussing the intended change to procedures with management.

Risk & Reliability Associates Pty Ltd

2.7

Paradigms 2.9 Paradigm Integration

The figure below describes an understanding of how the different paradigms presented in this section fit within a large organisation.

Board and CEO (Policy)


A B C D E

1 2 3 4 5

AS4360

Vulnerability Analyses, SWOT Analyses etc, Audits, Underwriting Assessments, Availability Assessments.

Crisis and Fallout Management

Top Down

Operations & Maintenance

QRA, HazOPs, FMECA, RCM, Job Safety Analysis, Cause Consequence Modelling etc

Losses, Incidents and Breakdowns Fire Fighting, First Aid, Legal Actions Insurance Payments

Courts

IEC (AS) 61508

Pre-event

Stategic

Event Horizon

Post-event

Tactical

An Integrated Risk Paradigm Framework The top left hand box shows those paradigms that would be expected to apply strategically at the higher levels of an organisation, whilst those in the bottom left hand box could generally be applied at the operational level. On the right hand side are the tactical issues that are faced post-event. The objective of risk management is to stay on the left hand side of the event horizon but a complete risk management framework must provide for the post-event scenarios. There are a number of risk techniques available but only three generic methods by which organisations can proceed with strategic tasks to address the concept of risk. These are: i) ii) iii) Expert knowledge provided from experts, literature and research Facilitated workshops of experts and interested parties Interviews with selected players.

Each of these methods has different strengths and weaknesses depending on the culture of the organisation and the nature of a particular task.

2.8

Risk & Reliability Associates Pty Ltd

Paradigms The best methodologies to use in the implementation of each of the paradigms are illustrated in the following table: Technique>> Risk Management Paradigm 1. 2. The rule of law Insurance approaches Expert reviews Facilitated workshops Yes (Arbitration, moot courts) Yes (Risk profiling sessions) Yes (HazOps, FMECAs etc) Yes (SWOT & vulnerability) Difficult in isolation Difficult to be comprehensive Yes (Crisis simulations) Difficult Selective interviews Yes (Royal Commissions) Yes (especially moral risk) Difficult

Yes (Legal opinions) Yes (Risk surveys, actuarial studies) Yes (QRA, availability & reliability audits) Difficult in isolation Yes (Actuarial studies) Difficult to be comprehensive Yes (Computer simulations) Yes (Quality audits)

3.

Asset based, 'bottomup' approaches Threat based 'topdown' approaches Business (upside AND downside) approaches Solution based best practice approaches Simulation

4.

Yes (Interviews) Yes (Fact finding tours) Yes (Fact finding tours) Difficult

5.

6.

7.

8.

Risk culture concepts

Yes (Interviews)

Risk Management Paradigm - Technique Matrix The concept of a Safety Case, which is logically prior to and supports the Business Case for an enterprise, is one interesting development. Those techniques and paradigms highlighted in the table could be used in developing a safety case.

Risk & Reliability Associates Pty Ltd

2.9

Paradigms 2.10 Models

2.10.1 Risk and Reliability Diagrams A particularly useful way of examining (pure) risk and reliability in an organisational sense is via a risk diagram. A risk diagram is fundamentally a plot of the likelihood of events occurring against the severity of the outcomes. This can be done in different ways depending on the industry or organisation that is being examined. The frequency denominator (events per year, events per kilometre, events per passenger mile, or events per any frequency denominator) is plotted against consequence severity in down time, dollars, lives lost, working days lost, or days lost to the community.

Relative Likelihood

Reliabilty Engineering FMECA and RCM Defence Industry Driven

Risk Engineering HazOp and FTA Aerospace & Nuclear Industry driven

Service

Safety

Breakdowns Public Crtiticism Protest Pickets Staff Complaints Personal Injury

Product Boycott

High technology and high hazard system failures, Class Actions, Market Collapse

Industrial Stoppage

Maintenance

OH&S

Fire & Explosion

Catastrophic

Relative Severity of Consequence


Organisation Risk Diagram In organisational terms the risk diagram describes the relationship between the different technical and commercial areas of activity and the relationship between risk and reliability. Plotted on normal axes, the curve typically takes the form of a hyperbola as shown. If the plot is likelihood against severity in dollars, then the area under the graph represents the size of the economic loss. Typically, the greatest area is at the maintenance end, then the OH&S or personal injury area, then the fire and explosion zone and lastly the catastrophic event region. The Maintenance region, being the largest, therefore provides the greatest returns for good management and is the target of such programs as Reliability Centred Maintenance (RCM). However the other regions which deal with damage, injury and death also have a legal dimension. One view is to suggest that failure to optimise the maintenance region can send an organisation broke, but failure to deal with the legal dimension can send directors to goal. Certainly, both are important.

2.10

Risk & Reliability Associates Pty Ltd

Paradigms 2.10.2 Asset Management and the Costs of Ownership Asset management is more than ownership, accountability and demand management after the assets are in place, it is about whole of life approach to management. Asset management is about all those actions, from the first stirrings of a need to the final recycling of the disposed asset which ensure that an asset achieves the business objectives of: i) ii) iii) iv) Being safe for operators, users and the public. Not adversely impacting on environment during its use, maintenance or disposal. Providing the service for which it was procured. Achieving the above at minimum cost of ownership over its life.

The cost of ownership includes at least: a) b) c) The initial capital cost, plus the whole of life cost of operation and maintenance, plus the whole of life cost of risk (the cost of prevention plus the cost of loss).

In some cases the largest component of the cost of ownership will be the whole of life cost of risk. For public authorities especially, it is very common to have very large expenditures on risk control measures that are not identified specifically as part of the cost of ownership of the operating assets. For example, signalling on railways is a risk control measure to prevent trains from colliding. If all trains ran exactly on time and the timetable was perfect then there would be no red signals ever occurring in a train network. This indeed was historically the case. The reasons for the introduction of signalling systems was because eventually the train system became sufficiently complicated that perfect achievement of timetable was no longer possible. This meant that collisions would inevitably occur unless some interposing system was installed. The cost of the signalling system should be included as part of the cost of ownership for the railways but identified as part of the preventive aspects of the cost of risk. This concept is reflected in market risk terms, especially in banks as RAROC (Risk Adjusted Return On Capital). 2.10.3 Risk Management Process Model The Risk Management Process Model is one of the most commonly used risk management models and dates from the mid seventies.
Identification
historical data (past experience) surveys workforce scientific literature

Quantification
likelihood of occurrence and severity of consequence

Evaluation balance of advantages / disadvantages of running the risk with the advantages / disadvantages of controlling it

Control

risk retention risk reduction insurance risk transfer

A Risk Management Process Model The identification phase parallels the common law aspects of foreseeability (see Chapter 4). The option of risk transfer under Control, has been severely curtailed in Australia in recent times. It used to be possible to sell a high risk portion of a business and then contract the service back, leaving the risk associated with that enterprise quarantined from the original business. Such a practice has been soundly rejected in Australian jurisdictions.

Risk & Reliability Associates Pty Ltd

2.11

Paradigms The model in below is an overview from the Australian/New Zealand Standard, Risk Management, AS4360: 1999. This follows the process model.

Establish the context

Communicate & Consult

Analyse risks

Evaluate risks Assess risks Treat risks

Risk Management Overview The main elements are in the form of an iterative process: a) b) c) d) e) f) g) Establish the Context - This step establishes the strategic, organisational and risk management context in which the rest of the process will take place. Risk assessment criteria and structure to be used should also be defined. Identify Risks - Identify what, why and how hazards arise. Analyse Risks - Determine existing controls and establish the likelihood of the events and the severity of the consequence. Evaluate Risks - Compare projected risk levels against criteria to determine acceptability or otherwise of each hazard and set risk priorities. Treat Risks - Accept and monitor low-priority hazards. For all other hazards develop and implement a specific management plan, which includes consideration of funding. Monitor and Review - Monitor and review the performance of the risk management system and changes which might affect it. Communicate and Consult. Communicate and consult with both internal and external stakeholders at each stage of the risk management process and concerning the process as a whole.

For each stage of the process adequate records should be kept to satisfy an external audit.

2.12

Risk & Reliability Associates Pty Ltd

Monitor & Review

Identify risks

Paradigms 2.10.4 An Idealised Risk Management Structure The diagram below represents the way in which industry often establishes an idealised risk management structure. It is generally considered idealised because whilst a company manager may indeed have the title of Risk Manager, the legal responsibility for the management of risk is a line management function.
Risk Manager

Pre-Event Security Manager Finance Manager Risk Engineer Public Affairs Ergonomist OSH&E Manager

Damage Control Crisis Management Team (Media Relations) First Aid Fire Team

Post-Event Medical Staff Insurance Legal Advisers

An Idealised Risk Management Model In practice, most persons with the title of Risk Manager, are in fact internal risk advisors, on whose advice line management may choose to rely on. Legally, at least, the ultimate decision makers with regard to the levels of risk an organisation can accept will be ultimately its highest level of management, namely its Board of Directors or equivalent. 2.10.5 A Facilities Management Model The facilities management model is favoured by organisations that have large volumes of occupied space, for example universities and hotels.

Facilities Management

Risk Management

Asset Management

Space Management

Facilities Management Model

Risk & Reliability Associates Pty Ltd

2.13

Paradigms 2.10.6 An Asset Management Model The organisational model shown below has proved attractive to local government.

Asset Management

Risk Management

Resource Management

Risk Engineering

Insurance

Operation Management

Maintenance Management

An Asset Management Model

2.10.7 Process Model of Risk Management This model uses an underlying time sequence basis within a legal framework.

Risk Management

Event
Environmental Engineering Modify the work environment i.e, remove the brick Safety Teach people to work safely ie, lift their feet A brick Someone about to trip over a brick Rehabilitation Injury Recovery Insurance

Courts Safe= Acceptable Risk

Required feedback

Process Model of Risk Management It also suggests that the purpose of risk management is to optimise the total costs of risk, subject to the constraint of matching legal expectations at all times.

2.14

Risk & Reliability Associates Pty Ltd

Paradigms 2.10.8 Key Performance Areas Model The key performance model is a spin off of a recent business management refocus that all business activity should be measured by Key Performance Indicators (KPIs) measuring Key Performance Areas (KPAs). This can be represented in a number of ways, such as the one shown below.

Customer, Competition, Growth, Political Culture, Structure, Resoutces

Business/ External Environment

Organisation

Selection, Training, Assessment, Retraining

Competent Staff Outcome World's Best Practice

Design, Procure, Constuct, Modify, Audit

Physical Configuration

Operations, Maintnance, Audits, Corrective Actions, Procedures

Operation & Maintenance Management Incident, Crisis & Emergency Planning

Plans, Resources, Rehabilitation, Support

Key Performance Areas Model 2.10.9 Risk Role Models Different elements of society play different risk management roles. Governments, for example are expected to have a major role in the management of public risk. This usually manifests itself as various forms of regulation over corporate risk and emergency response services, if required. Interestingly, depending on where organisations lie in the causal chain depends on how they regard the activities of the others and therefore the role each must play.

Corporate or Institutional Risk Management Indirect Government Control (Regulation)

Public Risk Management Direct Government Control Time

Corporate Hazard or Pathogen

Corporate Prevention Failure

Corporate Crisis Management Failure

Public Emergency Response Failure

Government Crisis Management Failure

Loss of Public Confidence. Change of Government.

Risk Roles Model From a government perspective, unmanaged corporate hazards represent a threat that must be addressed, usually by regulation and the provision of adequate emergency response and crisis management systems. From the corporations' perspective, governments and associated regulations represent disproportionate interference for possible consequences of matters that the corporations believe they have in hand already.

Risk & Reliability Associates Pty Ltd

2.15

Paradigms REFERENCES Braithwaite G, J P E Faulkner, R E Caves (1997). Latitude or Attitude? - Airline Safety in Australia. Paper presented at the 1997 National Conference of the Risk Engineering Society, Engineers Australia. Canberra. Kuhn T S (1970). The Structure of Scientific Revolutions, 2nd Edition, enlarged, sixth impression. University of Chicago Press. Reason J (1997). Managing the Risks of Organizational Accidents. Ashgate Publishing Limited. Standards Australia/Standards New Zealand (1999). Risk Management. Australian/New Zealand Standard AS/NZS 4360:1999. READING Standards Australia (1999). Functional Safety of electrical / electronic / programmable electronic safety related systems. Par 6.5: Examples of methods for the determination of safety integrity levels AS 61508.5 1999 / IEC 61508.5 1998.

2.16

Risk & Reliability Associates Pty Ltd

Governance

3.
3.1

Governance and Risk


Risk Managements Role in Good Governance

Over the last decade numerous international and national inter-governmental bodies have sought to promote good corporate governance. One element that all emphasise is that risk management is an integral part of good governance. For example, the Commonwealth Heads of Government meeting in Edinburgh in 1997 issued a Declaration whose purpose was to promote excellence in corporate governance. It set up the Commonwealth Association for Corporate Governance (CACG) which issued 15 Principles it considered fundamental to a holistic approach to corporate governance. In reference to Risk Management, CACG Principles state: The board must identify key risk areas and key performance indicators of the business enterprise and monitor these factors. If its strategies and objectives are to have any relevance, the board must understand and fully appreciate the business risk issues and key performance indicators affecting the ability of the corporation to achieve its purpose. Generating economic profit so as to enhance shareholder value in the long term, by competing effectively, is the primary objective of a corporation and its board. The framework of good corporate governance practices in a corporation must be designed with this objective in mind, while fulfilling broader economic, social and other objectives in the environment and circumstances in which the corporation operates. These factors business risk and key performance indicators - should be benchmarked against industry norms and best practice, so that the corporations performance can be effectively evaluated. Once established, the board must constantly monitor these indicators. Management must ensure that they fully and accurately report on them to the satisfaction of the board. The board, as emphasised throughout, has a critical role to play in ensuring that the business enterprise is directed towards achieving its primary economic objectives of profit and growth. It must, therefore, fully appreciate the key performance indicators of the corporation and respond to key risk areas when it deems it necessary to assure the long-term sustainable development of the corporation. 3.2 Corporate Governance Systems

Corporate governance is the system by which an organisation is directed and controlled. Laws regulate only some aspects of corporate governance. In the main, directors and managers have only principles and guidelines to help them construct systems and maintain their currency. There is no single governance model that fits all types of organisations. 3.2.1 Governance Models

A number of generic models issued by international bodies and national standard setting councils are commonly available. For example, the Organisation for Economic and Cultural Development (OECD), the United Nations (UN), the Commonwealth Association for Corporate Governance (CACG), and the Council of Standards Australia. These generic models seek to enable users to appreciate and identify the wide range of concerns that good governance needs to cover.

Risk & Reliability Associates Pty Ltd

3.1

Paradigms Other relevant models, as well as regulations, codes of best practice, and government programs and policies also exist. Most focus on aspects of governance pertinent to the particular areas of authority or expertise of the issuing bodies. More often than not, these refer to financial risk. For example, national stock exchanges, chartered accountants, auditors, company secretaries and other finance-related professional groups. For example: ASX Corporate Governance Council: Principles of Good Corporate Governance and Best Practice Recommendations. IFSA Guidance Note No 2.00. Corporate Governance: A Guide for Fund Managers and Corporations. Australian National Audit Office 1999 9001 Quality Management Systems Requirements Complaints Handling Risk Management Records Management Part 1: General Part 2: Guidelines Compliance Programs Good Governance Principles Fraud and Corruption Control Organisational Codes of Conduct Corporate Social Responsibility Whistleblower Protection Programs for Entities.

IFSA ANAO AS/NZS ISO AS/NZS 4269 AS/NZS 4360 AS/ISO 15489 AS/ISO 15489.1 AS/ISO 15489.2 AS 3806 AS 8000 AS 8001 AS 8002 AS 8003 AS 8004

State as well as national government laws, regulations and programs can also apply. In Victoria for example: Victorian Managed Insurance Authority Act 1996, Financial Management Act 1994, Victoria Governments Management Reform Program Victorian Government policies associated with private-public sector service and infrastructure delivery such as Partnerships Victoria. 3.2.2 Key Governance Areas and Issues

The following table lists only some of the numerous issues and operational functions that an reasonably comprehensive corporate governance system should encompass: Accountability Transparency Code of conduct Good citizenship Social responsibility Shareholder rights Stakeholder identification Stakeholder liaison Corporate ethics Board charter Board protocol Authority delegation CEO remuneration Asset management Quality management Continuous improvement Best Practice Training OH&S Fraud and corruption control Complaint handling Compliance Due diligence Records management Internal reporting Security

3.2

Risk & Reliability Associates Pty Ltd

Governance 3.3 Origins of the Good Governance Movement

The main challenge for those charged with designing or reviewing a corporate governance system is how to ensure the system recognises all the key aspects of the corporations objectives, context, structure and operation. In undertaking this task, it helps first to know why the recent global emphasis on governance came about. What was the good governance movement a reaction to? What did it seek to avoid? What does it aim to achieve? Until the 1990s little if anything was heard in business circles of the term governance. When the term first came into business usage, many took it to be merely another of those verbal fads that pop up from time to time a fancy way of referring to governing or government. 3.3.1 Stock Crashes and Mega-Corporation Collapses

Underpinning the changes to the business vocabulary were efforts to reorientate corporate organisation and decision-making. As the Australian Standard AS 8000-2003, Good Governance Principles (p.4) states: The stock market crash in 1987 and the subsequent collapse of many corporate entities around the world lead to urgent calls, particularly from institutional shareholders, for the reform of corporate governance mechanisms. After the 1990s stock market bubble burst, five of the 10 biggest corporate collapses on record pushed US corporate bankruptcies to new records for the second consecutive year. Topping the list was WorldCom whose $104bn in assets made it the most expensive collapse in history. In 2002 alone, 186 US companies involving $368 billion in assets went bust. It beat the previous years record of $259 billion. Australia had its counter part when a number of high profile companies including HIH and OneTel imploded, causing severe damage to public as well as investor perceptions of corporate governance. The corporate failures of year 2001 were mainly the result of debt problems resulting from poor appreciation and response to financial risk. But, in the previous year, accounting scandals were the order of the day. Many involved criminal fraud, undetected or unexposed over lengthy periods of time even by those claiming to be professional financial watchdogs in the media, banking and investment advisory and audit houses. WorldCom accounted for more than $9 billion of false profits on its balance sheet. With the benefit of hindsight the consensus of financial media was that to have such thunderous bankruptcies, companies had both to take on a huge amount of debt and either be badly or fraudulently run. Such companies also had to have something sufficiently attractive about them that led creditors into foolishly or mistakenly extending them huge amounts of credit. These scandals inflicted severe damage on employees and pension fund holders. In a number of cases (many still proceeding), senior managers earned gaol sentences. 3.3.2 Other Contemporary Causal Factors

At the same time, global efforts were developing to refit companies to cope with other new challenges, especially those posed by: the expanding, increasingly competitive, international market economy, the de-regulatory, neo-liberal economic policies generally associated with globalisation, the increasingly complex technology creating what some called the risk society, increasing concern and activism over environmental and public health issues, escalating liability litigation, and, more recently new styles and severity of international terrorism.

Risk & Reliability Associates Pty Ltd

3.3

Paradigms The concept of governance came into fashion about the same time as a number of other new terms - or at least new usages of terms. The thrust of words like deregulation, corporatisation, privatisation, globalisation, international competitiveness, continuous improvement, etc, become clear only against the background of the new economic policy orthodoxy and changes in the international market and the global spread of new technology. Soon, we also began to hear more of concepts like transparency, stakeholder as well as shareholder interest, social responsibility, business ethics, and corporate good citizenship. Later, environmentalist and public safety terms like sustainability and the precautionary principle joined the verbal influx. It was a period in which liability litigation also proliferated. Concepts like duty of care, best practice, and due diligence helped swell the vocabulary of day-to-day corporate activity. These trends were part and counter-part of government deregulation, global market orientation, technology, and the public reactions all these developments engendered. 3.4 The Rise of the Risk Society

The nature and extent of risks today are a far cry from those of the satanic mills of the first Industrial Revolution in the Nineteenth Century. The physical pollution and social harm associated with early technology was localised, mostly confined to a limited urban area. The risk as well as the opportunities of much modern technology is limitless: nuclear fission, radio-active waste disposal, genetic engineering, techno-scientific animal husbandry, food manufacture, pharmaceuticals and numerous other new processes and systems impact populations often for the better, sometimes for the worse - across continents and down generations. Risk extends to the planet itself: greenhouse, ozone layer, acid rain, rain forest clearance in Brazil, forest fires in Indonesia and massive dam construction in China. But even at less than universal dispersion of risk, the economic and social impact of local incidents can be great. Consider, for example, the Longford gas disaster in Victoria, the failures of Sydney Water and Auckland Power, and the collapses of HIH Insurance and Ansett Airlines. Some sociologists have called attention to the newly emerging conditions by using the term, Risk Society (Beck, 1986). The term draws attention to the fact that the new globalising, highly complex, technological systems are unleashing hazards and potential threats, as well as benefits, to an extent previously unknown. Francis Fukuyama (1999) notes "it is science that drives the historical process; and we are on the cusp of an explosion in technological innovation in the life sciences and biotechnology. Early technology was designed largely to control the risks that sprang from nature - flood, fire, disease, etc - and from scarcity - famine, low productivity, limited distribution capacity, etc. But now, as Ulrich Beck points out, risk increasingly emanates from man and his inventions. He uses the term "reflexive modernity" to warn of the catastrophic as well as the beneficial potential of the new technology. Technological progress is also enabling the world, if it will, to abolish scarcity of supply of human material needs. Many emerging hazards are both unintended and unanticipated. Arguably, some risks may lurk so deeply in new products or processes that they may be unknowable, even to state-of-the-art science at the time innovations are implemented. Increasingly sophisticated methods of risk identification, calculation and control will therefore be demanded of risk management. Of necessity, risk management functions will be conducted increasingly in the glare of public scrutiny. New parameters of transparency and public "risk tolerability" will be forged not in the comfortable privacy of boardrooms but on the exposed public battlefields of political controversy, legal liability and impending government regulation on default. When the nature of risk from new technology changes so that many risks remain latent and do not manifest themselves for years, the danger is that the incentives to control them can be weakened. Competition and the profit-motive may drive some management to neglect consequences they think will not necessarily impact during their term of office. 3.4 Risk & Reliability Associates Pty Ltd

Governance The outbreaks of Mad Cow Disease and dioxin-contaminated food exports should be taken as just two of many warning signals that worse is to come unless risk management succeeds in keeping pace with the burgeoning risk society. 3.5 Governance and Non-Financial Risk

One effect of the risk society and the corporate governance movement that gained momentum in the 1990s was to put greater emphasis on risk management. Since then, pressure has been maintained not just for best practice but also for continuous improvement in governance risk identification and management. The proper governance of companies will become as crucial to the world economy as the proper governing of countries, declared the President of the World Bank, James D. Wolfensohn, commenting on the good governance movement. Nevertheless, there are still hard yards to cover. A McKinsey study of risk management practice in May 2002 covered 200 directors representing over 500 boards of major companies. Thirty-six percent of the directors believed their boards did not understand their companys major risks. Approximately 40 percent believed they could not identify, safeguard and plan for risk effectively enough. The same percentage believed that non-financial risk received only anecdotal treatment in the boardroom (Protiviti 2003). Research published by Financial Executives International in November 2001, for example, claimed that 65 percent of senior executives lacked high confidence in their risk management. FEI reported that doubts persisted over the extent to which existing processes could be relied upon to identify all potentially significant business risks to their enterprises (Protiviti 2003). 3.6 3.6.1 Public Sector Governance and Risk Auditor General Victorias Audit Report

The Auditor General Victorias performance audit report in March 2003, Managing Risk Across the Public Sector aimed to provide a timely assessment about risk management practices at individual agency and whole-of-government or State-sector levels. The report noted the effort to establish a formal and structured focus on risk across all industries and the integration of business risk with other more technical or financial risk assessment that began with first establishment of the Australian and New Zealand Standard, AS/NZS 4360:1999 Risk Management in 1995. The report found that the Victorian State public sector was increasingly applying a structured risk management approach, though not necessarily that suggested by the Standard. Key drivers in that State included the Victorian Managed Insurance Authority Act 1996, the Financial Management Act 1994, the Victoria Governments Management Reform Program and policies associated with private-public sector service and infrastructure delivery such as Partnerships Victoria. However, the Auditor General found that in over three quarters of public sector organisations, Boards/CEOs and executive management were directly involved and taking leadership roles regarding risk management. Nevertheless he concluded: Although more than 90 percent of the States public sector organisations examined and applied risk management processes in some part of their business and services, risk management across the State public sector was not yet an established or mature discipline. Nearly one third of all organisations were still not explicitly identifying and assessing their key risks. Nor were they always reporting risk information to their key internal and external stakeholders.

Risk & Reliability Associates Pty Ltd

3.5

Paradigms Improvement was needed in the ability of organisations to identify their key state-sector risks. While various entities might have an adequate view of their own risk exposures, they did not all understand how their exposures would impact other agencies or the State as a whole. The likelihood therefore existed that significant State-sector risks were going undetected and under managed. . There was a lack of clarity around the responsibility for the escalation of these risks and a lack of a full understanding of State-sector risks within portfolios. Certain risk types could therefor go undetected at a State-sector level and the risk persisted that insufficient risk mitigation strategies could be implemented from a whole-of-state perspective. Most agencies had no existing structure to share risk management best practice across the State-sector The practice was still prevalent of reviewing risk strategies and assessment as a separate annual exercise or through periodic Board presentations.

The Auditor General Victoria report advised that risk management should not be an annual or infrequent exercise, but should be imbedded into usual business processes. Is said, Risk leadership, appetite and culture should be monitored constantly. And there should be reliable access to demonstrated risk management good practices in other public sector organisations as well as up-to-date information on key success factors or benchmarks. 3.6.2. UK Strategy Unit Study

Britains Prime Minister, Tony Blair, recently directed his UK Strategy Unit to conduct an in depth study of modern risk, and how governments might better manage it. Despite improvements across government, Blair admitted that risk management in the UK had been found wanting in a number of recent policy failures and crises. What government needed to know was how to get the right balance between innovation and change on the one hand, and avoidance of shocks and crises on the other. This was now central to the business of good government. Blair instructed the Strategy Unit to draw on good practice and thinking around the world - from across government, the private sector, and other experts and commentators. Even prior to Blairs directive to the Strategy Unit, the UK government had already made changes to its approach to risk. Blair described these as radical, and referred in particular to bodies like the UK Food Standards Agency, the Human Genetics Commission and the Monetary Policy Committee. He said these bodies illustrated the trend to more open processes, based on evidence, arguing that such processes were more effective at handling risks and winning public confidence than secrecy. He also pointed to the Civil Contingencies Secretariat whose aim was to improve the way the UK prepares for threats of serious disruption to the nation. One of the Units early conclusions was that it was not only the accelerating pace of change in science and technology and the greater connectedness of the world that was heightening the risk environment for government. Escalating risk, especially political risk, was also due to rising public expectations... [and] declining trust in institutions, declining deference, and increased activism around specific risk issues, with messages amplified by the news media. The report concluded that, although improved, risk management by the UK government was still inadequate to the burgeoning challenge. It needed to keep constantly under review where risk management should best sit. It should strive for continuous improvement through good judgement supported by sound processes and systems.

3.6

Risk & Reliability Associates Pty Ltd

Governance On the changing nature and severity of risk it referred to unforeseen events, programmes going wrong, projects going awry including: manufactured risks. That is, those requiring governments and regulators to make judgements about the balance of benefit and risk across a huge range of technologies from genetically modified food and drugs, to industrial processes or cloning methods. direct threats. For example, events of September 11 to the threat of chemical and biological attack. risks resulting from the increasing vulnerability of citizens to distant events. For example, those ranging from economic crises on the other side of the world to attacks on IT networks, diseases carried by air travellers, or the indirect impact of civil wars and famines. safety risk issues. For example, those arising from BSE, the Measles, Mumps and Rubella (MMR) vaccine, and such other issues of risk to the public regarding, for example, rail safety, adventure holidays, flooding; imposed risks. Those imposed on the public by individuals or businesses that necessitate government regulatory intervention; risks of infrastructure disruption from industrial action, protest or failure of transport or IT networks; risks to government from the transfer of risk. For example, in capital projects and service delivery to the private sector; risks of damage to governments reputation in the eyes of stakeholders and the public that impact governments ability to carry out its programs.

The report recommended action in six main areas. systematic, explicit consideration of risk should be firmly embedded in governments core decision-making processes (covering policy making, planning and delivery) government should enhance its capacity to identify and handle strategic risks, with improved horizon scanning, resilience building, contingency planning and crisis management risk handling should be supported by best practice, guidance and skills development organised around a risk standard departments and agencies should make earning and maintaining public trust a priority in order to help them advise the public about risks they may face. There should be more openness and transparency, wider engagement of stakeholders and the public, wider availability of choice and more use of arms-length bodies such as the Food Standards Agency to provide advice on risk decisions. Underpinning principles for handling and communicating on risk to the public should be published for consultation ministers and senior officials should take a clear lead in handling risk in their departments driving forward improvements, making key risk judgements, and setting a culture which supports well judged risk taking and innovation the quality of risk handling across government should be improved through a two-year programme of change, linked to the Spending Review, and clearly set in the context of public sector reform (the Departmental Change Programme).

The report said its recommendations aimed to enable confident decision taking on both risk and innovation in order to reduce waste and inefficiency and lead to fewer unanticipated problems and crises that may undermine confidence and trust. Risk & Reliability Associates Pty Ltd 3.7

Paradigms Guideline principles were suggested to cover difficult areas. The report noted, for example, that governments normally seek to ensure that those who impose risks on others bear the consequences. But cases arise where responsibility cannot be attributed to any specific individual or agency. The report recommended that governments aim to ensure that responsibility rests with those best placed to manage the risk. It said that this should include protecting minority interests by balancing risks between different groups. Where the consequences of a risk are too great for any one individual or business to bear, the Unit recommended that government should intervene to provide protection or to pool the risk. Where the market cannot provide sufficient cover and the consequences are unacceptable, it believed the government should step in as insurer of last resort. Government might also need to intervene where market provision is withdrawn in response to an external shock. A case in point was the inability or unwillingness of airline companies after September 11 to bear the costs of enhanced airport and aircraft security. When the study was completed the UK Prime Minister introduced the report with a caution against the sort of unwarranted risk avoidance that results in unnecessary loss of promising opportunities: All life involves some risk, and any innovation brings risk as well as reward - so the priority must be to manage risks better. We need to do more to anticipate risks, so that there are fewer unnecessary and costly crises, like BSE or failed IT contracts, and to ensure that risk management is an integral part of all delivery plans. But we also need to be sure that innovations are not blocked by red tape and risk aversion, and that there is a proper balance between the responsibilities of government and the responsibilities of the individual. (The UK Strategy Units report itself is available on http://www.number10.gov.uk/SU/RISK/risk/home.html). 3.7 Risk and Corporate Citizenship

Clearly, the corporate goal of maximising returns for shareholders is no longer acceptable as the magic ethical bullet that justifies any means. The CACG states: Good corporate governance requires that the board must govern the corporation with integrity and enterprise in a manner which entrenches and enhances the licence it has to operate. This licence is not only regulatory but embraces the corporations interaction with its shareholders and other stakeholders such as the communities in which it operates, bankers and other suppliers of finance and credit, customers, the media, public opinion makers and pressure groups. While the board is accountable to the owners of the corporation (shareholders) for achieving the corporate objectives, its conduct in regard to factors such as business ethics and the environment for example may have an impact on legitimate societal interests (stakeholders) and thereby influence the reputation and long-term interests of the business enterprise. The wider social impact of corporate decisions is being recognised, and a widening sense of social responsibility is being encouraged. Obviously, this expands the area of risk that can now impact the business enterprise through its public image and civil liability. Note the emphasis on stakeholders, not just shareholders. External as well as internal stakeholders are mentioned. That is, not only the jobs and working lives of the corporations employees are involved, but those outside among the public affected by corporate activities. Note that the CACGs reference to shareholder interest is restricted to the legitimate interests of shareholders. Shareholders are described now as only one among a number of stakeholder groups.

3.8

Risk & Reliability Associates Pty Ltd

Governance Likewise the reference is not to vague, generalised industry standards and relevant statutory obligations but to keeping up with best business practice a more specific and demanding term. CACG principles point to increasing recognition of the wider impact of corporate decisions on the community. Attention is focussing on how the corporation should relate to the community, including the extent of social responsibility over and above an organisations obligation to shareholders, the law, and the bottom line. 3.8 Fallout Severity

In the contemporary climate failures of corporate governance can result in very public fallouts with severe consequences not only for the corporation, its shareholders and stakeholders, but also for individual managers. In Australia this was most recently illustrated when on 24 March, 2004, the Australian Prudential Regulation Authority's (APRA) review of the foreign currency trading scandal at the National Australia Bank became public. Irregular currency options trades had incurred losses of $360 million at the National Australia Bank. It led to the sackings of four traders, other executive departures, and a change of chairman and chief executive. Media reports highlighted that the NAB had to halt its latest share buy-back in order to lift its capital adequacy ratio; the bank was not able to use its own internal measure of market risk capital; and its currency options desk was also halted for proprietary trading and corporate business. (For example, ABC Radio National Report, 24.3.04) The APRA review said that there were: many missed opportunities to detect and close down the irregular currency options trades; management at the bank had turned "a blind eye" to known concerns; back-office procedures had significant gaps; executive risk committees were "particularly ineffective"; and the bank's board was not sufficiently pro-active on risk issues.

The regulator's report criticised what it called a culture where risk management controls were seen as "trip-wires to be negotiated rather than presenting any genuine constraint on risk-taking behaviour". The regulator says it frequently came across the phrase "profit is king" in its investigations. The chairman of the Australian Shareholders Association, John Curry, commented that it was not sufficient for the audit committee to say that they didn't receive some of the information they should have received. The audit committee should have been out there asking questions and probing and finding out whether the systems were correct or not." (ABC Radio National, 24.3.04)

Risk & Reliability Associates Pty Ltd

3.9

Paradigms 3.9 Basic Principles of Good Corporate Urban Governance

It is also worth noting that an inter-agency grouping is seeking to get the UN General Assembly to adopt the following principles for good urban governance. The campaign proposes the following concepts as goals not merely for rhetorical declarations, but for operational implementation: sustainability subsidiarity equity efficiency transparency accountability civic engagement citizenship security.

Given the efforts by the Commonwealth Heads of Government, UN agencies and numerous nongovernment bodies, it is reasonable to anticipate sharper, more critical public scrutiny and reaction to actual or perceived corporate failure to live up the new standards of good governance. Companies should be prepared to face rigorous public probing during the fallouts that will certainly follow any such occurrences. In regard to good governance in general whether governmental quasi-governmental or corporate, the Commonwealth Association for Corporate Governance (CACG) believes the following elements are essential: efficiency, probity, higher levels of conduct by professions and professionals, active and responsible capital providers, effective legal and regulatory regimes, reasonably competitive markets, a free and critical media.

REFERENCES Fukuyama Francis, Professor of Public Policy, George Mason University. The Independent (16/6/99) Beck, Ulrich (1992). Risk Society - Towards a New Modernity. SAGE publications. Commonwealth Association of Corporate Governances (CACGs) 15 Principles. www.combinet.net/governance/FinalVer/commonwe.htm Protivitis The Bulletin, Vol 1, Issue 7, 06/2003. Standards Australia/Standards New Zealand (1999). Risk Management. Australian/New Zealand Standard AS/NZS 4360:1999. READING

3.10

Risk & Reliability Associates Pty Ltd

Liability

4.

Liability

The law is much too important to be left up to lawyers. Australian aphorism. 4.1 Statute vs Civil Law

Civil or common law is law derived from actual cases, that is, law made by or modified by the judiciary. Common law is the product of societal values over centuries and evolved in the English courts. One party claiming damages from another brings civil cases under common law. In common law the case must be proved on the balance of probabilities. Statute law is law passed by Acts of parliament. This law takes the form of Acts and Regulations made under an Act. Statute law specifies penalties for breaches. Statutory offences require that the case against the accused be proved 'beyond reasonable doubt'. This is considerably heavier than the civil standard. Government departments and statutory authorities are responsible for the enforcement of statute law. They determine whether to prosecute for breach of statutory duty. The paper by Gumley, 2003 outlines the offences and penalties for environmental crimes in Victoria and those under OH & S legislation take a similar approach. Australian OH & S legislation is based on the U.K. Robens type legislation and is derived from the common law duty of care concept (Creighton 1996), particularly the duty of employers towards their employees. Each Australian state has its own OH & S and Environmental legislation but whilst all are very similar there are some subtle differences as to the extent of the duties. In the 1985 Victorian OHS Act, for example, the duties of employers are qualified by so far as practicable with practicable being defined as having regard to: a) the severity of the hazard or risk in question b) the state of knowledge about that hazard or risk and the ways of removing or mitigating that hazard or risk c) the availability and suitability of ways to remove or mitigate that hazard or risk d) the cost of removing or mitigating the hazard or risk. In some cases of workplace deaths, the authorities have brought charges of manslaughter under the Crimes Act. As yet no successful convictions have been obtained in Australia because the individuals charged must be shown to have mens rea or a guilty mind. In Victorian law the relevant mens rea for manslaughter is gross or criminal negligence. In some jurisdictions the concept of industrial manslaughter for workplace fatalities has been introduced because of the difficulty of proof beyond reasonable doubt. There are also some difficulties in determining which individuals represent the mind of the corporation. 4.2 Common Law Criteria

Common law actions as a result of workplace injury have largely been supplanted by Workers compensation systems, i.e. injured workers receive compensation for the impacts of injury without having to take action against the employer in the courts. However apart from a much reduced number of workplace injury cases the common law duty of care of one person to another is invoked in many aspects of modern life. For example, the organisers of any public event have a duty of care to all those involved in or potentially impacted by the event. It is the common law duty of owners and occupiers of premises to ensure they are safe for members of the public who have access. Failure to do so may be negligent, and can lead to the significant costs associated with common law claims, and can also lead to statutory penalties for responsible individuals if the responsible government authority decides to act.

Risk & Reliability Associates Pty Ltd

4.1

Liability To be found guilty of negligence, the answers to all four of the questions posed below, on the balance of probabilities, needs to be Yes. These are termed the four common law tests of negligence. A. CAUSATION Did the injury or damage occur because of the 'unsafe' matter on which the claim of negligence is based? FORESEEABILITY Did you know or ought you to have known... ? Could this have been foreseen...? (Prior incidents, complaints, wide or common knowledge, or expert advice) PREVENTABILITY Is there a practical way or alternative to how things were done? (Design or removal; administration and training). REASONABLENESS Was the balance of the significance of the risk vs the effort required to reduce it reasonable?

B.

C.

D.

Note that: approved or common practice may or may not be reasonable. compliance with regulations and codes of practice is a starting point, not a goal. For example BS 5760 : Part 12 : 1993 (page iii) states, Compliance with a British Standard does not of itself confer immunity from legal obligations. the occupier/employer must be practically able to undertake the change. expense alone is not a factor, nor is practical inconvenience the creation of other risks by the change needs to be considered. individual susceptibility needs to be considered. Because of the considerable volume of case law available to the judiciary, the application of the common law tests of negligence also provide much of the basis for decisions relating to cases of offences under OH & S and Environmental law. These tests of negligence require expert evidence; lawyers cannot decide them. Most common law cases never reach court because the lawyers settle out of court. If there is no evidence of significance that would lead a judge or a jury to derive a no answer to any of the four tests above, the lawyers for the defendant can only accept defeat and settle for a relatively large sum. 4.3 On Juries and Justice

With regard to common law actions for negligence described above, a jury sometimes determines the balance of probabilities. It seems that juries can be affected by the horror of the injuries and other matters so that even if the assessment might be less than 50% in favour of the plaintiff, the jury will still find in the plaintiffs favour. But juries are complex. For an extreme example consider the following case from a sitting of the District Court, composed of the presiding Judge and Jury in Dubbo, NSW. (As quoted by the Hon. James Muirhead QC in Discharge the Jury? Menzies School of Health Research, 1989). The accused, a local man, was charged with cattle stealing. Apparently the evidence that he had stolen the cattle was overwhelming. The local jury having considered their verdict returned to court. When asked for their verdict the foreman replied, 'We find the defendant not guilty if he returns the cows.' The Judge was furious. He vigorously reminded the jury of their oaths to 'bring in a true verdict according to the evidence', declined to record their verdict and sent them back to the jury room to reconsider the verdict. The jury retired briefly and returned with a defiant air. When asked if they had reconsidered their verdict the foreman said 'Yes, we have. We find the accused not guilty and he can keep the cows.

4.2

Risk & Reliability Associates Pty Ltd

Liability There are several points about the adversarial system that need to be remembered. It is first and foremost a court of law. As Engineers Australia notes in the brochure Are You at Risk (1990): Adversarial courts are not about the dispensing justice, they are about winning actions. In this context, the advocates are not concerned with presenting the court with all the information that might be relevant to the case. Quite the reverse, each seeks to exclude information considered to be unhelpful to their side's position. The idea is that the truth lies somewhere between the competing positions of the advocates. Further, courts do not deal in facts, they deal in opinions. Again from Are You at Risk : What is a fact? Is it what actually happened between Sensible and Smart? Most emphatically not. At best, it is only what the trial court - the trial judge or jury - thinks happened. What the trial court thinks happened may, however, be hopelessly incorrect. But that does not matter - legally speaking. That is, in court, the laws of man take precedence over the laws of nature. 4.4 Due Diligence

The primary defence against negligence claims is due diligence. This really means that a reasonable person (in the eyes of the court and with the advantage of 20:20 hindsight) in the same position would have undertaken certain procedures and processes to ensure whatever it is that did happen, on the balance of probabilities, shouldn't have occurred. This is probably best represented by the diagram below (adapted from Sappideen and Stillman 1995).

Magnitude of Risk Probability of Occurence Severity of Harm

Expense Difiiculty and Inconvenience Utility of Conduct

How Would a Reasonable Defendant Respond to the Foreseeable Risk? The overall situation is perhaps best summarised by Chief Justice Gibbs of the High Court of Australia: Where it is possible to guard against a foreseeable risk, which, though perhaps not great, nevertheless cannot be called remote or fanciful, by adopting a means, which involves little difficulty or expense, the failure to adopt such means will in general be negligent.
Turner v. The State of South Australia (1982) (High Court of Australia before Gibbs CJ, Murphy, Brennan, Deane and Dawson JJ).

Risk & Reliability Associates Pty Ltd

4.3

Liability The balance is the hard part. It is hard for outsiders to know the true extent of the resources (financial, administrative and/or staff) ultimately available to an organisation. This means external assessment as to the correctness of the balance is difficult and something an individual organisation must do internally. The legislated hierarchical order of risk control solutions is: i) ii) iii) iv) Elimination or Removal (100% effective) Design or engineering (typically 90% effective) Administration (typically 50% effective) Training (typically 30% effective).

Another way of expressing the courts reluctance to rely on training and administrative controls is to see it in the context of a cause-consequence model. A concept diagram is shown below:
Falling objects on construction site Threat PPE Hardhat Loss Loss of Control Incident

Precaution Failure Kickrails to restrain small objects

Near Miss

Concept Cause-Consequence Model Primary controls include kickboards on platforms to prevent objects from being dislodged and falling in the first place. Note that personal protective equipment (in this case a hard hat) improves the probability of a near miss but that the system was out of control already once an object had started to fall. This needs to be taken into consideration when assessing the balance noted above. Specifically, it is imprudent, and indeed unlawful, to rely on administrative and training solutions when a design solution, on balance is available.

4.4

Risk & Reliability Associates Pty Ltd

Liability 4.5 Safety Cases

Safety Cases provide for a very interesting perspective in the liability context. Historically they were developed to optimise safety performance. There are parallels to a Business Case, which is usually drawn up to convince a financier that a business is viable (Redmill et al., 1997). The object is to ensure that all significant factors affecting the business have been identified and that appropriate measures are in place to maximise the positive factors and minimise the negative ones. It is usually the responsibility of the highest levels of management of the organisation. Accordingly, responsibility for the failure of a business usually rests there too. A Safety Case is intended to provide the same assurance with respect to the safety of a system or complex. Again it is primarily the responsibility of the operating company, at its highest levels.
Board

Safety Audit

Safety Management System

CEO

Business Management System

Financial Audit

Middle Management

Business Units
Idealised Safety Case Structure Safety Cases are in effect reasoned (legal) arguments that all significant hazards have been identified, properly managed and are safe. Once established, it typically manifests itself as a contract between an organisation and a regulator that permits the organisation to operate within defined limits in accordance with documented procedures. Compliance failure is a breach of contract. If damage to third parties, or death and injury occur due to such breaches then serious liabilities arise. Because of this, the adversarial legal process seems to have converted the concept to a liability management device. This is discussed further in the next section. Quality type processes are good to ensure compliance with the contract. However they are less effective in establishing the Safety Case initially, or in the argument for its subsequent redevelopment. Risk analysis is essential for the Safety Cases initial development and continuing validity.

Risk & Reliability Associates Pty Ltd

4.5

Liability 4.6 Adversarial Legal System Contradictions

Arising from the above review, there appear to be some profound contradictions being created in risk control and the adversarial legal system. Firstly, the emerging view that risk control failures arise from systemic (being strategic or policy) errors (Reason, 1993) does not appear to create liabilities for the policy or strategic decision makers. Rather it imposes the responsibility to be diligent (with all the subsequent liability) on those who actually have to implement such policies. It is also interesting to note that for senior management and board members at least, liability management is identical to consequence management. Frequency and therefore risk management is not really an issue. If a serious loss event can credibly occur (in legal terms it is possible) then it must be managed. The fact that it occurs very, very rarely is not relevant. To paraphrase a judge in NSW; "What do you mean you didn't think it could happen; there are seven dead". This liability impact has had a great effect on the development of safety cases. The Victorian Major Hazards legislation, for example, indicates that the chief executive officer or the most senior officer resident in the state of Victoria shall sign off the safety case. Passing a potential safety case via two sets of lawyers in the loop shown below changes its nature from being a wholly technical statement of safety by technical persons to a liability management device, a substantial development.

Board
Policy

Corporate legal sign off

In house legal advice

Middle management assessement and attempted feedback

Requested resources, $, time & people

Safety Case Development Loop Secondly, the notion of a statute is that it represents a law that a citizen can choose to obey or not (we have free will). If it is not obeyed then a penalty will be imposed. Ignorance is no excuse. However, if a policy or system of management created the circumstances leading to the failure, meaning the individual did not really understand the risk framework being imposed, then a very difficult contradiction occurs. That individual has to have knowledge mastery of the total social/legal/technical risk control system in which he or she works so that potential problems can be demonstrated to those same policy makers in ways the policy makers cannot legally avoid. Otherwise the responsibility (and liability) cannot be restored to those higher management echelons. The parallel is to a soldier being commanded to perform crime. A soldier is trained to obey orders it is part of his work culture but he needs to have knowledge of societal law and mores as well. He has to know when to refuse an illegal command; otherwise he can find himself being charged with a crime. 4.7 Risk Auditing Systems

Risk auditing rating systems like Victorian Governments SafetyMAP, the NSCAs 5 star system or Det Norske Veritass International Safety Rating System are also interesting in this context. Whilst they may provide indications as to the overall heath of risk control systems, they are not a direct defence against liability arising from a particular accident even if perfect scores had been consistently attained by the participating organisation.

4.6

Risk & Reliability Associates Pty Ltd

Liability REFERENCES Creighton W B (1996). Understanding Occupational Health and Safety Now in Victoria. CCH Australia, Sydney. Gumley W (2003). Environmental Crimes: Offences and Penalties in Victoria. Corporate Misconduct ezine: http://www.lawbookco.com.au/academic/Corporate-Misconduct-ezine/html-files/articles.asp Engineers Australia (1990). Are You at Risk? Canberra. Muirhead J. Discharge the Jury? Menzies School of Health Research, (1989) Reason J (1993). Managing the Management Risk: New Approaches to Organisation Safety , Chapter 1 of Reliability and Safety in Hazardous Work Systems: Approaches to Analysis and Design. Eds I Wilpert et al. Lawrence Erlbaum Associates Ltd, East Sussex. ISBN 0-86377-309-5. Redmill, Felix and Jane Rajan (1997). Human Factors in Safety Critical Systems. ButterworthHeinemann, Oxford. Sappideen C and R H Stillman (1995). Liability for Electrical Accidents: Risk, Negligence and Tort. Engineers Australia Pty Limited, Crows Nest, Sydney. Turner v. The State of South Australia (1982). High Court of Australia before Gibbs CJ, Murphy, Brennan, Deane and Dawson JJ). READING Smith Damien J (1986). Engineers & Professional Negligence. Enterprise Care, 1st Floor, 21 Burwood Road, Hawthorn, Victoria, 3122. Reprinted 1994. ISBN 0 646 09785 7.

Risk & Reliability Associates Pty Ltd

4.7

Causation

5.

Causation

We all have our philosophies, whether or not we are aware of the fact, and our philosophies are not worth very much. But the impact of our philosophies upon our actions and our lives is often devastating. This makes it necessary to try to improve our philosophies. (Paraphrased from Karl Popper, 1972). The way in which we believe things occur determines how we will respond and attempt to manage them. Risk analysis in many ways is an examination of our philosophies or prejudices using processes that can withstand judicial scrutiny. It is therefore culture, time and place specific. As mentioned in Chapter 1, if one believes that people are dying from the plague because of selective retribution from God for past sins, then the way this risk is managed will be different from that for a society, which believes in germ theory. In business terms for example, the world is often regarded as a wholly commercial place with everyone acting in a self-interested manner. If this is true, then what one does to prosper and minimise risk will differ from the actions of those who believe in a more humanistic view of human behaviour and responsibility. The natural material world on the other hand tends to be considered as deterministic or probabilistic in nature and subject to natural laws. This creates some profound contradictions. Engineers believe they can change the future using materials and systems that behave predictably. But if the engineers themselves are predictable, can their actions be similarly predetermined? Our courts have similar problems. How can someone be convicted of a crime if his or her behaviour was predetermined by his or her situation and circumstances? 5.1 Paradigms

Paradigms (Kuhn 1970), or a set of concepts shared by a community of scientists or scholars, are fundamental issues. Alternate names given to describe these different views of how things happen include worldviews, or weltanschauung. Consider a simple example by comparing the views of some insurance authorities with those of risk engineers. Suppose that a certain class of car was having more accidents than most. An engineer investigating this might conclude that it was due to malfunctioning brakes. If so, a product recall would be made and the problem fixed. A month or two might go by to ensure that the accident frequency really did go down abruptly (a step function) and if so the matter, from the engineer's perspective, would be closed. That is, the causal effect between malfunctioning brakes and accidents had been established and the problem solved. Now consider the insurers. They will have increased premiums for this class of car once the accident rate increased. Once the accident rate drops to the same as all other cars the engineer might expect an immediate drop in the premiums. However, underwriters tend to have a probabilistic view of things. The premiums will almost certainly be averaged over several years and drop progressively, not abruptly. That is, underwriters usually have a probabilistic view on the universe, not a causal one. There are some interesting management shifts occurring. Maruyama (1974) describes three simplified pure paradigms or structures of reasoning shown below. Some of the views in the table are prior to current modern views regarding community consultation especially for environmental risk assessment where community consultation is a prime source for legitimacy.

Risk & Reliability Associates Pty Ltd

5.1

Causation It appears that a shift has occurred from paradigm 1 to paradigm 3 in the last 25 years. (1) Unidirectional Causal Paradigm Traditional 'cause and effect' model Past and future inferable form (2) Random Process Paradigm Thermodynamics; Shannon's information theory Information decays and gets lost; blueprint must contain more information than the finished product. Decaying universe Individualistic Decentralisation Anarchistic Nominalism Isolationist Haphazard Freedom of religion Do your own thing Inductive, empirical Atomistic Why bother to learn beyond ones own interest. Statistical There is a probability distribution; find out probability distribution. What does it do to me? Limited categories for own use Egocentric (3) Mutual Causal Paradigm Post-Shannon information theory. Information can be generated. Non-redundant complexity can be generated without pre-established blueprint. Self-generating and selforganising universe Non-hierarchical interactionist Heterogenistic coordination Cooperative Network Symbiotic Harmony of diversity Polytheism harmonism Elimination of hardship on any individual Complementary Contextual Polyocular; must learn different views and to take them into consideration. Relational, contextual analysis, network analysis. Dissimilar results may come from similar conditions due to mutually amplifying network. Network analysis instead of tracing of the difference back to initial conditions in such cases. Look for feedback loops for selfcancellation or self-reinforcement. Changeable categories depending on situation. Most direct source of information, articulate in their own view, essential in determining relevance Generated by community people.

Science: Information:

Cosmology: Social organisation: Social policy: Ideology: Philosophy: Ethics: Aesthetics: Religion Decision process: Logic: Perception: Knowledge:

Predetermined universe Hierarchical Homogenistic Authoritarian Universalism Competitive Unity by similarity and repetition Monotheism Dictatorship, majority rule or consensus Deductive, axiomatic Categorical Believe in one truth. If the people are informed, they will agree. Classificational, taxonomic Dissimilar results must have been caused by dissimilar conditions. Differences must be traced to conditions producing them. 'Impact' analysis Pre-set categories used for all situations Ignorant, poorly informed, lacking expertise, limited in scope By 'experts'; either keep community people uninformed, or inform them in such a way that they will agree

Methodology : Research hypothesis and strategy:

Assessment: Analysis: Community people viewed as: Planning:

Laissez-faire

Three 'Pure' Paradigms after Marayama (1974)

5.2

Risk & Reliability Associates Pty Ltd

Causation 5.2 5.2.1 Biological Metaphors Reason's Pathogens

James Reasons (1993) resident pathogen model of how things go wrong is described in the figure below. The idea is that latent failures in technical systems are analogous to resident pathogens in the human body, which combine with local triggering factors, for example, life stresses or toxic chemicals, to overcome the immune system and produce disease. Like cancers and cardiovascular disorders, accidents in defended systems do not arise from single causes. Rather, they occur as a result of the adverse conjunction of several factors, each necessary but none sufficient to breach the defences alone. And, as in the case of the human body, no technical system can ever be entirely free of pathogens.
Fallible decisions

Latent failures (high level decision makers)


Latent failures (line management)

Line management deficiencies

Preconditions Latent failures (preconditions) for unsafe acts

Unsafe acts

Active failures (productive activities)

Failed or absent defences

Active and latent failures (defences)

Accident
Reasons Resident Pathogen Metaphor Model Such a view leads to a number of views about accident causation: a) b) c) d) e) f) g) h) Accident likelihood is a function of the number of pathogens within the system. The more complex and opaque the system, the more pathogens it will contain. Simpler, less well-defended systems need fewer pathogens to bring about an accident. The higher a persons position within the decision making structure of a system, the greater the opportunity to spawn pathogens. Local triggers are hard to anticipate. Resident pathogens can be identified pro-actively. Neutralising pathogens (latent failures) are likely to have more and wider ranging safety benefits than those directed at minimising active failures. The establishment of diagnostic organisational signs will give general indications of the health of the high-hazard technical system.

Risk & Reliability Associates Pty Ltd

5.3

Causation 5.2.2 Kauffmans' Complexity

Kauffmans view (1995) is interesting in terms of organizational behavior. We may have our intentions, but we remain blind watchmakers. We are all, cells and CEOs, rather blindly climbing deforming fitness landscapes. If so, then the problems confronted by an organization cellular, organismic business, governmental or otherwise - living in niches created by other organizations, is preeminently how to evolve on its deforming landscape, to track the moving peaks. Tracking peaks on deforming landscapes is central to survival. Landscapes in short are part of the search for excellence - the best compromises we can attain. 5.2.3 Dawkins' NeoDarwinism

In the context of modelling complex technological systems, Richard Dawkins computer based artificial selection (his Biomorphs) provides some fertile parallels for risk and reliability engineers (Dawkins 1986 and 1998). In practice this boils down to modelling a complex system in a virtual reality environment and playing endless what if scenarios. An example was discussed in Section 2.7. Other examples, which you may be, more familiar with are Flight simulators for aircraft pilot training, road traffic modelling for designing road traffic control and simulation modelling of nuclear explosions. The last of these has been influential in convincing governments to sign nuclear test ban treaties. Obviously, these require fearsome computer power and an extensive interpretation of nature. And, a belief that hyper-reality can come close to reality. 5.3 Discrete State Concepts

State models are based on the notion that any system with different ways and combinations of achieving similar outcomes can be described by a number of distinct, mutually exclusive, independent states. That is, failure or change in one state or condition is independent of the other. Once established, any of the defined operating states can be attained. The sequence to achieving each state may not be important per se. It is the time the system is in each state and the likelihood of transiting between states. Block diagrams and other graphical methods can be used to illustrate the system. The one immediately below depicts a redundant accounting system to ensure that correct accounts are kept.

Accounting System

Auditing System
A Redundant Accounting System There are three possible states: State (0) State (1) State (2) Both systems operating One system operating Both systems failed

5.4

Risk & Reliability Associates Pty Ltd

Causation The likelihood of failure of the second system once the first system has failed may well be different to the likelihood of failure when both systems are operating. Depending on which system has failed, the restoration rate of the other may also be different. These three states and transitions can be represented in different ways, such as in the following figure:

1st system fails

2nd system fails

1st system restored


State Transition Diagram Markov chain analysis is the most common form of state analysis technique as it assumes a constant failure rate and restoration rates. Other probability distributions can be used with more difficulty. The conceptual problem with the technique is defining all the possible system states. It can get very complex very rapidly. Ignoring partial states can render the analysis difficult. 5.4 Time Sequence

One of the central concepts of causation is conjunction in time and space. The courts reflect this in the form of a chronology of events leading up to the "crime". For lawyers, the time sequence is defined by a list of events described in words down a page. For engineers it is usually an arrow of time going from left to right across a page. This general idea can easily be extended to most events, such as fire in a building as shown below.
Ignition Smoke Flame Flashover Escalation Burnout Time

Time Sequence Model of Fire Having developed such a simple model it can be extended. The time sequence below for fire in a building was developed to satisfy underwriting concepts. It has the same two parts of the risk equation, how likely the fire is to develop (the inception risk) and how severe the consequences (the propagation risk) are likely to be.
Inception Risk
Propagation Risk

Smoke Loss Expectancy

Thermal Loss Expectancy

Maximum Forseeable Loss

Supporting Conditions
Housekeeping, dust control, storage arrangements, construction, etc

Ignition

Fire Development

Fire reaches Smoke significant, Detection detectable size


Alert staff, Smoke detectors

Thermal Detection

Sprinklers and/or foam

Passive fire control


Burnout

Time

Smoking, wiring, welding, static sparks, etc

Rate of fire growth, combustible loading, spill systems, etc.

Firewalls, Space separation

Evacuation and Brigade response can commence

Generalised Time Sequence Model for Fire

Risk & Reliability Associates Pty Ltd

5.5

Causation Note that the fire development or growth rate is not linear once flaming has commenced as shown in the following figure:
Very Rapid Fire Growth Smoke Flame

Time

Representative Fire Curve Different analysts have developed different time sequence models for different problems at different times. Heinrich's domino model of causation (derived in the 1940s) has a particular focus (Heinrich, 1959), shown below.
1. Ancestry and Social Environment 2. Fault of Person 3. Unsafe act or/ unsafe mechanical or physical condition

1 2 3

4. Accident 5. Injury

4 5

Removal of middle domino breaks the chain

Heinrich's Domino Model Such a model suggests that accidents are ultimately derived from an individuals ancestry and social environment. That is: 1. 2. 3. 4. 5. People are born with and/or are socialised to develop faulty personal characteristics such as recklessness, stubbornness, avariciousness and the like. Inherited or acquired faults of a person including recklessness, violent temper, nervousness, excitability and inconsiderateness constitute proximate reasons for committing unsafe acts or permitting the existence of physical hazards. Unsafe acts or performance will occur. Accidents will occur (falls of persons, being hit etc). Then injuries will result.

The point of his model is that if one link in the domino can be removed (domino 3) then the chain will be broken. This supports the modern thrust in legislation and views such as those expressed by Kletz (1985) in his text, An Engineers View of Human Error. Kletz notes that saying that accidents are the result of human failings may or may not be true, but it is certainly not helpful in risk control terms.

5.6

Risk & Reliability Associates Pty Ltd

Causation Rowes Risk Estimation Model, (see figure below) is directed at hazards with multiple pathways to damage situations (Rowe, 1977). This is particularly appropriate for some large chemical incidents and nuclear reactors, for example, where direct radiation, radioactive dust fallout and entrainment in the food chain can all provide cumulative doses to the exposed group. A risk agent is a person or group of persons who evaluate directly the consequences of a risk to which they are the subject. The arrows indicate that in principle multiple pathways can lead from one element to the other. Each pathway can have a probability associated with its occurrence.
Causative event The causative event is the beginning in time of an activity

Outcome(s)

The final result of an activity initiated by a causative event

Exposure(s)

The condition of being vulnerable in some degree to a particular outcome of an activity, if that outcome

Consequence types

The impact to a risk agent of exposure to a risky event

Consequence values

The importance of a risk agent subjectively attaches to the undesirability of a specific risk consequence

Rowe's Risk Estimation Model Ishikawa (1985) Fishbone diagrams, shown below are another form of time sequence model often used by quality control advisers.
Material Machine Measurement

Quality characteristics

Effect

Milieu

Man Cause Factors Process

Method Characteristics

Ishikawa 'Fishbone' Diagram Ishikawa also refers to them as cause and effect diagrams (Ishikawa, 1985). The effect is found at the right hand end. The words appearing at the tips of the main branches are causes or so called cause factors. The collection of these cause factors is a process. The minor branches are inputs to the cause factors or sub-causes. The object of the exercise is to improve the quality characteristics by identifying the most important cause factors and adjusting them appropriately.

Risk & Reliability Associates Pty Ltd

5.7

Causation 5.5 Energy Damage

From an engineering perspective, it is observed that injury damage and ill health are the result of the loss of control of damaging energies. Such a concept has a number of useful consequences. Firstly, establishing where energy can be released to affect people, (conjunction in time and space) provides a simple basis for determining vulnerabilities in a complex system. Secondly, the nature of the energy release provides insight into control options. For example, kinetic energy is proportional to the square of the speed of a vehicle. Going twice as fast means that 4 times the energy would be released on impact. A list of damaging energies is shown below. External Energies Potential Energies Kinetic Energy 'Flowing' Mechanical Energy Acoustic and other Vibrating Energy Electrical Energy Ionising Radiation Thermal Radiation Chemical Energy Micro-biological Muscular Energy Internal Energies Whole or Part-Body Mass Energy Muscular Energy gravitational structural strain compressed fluids linear and rotational motion mechanical power in machinery noise and mechanical vibration electrical potential energy (volts) electric-,magnetic radiation electrostatic charge nuclear particles and radiation solids, fluids, flames ambient condition fire, explosion toxic effects corrosive effects infections, parasites, bacteria, virus, etc purposeful (attacks) and inadvertent gravitational, potential and/or kinetic energy, for example: walking/running or swinging/moving the limbs overload, overuse and postural energy levels Damaging Energies 5.6 Energy Damage Models

Energy damage concepts define a hazard as the source of energy. So a brick on the floor is not a hazard in itself, rather it is the potential energy of the person who trips. This sometimes seems trivial but in one expert witness case, for example, the authors considered the situation of an electrician who received an electric shock whilst on top of a ladder. He subsequently fell and hit his head on the concrete floor resulting in serious injury. The hazard in this case was the gravitational potential energy that was released during the fall that could have been controlled by wearing a hard hat. The electric shock represents only a possible reason for the fall and not the primary hazard source. Energy damage models are particularly effective in establishing control options. Haddon (1973), for example, suggests 10 generic counter strategies: i) ii) iii) iv) v) vi) vii) viii) ix) x) 5.8 Prevent marshalling of energy (dont climb to a height} Reduce energy marshalled (reduce speed) Separate in time and space (install road traffic signals) Prevent the release of energy (fit guard rails) Separate by a barrier (install guards) Modify release rate of energy (reduce slope) Strengthen structure (fire proof buildings) Modify surface impact (remove sharp edges) Detect, counteract damage (fire sprinklers) Optimise repair (rehabilitation) Risk & Reliability Associates Pty Ltd

Causation These 10 strategies provide a hierarchy of control and an opportunity to recognise additional essential factors as shown below:
Time Zones Predisposing Conditions Situation Normal Moving out of control Out of Control Damage Repair

Haddon's Strategies

Prevent marshalling energy (don't climb to a height)

Separate in time or space (install road traffic signals)

Separate by a barrier (install gurardrails)

Strenthen structure (fire proof buildings)

Detect, counteract damage (fire sprinklers) Optimise repair (rehabilitation)

Reduce Energy marshalled (reduce speed)

Prevent release of energy (fit guardrails)

Modify rate or release of energy (reduce slope)

Modify surface impacted (remove sharp edges)

Strategy for Management of Energy Exchanges The energy damage concept can be represented in different ways. The figure below of the extended energy damage model (Viner, 1991) shows possible hazard control mechanisms in terms of recipient effects.
Hazard control mechanism Recipient's boundary

hazard
Space transfer mechanism

recipient

Extended Energy Damage Model The types of risk control measures, which are evident from this model, are: i) ii) iii) iv) v) Control the existence or amount of energy. Maintain the reliability of the hazard control mechanism. Remove or reduce the need for the space transfer mechanism. Raise the damage threshold of the recipient. Separate the hazard and the recipient.

This can perhaps be best explained by considering someone exposed to a noisy machine. The machine can be replaced by a less noisy device or the noise could be reduced at its source by acoustic dampening on the machine, the machine and the recipient could be separated by the installation of an acoustic hood over the machine, or the recipients damage threshold could be artificially raised by the provision of hearing protection.

Risk & Reliability Associates Pty Ltd

5.9

Causation Energy damage concepts are particularly useful for constructing cause-consequence models in assisting in the determination of the loss of control point. Whilst we may consider that in many circumstances, an incident occurs when there is a loss, at least in complex systems, the Loss of Control point is actually the incident. Unless such loss of control incidents are recorded and investigated the system is heading for a fall.
Threat Loss of Control Precaution Failure Incidents Near Miss Loss

Concept Cause-Consequence Diagram It is always better to control a hazard before loss of control point rather than respond during or after the event. Lawyers are far more prone to sign off on a management strategy that suggests that the dangerous situation will be prevented rather than relying on a rapid response strategy. Consider the hazard of fire in a building. In this case, it would be best to eliminate the source of energy, that is, the vulnerability or hazard, by using non-combustible materials. The next best alternative would be to control the hazard, for example, by installing automatic sprinklers. The least desirable (although sometimes necessary) option is to rely on human response, which occurs after the outbreak of fire, that is, after the loss of control of the latent chemical energy stored in the structure. 5.7 Conditions and Failures

A latent failure (a failure which is not detected and/or enunciated when it occurs) will disable protective mechanisms or reduce safety margins thereby increasing the risk associated with hazards due to subsequent conditions or failures. Latent failures, by themselves, do not constitute hazards (that is, by themselves they have no effect which would make them noticeable, otherwise they would not be latent, by definition). Usually latent failures affect only functions which are not relied upon in normal operation, but which provide fail-safe coverage and/or protection against abnormal conditions. (SAE ARP 4761, Appendix D) The notion of latent conditions has re-emerged in causation recently, largely as a result of James Reasons (1997) promotion of latent conditions. J L Mackie (1965) outlines a situation, which can be used to explore the concept: Suppose that a fire has started in a house, which is extinguished before it consumes the house completely. Fire investigators will investigate the cause and may conclude that it started in some wiring due to a short circuit. However, this is not a simple concept. 5.7.1 Necessary Conditions

A necessary condition is a positive condition that must be present for the incident to occur. In the example of a house fire, necessary conditions include combustible materials and an ignition source. From this definition a short circuit is not a necessary condition for a house fire as hot oil fires on stoves and children playing with matches are other well-known domestic fire sources. 5.7.2 Sufficient Conditions

For an incident to occur there must also be sufficient conditions. For example, there has to be sufficient nearby combustibles in an appropriate configuration with an adequate supply of air (oxygen) to cause a fire.

5.10

Risk & Reliability Associates Pty Ltd

Causation 5.7.3 Negative Conditions

Negative conditions are the absence of certain conditions causing a fire. For example: * * * * a correctly sized fuse (which would have prevented the short circuit in the first place), or the failure to enclose the cable in metal pipe to shield it from combustibles, or the absence of a nearby automatic sprinkler which would have minimised the fire, or the absence of a micro-meteorite that would have crashed through the area just as the fire was about to start.

Obviously, negative conditions are problematic because they can include a vast array of unpredictable 'what if' possibilities. 5.7.4 Controllable Conditions

What the fire investigators may be attempting to do is to describe those conditions that they believe should have been considered controllable. This is in some ways problematic since establishing all the relevant conditions can be a very difficult task, especially if it is deemed to include all aspects of human behaviour in the context of underlying cultural, social and economic circumstances. To establish what might be practicable, some form of probability test seems to be applied. The legal tests of causation appear to be relevant. If a negative condition were removed would it have: * * 5.7.5 controlled the situation beyond reasonable doubt? or, controlled the situation on the balance of probabilities? Latent Conditions

The notion of latent conditions seems to rest around some form of failure that is not apparent when it occurs, similar to a hidden or concealed failure in FMECA (Fault Modes, Effects and Criticality Analysis). So, like a software error, a latent condition waits until a particular pattern of circumstances arises enabling a catastrophe. In this sense, latent conditions would be controllable, possibly negative and necessary not but sufficient.

Risk & Reliability Associates Pty Ltd

5.11

Causation REFERENCES Dawkins Richard (1998). Climbing Mount Improbable. Penguin Books. His earlier works, The Blind Watchmaker (1986, Penguin Books) and The Selfish Gene (1976, Oxford University Press) are also worth reading. Haddon W (1973). Energy Damage and the Ten Countermeasure Strategies. Journal Trauma, Volume 13, Number 4, pages 321-331. Heinrich H W (1959). Industrial Accident Prevention. 4th Ed. New York, McGraw Hill Books. Ishikawa, Kaoru (1985). What is Total Quality Control? Prentice-Hall. Translated by David J Lu. Kauffman, Stuart (1995). At Home in the Universe. The Search for Laws of Self Organisation and Complexity. Penguin Books Edition 1996. (Quote is from page 247) Kletz T A (1985). An Engineers View of Human Error. IChemE, London. Kuhn T S (1970). The Structure of Scientific Revolutions, 2nd Edition, enlarged, sixth impression. University of Chicago Press. Maruyama M (1974). Paradigmatology and its Application to Cross-Disciplinary Cross-Professional and Cross-Cultural Communications. Cybernetica, No.2, pp. 136-156. Mackie J L (1965). Causes and Conditions. American Philosophical Quarterly, 2.4 (October 1965), pp 245-64 and 261-4. Reprinted as Chapter I of Causation and Conditionals edited by Ernest Sosa. Oxford Readings in Philosophy. Oxford University Press (1975). pp 15-38. Popper K R (1972). Objective Knowledge: An Evolutionary Approach. Clarendon Press, Oxford. Revised Edition 1979. Paraphrase is from Chapter 2. Reason J (1993). Managing the Management Risk: New Approaches to Organisation Safety Chapter 1 of Reliability and Safety in Hazardous Work Systems: Approaches to Analysis and Design. Eds I Wilpert et al. Lawrence Erlbaum Associates Ltd, East Sussex. ISBN 0-86377-309-5. Reason J (1997). Managing the Risks of Organisational Accidents. Ashgate Publishing Limited. Rowe W D (1977). An Anatomy of Risk. Wiley Interscience, New York. SAE ARP 4761:1996 Guidelines and Methods for Conducting the Safety Assessment process on Civil Airborne Systems and Equipment. Society of Automotive Engineers, Aerospace Recommended Practice. Viner D B L (1991). Accident Analysis and Risk Control. VRJ Information Systems, Melbourne. ISBN 0 646 02009 9

5.12

Risk & Reliability Associates Pty Ltd

Risk Criteria

6.

Risk Criteria

Risk criteria are used as a decision-making yardstick by governmental agencies, business and occasionally individuals to determine whether a risk is acceptable, tolerable or unacceptable. 6.1 Legal Criteria

A robust form of measurement can be devised around legal criteria as discussed in Chapter 4. For example, the number of actions taken by various environmental and occupational health and safety enforcement agencies against an organisation or perhaps the number of days directors might spend in jail might be considered.
HAZARDS LikeSeve- Risk lihood rity 0.1 2 0.2 0.2 0.05 0.3 0.65 0.025 0.001 0.45 0.01 0.5 0.005 0.003 3 50 2 13 260 1500 0.5 6 60 100 1 0.6 2.5 0.6 8.45 6.5 1.5 0.23 0.06 30 0.5 0 51.1 INCIDENTS AND OCCURRENCES LikeSeve- Risk lihood rity I1 0 2 0 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 : Ij Ij 0 0 0 52 1 0 2 0 0 0 0 0 1 0 3 50 2 13 260 1500 0.5 6 45 100 3 0 4 0 0 0 0 0 45 0 CLAIMS JUDICIAL PROCEEDINGS LikeSeve- Risk LikeSeve- Risk lihood rity lihood rity 0 2 0 J1 0 2 0 1 0 1 0 0 0 0 0 1 0 0 3 50 2 13 260 1500 0.5 6 45 100 0 3 0 2 0 0 0 0 0 45 0 0 50 J2 J3 J4 J5 J6 J7 J8 J9 J10 J11 : Jj Jj 0 0 0 47 0 0 1 0 0 0 0 0 1 0 3 50 2 13 260 1500 0.5 6 45 100 0 0 2 0 0 0 0 0 45 0

H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 : Hi Hi

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 : Cj Cj

Event Horizon <<<<<<Pre-Event Control / Post - Event Management >>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Concept Hazard Register The above table suggests why such an approach could be considered. Over any period of time, most hazards will not result in incidents and of the incidents that do occur only a few will give rise to claims. Most of the costs will manifest in those claims that make it to court. This ought to be a small subset of the set of all hazards and incidents. However, there are obviously other dimensions to managing risk like this. Unless one is clairvoyant it is not possible to know which hazards definitely will lead to court cases and which ones will not. So only if a company was both naive and immoral would it attempt to manage risk by trying to identify and manage only those hazards which it thought might lead to incidents that could end up giving rise to prosecution or a common law claim. 6.2 Individual Risk Criteria

If a single severity of outcome is being considered then very often probability criteria can be used as the basis to benchmark risk. Many countries in the world maintain databases on causes of death to their citizens. These can be analysed. A typical result is shown on the following page. These tables are basically a statement of what a particular community seems to have historically accepted as reasonable. That is, what we as a society are willing to live with. Nuclear authorities usually undertake such studies. They are very interested in where nuclear risk is perceived to lie. The numbers for the NSW figures were prepared by ANSTO (Australian Nuclear Science and Technology Organisation).

Risk & Reliability Associates Pty Ltd

6.1

Risk Criteria From such lists various authorities suggest acceptable frequencies of death for individuals in critical exposed groups. These numbers are in chances per million per year. That is, the chances, on average, of being struck and killed by lightning in NSW is one in ten million per year or alternatively, for an individual, once in every ten million years. Voluntary Risks (average to those who take the risk) Smoking (20 cigarettes/day) 1. all effects 2. all cancers 3. lung cancers Drinking alcohol (average for all drinkers) all effects alcoholism and alcoholic cirrhosis Swimming Playing rugby football Owning firearms Transportation Risks (average to travellers) Travelling by motor vehicle Travelling by train Travelling by aeroplane accidents Risks averaged over the whole population Cancers from all causes total lung Air pollution from burning coal to generate electricity Being at home-accidents at home Accident falls Pedestrians being struck by motor vehicles Homicide Accidental Poisoning total venomous animals and plants Fires and accidental burns Electrocution (non-industrial) Falling objects Therapeutic use of drugs Cataclysmic storms and storm floods Lightning Strikes Meteorite strikes Risks to Individuals in New South Wales (from NSW Department of Planning, 1990)
Source: Edited from D J Higson, Risks to Individuals in NSW and Australia as a Whole, Australian Nuclear Science and Technology Organisation, July 1989

Chances of fatality per million person years 5000 2000 1000 380 115 50 30 30

145 30 10

1800 380 0.07-300 110 60 35 20 18 0.1 10 3 3 2 0.2 0.1 0.001

6.2

Risk & Reliability Associates Pty Ltd

Risk Criteria Such data can also be represented in a triangle type diagram, sometimes referred to as "the dagger diagram". The two key levels seem to lie around road death statistics and the chances of being struck by lightning. In simple terms, it seems that if we believe something is more dangerous than driving a car then the risk is unacceptable (about one chance in 10,000 per year), but that if it about as likely as being struck by lightning (about one chance in 10 million per year), then it is probably so low that we don't expect anyone to do anything about it. In the range between these two figures cost benefit studies to reduce the risk to as low as reasonably practicable is appropriate

Risk Categories

Levels of Risk Acceptability

Typical Quantification Values

Intolerable; risk cannot be justified except in extraordinary circumstances 10 -4 per year

II

Undesirable; tolerable only if reduction is impractical or if cost is grossly disproportionate to the improvement gained

Car Accident Death Rate

10 -5 III Tolerable if the cost of reduction would exceed the improvement gained 10 -6 IV Broadly Acceptable Negligible risk

per year Limit for WA EPA

per year

Objective for NSW DoP 10-7

per year

Acceptable

Trivial risk

Lightning Strike Death Rate Objective for Vic VWA

Risk Levels for Individuals in a Critically Exposed Group


Diagram (without quantification) appears in IEC 61508 as figure B1

Many organisations are now emphasising the risk criteria of tolerance rather than acceptance. To tolerate risk means that risk is not regarded as negligible, meaning that it can be ignored. Rather, it must be kept under review and reduced still further to the negligible level if and when this becomes practical. The key element is the process by which it is demonstrated that all practicable measures have been taken to reduce risk levels to a minimum. The Victorian WorkCover Authority, the NSW Department of Planning and the Western Australian Environmental Protection Authority (EPA) have defined individual risk levels. Other Australian States tend to utilise one or other of these criteria when assessing individual and/or societal risk. A summary of criteria used in Australia and New Zealand is described in Chapter 13, Process Industry.

Risk & Reliability Associates Pty Ltd

6.3

Risk Criteria For example, the NSW Department of Planning has published an advisory paper "Risk Criteria for Land Use Safety Planning" (June 1992) that outlines the criteria by which the acceptability of risks associated with potentially hazardous developments will be assessed. The table below summaries the criteria for the individual fatality risk for new installations.
Risk Level -6 0.5 x 10 pa -6 1.0 x 10 pa -6 5 x 10 pa 10 x 10 pa -6 50 x 10 pa
-6

Land Use Hospitals, schools, child care facilities, old age housing Residential, hotels, motels, tourist resorts Commercial developments including retail centres, offices and entertainment centres Sporting complexes and active open spaces Industrial

Individual Fatality Risk-New Installations 6.3 Societal Risk Criteria

As the severity of the event increases, we appear to become more risk averse. Particularly, once the death threshold is passed, it appears the community has a much greater aversion to multiple fatality incidents. Authors such as Wiggins (1984) in the USA have noted that the dollars Congress spends per life saved for a coalmine disaster or aircraft collision is much higher than the dollars spent to save a life on the road. In many countries this seems to amount to a one hundred-fold decrease in the likelihood of the event for a ten-fold increase in the severity of the consequence measured in fatalities. This is shown in the Netherlands criteria below. Societal risk analysis combines the consequence and likelihood information with population information. This is presented as a F-N plot, which indicates the cumulative frequency (F) of killing 'n' or more people (N).
10 -3 Netherland Unacceptable Limit

-4 10 Frequency of N or 10-5 more fatalities per year -6 10 -7

ALARP (As low as reasonbly practicable)

10

-8 10

Netherland Acceptable Limit 1 10 100 1000 Number of Fatalities (N)

Societal Risk Criteria


as reported by the NSW Department of Planning (1990)

There also appear to be occasions dealing with very severe events where the consequence of the outcome is deemed to be so high that it is just politically unacceptable. Societal risk criteria have been proposed by a number of authorities including the Victorian WorkCover Authority and the NSW Department of Planning. Again these are described in more detail in Chapter 13.

6.4

Risk & Reliability Associates Pty Ltd

Risk Criteria For example, societal risk criteria for public safety relating to hazardous industries have not been formally established and publicised in Victoria. There is currently a set of draft criteria issued by the Victorian WorkCover Authority (VWA), which is used by Government Authorities involved in Land Use Planning. This criterion was used as part of the Technica Ltd, Risk Sensitivity Analysis for the Altona Petrochemical Complex and Environs, October 1997. The document establishes criteria for societal risk in the form of a log-log F-N plot that results in two parallel lines defining three zones: a) b) c) above the acceptable limit the societal risk level is not tolerable between the acceptable and negligible limits the societal risk level is acceptable but if the perceived benefits gained by the activity are not high enough, some risk reducing measures may be required. Risk should be "as low as reasonably practicable" (ALARP). below the negligible limit, the societal risk level is acceptable, regardless of the perceived value of the activity.
10 -2 Risk Unacceptable

10

-3

Frequency of N or -4 10 more fatalities per year -5 10 -6 10 -7 10

Risk Acceptable but remedial measures desirable Risk Negligible 1 10 100 1000 Number of Fatalities (N)

Victorian Societal Risk Criteria 6.4 Environmental Risk Criteria

Unlike OH&S risk assessment in which all evaluations have a common denominator, namely human exposure, environmental risk assessment has a much broader and complex scope with a substantial increase in the number of uncertainty characteristics. 6.4.1 Wright's Criteria

Wright (1993) describes several factors which need to be recognised. * * * * * * * * ecosystems are complex, open and dynamic the time-scale to cause measurable impact or recovery from impacts may be longer than human life persistent materials which are bio-available, and have the potential to bio-accumulate should be avoided, discharge will cause irreversible net change the relative scale of the environmental impact must be considered in all environmental dimensions (spatial, temporal etc) the ecosystem has inherent or built-in variability and recoverability cause and effect relationships are often difficult to measure interdependency exists between different eco-sub-systems acceptability of risks to the environmental resources is dependent on human values. 6.5

Risk & Reliability Associates Pty Ltd

Risk Criteria There is also the problem of synergistic effects. This means, for example, that two chemicals which are individually inert in the environment, interact to cause damage. Wright also suggests that it is possible to calculate the likelihood and size of accidental or intermittent releases and then make a judgement on what the consequences of such releases would be. The table of consequences is shown below:
Consequence Type Catastrophic Description Irreversible alteration to one or more eco-systems or several component levels. Effects can be transmitted, can accumulate. Loss of sustainability of most resources. Life cycle of species impaired. No recovery. Area affected 100 km2 Alteration to one or more eco-systems or component levels, but not irreversible. Effects can be transmitted, can accumulate. Loss of sustainability of selected resources. Recovery in 50 years. Area affected 50 km2. Alternation/disturbance of a component of an eco-system. Effects not transmitted, not accumulating or impairment. Loss of resources but sustainability unaffected. Recovery in 10 years. Temporary alteration or disturbance beyond natural viability. Effects confined < 5000 m2, not accumulating. Resources temporarily affected. Recovery < 5 years. Alteration or disturbance within natural viability. Effects not transmitted, not accumulating. Resources not impaired.

Very Serious Serious Moderate Not detectable

Environmental Consequences In the context of a risk diagram:

Frequency per year 1 -1 10 10 Likelihood


-2

Accidental and Intermittent Release

Intolerable Risk Level "As Low As Reasonably Practicable" (ALARP) Region

-3 10 -4 10 -5 10 -6 10 Negligible Risk Level

Design/Operation Risk Level

Not Detectable Moderate Serious

Very Serious

Catastrophic

Consequence
Risk Levels for Accidental Releases to the Environment

6.6

Risk & Reliability Associates Pty Ltd

Risk Criteria 6.4.2 Inter-governmental Agreement on the Environment (Feb 1992)

The 'Precautionary Principle' has been adopted by the Inter-governmental Agreement on the Environment (1992) between the Commonwealth and the States as a cornerstone of Australian environmental policy. The principle expressed in the IGAE is: Where there are threats of serious or irreversible environmental damage, lack of full scientific certainty should not be used as a reason for postponing measures to prevent environmental degradation. In the application of the precautionary principle, public and private decisions should be guided by: (i) (ii) careful evaluation to avoid, wherever practicable, serious or irreversible damage to the environment; and an assessment of the risk-weighted consequences of various options.

This principle apparently had its origins in Germany's democratic socialist movement in the 1930's and gained acceptance through the 1970's and early 1980's as a powerful corporate governance tool, significantly reducing the instances of imprudent business practices and adding strength to the world's rapidly developing securities' markets. The significance of an intergovernmental agreement relates to the Australian constitution and that the original six Australian states existed before federation. Unless the constitution specifically provides for powers being exercised by the federal government, the residual powers remain with the states. So in order to obtain a consistent national outcome for matters that lie outside the constitution an intergovernmental agreement must be obtained. 6.5 Insurance Criteria

Depending on the nature of the event, the insurance approach can provide certain benchmarks or criteria.

Relative Likelihood of Consequence

Public Liability

Uninsured

Workers Compensation

Property Insurance

Re-insurance

Maintenance

OH&S

Fire & Explosion

Catastrophic

Relative Severity of Consequence

Risk Diagram Showing Some Insurance Regimes

Risk & Reliability Associates Pty Ltd

6.7

Risk Criteria There are presently about thirteen different definitions of property loss expectancy used throughout the world, each with subtle definitions and variations. The reason for this plethora appears primarily to derive from the history of the organisations using them. Once a company has established an underwriting tradition it is difficult to change the definitions without seriously complicating the individual underwriters attitudes and that of the re-insurers towards the underwriters. Perhaps not unnaturally, there appears to be an observable trend with loss estimates that the more conservative the underwriter the more severe the loss estimate criteria will be. This is particularly noticeable with re-insurers' definitions. In the case of workers compensation insurance, almost all Australian jurisdictions have different criteria for claim thresholds. 6.6 Ethical Criteria

The Codes of Ethics of most professional societies contain certain performance criteria, which are supposed to apply to the members. The UK Engineering Council adopted in 1993 the following statement that Engineers Australia (1993) picked up in a more diluted form. The small print on the back of the UK brochure stated: The Engineering Council expects registrants to adhere to good engineering practice wherever and whenever possible and considers that this code of professional practice will assist registrants in achieving this standard. Registrants should be aware that non-compliance with the provisions of this code might be relevant when considering professional disciplinary matters although adherence to this code will be regarded as demonstrating good practice, which could provide the best protection against such action. While a failure to adhere to the provision of this code by an individual registrant may not necessarily amount to negligence or a breach of an applied contractual term by that registrant, such failure may be evidence of an infringement of the Councils rules of conduct, which could lead to disciplinary proceedings. The ten-point code on professional practice on risk issues is: i. ii. iii. iv. v. vi. vii. viii. ix. x. professional responsibility exercise reasonable professional skill and care law know about and comply with the law conduct act in accordance with codes of conduct approach take a systematic approach to risk issues judgement use professional judgement and experience communication communicate within your organisation management contribute effectively to corporate risk management evaluation assess the risk implications of alternatives professional development keep up to date by seeking education and training public awareness encourage public understanding of risk issues.

The point to note is that the law is second on the list, and that the statement in italics is quite clear that if a registrant fails to adhere to the code, then he or she is on their own.

6.8

Risk & Reliability Associates Pty Ltd

Risk Criteria REFERENCES Commonwealth of Australia (1992). Intergovernmental Agreement on the Environment. Engineers Australia (1993). Dealing with Risk. Engineers Australia, Canberra. The Engineering Council of the United Kingdom (1993). Code of Good Practice for Dealing with Risk. Higson D J (1989). Risks to Individuals in New South Wales and in Australia as a Whole. Nuclear Safety Bureau, Australian Nuclear Science and Technology Organisation. NSB Report 2/1989. International Standard on Functional Safety, IEC-61508-5 Functional Safety Systems of electrical/electronic/programmable electronic safety-related systems- Part 5 Examples of methods for the determination of safety integrity levels, July 1998. NSW Department of Planning (1990 and 1992). Risk Criteria for Land Use Safety Planning. Hazardous Industry Planning Advisory Paper No. 4. Technica Ltd (1997). Risk Sensitivity Analysis for the Altona Petrochemical Complex and Environs. Western Australia EPA document: Guidance for the Assessment of Environmental Factors, Risk Assessment and Management: Offsite Individual Risk from Hazardous Industrial plants, No.2 (Interim July 1988) Wiggins J H (1984). Risk Analysis in Public Policy. Proceedings of Victoria Division, Engineers Australia, Risk Engineering Symposium 1984: Engineering to avoid Business Interruption. Wright N H (1993). Development of Environmental Risk Assessment (ERA) in Norway. Norske Shell Exploration and Production. READING Engineers Australia (1990). Are You at Risk? Engineers Australia, Canberra. Fernandes-Russell, Delia (1988). Societal Risk Estimates from Historical Data for UK and Worldwide Events Research Report No. 3. Environmental Risk Assessment Unit, School of Environmental Sciences, University of East Anglia Norwich, UK. Health and Safety Commission, UK (1991). Major Hazard Aspects of the Transport of Dangerous Substances. Report and Appendices of the Advisory Committee on Dangerous Substances London, HMSO. Health and Safety Executive, UK (1988). The Tolerability of Risk from Nuclear Power Stations. London, HMSO. Higson D J (1990). Nuclear Safety Assessment Criteria. Nuclear Safety, Volume 31, No. 2, April/June 1990, pp 173-186. Lees F P (1995). Loss Prevention in the Process Industries. 2nd Edition. Butterworth- Heinemann Ltd, Oxford, UK (3 Volumes). Muspratt M A & R M Robinson (1991). Ethics and their Environment. Proceedings of the Annual Conference, Hobart. Engineers Australia. NSW Department of Planning, Sydney (1989). Environmental Risk Impact Assessment Guidelines Hazardous Industry Planning Advisory Paper No. 3.. NSW Government (1993). Total Asset Management Manual - Risk Management. Public Works Department, November 1993. Warren Centre for Advanced Engineering (1986). Major Industrial Hazards. The University of Sydney. Risk & Reliability Associates Pty Ltd 6.9

Top Down Techniques

7.0

Top Down Techniques

This chapter focuses on the top down view of downside risk or vulnerabilities. Further discussion on the upside risk or value addeds is contained in Chapter 3.3, Risk and Opportunity. Ranking combinations of upside and down side risk is covered in Section 8.4, Integrated Investment Ranking. Two high level or top down techniques appear common. Vulnerability techniques derived from the military intelligence community and SWOT (Strengths, Weaknesses, Opportunities and Threats) from the commercial sector. Conceptually the two overlap as shown in the augmented diagram below. 7.1 SWOT Assessments

The SWOT analysis interpreted from a risk perspective provides insight into Liabilities as established by Vulnerabilities (the risk of loss), and Rewards identified by Value Adding (the risk of gain).

External / Internal Factors

Opportunities

Threats

Value Addeds

Strategy

Vulnerabilities

Strengths

Weaknesses

Organisation
Augmented SWOT Process 7.2 Upside and Downside Risk

It should be noted that many risk decisions have simultaneous upside and downside risk elements. 7.2.1 Business Risk Market risk is an obvious form of business risk with both upside (speculative) and downside (pure) risk implications. 7.2.2 Clinical and Military Risk Decisions Different clinical procedures can also entail a mix of risk outcomes. Take the crude example of a traumatic leg injury. Amputation will almost certainly save the life of the patient but at a price of reduced mobility. Saving the leg is possible but with an increased risk of gangrene. Which procedure should be adopted? If a downside risk assessment only were considered then the leg would almost certainly be amputated. Military decisions also have this two-sided element. The best immediate course of action (COA) might be very chancy but could reduce the conflict to days rather than years. Is it better to take the chance or to play it safe and prolong the conflict? 7.2.3 Project Risk Decisions Project risk provides another interesting insight. In this case the upside risk is assumed in the proposal. The risk analysis generally focuses on those issues which will prevent the assumed upside benefits from being achieved. That is, it is a downside risk assessment process from an assumed upside risk position. This is discussed further in Section 7.5, Project Risk Profiling.

Risk & Reliability Associates Pty Ltd

7.1

Top Down Techniques 7.3 Vulnerability Assessments

The diagram below outlines a generic vulnerability assessment technique that is used very widely to assess and propose appropriate solutions to risks that affect most organisations. This technique is something that is used by military intelligence, strategic planners, public affairs risk analysts, project managers as well as risk engineers. The central concept is to define the assets of the business and all the possible threats to them. The organisations Critical Success Factors can also be considered to be the organisations assets. The threats are then systematically matched against the assets to see which is vulnerable to each threat. Only the assessed vulnerabilities then have control efforts directed at them. This prevents the misapplication of resources to something that was really only a threat and not a vulnerability.
Assets (Critical Success Factors) Public image and confidence Capability to perform an organisations function Physical resources and facilities Personnel resources Customer loyalty Threats Smoke, fire, explosion Natural hazards (rain, snow, wind, earthquake etc.) Critical plant failure Failure of a major supplier Sabotage, acts of aggression Vulnerabilities (Assets exposed to Threats) Physical (e.g. buildings vulnerable to fire, money to theft, equipment to sabotage, product to contamination) Personal (e.g. personnel to injury/vehicle accident, chemical exposure, discrimination, terrorism) Public Relations (e.g. corporate image to pollution, product fault, fraud and corruption), Financial (e.g. assets to currency, market or interest rate changes) Management Strategies Risk Control (Design, Administration, Training) Risk Avoidance Risk Transfer Risk Acceptance

Generalised Vulnerability Assessment Technique It is important that the identified threats are credible. For example one would not list earthquake as a credible threat in a region, which is not in an earthquake region. Nor would terrorism normally be a credible threat to the building of a new production facility for jam in a rural location. The power of the vulnerability technique lies in its potential to provide a completeness check. For example, if all the critical success factors for an enterprise are declared, and all the primary credible threats identified then no unexpected vulnerabilities should impact the organisation. However, if one credible vulnerability is overlooked then an unexpected event can occur out of the blue.

7.2

Risk & Reliability Associates Pty Ltd

Top Down Techniques The vulnerability process can also be shown as a simple flow chart.

Vulnerability Assessment Process The power of the process rests on the fact that whilst there may be a large number of identified assets to be protected against a large number of threats, the actual number of critical vulnerabilities is usually quite small, typically around 10% of the intersections of a typical asset/threat matrix. Critical vulnerabilities are explained further in Section 7.3.5. The weakness of the technique is that it often identifies areas of strategic concern rather than particular risk issues and precautions.

Risk & Reliability Associates Pty Ltd

7.3

Top Down Techniques The figure below shows the vulnerability technique as a flow chart for computer risk assessment.

Objectives of the Organisation

No

Is the computing facility essential to the maintenance of the objectives?

Yes What parts are essential? Payroll, accounting . . . Yes Can these essential services be done elsewhere? No Adequate protection against disaster possibilities essential. Define disaster period?

Is insurance enough to cover cost of outside operations, replacement of equipment and non-essential services?

Vital points identified, processors, power supplies, air conditioning . . .

More insurance required?

Threats identified, fire, water damage, power failure, sabotage . . .

Increased insurance Yes Is protection commensurate with insured levels?

Vulnerable? Do threats expose vital points ?

No

No

Is protection adequate and appropriate?

Yes

Cost effective recommendations?

Business Interruption Insurance largely ineffective. Should such premiums provide funds for physical protection.

Implementation

End

Flow Chart of the Asset and Threat Technique Applied to Computer Risk Assessment The vulnerability approach is used in the Information Security Standard AS/NZS 4444.2:2000. Step 3 entitled, Undertake a Risk Assessment, comprises the steps of: Threats Vulnerabilities Impacts

7.4

Risk & Reliability Associates Pty Ltd

Top Down Techniques 7.3.1 Assets

Lists are the most common way of establishing assets. For example, the Australian Risk Management Standard (AS 4360:1999) lists possible areas of impact as: a) b) c) d) e) f) g) h) i) j) Asset and resource base of the organisation, including personnel. Revenue and entitlements Costs of activities, both direct and indirect People Community Performance Timing and schedule of activities The environment Intangibles, such as reputation, goodwill, quality of life Organisational behaviour

Dependency trees can also be used for such an assessment. The example below sets out the key assets from the viewpoint of an airline that perceives its business to be that of moving paying passengers by air.
Flying Paying Passengers

Serviceable Aircraft

Trained Aircrew

Passengers

Serviceable Airports

Reservation Systems

Passenger Terminals

Trains, Taxis, Carparks

Computers & Software

Trained Operators

Dependency Tree Diagram of an Airline Each of these sub-assets could then be examined for their vulnerability to each of the listed threats. All these approaches assume that the analyst has a clear view of what the business of the organisation actually is, something that is not always easily achieved. It is very difficult to undertake a risk analysis if the organisation concerned cannot clearly state its business at the outset. There are various ways that a vulnerability assessment can be made including desktop studies, workshops, hiring specialist consultants or combinations of these.

Risk & Reliability Associates Pty Ltd

7.5

Top Down Techniques 7.3.2 Threats

The second task, after identification and assessment of assets, is identification and assessment of threats to these assets. Threat, as used in this section, refers to any occurrence or activity that could destroy a business asset or reduce its value or business effectiveness. (Where some disciplines use the term threat in this way, others would prefer to use terms like hazard or risk) The type and degree of protection required for different assets will depend on the nature and likelihood of the threat and how vulnerable that asset is to those threats. The security appropriate to bomb threats, for example, is obviously different to that required regarding product extortion. The issue to be considered is: what particular credible threats exist or could arise to the identified assets and which of these threats are significant? A sample Threat Checklist is shown below.

Threats to Treasury & Finance Credit squeezes Liquidity issues Customer payment defaults Exchange fluctuations Funding sources failure Interest rate fluctuations Threats to Assets Fire Earthquake Flood Explosion Critical plant failure Malicious damage Threats of Business Interruption Industrial action Political/Civil upheaval Picketing/Demonstrations/Boycott Bomb Threat Bomb "Hoax" Malicious Damage/Sabotage Threats to Information Industrial Espionage Takeover Sabotage of data Threats to Company Reputation Scandal (eg, frauds, business or political) Product Fault or Contamination Environmental pollution

Threats to Company's Competitive Edge Professional incompetence Failure to best practice Failure to continuously improve Poor public image Threats to Product Product Extortion Collusive Theft Pilferage Contamination Threats to Staff Discrimination OH&S injury Harassment Threats from Staff Pilferage Theft Fraud Malicious Damage Threats to Cash Robbery Burglary Military Threats Sniper fire Small arms fire Machine gun fire RPG or mortar attack Artillery attack Missile attack Thermonuclear

A Sample Threat Checklist

7.6

Risk & Reliability Associates Pty Ltd

Top Down Techniques 7.3.3 Vulnerabilities

A vulnerability is a weakness with respect to a threat. This weakness may be intrinsic in the asset. For example, a US multinational company is probably more vulnerable to politically motivated attacks than a Swiss company. Product is more vulnerable to theft and fraud if the stock control and accounting systems are dominated by the requirements of the sales department to the detriment of accurate and timely accounting. Or the weakness may be due to the location of the asset. For example, an Australian company in the Middle East is more vulnerable to terrorism than one in Iceland. Confidential information on a meeting room blackboard in an office with some public access is more vulnerable than when it is in a locked cabinet in a manager's private office or a secure registry. Or the weakness may be due to inadequate or inappropriate risk management. For example, a company with no contingency planning for crisis management, public relations fallouts, or disaster recovery is more vulnerable to adverse business impact if certain threats materialise. NB: Vulnerability is used alternatively to refer to the extent of exposure of business or asset to a risk. 7.3.4 Business Impact

Business impact is a form of risk characterisation particularly persuasive in assessing commercial risk. It is the overall cost to the company if threats succeed. Proper assessment of potential business impact is essential in determining the cost-benefit of proposed counter-measures. Commercial vulnerabilities are often characterised by an inability to purchase insurance against them. The quantification of commercial vulnerabilities is necessarily less scientific as human nature appears to much greater significance. Many organisations create a Group Risk Profile. This is discussed further in Section 3.3 Risk and Opportunity. This provides consideration of the major recognised balance sheet, off balance sheet strategy performance and operational performance together with procedures for their day-to-day management. The key issues are to establish the nature of the perceived vulnerability quantified in terms of possible dollar impact and return period. How much would the counter-measure cost to implement and maintain? How much risk reduction would this achieve? How does this compare with the maximum foreseeable loss that could result if the measure was not introduced and threats succeeded? Such an approach can direct attention to revenue concentration for example. Any business that obtains more than 25% of its income from a single source or contract can be subject to major profit fluctuations if that source abruptly stopped. To ensure a steady dividend stream it may be desirable to retain profits to offset against the possible loss of income, or to use such retained funds to diversify the income stream. Business impact should include human cost, that is, suffering, anguish, anxiety, stress, and the like, which staff, members of the public, and associated families would experience - not just loss measurable in dollars. Good corporate citizens and managers should be motivated by normal human values, not just "economic rationalism". (Although in legal cases where injured parties sue for damages due to negligence, dollar values will be put on such things.) It is necessary also to consider consequential or indirect costs as well as direct costs. For example, it may only cost thousands of dollars to replace a contaminated product, even less if it is covered by insurance. But the loss of market share and business reputation may be far more important. Consequential damage includes such things as: business interruption loss of market share or competitive edge fines due to incidental pollution resulting from fire, explosion, or malicious damage

Risk & Reliability Associates Pty Ltd

7.7

Top Down Techniques Consequential damage can result also if a breach of security causes such things as: strikes legal liability government regulation deterioration in relations with staff, unions, neighbourhood, government, media / public Sometimes security itself can be the cause of poor staff and union relations if it is inappropriate, or insensitively implemented. A common example is the inept use of baggage inspections or searches as a counter-measure against pilferage. Assessing business impact is a collective task. A manager cannot do it effectively without the assistance of other managers of specialist functions. Virtually all other functions are involved in assessing business impact in relation to one or other of the company's assets. Obviously insurance and finance/accounting departments need to be involved, but so too, in many cases, do production/operations, marketing, personnel, industrial relations, public and media relations, and legal departments. 7.3.5 Control Only when risk has been identified and prioritised by assessing assets, threats, vulnerabilities, and potential business impact, can appropriate control options be identified and appraised. 7.3.6 Workshops

One of the most successful methods of obtaining consensus on the relative importance of vulnerabilities, characterising risk, establishing control options and creating an action list is to use an asset and threat matrix in a workshop with relevant managers. There are various possibilities but a common approach is a two-stage workshop shown below.

Asset ID Credible Threats


Credible Vulnerabilities

Stage 1

Criticality Assessment

Stage 2

Critical Credible Vulnerabilities


Statutory and Regulatory Compliance Common Law "Due Diligence" Investment Payback Criteria Insurance Criteria

Possible Precautions

Risk Analysis

Recommendations and/or Residual Risk Allocation

Vulnerability Workshop Process As discussed in the Liability chapter (Chapter 4), senior decision makers and the courts require a demonstration that all practicable reasonable precautions are in place. The underlying issue is that if something untoward occurs the courts immediately look to establish (with the advantage of 20:20 hindsight) what precaution/s that should have been implemented werent. Risk is not strictly relevant since, after the event, likelihood is not relevant. It has happened. As an Australian judge has been reported as noting to the engineers after a recent train incident: What do you mean you did not think it could happen, there are seven dead.

7.8

Risk & Reliability Associates Pty Ltd

Top Down Techniques Hence the notion of risk is really only used to test the value of the precaution it is claimed ought to have been in place. How risky a situation is before the event is not germane. 7.3.7 Criticality Assessment

One of the simplest ways to address this is to undertake a preliminary criticality analysis. Prior to the Stage 2 workshop, the assets and threats of concern to the organisation are developed into a matrix form. A preliminary criticality determination is made using the values in the table below. xxx xx x va Critical potential vulnerability that must be (seen to be) addressed Moderate potential vulnerability Minor potential vulnerability No detectable change Possible value adding Criticality Scoring System If this is correctly done then around 10% or so of the cells will have three xs. This is the Pareto principle. Typically 80% to 90% of the risk comes from 10% to 20% of the vulnerabilities. Dealing with these to 10 to 20% is the primary purpose of the analysis. A very simple example result from a first stage is shown in below. ASSETS > THREATS Technical Failure Community Issues Political (change of government) Credit Squeeze Flood Reputation xx x xxx x Sample Vulnerability Matrix Many analyses in fact stop at the criticality stage. Provided there are cogent arguments explaining why all critical vulnerabilities are being managed, then further analysis is often not required, at least from a liability perspective. In a sense the critical vulnerabilities are the top consequence scores in a risk characterisation matrix as shown below. The next section considers risk characterisation in greater detail. Operability xx x xxx xxx Staff xx xx x xx xx

LIKELIHOOD Almost Certain

x x

x x x

Likely B

H H M H L L L
1

Possible Unlikel

D
E

Rare

E E H E E M H E E L M H E L M H H
2
Moderat

Insignifican

Minor

Major Catastrophic

CONSEQUENCE

5 x 5 Risk Characterisation Matrix Showing xxx Criticality Consequence Values Risk & Reliability Associates Pty Ltd 7.9

Top Down Techniques However critical vulnerabilities (xxx) can be analysed further in a number of ways. This depends on the nature of the analysis. Profiling enterprise risk using the risk matrix approach is very popular and is described further in ensuing sections. However other techniques can be used depending on the nature of the issue. A sample vulnerability matrix for a business is shown below. In this case scores are out of 10. The sum of the scores in the columns indicates the best collective belief of that organisation as to the key assets that are most susceptible to possible threats. The sum of the scores in the rows indicates the belief as to the most serious threats the organisation faces. The highest individual scores represent critical areas of vulnerability that should be addressed.
ASSETS >>> THREATS
Chemical (fire, explosion, poisons) Bomb Statutory non-compliance Pollution (oil spills, fires, dang. goods releases) Spill Malicious damage and contamination Biomechanical (incl personal injury) Scandal (eg, frauds, political involvements) Extortion Picketing/demonstrations Pilferage and Theft Industrial espionage Storm (wind, hail, lightning, floods) Contamination Harassment Alcohol/drugs Suborning of staff for fraud or collusive theft Bomb (threats) and hoaxes Gravitational (falls, falling objects, landslides) Discrimination Electrical Assault Noise and Vibration Defamation Totals 5 6 9 9 9 7 9 8 9 9 9 6 3 10 9 8 4 2 2 0 3 0 2 0 138 8 5 10 6 5 6 10 6 10 10 10 9 2 0 5 2 5 5 2 0 1 0 2 0 119 5 5 5 3 3 3 7 5 5 5 5 1 4 0 5 3 3 1 6 4 4 5 3 2 92 9 4 10 5 4 4 0 7 0 0 0 0 4 0 0 5 0 5 0 0 0 0 0 0 57 2 5 2 2 2 2 2 1 2 2 2 1 2 6 2 1 2 1 2 6 2 4 0 2 55 8 4 0 10 10 0 0 0 0 0 0 0 2 8 0 1 0 0 0 0 0 0 0 0 43 0 3 0 0 0 5 0 0 0 0 0 8 2 0 0 0 4 2 0 0 0 0 0 0 24 0 4 0 0 0 2 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 12 37 36 36 35 33 29 28 27 26 26 26 25 25 24 21 20 18 16 12 10 10 9 7 4 540 Reputation Comp Edge Staff Operability Public Env/ Habitat Information Bldg/ Facility Totals

Sample Vulnerability Matrix of a Business

7.10

Risk & Reliability Associates Pty Ltd

Top Down Techniques 7.3.8 Risk Characterisation

A risk characterisation matrix framework is a very common approach such as that described in Appendix E of the Risk Management standard (AS/NZS4360:1999) and shown below. This appears to have been adapted from earlier military work (U.K. Ministry of Defence,1996 and U. S. Department of Defence, 2000, both revised versions of earlier standards). Such a matrix can be greater or less than 5x5 matrix on either scale. 7x5 is common for very large organisations and 4x3 or 2x2 for small projects. Other systems use a 1 to 5 category for both likelihood and consequence.
LIKELIHOOD
Almost Certain

A B C D E

Likely

Possible

H H E E E M H H E E L M H E E L L
1

E = extreme risk; immediate attention required H = high risk; senior management attention required M = moderate risk; management responsibility must be specified L = low risk; manage by routine procedures

Unlikely

Rare

L L
Minor

M H E M H H
Moderate

Insignificant

Major Catastrophic

CONSEQUENCE

Example of Risk Definition and Classification (after AS 4360:1999) Three methods of risk presentation are possible and are shown below. The first is a linear risk profile concept. The second is hyperbolic in nature with the product of the two values being used. The third in effect sums the numbers as logarithms. That is, each number represents a change in the order of magnitude if the scales are log log in nature.

15 10

19 14 9

22

24

25

5 4

10

15

20 16 12
8

25 20 15
10

5 4 Likelihood
3

10

4
Likelihood 3

18 13 8
4 3 3

21 17 12
7 44

23 20 16 11
5 5

4 3 2 1 1

8 6 4
2

12 9 6 3 3

5
4

6
5

7 6 5
4

8 7 6
5

9 8 7 6 5

6 3
1

Likelihood 3
2

2 1

5 2
2 2

2 1

2
1 11

4 2 2 2

4 4

5 5

11

3 3

44

Consequence Severity

Consequence Severity

Consequence Severity

Risk Assessment Charts The hyperbolic ranking system provides for a much greater scatter between identified vulnerabilities. That is, the top score is 25 for both systems, but a vulnerability with a 3, 3 value scores 13 in the linear system but 9 in the hyperbolic system thereby deeming it much less important and therefore demanding less organisation control effort. That is, less time and money will produce substantially greater results. However the linear system indicates exactly where the risk lies since a unique number describes each point on the chart. The use of logarithmic scales seems to resolve a number issues since this ensures that lines of constant risk are created which makes such presentation tools more intuitive and user friendly. This may make the third most mathematically pure but appears to be the least common. The Australian Standard matrix at the top of the page does not appear to be based on either the hyperbolic or the linear system. Risk & Reliability Associates Pty Ltd 7.11

Top Down Techniques 7.4 Enterprise Risk Profiling

Ultimately there must be an enterprise view of how identified risk issues should be characterised. This is appears necessary when there are competing risk agendas and limited capital available. For example, underwriting requirements, environmental issues, RCM requirements and OH&S issues can compete for scarce capital. How can an organisation come to grips with such issues without an overall top down risk framework?
Enterprise Risk Management Business Context Top down

Low level top down or High level bottom up

System

Sub-system

Context

FMECA, HazOp, JSA QRA etc.

Assembly

Bottom up

Component

Enterprise Risk Framework The above enterprise risk framework diagram describes one understanding. When activities are undertaken bottom up, each specialist group comes to an internalised understanding of what is important to the organisation. However, when the risk assessment of the environmental group competes with the risk assessment of the HazOp group and the JSA group for resources a very difficult situation can arise. A high level business risk framework can normalise the value systems of the competing groups saving considerable time and much frustration. 7.4.1 Determining Risk Matrix Values

One simple method for developing the consequence values of the matrix is to consider a loss that would prove catastrophic to the organisation and stepping back in order of magnitude changes from catastrophic to noticeable. The table should reflect the full range of loss values, not just directly measurable items. An example of a consequence table is shown the table below. The loss values can vary for different organisations. The critical aspect is the range of the consequences. This is different for different organisations. Catastrophic may be $1 billion for some companies whereas a $100,000 loss is probably devastating for most domestic situations. Note also that loss of reputation and other intangibles account for the vast majority of loss.

7.12

Risk & Reliability Associates Pty Ltd

Top Down Techniques

Consequence Rating>> Critical Success Factors Reputation & Competitive Edge

1 Noticeable

2 Important

3 Serious

4 Major

5 Catastrophic

Magistrate's Court Action Serious complaint

Local Press County Court Action Adverse ministerial comment in State Parliament

State Press Supreme Court Action Adverse ministerial comment in Federal Parliament $1 M Isolated release of private information Isolated database hacking EPA Fine

National Press Court of Appeals of a Supreme Court OR Federal Court Action

International Press High Court Action

Financial Performance Compliance, Corporate Governance & Information

$10,000

$100,000 Breach of Statutory EPA Regulations

$10 M Successful prosecution for breach of privacy

$100 M Widespread access to confidential records Breach of statutory, regulatory or contractual obligations Ongoing and extensive database hacking or fraud 10 deaths Massive industrial disputes Loss of a major infrastructure facility due to earthquake, etc

Occupational Health & Safety and Environment

Minor injury

Temporary serious injury

Permanent serious injury or disability Ongoing staff harassment or abuse Minor structural damage

1 death Significant structural damage due to fire etc

Sample Consequence Table

Likelihood for an organisation is usually done on a frequency basis, for example: Almost Certain Likely Some Chance Unlikely Rare Once per year Once in 10 years Once in 100 years Once in 1,000 years Once in 10,000 years

Typical Likelihood Values for an Organisation The use of combined logarithmic values for each scale provides for lines of constant risk.

Risk & Reliability Associates Pty Ltd

7.13

Top Down Techniques Each critical vulnerability can then be placed on the risk matrix as shown below. The summary of all the dots on the matrix is in fact the unmitigated risk profile for the subject organisation. As noted in section 4.4, Due Diligence, the final decision for action is individual to an organisation. But a process like this makes it transparent to any whom wish to know, whether it be shareholders, judge or jury, or regulator.
LIKELIHOOD
Almost Certain

A B C D E

Likely

Some Chance

Unlikely

Rare

H M L L L
7

H H M L L
2 5

E
1

H H M M
3

E E E H H
6

E E E E H
Catastrophic

Noticeable

Important

Serious

Major

CONSEQUENCE

Sample Risk Profile One of the simplest approaches is to place a dot on the current risk position as shown above and another on the revised location after proposed the risk control is in place, as shown below. An immediate payback can then be visually seen.
LIKELIHOOD Almost Certain

A B

Likely

Some Chance

C D
E

H H E E E M H H E E L M H E E
1R
6R 4R

Unlikely

Rare

L L
7R

3R

L L
Important

2R

M H E M H H
5R

Noticeable

Serious

Major

Catastrophic

CONSEQUENCE

Sample Residual Risk Profile Residual risks (those that remain after risk mitigation) can and should be classified. The categories given in AS(IEC) 61508:2000 are instructive. Class I (intolerable except in extraordinary circumstances) Class II (undesirable unless risk reduction is impracticable or the cost of reduction would exceed the improvement gained) Class III (tolerable if the cost of risk reduction would exceed the improvement gained) Class IV (broadly acceptable negligible risk) Class V (acceptable trivial risk) Class I is broadly equivalent to the Extreme category in the Australian Standard. Class II is broadly equivalent to the High category. Class III is equivalent to the Moderate category whilst the remaining two classes equate to the Low category. Hence all residual risks should be Class III, IV, or V, that is, Moderate or Low using the Australian Standard risk terminology.

7.14

Risk & Reliability Associates Pty Ltd

Top Down Techniques 7.5 Project Risk Profiling

Projects to have an interesting conceptual risk profile. The upside risk position is assumed in the proposal. The risk analysis generally focuses on those issues which will prevent the assumed upside benefits from being achieved. That is, it is a downside risk assessment process from an assumed upside risk position. Again the vulnerability approach can be used as shown below.

Flow Chart for Project Vulnerability Assessment

Risk & Reliability Associates Pty Ltd

7.15

Top Down Techniques The analysis can be done at any stage in the projects life cycle depending on the projects nature. Such a life cycle is shown below.
STAGES OF THE PLC Conceive Design Plan Allocate Execute Deliver Review Support ROLES FOR RISK ANALYSIS Identifying stakeholders and their expectations Identifying appropriate performance objectives Setting performance criteria Assessing the likely cost of a design Identifying and allowing for regulatory constraints Determining appropriate levels of contingency funds and resources Evaluating alternative procurement strategies Determining appropriate risk sharing arrangements Identifying remaining execution risks Assessing implications of changes to design or plan Identifying risks to delivery Assessing feasibility of meeting performance criteria Assessing effectiveness of risk management strategies Identifying of realised risks and effective responses Identifying extent of failure liabilities Assessing profitability of the project

Applications of Risk Management in the Project Life Cycle


(adapted from Project Risk Management, Chapman and Ward, 1997)

If done on a 5x5 matrix, risk characterisation requires further consideration. For a project, likelihood is usually done on a probability rather than a frequency basis since the likelihood is related to the project which may extend over many years, for example:
Almost Certain Likely Some Chance Unlikely Rare 100% chance of occurrence during the project 30% chance of occurrence during the project 10% chance of occurrence during the project 3% chance of occurrence during the project 1% chance of occurrence during the project

Typical Likelihood Values for a Project To ensure lines of constant risk the consequence scale thus also needs to be (semi) logarithmic.
Project Delivery Financial Performance Occupational Health & Safety Environmental Consequence Rating 1% time overrun 1% budget over-run Minor injury 3% time over-run 3% budget overrun Temporary serious injury EPA Reportable incident 2 Important 10% time over-run 10% budget overrun Permanent serious injury or disability 3 Serious 30% time over-run 30% budget over-run 1 death 100% time over-run 100% budget over-run Multiple deaths Major spill or bushfire 5 Catastrophic

1 Noticeable

4 Major

Typical Consequence Values for a Project If the project delays and costs can be usefully characterised then contingency sums and delays can be estimated. This can be simply done by calculating the loss expectancy of the residual risks and then summing these. For example, wet weather is estimated at a 50% chance of 6 days. The average wet weather loss expectancy is then 3 days for the project. Such an approach assumes that each risk being considered is discrete. That is, the loss events do not overlap.

7.16

Risk & Reliability Associates Pty Ltd

Top Down Techniques REFERENCES Chapman C and Ward S (1997) Project Risk Management, John Wiley and Sons, Chichester U.K. Table is from page 27. Department of Defense (USA), Standard Practice for System Safety, MIL-STD-882D, 10 February 2000. Ministry of Defence (UK), Safety Management Requirements for Defence Systems, Part 1: Requirements, Defence Standard 00-56(PART 1)/Issue 2, 13 December 1997. Standards Australia, Australian Risk Management Standard (AS 4360:1999) Standards Australia/International Electrotechnical Commission AS/IEC 61508:2000. Functional Safety of Electrical/ Electronic / Programmable Electronic Safety-related Systems. Standards Australia/Standards New Zealand (1999). Risk Management. Australian/New Zealand Standard AS/NZS 4360:1999. Standards Australia/Standards New Zealand (2000). Information Security Management. Australian/New Zealand Standard AS/NZS 4444.2:2000. READING Grey Stephen (1995). Practical Risk Assessment for Project Management. John Wiley & Sons, Chichester, UK. Robinson Richard M, Gaye E Francis, Kevin J Anderson (2003). Lessons from Cause-Consequence Modelling for Tunnel Emergency Planning. Proceedings of the Fifth International Conference on Safety in Road and Rail Tunnels. University of Dundee. pp 149-158. ISBN 1 901808 22 X.

Risk & Reliability Associates Pty Ltd

7.17

Ranking Techniques

8.
8.1

Ranking Techniques
Risk Registers

A Risk Register is an action list of identified problems ranked by risk criteria. The nature of a register varies according to the techniques by which the problems were identified and the manner of the risk characterisation. Common risk registers include Vulnerability, HazOp, Hazard (OH&S), FMECA and Property Loss Prevention. They all have a common purpose: to establish tactical and strategic weaknesses so that they can be managed before they manifest themselves as real pain to an organisation. Accordingly, they have many similarities especially in the methods of risk characterisation. 8.1.1 Vulnerability Registers

A Vulnerability Register is derived from a top down process. It is described in detail in Chapter 7.3, Vulnerability Assessments. In summary, the process requires that critical success factors be identified for an enterprise (the assets). A list of potential threats is then developed. Assets that are vulnerable to threats can have a risk characterisation (business impact assessment) made to establish priorities. The primary benefit of such a process is that real resources are only spent on vulnerabilities rather than threats. The primary weakness of such an approach is that the identified vulnerabilities can be merely areas of concern and insufficiently precise to ensure that action can be targeted effectively. 8.1.2 HazOp Risk Registers

A HazOp (Hazard and Operability) risk register is derived form a bottom up process. It is described in detail in Chapter 10, Bottom Up Techniques. In summary, the process requires that a detailed functional statement of a contract, project or process be available. Each functional element is examined using a series of predetermined guidewords to see if its failure will cause problems. If so, action is proposed. The principal benefit of a HazOp process is that it is very specific, and the benefits of corrective action can be easily seen. The primary weakness is that it may fail to spot problems, which result from simultaneous failures, so called "common cause", or common mode failures, which can have serious liability implications. 8.1.3 FMECA Registers

A FMECA (Failure Modes, Effects and Criticality Analysis) is another form of bottom up risk assessment, very similar to HazOps, but directed at reliability rather than risk issues, although in practice HazOp and FMECA seem to be pretty much interchangeable. This process is also described in detail in Chapter 10, Bottom Up Techniques. 8.1.4 Hazard (OH&S) Registers

The focus of such studies is obviously human safety and can incorporate a number of the Vulnerability and HazOp techniques. 8.1.5 Property Loss Prevention Registers

Property Loss Prevention Registers are also described in Section 8.3 of this chapter. These focus on Property Loss matters, typically based around assessments, as they would be conducted by the insurance industry.

Risk & Reliability Associates Pty Ltd

8.1

Ranking Techniques 8.2 Ranking Acute OH&S Hazards

Organisationally risk normally follows a hyperbolic profile. Such a view is consistent with the accident risk triangle espoused by Bird and Heinrich. On average it does appear that for a tenfold decrease in likelihood there is a tenfold increase in severity for pure risk events. On log-log paper this is a line of constant risk. This is based on the notion that risk is a function of both severity and frequency and, all other aspects being equal, can be expressed as the product of the two. This means that if it can be shown that the injury severity can be decreased by a factor of ten then its likelihood can be increased by a similar factor and vice versa without changing the overall risk. This concept is shown on a log-log graph as a line at 45 deg and is represented below. 8.2.1 Lines of Constant Risk
1x10 -3 Lines of Constant Risk

1x10 Likelihood of Occurrence 1x10

-4

Higher Risk (Dangerous)

-5 Lower Risk (Safe) 0.1 1 10 Severity of Consequence 100

1x10

-6

Lines of Constant Risk If such a concept is adopted then a simple spreadsheet risk assessment and solution ranking method can be developed. To achieve such a result requires that for each identified hazard an appropriate recommendation is made and the following parameters determined: the likelihood of the event occurring; the anticipated most probable of severity outcome for that occurrence; the probable risk control effectiveness of the proposed recommendation and an estimate of its cost.

8.2

Risk & Reliability Associates Pty Ltd

Ranking Techniques 8.2.2 Spreadsheet based Acute Hazard Quantification

Provided an assessment of the likelihood and consequence severity of a hazard can be made then a simple spreadsheet risk calculator can be devised as shown below.
Proposed Measure Provide foam padding Likelihood per year 1 Consequence Severity 25 Control Effectiveness 90% Control Cost $ 100 Risk Reduction Rating 22.5

Spreadsheet Risk Calculator Absolute severity is the greatest expected measure of consequence for a particular hazard in whatever units are being used. The product of the likelihood of the event occurring per annum and the expected severity of the outcome measures absolute risk. Greatest risk reduction per dollar spent is calculated by the formula: Likelihood x Severity x Percentage risk control Total capital cost of recommendation If historical data on injury frequencies and severity is not available then a risk estimation can still be made for any hazard by developing exposure data. That is: Likelihood = Exposure x Probability of injury where and Exposure is the number of trials per time period Probability is a number between 0 and 1.

For example, consider a tripping hazard due to wrinkled carpet. How many times in a working day does a typical employee step over the carpet? How many employees typically do this? How many days does a typical employee work? The product of all these numbers will give a first approximation as to the number of trials per annum. This can also be done in a spreadsheet form.
Trials per time unit per person 2 Time units pa 240 People per shift 10 Shifts 2 Trials pa 9,600 Probability of injury per trial -4 1 x 10 Likelihood of injury pa 1

Quantifying Exposure and Likelihood

Risk & Reliability Associates Pty Ltd

8.3

8.4
Control Severity Injury Control Payback Rank Risk (days effectiveness Priority frequency rating cost ($) score Order lost p.a.) (%) (p.a.) (days lost) 0.96 100 25 24 90 21.6 1

Ranking Techniques

Item No.

Trials per Time Proposed People Exposure Probability control time unit units Shifts per shift (trials p.a.) per trial measure per person p.a. Provide foam 0.0001 10 9600 2 2 240 for head bump potential

Sample Spreadsheet Hazard Register

Risk & Reliability Associates Pty Ltd

Ranking Techniques A table of helpful figures is provided to facilitate risk ranking.


Exposure (Time Units p.a.) Constant (every 5 working minutes) Hours (typical working hours) Days (working days per year) Weeks (typical working weeks) Months Years Reasonable Severity Potential (after Viner 1991) Medical and Temporary Partial Incapacity (Hit thumb with a hammer) Temporary Total Incapacity (Unconscious) Permanent Partial Incapacity (Maiming) Permanent Total Incapacity/Death Multiple (typical 3) Deaths Probability of Injury per Trial Certain Imminent Probable Likely Unexpected Remote 1 in a million 10-0 10-1 10-2 10-3 10-4 10-5 10-6 = = = = = = = 1 0.1 0.01 0.001 0.0001 0.00001 0.000001 = = = = = = = 1/1 1/10 1/100 1/1,000 1/10,000 1/100,000 1/1,000,000 24000 per year 2000 per year 240 per year 48 per year 11 per year 1 per year Days Lost 0.5 25 275 6000 18000

Recommendation Effectiveness (Anticipated Risk Reduction) Total removal Design Administration Training Recommendation Cost Maintenance Budget Item Annual Budget Item Capital Works Item $100 $1,000 $10,000+ 100% 90% 50% 30%

Helpful Ranking Figures

Risk & Reliability Associates Pty Ltd

8.5

Ranking Techniques These figures can be extended to the below. The first provides for a rapid calculation of expected accident frequency.
Exposure 24,000 p.a. 2,000 p.a. 240 p.a. 48 p.a. 11 p.a. 1 p.a. Probability 1/100 24 p.a. 2 p.a. 0.24 p.a. 0.048 pa. 0.011 p.a. 0.001 p.a. 1/10,000 2.4 p.a. 0.2 p.a. 0.024 p.a. 0.0048 p.a. 0.0011 p.a. 0.0001 p.a. 1/100,000 0.24 p.a. 0.02 p.a. 0.0024 p.a. 0.00048 p.a. 0.00011 p.a. 0.00001 p.a. 1/1,000,000 0.024 p.a. 0.002 p.a. 0.00024 p.a. 0.000048 p.a. 0.000011 p.a. 0.000001 p.a.

Figures to Calculate Expected Accident Frequency The second table provides for a typical first order correlation between injury severity, loss expectancy and public response in the form of environmental, regulatory and media impact.
Severity Noticeable Important Serious Severe Critical Catastrophic OH&S (days lost) 0.5 25 275 (1 death ) 6,000 (3 deaths) 18,000 (3+ deaths) 18,000+ Property (dollars) $1,000 $10,000 $100,000 $1,000,000 $10,000,000 $100,000,000 Environmental Regulatory/Media Local media (non-metropolitan) Local media (metropolitan) National media, local regulation National media & regulation Intl media & national regulation

Estimated Expected Severity 8.2.3 Precautionary Ranking Note

Care should be used when selecting a point on a line of constant risk as a system of risk characterisation. Whilst such lines may be true on average for all risks, they are not true for individual risks. For example, the risk of tripping on a footpath is more likely to cause injury than death whereas the risk associated with falling off a high rise building is far more likely to cause death than injury. That is, there is a unique risk curve for each hazard. It is almost certainly not a line of constant risk. It is therefore prudent to characterise the most probable consequence severity first and then to characterise the likelihood of the occurrence of that consequence severity. The object is to ensure that the worst point on the risk curve for an individual risk is chosen for characterisation. A sample list of possible risk curves follows. They are very subjective risk curves based on the experience of the authors. They are drawn as though on log-log graph paper so that a 45 degree line would represent a line of constant risk. Consequence is represented as days lost. Likelihood is on a probability basis and would need to be multiplied by the number of trials to obtain the actual expected number of expected injuries. Adapting the figures above to the nearest order of magnitude provides the following scales.
1 day lost 10 days lost 100 days lost 1,000 days lost 10,000 days lost Medical and Temporary Partial Incapacity Temporary Total Incapacity (Unconscious) Permanent Partial Incapacity (Maiming) Permanent Total Incapacity/Death Multiple Deaths

Consequence Scale
1 in 100 1 in 1,000 1 in 10,000 1 in 100,000 1 in 1,000,000 Probable Likely Unexpected Remote 1 in a million

Likelihood Scale

8.6

Risk & Reliability Associates Pty Ltd

Ranking Techniques

1 in 100 Probable

1 in 1,000 Likely

1 in 10,000 Unexpected

1 in 100,000 Remote

1 in 1,000,000 1 in a million

10
Temporary Total Incapacity

100
Permanent Partial Incapacity

1,000
Permanent Total Incapacity /Death

10,000
Multiple Deaths

Medical and Temporary Partial Incapacity

Risk Curve for Manual Handling Hazard


1 in 100 Probable

1 in 1,000 Likely

1 in 10,000 Unexpected

1 in 100,000 Remote

1 in 1,000,000 1 in a million

10
Temporary Total Incapacity

100
Permanent Partial Incapacity

1,000
Permanent Total Incapacity /Death

10,000
Multiple Deaths

Medical and Temporary Partial Incapacity

Risk Curve for Trip on Paving Hazard


1 in 100 Probable

1 in 1,000 Likely

1 in 10,000 Unexpected

1 in 100,000 Remote

1 in 1,000,000 1 in a million

10
Temporary Total Incapacity

100
Permanent Partial Incapacity

1,000
Permanent Total Incapacity /Death

10,000
Multiple Deaths

Medical and Temporary Partial Incapacity

Risk Curve for High Voltage Electrocution Hazard Sample Possible Risk Curves of Particular Hazards

Risk & Reliability Associates Pty Ltd

8.7

Ranking Techniques In the authors' experience the most appropriate order in which to consider such matters are: * * * absolute severity absolute risk. greatest risk reduction per dollar spent

Absolute severity reflects the need to ensure that anything with (multiple) death potentials has been seriously considered. The spreadsheet calculator described will score badly risk control solutions that are expensive and/or inefficient. So if an expensive solution has been proposed when a cheaper one was available then due diligence may not have been satisfied. The results of such work can be represented by tabular outputs such as that shown below.
Statement of Risk Head bump potential exists at the end of the conveyor Use of blow down gun on conveyor provides for embolisms, mechanical damage potential and eye injury to personnel. The dock should be guarded against fall potentials when not actually in use. This is difficult to effectively achieve. Jumping out of truck holding goods. This imposes severe back strain problems. The platforms by the discharge chute do not have kick boards, a proper access ladder or complete hand railing. The ramp safety chain fastening appears inadequate. Any slack in the chain would enable the trailer/ramp to separate. The stairway of Building 1 has slippery surfaces. Controls Provide foam padding in addition to the stripe indicating surface. Discontinue the use of the blow down gun in favour of a suitable vacuum cleaner. Remove flexible air hose. Minimum options are: 1. Paint the edge brightly 2. Mark a "no walking" area. 3. Provide a small raised wooden edging. 1. Provide a large non-slip step down. 2. Provide induction and training. This really requires a redesign of the loading operation in this area to conform to AS1657. Provide a welded stanchion down to bumper level so that the chain is horizontal and slack is minimised. Resurface the stairs with a non slip surface, (coefficient of friction 0.4 min., desirably 0.5, for all foreseeable conditions) Risk 23 143 Severity 25 6000 Cost 100 1000 Payback 207 143

264

275

1000

132

1031

275

5000

103

2639

275

20000

92

26

275

1000

13

25

1000

Sample Hazard Register


Sorted by Greatest Risk Reduction per Dollar spent (Payback Score)

8.8

Risk & Reliability Associates Pty Ltd

Ranking Techniques 8.2.4 Process Review

Reviewing the process: i) ii) iii) iv) v) vi) Simultaneously identify the hazard and a possible solution Select a realistic maximum injury severity for that hazard Assess exposure and probability per trial to determine the frequency sensible for that consequence severity Conduct a reality check, "Is that frequency sensible for that consequence severity?" Select solution's control cost and risk control effectiveness. Calculate risk, risk reduction and ranking.

8.2.5 Risk Control Measures There are five general categories for risk control:
1. Removal or Elimination 2. Design or Physical Control (engineering) 3. Administrative Control (procedural) 4. Training (Work Method Controls) (personnel) 5. PPE (Personnel Protective Equipment) Effectiveness 100% 90% 50% 30% 20%

Some examples of the above categories are given in the table below:
Occurrence Type CHEMICAL EXPOSURE - toxic properties Engineering Controls 1. Design for containment 2. Ventilation systems. 3. Change rooms and Procedural Controls 1. Chemical purchasing procedure. 2. Chemical register. showers etc. 3. Provision of personal protective equipment 4. Medical monitoring programs 5. Transportation, handling and storage practices. 6. Maintenance of equipment 7. Emergency procedures 1. Chemical purchasing procedure. 2. Chemical register. 3. Provision of personal protective equipment. 4. Transportation, handling and storage practices. 5. Maintenance of equipment. 6. Emergency procedures. 1. Chemical purchasing procedure 2. Chemical register. 3. Provision of personal 4.Transportation, handling and storage practices. 5. Maintenance of equipment. 6. Emergency procedures 1. Work Permit systems 2. Equipment Maintenance Personnel Controls 1. Training in the selection, use and care of personal protective equipment. 2. Information on toxic properties and routes of ingestion 1. Training in the selection, use and care of personal protective equipment. 2 Information on toxic properties and routes of ingestion. 1. Awareness of hazardous properties. 2. Awareness of emergency procedures.

CHEMICAL EXPOSURE - corrosive properties

1. Splash and leak proof containers. 2. Provision of showers and eye washes.

CHEMICAL EXPOSURE - fire and explosion effects CHEMICAL EXPOSURE - asphyxiant properties

1. Provision of storage facilities 2. Provision of Containers

1. Atmosphere assessment equipment 2. Ventilation equipment 3. Harnesses and air supply equipment

1 Awareness of hazardous properties. 2. Awareness of emergency procedures.

Examples of Risk Control Measures

Risk & Reliability Associates Pty Ltd

8.9

Ranking Techniques 8.3 Ranking Property Loss Prevention Hazards

An example of a property hazard risk calculator is given in the figure below.

Property Loss Prevention Program


Major Recommendation Register Recommendation No:1 Date: Monday 15 March 1996

Recommendation: Installing in-rack sprinklers in the multiple row racks in the raw materials warehouse and under the finished goods conveyor is required to make the sprinkler protection effective. The existing sprinkler system was designed for solid pile storage. It is inadequate for multiple row rack storage and in-rack sprinklers or a very serious increase in overhead sprinklers protection would be required. The new conveyor system shields the overhead sprinklers and a new row is required under it.

Backg'd Event Freq. (pa) Hot Spot Freq. (pa) Total Event. Freq. (pa) Years Between Events Asset Damage $ Business Interruption $ Severity (PD + BI) $

0.01 0.01 100 2,000,000 1,000,000 3,000,000

Rec. Capital Cost $ Rec. Maint. Cost $ pa Rec. Effectiveness % Pre. Rec. Loss Expectancy Post. Rec. Loss Expectancy Annual Loss Expectancy Payback Period (years)

100,000 100 90 30,000 3,000 27,000 3.7

Property Loss Payback Calculator The definitions for each of the items above follow on the next page. The key concept is the total cost of risk. For property damage the product of the likelihood of the loss event and its expected frequency is the annual loss expectancy. That is how much money would need to be put aside each year to pay for the cost of loss if no insurance were purchased. It is a direct measure of the risk of the event. For example, if the projected cost of the event is $1m and it occurs once every 10 years then $100,000 per year should be set aside to pay for the cost of loss. This follows from the Loss Rate Concept (Browning R L 1980). However, if a risk control option can be implemented then it should reduce either the likelihood or the severity of the loss event substantially, perhaps 90%. This will reduce the annual loss expectancy by 90% from $30,000 per year to $3,000 per year. That is, there will be a saving in the cost of ownership of $27,000 per year. Thus if the cost of the improvement is $100,000 it will nominally take 3.7 years to pay back. The formula is: Payback Period (Years) = So in the above case: Payback Period (Years) = $100,000 ($27,000 pa - $100 p.a.) = 3.7 years Recommendation Cost . ( Annual Loss Expectancy - Maintenance Cost p.a.)

8.10

Risk & Reliability Associates Pty Ltd

Ranking Techniques 8.3.1 Property Loss Calculator-Definition of Terms This is the expected fire frequency associated with the event. For example, a fire in a warehouse This is an assessment of unusual items which add a particular event frequency beyond the normal, background frequency. For example, an internal petrol bowser The sum of Background and Hot Spot Frequency The reciprocal of Total Event Frequency. An estimate of the expected property damage. An estimate of the expected loss of profits. The sum of Asset Damage and Business Interruption. An estimate of the cost of maintaining the recommendation per year. This needs to include any potential losses associated with the proposed solutions. For example, in-rack sprinklers might be struck once a year by forklifts causing $10,000 damage on each occasion. An estimate of the control effectiveness of the proposed risk control solution. It can be either a frequency reduction or a severity reduction or both. This is the annual loss expectancy and is the product of the Total Event Frequency and the Severity. This is the revised annual loss expectancy after the recommendation has been implemented. It is the Pre-Recommendation Loss Expectancy reduced by the Recommendation Effectiveness. This is the difference between the Pre- and Post Recommendation Loss Expectancy This is equal to: Recommendation Cost ( Annual Loss Expectancy - Maintenance Cost p.a.) This excludes any discounted cash flow considerations, which does not seem to be important for projects that have a payback of 3 years or less.

Backg'd Event Freq. (p.a.) Hot Spot Freq. (p.a.)

Total Event Freq. (p.a.) Years Between Events Asset Damage Business Interruption Severity (PD + BI) Rec. Maint. Cost $

Rec. Effectiveness %

Pre-Rec. Loss Expectancy Post-Rec. Loss Expectancy

Annual Loss Expectancy


Payback Period (years)

Risk & Reliability Associates Pty Ltd

8.11

Ranking Techniques 8.4 Integrated Investment Ranking

Capital investment proposals are often focused on new projects or schemes. But projects which improve reliability or reduce risk can provide for superior investment. To properly assess and compare different capital works projects, an integrated assessment process is needed. Such a payback assessment system should also; Establish a balanced investment program. Rank projects to provide the maximum rate of return. Assess the cost of providing a specified level of service.

A concept model is shown below.


The Benefits arising from a Solution to a Perceived Problem are:

Savings in Maintenance Costs

Commercial Benefits

PR, Image Moral Value

Reduction in Reduction in Risk Risk (Loss Expectancy Expectancy)

Calculated as Investment Ratio or Years Payback Value

Benefit Model Of the four forms of benefit identified in the figure above, determining dollar values for Commercial Benefits and Maintenance Savings are relatively straightforward. However, determining dollar benefits for the issues of Public Relations, Corporate Morale, Image and Reduction in Loss Expectancy is more complex. Results of any investment assessment must be presented in ways that senior management can understand, that is financially based, and on clean crisp pieces of paper. In our experience, senior managers, directors do not respond to computer screens. A spreadsheet example of a possible layout is included at the conclusion of this chapter

8.12

Risk & Reliability Associates Pty Ltd

Ranking Techniques New Company Pty Ltd Project Investment Summary Project Description Years Payback: 1.59 yrs

A problem exists with a lack of oil traps on the storm water drains. This means that any transformer that leaks will release oil directly to the creek. The proposal is to install oil traps on each sub-station

Investment Overview Cost Design Labour Materials Contingency Total Cost $12,500 $86,000 $45,000 $14,350 $157,850 Return Commercial Return Maintenance Saving PR Benefit Risk Saving Total Return (Summary over) $0 pa ($1,000) pa $500 pa $99,954 pa $99,454 pa

This is for a photograph

Risk & Reliability Associates Pty Ltd

8.13

Ranking Techniques

Commercial Return No commercial prospects noted.

0 p.a.

Risk Saving

$ 99,954 p.a.

Maintenance Saving Will cost $1,000 per year to maintain.

($1,000) p.a.

Risk Saving Calculation Event Frequency per year Years between events Consequence Severity Asset Damage Business Interruptions Clean Up Cost Legal Cost Fines Management Stress Cost Public Relations Damage Total Severity Project Effectiveness 2.00 0.50 $0 $0 $5,000 $3,000 $2,000 $10,000 $30,000 $50,000 99.95%

Public Relations Benefits

p.a.Comments Total cost to the organisation is 2 x $50,000 pa or $100,000 pa. PR Damage equals the cost to restore the Organisation's real name. Effectiveness: The only time when the oil traps won't work is during a raging storm, say 10 hours out of 8760 hours per year. Project Effectiveness = = 8760 - 10 8760 99.95%

Small benefit to locals but no real positives

8.14

Risk & Reliability Associates Pty Ltd

Ranking Techniques READING Anderson K J, Robinson R M and D Hyland (1992). Ranking of Infrastructure Renewals Taking into Account the Business Requirements of the Railway. CompRail 92 Conference. Washington. Browning R L (1980). The Loss Rate Concept in Safety Engineering. Marcel Dekker, USA. Lees F P (1995). Loss Prevention in the Process Industries. 2nd Edition. Butterworth-Heinemann Ltd, Oxford, UK. (3 Volumes). Robinson R M, J R Kennedy and T Beattie (1995). Risk Based Investment Ranking. Viner D B L (1991). Accident Analysis and Risk Control. VRJ Information Systems, Melbourne. ISBN 0 646 02009 9. Table is on page 132.

Risk & Reliability Associates Pty Ltd

8.15

Item No.

Proposed control measure Provide foam for head bump potential

Trials per Time People Exposure time unit units Shifts per shift (trials p.a.) per person p.a. 2 240 10 2 9600

Injury Severity Control Probability Risk (days frequency rating (days effectiveness per trial lost p.a.) (p.a.) lost) (%) 0.0001 0.96 25 24 90

Control Payback Rank Priority cost ($) score Order

100

21.6

Modelling Techniques

9.

Modelling Techniques

There are variety of analytical methods for risk and reliability modelling of the pure risk of technical systems documented in a range of standards and codes. They are especially applicable to analysis of computer systems (functional safety assessment), which in this day and age appears to be a substantive component of any significant infrastructure control system. The ones that the authors have used successfully are shown below and will be discussed in this chapter.

Trees Fault Trees Success Trees Event Trees (Consequence Trees) Dependency Trees Blocks Reliability Block Diagrams Dependence Block Diagrams Blocks vs Trees Integrated Presentation Diagrams Cause-Consequence Diagrams Threat-Barrier Diagrams Venn (Swiss Cheese) Diagrams List of Modelling Techniques and Presentation Methods

The choice mostly relates to the nature of the problem under investigation and the requirements of the audience to whom the analysis is being addressed. The integrated presentation diagrams, as the name suggests, are generally more palatable to the public and the courts as they provide the most pictorial representation of the subject. However, analytical technical people generally prefer to use trees and block diagrams for the initial analysis at least. A summary of the mathematics required to support these pure risk-modelling techniques is contained in Chapter 12. This chapter also contains a summary of the mathematics used for modelling market (speculative) risk.

Risk & Reliability Associates Pty Ltd

9.1

Modelling Techniques 9.1 Trees

The heart of decision trees is the assumption that truly independent variables contribute to occurrences and outcomes. That is, what independent things must conspire together to bring about an event, and having occurred, what are the possible outcomes? The general structure of such models was established in 1975 with the publication of the US Reactor Safety Study known as WASH-1400 and formally entitled: An Assessment of Accident Risks in the US Commercial Nuclear Power Stations (Reason, 1990). The basic steps are: i) ii) iii) iv) v) Identify sources of potential hazard Identify the events that could initiate such a hazard occurring (fault trees). Establish the possible sequence of events that could result from such occurrences (event trees). Quantify in probability and frequency terms the likelihood of ii) and iii). Determine the overall risk by aggregating all the known quantified hazards.

The difficulty is in determining the input numbers and ensuring that there are no common inputs or process that are affected simultaneously by one external factor. 9.1.1 Fault Trees

The time sequence concept can be extended in several different ways using probabilistic concepts. A fault tree is effectively a statement of what events have to conspire together to bring about an undesired outcome. Traditionally these have been drawn top-down and therefore the undesired event known as the "top event". Because of the logical hierarchy of the items, it can be seen as a form of time sequence going from the bottom towards the top of the page.
Light Fails

3.002 p.a. OR
Power Failure
1 p.a.

Bulb Burnt Out


2 p.a.

Fuse Failure

2 x 10
OR

-3

p.a.

Incorrectly Set

Power Surge
1 x 10
-3

1 x 10

-3

p.a.

p.a.

A 'Fault' Tree The fault tree leads to the conclusion that to minimise the likelihood of light failure, minimising the likelihood of bulb burnout provides the greatest contribution. The success tree in Section 9.1.2 indicates that to maximise light availability it is most effective to improve bulb operability than any other aspect. That is, both trees lead to the same general conclusion if the top event is similarly defined. From the risk engineers perspective, the reliability engineer has a distinct advantage; the outcome of the success tree, the "top event" is defined in terms of what makes the system operate to its specification, perhaps its availability; its success objective. The failures are all grouped together and contained in the idea of "unavailability" irrespective of whether the failure is due to a breakdown failure, (in the vast majority of instances), or failure (risk). 9.2 Risk & Reliability Associates Pty Ltd

Modelling Techniques 9.1.2 Success Trees

It seems that because of this, reliability engineers conceptually prefer "success" tree analysis to "fault" tree analysis. The concept is similar but the Boolean mathematics in the construction of the tree is reversed ('or' gates become 'and' gates) because of this focus on availability (the desired outcome) rather than the fault (the undesired outcome). Reconsidering the light bulb fault tree example:
Light Available 0.9488 & Power Available 0.999 & Bulb Operational 0.95 & Correctly Set 0.9999 & Fuse Operational 0.9998 & Fuse Available 0.9999

Power Available

A 'Success' Tree 9.1.3 Event Trees (Consequence Trees)

An event tree is a similar device except that it answers the questions associated with a particular event occurring with several possible outcomes. These traditionally have also been drawn top-down although in this case the time arrow would be moving from the top of the page towards the bottom of the page as shown below.

100 fires p.a. No 0.05 0.95 Yes

Fire Start Frequency

Sprinklers Effective?

5 large fires p.a.

95 controlled fires p.a.

Outcome Frequencies

An 'Event' (or 'Outcome') Tree

Risk & Reliability Associates Pty Ltd

9.3

Modelling Techniques 9.1.4 Dependency Trees

The block diagram technique is powerful because it agglomerates all the detailed failure or reliability data into a single communicative overview at a system level, something most of the other techniques fail to achieve. A dependency tree for an airline business is shown below. The likelihood of achieving the top objective could be assessed from the reliability of simultaneously achieving each of the sub-objectives.

Flying paying passengers

Serviceable Aircraft

Trained Aircrew

Passengers

Servicable Airports

Reservations Systems

Passenger Terminals

Trains, taxis, carparks

Computers & Software

Trained Operators
Airline Dependency Tree

Such dependency trees appear to be particularly useful for critical infrastructure assessments using the threat and vulnerability technique (Chapter 7.3).

9.4

Risk & Reliability Associates Pty Ltd

Modelling Techniques 9.2 9.2.1 Blocks Reliability Block Diagrams

Block diagrams are a simple way of representing complex systems diagrammatically. They can be used for both risk and reliability studies. The key concept is to divide the system or process under consideration into sub-systems that are independent of each other and which all the interested parties can pictorially see and agree represents the system as a whole. (This is definitely art and not science). It is absolutely critical that as many interested parties as possible participate and sign off the block diagram as any modelling done is on the basis that the block diagram is an accurate representation of reality for the particular study sign off the block diagram. For reliability work the representation will depend on the definition of success or failure (usually in terms of availability) adopted for the system. If it has multiple definitions (usually associated with alternate operating modes) separate diagrams may be required for each. There are four basic configurations (BS 5760: Part 2:1994) namely, series, parallel (active redundant), m out of n units and cold standby. These are shown below:

A
Series System

Output

S
Output

T
Parallel or Active Redundant System

X Y Z
Two Out of Three System

Output

P Q
Cold Standby

Output

Each block could be further reduced to other block diagrams. The block diagram technique is powerful because it agglomerates all the detailed reliability data into a single communicative overview at the system level, something most of the other techniques fail to achieve.

Risk & Reliability Associates Pty Ltd

9.5

Modelling Techniques 9.2.2 Dependence Block Diagrams

A reliability block diagram is, in fact, a success block diagram. It describes what elements have to work in order to get a successful output. Just like fault trees have a logical opposite in success trees, there are also fault block diagrams, generally known as dependence diagrams (SAE ARP 4761). The figure below shows the equivalent dependence diagram for the RBD in section 9.2.3 with all relevant failure paths.

Failure A

Failure B

Failure C Failure D Failure E

Sample Dependence Diagram Dependence diagrams are particularly useful for analysing fault trees and checking both the logic and mathematics since they can easily be drawn on a spreadsheet. In fact, the dependence diagram represents the cut set of a fault tree, the cut set being the set of all ways the top event in the fault tree will be true. 9.2.3 Blocks vs Trees

Block diagrams and success trees (and therefore fault trees) are interchangeable mathematically. The choice between the two techniques (or the use of both) depends on the scope of the analysis and presentation needs. The advantage of block diagrams is the simplicity of high-level presentations. The advantage of fault trees is the mathematical convenience of modelling a large number of inputs using, for example, spreadsheets.

Success A Success C Success B

Success D
Outcome

Success E
Sample Reliability Block Diagram

This can be redrawn as a fault tree.

Failure A & Failure B Failure C Failure D & Failure E


Sample Fault Tree
Failure G (Failure D & E) Failure F (Failure A & B)

or

Failure H (Failure F, C or G)

9.6

Risk & Reliability Associates Pty Ltd

Modelling Techniques 9.3 9.3.1 Integrated Presentation Models Cause-Consequence Models

Fault and event trees can be put together as shown below as a combined fault and event tree or, more elegantly, a cause-consequence diagram (Lees, 1995).

Vulnerability

Manifest Threat

Hit

&
Failed Precaution Fault Trees

Loss of Control Miss Event Trees

Concept 'Cause Consequence' Diagram In a complex situation a major difficulty is usually encountered in selecting the precise point of the loss of control event in such a cause consequence diagram. In theory at least, it could be anywhere along the chain. A useful solution to this difficulty for a risk engineer is to use an energy damage model approach (Viner, 1991) and to say that the event is the point at which control of the potentially damaging energy is lost. As emphasised in Chapter 4.4, Due Diligence, the loss of control point is very important legally. It is always better to prevent the problem, either by eliminating the threat or enhancing the precautions, than to try to recover the situation after control is lost. This has been tested with numerous lawyers by R 2A on many occasions. For example, with regards to airspace collision risk it is the point at which the two aircraft collision envelopes overlap. That is, they become so close that the pilots cannot avoid each other; they have lost control of their kinetic energy (Chapter 15.1). It does not mean that they will collide. In fact the collision envelope is large compared to the aircraft. It is just that the pilots have lost control over the outcome. The loss of control point is not always totally obvious. For example, in an analysis for an electrical authority with high voltage transmission lines the point of loss of control of energy was when someone or something penetrated the flashover envelope of the high voltage conductor (Chapter 15.4). That is, despite having entered this region with a fishing pole on the back of a vehicle, the flashover may not occur with fatal results to the occupants. It is possible they might be insulated from the road or it may be a very dry day and the actual envelope is a little smaller than usual. The loss of control point for fire in a tunnel appears to be that fire size which overwhelms the usual air handling system (Chapter 15.6). There are several arguments for this. The simplest, legally, probably revolves around confined spaces. The tunnels should only have sweet, decent air whenever they are occupied, even during a fire/smoke incident. Otherwise they would be considered a confined space. Emergency ventilation to prevent a situation becoming a confined space is an attempt to restore control and acts after the event. For level crossings it is the point at which the vehicle approaching the level crossing has inadequate stopping distance. An example of a cause-consequence diagram for an inadequate stopping distance for a level crossing can be seen below. To fully describe a cause-consequence model requires 3 parameters, threat likelihood, precaution failure probability and the hit and miss balance (degree of vulnerability).

Risk & Reliability Associates Pty Ltd

9.7

Modelling Techniques In terms of due diligence, the lawyers/courts always focus on the prevention side first. Trying to restore control after the event is always difficult. This actually parallels the OHS hierarchy of controls: elimination/engineering, administration and PPE (personal protective equipment). The latter can only be adopted if the other options are not viable. Viable in this sense seems to mean the common law test of negligence. That is, the balance of the significance of the risk verses the effort required to reduce it. Cause-consequence models invariably demonstrate that control before the loss of control point is the only way to reliably prevent large scale multiple life loss scenarios when large energies and many people are involved. In practice, in ensuring no loss of control, at least three assessment levels of precautions need to be considered: i) ii) iii) Not less safe comparison with the current situation Best practice - what other organisations and comparable industries do to manage similar threats As low as reasonable practicable - the balance of the significance of an additional precaution of defined safety integrity level versus its cost (a legally difficult process).

Train not heard

Collision?

Severe? 0.01

Extension?

1.00E-01
Yes Train not seen 1.00E-02 and Failure to 0.9

Train deaths
1.89E-08

detect train
1.00E-06 Yes 0.1

Vehicle
deaths? 1.89E-06

Crossing Lights
not seen

or Failure to
apply brakes

Hit 2.10E-06 Loss of Control Injury/


Damage or

Vehicle
deaths

1.00E-03
Car driver

1.10E-05 Stopping distance inadequate


2.10E-05

0.99

1.87E-06

dysfunctional
1.00E-05

0.1
No

2.10E-07

Road/Braking
system fails 1.00E-05 0.9

Near miss
Check Sum: 1.89E-05 2.10E-05

No
Conditional Cause-Consequence diagram for an inadequate stopping distance for a level crossing t

Advance crossing warning failure


Train detection failure

Driver fails to actuate brakes

LOC stopping distance inadequate Stopping system fails Scrunch

Deaths/injury/damage Coroner's inquiry

Cause Consequence Diagram of a Level Crossing One of the primary advantages of cause-consequence models is that they can readily be prepared on spreadsheets with the border tool drawing the lines. (It is necessary to include four cells for a particular item so that the line can come from the centre). Spreadsheets have become ubiquitous. Everyone can use them and share the model.

9.8

Risk & Reliability Associates Pty Ltd

Modelling Techniques 9.3.2 Threat Barrier Diagrams

From the authors perspective, threat barrier diagrams are another representation of causeconsequence models, as drawn on a drafting package. They can be particularly useful in showing barriers that have effects on multiple threats such as that shown for the tunnel case study (Chapter 15.6) below.

Fire in Heavy Commercial Vehicle

Fire in vehicle in stalled traffic greater than 5 MW.

Manual Fire Control Deaths, injury and damage

Fire in Car

Loss of Control

DG Fire Auto Deluge System Emergency Ventilation Emergency Evacuation

Traffic Congestion Control Prohibited vehicle enforcement

Sample Threat Barrier Diagram for Fire in a Road Tunnel

9.3.3

Venn (Swiss Cheese) Diagrams

Venn diagram models are graphical representations of AND and OR gates. These are expanded in more detail in Chapter 12, Mathematics. James Reasons use of this model type has provided the name Swiss Cheese.

Traffic Density

Radar Option

Separation/ Segregation

See and Avoid

Near Miss

Mid Air Collision

Venn Diagram Model of the Series of Failures Required for a Mid-Air Collision

Risk & Reliability Associates Pty Ltd

9.9

Modelling Techniques 9.4 Common Mode and Cause Failures

The validity of any of these models rests on the independence of the inputs and failure mechanisms. Each must be completely independent of all the others. If a single outside process can affect two inputs simultaneously then the model is compromised by what is termed a common mode or cause failure. Smith D (1993) makes a distinction between the two, which can be important, especially with diverse redundant systems. Common mode usually refers to a fire or power outage that can simultaneously damage both systems. Common cause refers matters like a misspecification for software. The hardware may be diverse and the software written by different contractors using alternate software. But the built in error will be reliably repeated by both systems, a common cause failure.
Common Mode Failures

Common Cause Failures

Accounting System
Inputs Outputs

Auditing System

A Redundant System A Common Cause Failure is when both the systems fail because of a flawed input that each of the diverse systems processes incorrectly. A Common Mode Failure occurs because of a simultaneous failure of both systems due to an external agency, for example, a fire or corruption. 9.5 Human Error Rates

Key references in the field of human reliability assessment (HRA) include the seminal US Nuclear Reactor Safety Study (1975), Lees (1995) and Swain (1983). Numerous techniques including HEART (Human Error Assessment and Reduction Technique) and THERP (Technique for Human Error Rate Prediction) are described by Villemeur (1992) and Kirwan (1995) and recent publications by Leveson (1995), Storey (1996) and Redmill (1997) also draw attention to the subject. The following figures stem from the failure rate of humans performing different tasks from the 1975 US Nuclear Reactor Safety Study. There are differences between errors of commission and errors of omission but the figures below have proven remarkably robust accurate for work undertaken by R2A. This includes air and sea pilots, car and train drivers and industrial situations generally. Type of Activity Critical Routine Task (tank isolation) Non-Critical Routine Task (misreading temperature data) Non Routine Operations (start up, maintenance) Check List Inspection Walk Around Inspection High Stress Operations; Responding after major accident - first five minutes - after five minutes - after thirty minutes - after several hours Human Error Rates
(Source: US Atomic Energy Commission Reactor Safety Study, 1975)

Probability of Error per Task 0.001 0.003 0.01 0.1 0.5 1 0.9 0.1 0.01

9.10

Risk & Reliability Associates Pty Ltd

Modelling Techniques Smith D (1993) summarises various sources. The following is an extract from this reference. Type of Activity Simplest Possible Task Overfill Bath Fail to isolate supply (electrical work) Fail to notice major cross roads Routine Simple Task Read checklist or digital display wrongly Set switch (multiposition) wrongly Routine Task with Care Needed Fail to reset valve after some related task Dial 10 digits wrongly Complicated Non-routine Task Fail to recognise incorrect status in roving inspection Fail to notice wrong position on valves Human Error Rates
(Source: Smith DJ 1993)

Probability of Error per Task 0.00001 0.0001 0.0005 0.001 0.001 0.01 0.06 0.1 0.5

A coarse summary has it that human errors in trained tasks occur typically at the rate of 1 in 100 per demand, checklist errors are notorious (1 in 10) and even critical tasks can evince error rates of 1 in 1000. For example, recent Watchdog monitoring of several thousand train orders found a handful of mistakes, not in themselves critical, but suggesting a human error probability of 2 in 1000. Based on successful testing of some 529 combinations of the software interlocking rules, according to Annex L of IEC 61508, at 95% confidence, failure probability per demand is 3/529=5.6 in1000.

Risk & Reliability Associates Pty Ltd

9.11

Modelling Techniques 9.6 Equipment Fault (Breakdown Failure) Rates

The following table provides a list of typical breakdown failure rates for mechanical parts from work done by the authors. It is emphasised that the data can vary according to operating environments, system interactions and maintenance regimes. Item MTBF Mean Time Between Failures (Hrs) 100,000 100,000 100,000 250,000 125,000 100,000 F/Million Hrs Life (yrs)

Motor Gearbox Clutch Bearings Belts Tensioners

10 10 10 4 8 10

11.42 11.42 11.42 28.54 14.27 11.42

Typical Component Breakdown Failure Rates Smith D (1993) summarises various sources of failure rates. The following is an extract from this reference. He provides up to three figures. If there is only one figure it means his sources are in good agreement. Two or three numbers means a scatter. Item Alarm Siren Alternator Computer-PLC Detectors-smoke-ionisation Motor-electrical-ac Transformers->415V VDU Lower 1 1 20 2 1 0.4 10 Failure Rates per million hours Most Upper 6 20 9 50 6 5 20 1 7 200 500

General Breakdown Failures Rates


(Source: Smith DJ, 1993)

9.7

Generic Failure Rates

Generic failure rates are useful for various forms of preliminary analysis. For example; Item People Mechanical systems Electrical systems Failure Rates -2 10 per operation -3 10 per operation -4 10 per operation

Generic Failure Rates 9.8 System Safety Assurance

System safety assurance is a large domain and the subject of separate R2A writings and courses. Nevertheless, certain elements are presented for introductory purposes. Much of the modelling described above is used for functional safety assessment pursuant to IEC61508:1998 (aka AS61508:2000).

9.12

Risk & Reliability Associates Pty Ltd

Modelling Techniques 9.8.1 Nines

The table below summarises the different terminology sometimes used to describe availability. Up to up to up to up to up to up to up to up to up to 30 1 5 10 30 45 1 2 10 secs downtime pa min downtime pa mins downtime pa mins downtime pa mins downtime pa mins downtime pa hr downtime pa hrs downtime pa hrs downtime pa is is is is is is is is is 99.999905% 99.999810% 99.999049% 99.998097% 99.994292% 99.991438% 99.988584% 99.977169% 99.885845% availability pa availability pa availability pa availability pa availability pa availability pa availability pa availability pa availability pa or 6 nines or 5 nines

or 4 nines

or 3 nines

Summary of Availability Numbers 9.8.2 SIL (Safety and Integrity Level)

SIL is a measure of the probability that the safety related system will fail dangerous. The value of SIL ranges from 1 (the lowest) to 4 (the highest). The table below is adapted from IEC 61508-1:7.6.2.9; via Factory Mutual Safety integrity level Low demand mode of operation (Average probability of failure to perform its designed function on demand) 5 4 10 to < 10 4 3 10 to < 10 3 2 10 to < 10 2 1 10 to < 10 Table of SIL Values 9.8.3 COTS & SOUP High demand or continuous mode of operation (Probability of a dangerous failure per hour) 9 8 10 to < 10 8 7 10 to < 10 7 6 10 to < 10 6 5 10 to < 10

4 3 2 1

High reliability is most simply and economically achieved by parallel low reliability systems. A very simple example is shown in the figure below.

99%

X Y
99%

99.99%

Parallel Active Redundant Systems As a result, no longer are the commercial and military industrial approaches distinct. For years the military has had its advocates for the use of commercial off-the-shelf (COTS) equipment, nondevelopmental items (NDI), and software of unknown pedigree (SOUP) but now military use of commercial designs is required. For example, in June 1994, a US Secretary of Defence (William Perry) memorandum officially changed the way the military develops and acquires systems. Military standards and specifications are out (except with a waiver) and commercial practices are in.

Risk & Reliability Associates Pty Ltd

9.13

Modelling Techniques REFERENCES British Standards Institution (1994). Reliability of Systems, Equipment and Components, Part 2: Guide to the Assessment of Reliability (BS 5760: Part 2). International Electrotechnical Commission (1998). Functional Safety of Electronic/Programmable Electronic Safety Related Systems. Also know as AS61508:2000. Kirwin Barry (1994). A Guide to Practical Human Reliability Assessment. Taylor & Francis, London. Lees F P (1995). Loss Prevention in the Process Industries. 2nd Edition. Butterworth- Heinemann Ltd, Oxford, UK, (3 Volumes). Leveson Nancy G (1995) Safeware - System Safety and Computers. Addison-Wesley. Perry William as quoted by Preston R. MacDiarmid and John J. Bart in Reliability Toolkit: Commercial Practices Edition. Reliability Analysis Center and Rome Laboratory, NY. Reason J (1990). Human Error. Cambridge University Press. Redmill Felix and Jane Rajan (editors 1997). Human Factors in Safety-Critical Systems. Butterworth Heinemann. Smith David J (1993). Reliability, Maintainability and Risk. Practical Methods for Engineers. Fourth Edition. Butterworth Heinemann, Oxford. Society of Automative Engineers, Guidelines and Methods for Conducting the Safety Assessment Process on Civil airborne Systems and Equipment, (SAE ARP 4761, 1995) Storey Neil (1996). Safety-Critical Computer Systems. Addison-Wesley. Swain Alan D and Bell Barbara Jean (1983). A Procedure for Conducting a Human Reliability Analysis for Nuclear Power Plants. US Atomic Energy Commission Reactor Safety Study (1975). Villemeur Alain (1992). Reliability, Maintainability and Safety Assessment. John Wiley & Sons. Viner Derek (1991). Accident Analysis and Risk Control. VRJ Delphi 1991. READING Department of Defence (USA) (1984). Electronic Reliability Design Handbook, (MIL-HDBK-338-1A), Washington DC. Department of Defence (USA). Reliability Prediction of Electronic Equipment (MIL-HDBK-217), Washington DC. Department of Defence (USA) (1986). Reliability Centred Maintenance Requirements of Naval Aircraft, Weapons Systems and Support Equipment. (MIL-STD-2173AS ), Washington DC. Factory Mutual Research Approval Guide (2001). Chapter 4, Functional Safety of Safety Related Systems and Components. Moubray, John (1992). RCM II Reliability Centred Maintenance. Butterworth Heinemannn Participating OREDA Companies. Off-shore Reliability Handbook (OREDA). Hovik, Norway: DNV Technica Standards Australia/Standards New Zealand (1998). Risk Analysis of Technological Systems Applications Guide. Australian/New Zealand Standard AS/NZS 3931:1998. 9.14 Risk & Reliability Associates Pty Ltd

Bottom Up Techniques

10.

Bottom Up Techniques

Generically, bottom up techniques examine how an element can fail and then assesses the impact of this on the system as a whole. Different bottom up techniques divide the system under consideration differently and may consider different failure types depending on the purpose of the analysis. The most common approach is to gather relevant experts in a room and use a process to obtain group consensus as to the seriousness of a problem and what should be done about it. The general layout for such an assessment is sketched below.

Whiteboard

Facilitator

Analysts
Computer Projector Laptop

O/H Screen

Technical Secretary

Typical Analysis Facility Layout The analysts are usually the designers and the (proposed) operators or maintainers, that is, those who have to live on a day-to-day basis with the plant or process. The facilitator and secretary are usually external to both these groups, often outside consultants. This is to minimise potential bias. The facility, process or contract is then examined in a structured manner, one piece at a time. Problems identified by the group are discussed and consensus achieved as to the significance and the best solution. Action is documented on the spot by the technical secretary with all those present signing off on it at that time.

Risk & Reliability Associates Pty Ltd

10.1

Bottom Up Techniques 10.1 FMEA, FMECA and RCM

10.1.1 FMEA and FMECA Fault (failure) modes and effects analysis (FMEA) and fault modes, effects and criticality analysis (FMECA) are similar in nature except the criticality of a failure mode in FMECA is used as a ranking tool for each failure mode. The process is divided into four key parts as shown below.

System Description & Block Diagram

Fault Modes

Effects (and Criticality)

Conclusion & Recommendation


Fault Modes, Effects and Criticality Approach The detail of the analysis depends on the level to which the system is reduced in the System Description and Block Diagram. If the plant is considered as several large subsystems then the results will be quite coarse. However, if the System Description is done to an individual component level, extraordinarily detailed analysis will ensue. Typically the systems breakdown for most reliability analysis is to four levels as shown in below:

System

Sub Systems

Assemblies

Components (Parts)

Typical System Breakdown

10.2

Risk & Reliability Associates Pty Ltd

Bottom Up Techniques Several authorities provide for lists of failure modes to be considered for each component or subsystem. For example, MIL-SD-1629A (US Military Standard pages 101-105): premature operation failure to operate at a prescribed time intermittent operation failure to cease operation at the prescribed time loss of output or failure during operation degraded output or operational capability other unique failure condition based on system characteristics and operational requirements or constraints A more typical list is: Delayed operation Erratic operation Erroneous indication Erroneous input Erroneous output External leakage Fails closed Fails open Fails to close Fails to open Fails to start Fails to stop Fails to switch False actuation Inadvertent operation Intermittent operation Internal leakage Leakage (electrical) Loss of input Loss of output Open circuit Out of tolerance (high) Out of tolerance (low) Physical binding or jamming Premature operation Restricted flow Short circuit Structural failure Vibration

Generic Fault Modes for FMEA and FMECA The failure effect of each mode of fault by each component or sub-system is then considered, especially if the effect will be concealed or hidden from the operators. This is common with redundant systems where the loss of the one unit could remain undetected until the second fails. It is of particular concern with protective devices that do not fail safe. In terms of establishing criticality, the effects are usually considered as being in four categories whose priorities are in the listed order: * * * * safety (fault mode with possible death or injury effects) environmental (fault mode with unacceptable environmental effects) service (fault mode with operational effects such as production interruptions, product quality variations, customer service implications) economic (fault mode with increased costs only)

By considering each component or sub-assembly and how it might achieve the fault mode described, and the consequences of such fault, a detailed understanding of the system can be achieved.

Risk & Reliability Associates Pty Ltd

10.3

Bottom Up Techniques A summary of the sort of results obtainable from such a study is shown in the table below. Component, (item, functional group) Push-Button (PB) Push-Button (PB) Fault (Failure) Modes The PB is stuck The PB contact remains stuck The relay contact remains open The relay contact remains stuck The fuse does not melt Possible Causes Effects on System (and criticality if desired) Loss of system function: the motor does not operate The motor operates too long: hence a motor short circuit, which leads to a high electric current and to a melting of the fuse Loss of system function: the motor does not operate The motor operates too long: hence a motor short circuit, which leads to a high electric current and a melting of the fuse In the case of a short circuit, the fuse will not open the circuit

Relay

Primary (mechanical) fault Primary (mechanical) fault The operator fails to release the PB (human error) Primary(mechanical) fault A high current passes through the contact The operator overrated the fuse (human error)

Fuse

FMEA Table of Results FMEA and FMECA are normally bottom up processes that look at how component parts can affect the larger systems as defined in the system description and block diagram. It can therefore be particularly detailed and is normally applied to very high valued systems where failure (breakdown) causes major difficulties, such as aircraft and military combat equipment. 10.2 RCM

The purpose of Reliability Centred Maintenance (RCM) is to establish the nature and frequency of maintenance tasks to ensure a target (optimum) level of reliability at best cost. It evolved in the private airline industry primarily through the activities of the Maintenance Steering Group of the International Air Transport Association. The final report of the Maintenance Steering Group in 1980 titled MSG-3, provided the backbone of the logic processes contained in the referenced texts and RCM analysis (Moubray 1992). The RCM process asks eight basic questions: i) ii) iii) iv) v) vi) vii) which assets (significant items) are to be subject to the analysis process. what are the functions and associated performance criteria (accept/reject boundaries) of each asset in its operating context. in what manner does it cease to fulfil its listed functions (fault mode). what failure mechanism causes each loss of function (failure cause or fault). what is the outcome and impact (criticality) of each fault (effect). what maintenance tasks can be applied to prevent each fault (preventive maintenance). what action should be taken if effective tasks cannot be identified.

The main point of the RCM analysis is to select which maintenance regime is most appropriate.

10.4

Risk & Reliability Associates Pty Ltd

Bottom Up Techniques Until the mid 1970s items were seen as exhibiting a standard fault profile consisting of three separate characteristics. An infant mortality period due to quality of product faults. A useful life period with only random stress related faults. A wear out period due to increasingly rapid conditional deterioration resulting from use or environmental degradation. This is shown in the figure below. The consequence of such beliefs was that equipment was taken out of service and maintained at particular intervals, whether it was exhibiting signs of wear or not.
Failure Rate

Time Infant Mortality Useful Life Wear Out

Bathtub Fault Rate However, actuarial studies of aircraft equipment fault data conducted in the early 1970s identified a more complex relationship between age and the probability of fault (Moubray 1992).

Wear-in to Random Wear Out

4%

Random then Wear Out

2%

Steadily Increasing

5%

Inceasing during Wear-in and then Random

7%

89%

Random over measurable life

14 %

Wear-in then Random

68%
Fault Rate Curve Specifically, the bathtub curve was discovered to be one of the least common fault modes and that periodic maintenance increased the likelihood of fault. This led to the idea that the maintenance regime ought to be based on the reliability of the components and the required level of availability of the system as a whole.

Risk & Reliability Associates Pty Ltd

10.5

Bottom Up Techniques The figure below indicates the overall process:


Collect System Information Present Data Select component/assembly/sub-system Identify function

Identify Failure Modes and Effects

Yes
Assess Criticality Concealed or Evident Safety or Environmental or Service or Economic? Redesign?

Maintenance Plan

RCM Analysis Flow Chart Note that a concealed fault mode is of major significance when assessing criticality. As can be seen, the process is really a FMECA with a focus on maintenance outcomes. 10.3 HazOps

The Hazard and Operability (HazOp) Study technique was originally pioneered in the chemical industry (Tweeddale 1992). It has since been adapted into a wide range of industries. The essential features of a HazOp study are: * * * * It is systematic and detailed. A series of guidewords is repeatedly used to ensure consistency and repeatability. A team who know most about the project or facility, typically those who designed and those who must operate it conducts it. It concentrates on exploring the consequences of deviations from the usual operating conditions. It is an audit of the completed part of a design.

Traditionally the HazOp procedure examines process equipment on a system-by-system basis, reviewing the process parameters using a checklist of guidewords, which suggest deviations from the normal operating conditions. The consequences of a variation are assessed, as are the circumstances that might bring it about. If it is deemed to be of inconvenience then it is addressed by the workshop on the spot and a solution proposed for action. The technique seems to work because the key parties to the process are present: the designers and operators, the builders and maintainers or the contractor and contractee.

10.6

Risk & Reliability Associates Pty Ltd

Bottom Up Techniques The guidewords are tailored to suit the particular industry. They can be defined using the following conceptual deviations (Tweeddale 1992): * * * * * * * * * * too much of ... (speed, load, level, elapsed time, distance, vibration etc). not enough of ... (speed, load, level, elapsed time, distance, vibration etc). none of ... (speed, load, level, elapsed time, distance, vibration etc). part of ... (wrong composition, wrong component). opposite of ... (reverse direction). wrong timing of ... (starting or stopping too early or too late, wrong sequence). wrong direction of ... (to left or right, wrong setting of points etc). wrong location of ... (too high or low, too far or too short). poor performance of ... ( normal duty, testing etc). other than ...(whatever else can happen apart from normal operation, such as start-up, shut down, uprating, low rate operation, alternative mode of operation, maintenance etc).
Select a line

Move on to next deviation

Select deviation eg more flow

No

Is more flow possible?


Yes

Is it hazardous or does it prevent effecient operation?


Yes

No

Consider other causes of more flow

Consider and specify mechanisms for identification of deviation

No

Will the operator know there is more flow?


Yes

What change in plant or methods will prevent the deviation or make it less likely or protect against consequences?

Consider other changes or agree to accept hazard

Is the change likely to be cost effective?


Yes

No

Agree change(s) and who is responsible for action

Follow up to see action has been taken

HazOp Flow Process

Risk & Reliability Associates Pty Ltd

10.7

Bottom Up Techniques 10.3.1 Process Industry HazOps The chemical process industry usually focuses on the process and instrumentation drawings (P&IDs). The typical guidewords used are: Flow: Level: Temperature: Pressure: Reaction rate: Quality: Physical Damage: Control: Protection: leak, high, low, reverse, phase high, low high, low high, low, and vacuum fast, slow concentration, impurities, cross-contamination, side reactions, inspection and testing impact, dropping, vibration response speed, independence, testing

After these key deviations have been applied to the P& IDs, a further list of overview guidewords can then be applied. These include: Materials of Construction (corrosion, erosion etc), Services Needed (compressed air and the like), Commissioning, Start-up, Shutdown, Breakdown, Electrical Safety, Fire & Explosion, Toxicity, Environmental Control, Access, Testing, Safety Equipment, Output or Throughput and Efficiency. 10.3.2 HazOps Applied to Contracts Most breakdowns in a contracting out relationship arise from a lack of understanding of what elements of the relationship were truly important and susceptible to unrecognised threats. Such hazards, however, can be determined before the contract is entered into using a modified HazOp technique. Those who have watched various contracts coming unravelled will have noted the oft expressed sentiment that, Gee, I wish we had thought of that before we got into this thing. Obviously the HazOp technique described here may not predict all possible problems, but it has proved itself superior to one or two individuals from the contracting organisations sitting in different rooms trying to crystal ball the future and include it in the contract conditions, especially for a project that is large or unique in nature. It also has the added effect of ensuring the win-win nature of any contract as both parties to the contact are assessing the potential difficulties and mutually agreeing on solutions. This reduces the likelihood of subsequent accusations and conspiracy theories. Actual assessment figures can be included on a HazOp Item Data Sheet. Such data can be exported to spreadsheet reports for listing and ranking.

10.8

Risk & Reliability Associates Pty Ltd

Bottom Up Techniques In flow chart terms, the process is shown below with a sample HazOp Item Data Sheet following.

Select a key contract function

Select threat e.g. key contractor staff absence

Move on to next deviation

No

Can it occur? Yes Is it hazardous or does it prevent efficient operation? Yes No Consider other critical contract staff absence

Will change in contract advise of this?

No

Will the company know the absence has occurred?

What change in contractor methods will prevent the deviation or make it less likely or protect against consequences?

Consider other changes or agree to accept hazard

Is the cost of the change justified?

No

Yes

Agree to change(s) Agree who is responsible for action

Follow up to see action has been taken

HazOp Procedure Applied to Contracts

Risk & Reliability Associates Pty Ltd

10.9

Bottom Up Techniques R2A Hazard Item Data Sheet Identified Problem Location Client: New Company Project: VIP Product Line Location: 3 stand press Drawing number: 736.67, Rev 4, 12/05/00 Title: Hazard Item No 23 Present 14 March 1996 2.35pm

Design Engineer Maintenance Engineer Contractor Scribe/Secretary: Fred Gatt, R2A Facilitator/Chairman: Richard Robinson, R2A

Nature of Problem Guide word: Production line maintenance Threat deviation: Key maintenance contractor staff unavailability Possible Causes: Illness Consequences: Production interruptions due to slow inexperienced maintenance staff

Preliminary Solution Payback Assessment Event Frequency 0.5 pa Consequence severity $10,000 Solution effectiveness 99% Risk Saving $4,900 pa Proposal cost Commercial return Maintenance saving PR/Morale Benefit Risk Saving Total Investment payback Period $5,000 $1,000 pa ($1,000) pa $500 pa $4,900 pa $5,450 pa 0.92 yrs Sign Off Responsible Person: Design Engineer Follow up action: Price B/U machine, request contractor price to guarantee staff availability Date: 15 March 1996

Action Review contractor backup staff arrangements. Price back up machine. Choose between increasing contract price to have stand-by staff available or buy new parallel production equipment Payback assessment as for back up machine

By: Maintenance Engineer Status: Comments: Further work required

HazOp Item Data Sheet

10.10

Risk & Reliability Associates Pty Ltd

Bottom Up Techniques 10.4 Common Mode Failures

Bottom up techniques have difficulties with common cause and mode failures. This arises because the process is bottom up rather than top down. A detailed assessment from individual components or subsystems such as HazOp or FMECA examines how that component or sub-system can fail under normal operating conditions. It does not examine how a catastrophic failure elsewhere might affect this component or the others around it. Such knock on effects are attempted to be addressed in HazOps by a series of general questions after the detailed review is completed, but it nevertheless remains difficult to use a HazOp to determine credible worst-case scenarios. Examining systems designed to deal with common mode failures with RCM techniques is difficult too. An automatic sprinkler system, for example will only be called upon to operate quite rarely, perhaps once in a hundred years. But when it is required, a massive common mode failure for all the equipment in the fire-affected area will be occurring. Sprinklers systems are therefore quite tough. An RCM analysis will suggest that it requires little or no maintenance to remain in an effective operating condition. Nevertheless, they are checked regularly, to ensure that the fractional dead time becomes trivial (the time it is out of service). Sprinklers are in fact subject to latent failures such as stones in the piping or a restriction in the water supply. Unless tested, such a condition may well remain hidden until the sprinklers are called upon to act during a fire. This is obviously the worst possible time to discover the fault. Reliability analysis is conceptually focused at minimising breakdown failures to the 5% section shown in the diagram below. That is, what should be done to plant and equipment to ensure optimal availability and service at best cost. Risk analysis is targeted at minimising damage, injury and death and -4 consequential problems including legal implications, that is the 0.0001% (10 ) region in the diagram. Applying reliability analysis to failure (risk) problems can be a difficult concept since the intellectual focus of the group is different. In a sense, this is why reliability people are optimists and risk people pessimists.
5% RCM Reliability Focus 0.00001 % Risk focus (10E-5)

95% existing availability

Reliability vs Risk For example, a critical facility was recently built with two power grid connections, a gas turbine generator and several diesel generators any one of which was capable of running the entire plant. Power supply reliability was very high from a breakdown failure perspective, as the reliability designer intended. However, all this gear was put in a single machine hall and thus subject to a single fire event. This provided for a common mode risk failure. If a risk engineer had been involved in the design process, the different power supply devices would have been fire isolated from each other so that a fire in one or a gas explosion in the hall could not expose the others and knock out all power supplies. In the context of outsourcing, lawyers represent a most interesting form of common mode failure. The diagram below shows two arrangements. The first represents the lawyers acting as advocates whilst in the second, the two parties are communicating directly and the lawyers are documenters. From observation of the difficulties associated with a number of outsourcing contracts it appears difficult for Party 1 and Party 2 to have a clear and complete understanding of each others position when lawyers act as advocates, in effect passing pieces of paper under the door to each other. The second diagram indicates the approach that seems to be much more effective.

Risk & Reliability Associates Pty Ltd

10.11

Bottom Up Techniques

Party 1

Lawyer 1

Lawyer 2

Party 2

Lawyers acting as advocates

Party 1

Party 2

Lawyer 1

Lawyer 2

Lawyers acting as documenters 10.5 Risk Management and the Project Life Cycle

The role and way in which risk management is considered in a project life cycle varies depending on the stage it is at. This can be represented by the figure below.

Contract Management

Pre-Planning

Roll-out, transition or project management


HazOps,FMECAs, QRA, JSA etc Bottom up analysis

Operation and Maintenance

Vulnerability Assessments Top down analysis

Functional Definition/ Specification

Commissioning

Risk Techniques in Project Management A pre-planning approach uses top down analysis such as vulnerability assessments to identify possible risks facing a project and/or the organisation in general. Vulnerabilities identified, (assets coinciding with a threat) are documented and addressed appropriately with a risk reduction solution in mind. This process can be conducted before a project is commenced as a form of completeness check. Once the project has been commissioned, risk management forms part of the project management process. Bottom up analysis techniques such as Quantified Risk Assessment (QRA), Job Safety Assessment (JSA) and HazOp studies can be used to identify specific project risks. It is here that engineering, procurement and construction solutions can be implemented.

10.12

Risk & Reliability Associates Pty Ltd

Bottom Up Techniques Risk management processes should be ongoing to be effective. Once the project is completed risk management is incorporated into the project's operation and maintenance procedures. Periodic assessments of the project need to be conducted to keep the risk management status current and upto-date. This can be done using either top down or bottom up methods or a combination of the two. 10.6 Hazard and Critical Control Point (HACCP) Analysis

HACCP is a systematic, organised approach to identifying, evaluating and controlling safety hazards in a food process. It is used to develop and maintain a system, which minimises the risk of contaminants. It was apparently developed by NASA in the 1960's to help prevent food poisoning in astronauts. In many ways it appears as a top down vulnerability technique applied at a very low level in the sense that it identifies who is to be protected and from what. It then goes on to establish how. A critical control point is defined as any point or procedure in a specific food system where loss of control may result in an unacceptable health risk. Whereas a control point is a point where loss of control may result in failure to meet (non-critical) quality specifications. HACCP can be used both as corrective and preventative risk management options. Risks are identified and a management option is selected and implemented to control the risk. However, the aim is to prevent hazards at the earliest possible point in the food chain. HACCP involves the identification of acceptable risk standards appropriate to different types of food hazards and the procedures to ensure that the risks are kept within the limits set by those standards. Food safety risk can be divided into the following three categories: Microbiological Risks Escheria Coli Salmonella Listeria Monocytogenes Staphylococcus Clostridium Botulinum Chemical Risks Pesticide and herbicide residues Cleaning chemicals Heavy metal residues Allergens Physical Risks Glass Plastic Metal Wood etc There are seven principles to the HACCP technique: 1. 2. 3. 4. 5. 6. 7. Identify hazards Determine the critical control points Determine the critical limits for each control point Monitor the critical limits Identify corrective action procedures (corrective action requests or CARs) Establish records and control sheets Verify the HACCP plan

Risk & Reliability Associates Pty Ltd

10.13

Bottom Up Techniques REFERENCES Department of Defence (USA). A Procedure for a Failure Mode, Effects and Criticality Analysis. (MILSTD- 1629A), Washington DC. Moubray, John (1992). RCM II Reliability Centred Maintenance. Butterworth Heinemann Tweeddale Mark (2003). Managing Risk and Reliability of Process Plants. Gulf Professional Publishing which is a imprint of Elsevier Science (USA). READING British Standards Institution (1994). Reliability of Systems, Equipment and Components, Part 2: Guide to the Assessment of Reliability (BS 5760: Part 2) Blanchard B (1991). Systems Engineering Management. Wiley Interscience. Blanchard and Fabrycky, (1990). Systems Engineering and Analysis, 2nd Edition, Prentice Hall International. Chemical Industries Association (1977). A Guide to Hazard and Operability Studies. Department of Urban Affairs and Planning (1995). Hazardous Industry Planning Advisory Paper No. 8 Hazard and Operability Studies. HAZOP Guidelines. Department of Defense (USA) (1984). Electronic Reliability Design Handbook, (MIL-HDBK-338-1A), Washington DC. Department of Defense (USA). Reliability Prediction of Electronic Equipment (MIL-HDBK-217), Washington DC. Department of Defense (USA) (1986). Reliability Centred Maintenance Requirements of Naval Aircraft, Weapons Systems and Support Equipment. (MIL-STD-2173AS ), Washington DC. Kletz T A (1985). An Engineer's View of Human Error. IChemE, London. Kletz T A (1986). HAZOP & HAZAN Notes on the Identification and Assessment of Hazards. IChemE, London. Kletz T A (1985). Cheaper, Safer Plants or Wealth and Safety at Work . IChemE, London. Lees F P (1995). Loss Prevention in the Process Industries. 2nd Edition. Butterworth-Heinemann Ltd, Oxford, UK. (3 Volumes) Reason J (1990). Human Error. Cambridge University Press. Smith Anthony (1993) Reliability Centred Maintenance. McGraw Hill. Smith David J (1993). Reliability, Maintainability and Risk. Practical Methods for Engineers. Fourth Edition. Butterworth Heinemann, Oxford. Villemeur Alain (1992). Reliability, Maintainability and Safety Assessment. John Wiley & Sons.

10.14

Risk & Reliability Associates Pty Ltd

Generative Techniques

11.

Generative Risk Techniques

Generative technique is a term adopted from James Reasons work in the risk area (Reason, 1993). In terms of the paradigm model in this text (Section 2.9) it generally refers to the selective interview column. It has much to do with morale and the willingness of people to constructively speak up and for the organisation to respond positively. In a legal sense it provides assurance after the event that no one can say, I knew that but nobody listened. 11.1 James Reason et al

James Reason is an English psychologist who has written extensively on risk. In 1993 he suggested a 7-point rating scale for overall organizational risk control: i) ii) iii) iv) v) vi) vii) Pathological barest minimum industry safety practices Pathological / low reactivity one step ahead of regulators, some concern re adverse trends Worried / reactive anxious about a run of incidents or accidents Repair /routine sensitive to events, safety data collection /analysis but local repair only Repair / some proactivity wide range of auditing but "technocratic" remedial measures Reform / generative aware that engineering, selection, training not enough, looking for better Truly generative proactive measures in place, safety measures under continuous review, range of diagnostic/remedial measures being considered, not complacent or self-congratulatory, still afraid of the hazards.

Reason (1997) noted three types of risk models: The Person Model The Person Model is exemplified by the traditional occupational safety approach. The main emphasis are upon individual unsafe acts and personal injury accidents. It is usually policed by safety departments. The most widely used counter measures are 'fear appeal', unsafe act auditing, new procedures, training and selection. The Engineering Model The Engineering Model is system based and quantified where possible. Counter measures are engineered into the system using devices such as HazOps, FMECA's etc. Measures include quantified individual risk and societal risk. The Organisational Model The Organisational Model is allied to crisis management. Human error is a consequence and not a cause. Countermeasures aim at an 'informed culture'. Safety may be measured as quality. Audit systems can often be seen to favour one or more of these models.

Risk & Reliability Associates Pty Ltd

11.1

Generative Techniques Reason also notes three types of culture, each having particular characteristics: Pathological Culture
Dont want to know Messengers are 'shot' on arrival Responsibility is shirked Failure is punished or concealed New ideas actively discouraged

Bureaucratic Culture
May not find out Messengers are listened to if they arrive Responsibility is compartmentalised Failures lead to local repairs New ideas often present new problems

Generative Culture
Actively seek it Messengers are trained and rewarded Responsibility is shared Failures lead to far reaching reforms New ideas are welcomed

For Reason, an informed culture = a safety culture. It has the following components: a reporting culture, a just culture, a flexible culture and a learning culture. A Reporting Culture Disincentives Extra work Scepticism that anything constructive to prevent it will happen A desire to forget all about it Lack of trust and Fear of reprisals Incentives Indemnity against disciplinary proceedings Confidentiality or de-identification The separation of the agency or department collecting and analysing reports from those bodies with the authority to institute disciplinary proceedings and impose sanctions Rapid, useful, accessible and intelligible feedback to the reporting community Ease of making a report A Just Culture
Were the actions as intended? No Were safe operating procedures knowingy violated? Yes No Was adequate training, selection processes and expertise available and present? No Yes No No No Is there a history of unsafe acts? Yes Yes

Were the consequen ces as intended?

Were procedures available, workable, intelligible and correct? Yes

Yes

Sabotage, malevolent, damage etc.

Reckless violation.

System induced violation.

Negligent error.

System induced error.

Blameless error.

Diminishing culpability

A decision tree for determining the culpability of unsafe acts

11.2

Risk & Reliability Associates Pty Ltd

Generative Techniques A Flexible Culture A culture that favours face-to-face communication Work groups made up of divergent people (with shared values and assumptions) Able to shift from centralised control to decentralised mode in which the guidance of local operations depends largely on the professionalism of the first-line supervisors

A Learning Culture Observing (noticing, attending, heeding, tracking) Reflecting (analysing, interpreting, diagnosing) Creating (imagining, designing, planning) Acting (implementing, doing, testing)

Reason is not the only author to notice the importance of culture. Charles Hampden-Turner (1990) has a notion of virtuous and vicious circles, shown below.

the culture promotes an extreme formality

and a tendency for units to decentralize and deviate with the result that...

and an increasing centralization of authority

thereby precipitating considerable informal resistance and dissent

The Vicious Circle

the culture carefully notes what informal activity

that a centralized information system encourages

among the decentralized units is of most value to customers

and formalizes these into its regular operations, ensuring...

The Virtuous Circle

Risk & Reliability Associates Pty Ltd

11.3

Generative Techniques 11.2 Transparent Independent Rapid Risk Reporting

A number of organisations have developed transparent, independent-of-line-management rapid risk reporting systems. Such systems have two prime aims: i) To enable rapid reporting of matters like critical near misses that give individual employees a chill. A number of organisations have noted that just before something really serious happens someone somewhere in the organisation develops premonition which if promptly reported can prevent a disaster, and; To deal with issues that normal, day to day, line management systems have repeatedly failed to address. For example, remote monitoring systems that persistently fail despite the IT departments recurring efforts to sustain them. Rather than let frustrated employees develop hidden independent fixes outside of the ken of line management which can easily create latent conditions, one last risk communication system can be invoked.

ii)

One common approach is a weekly Red, Amber, Green (RAG) report. All employees should be able to access the RAG report to flag emergency risks, near miss / unsafe conditions and systemic failure. Typically this is by email to a central coordinator. The report is sent electronically to all managers and board members weekly. If a critical issue is identified that requires immediate attention then it is entered into the RAG report and identified as a 'red' risk. A review and/or investigation is then conducted to examine the extent of the problem resulting in the problem being actioned and moved to either the Amber (under review) or Green (fixed) section. Once it is Green it is deleted. If the emerging risk is ongoing, then the risk should be transferred from the RAG report to the usual risk register database for ongoing monitoring. Such a process is peculiarly open and powerful since it is routinely steps outside normal day-to-day line management decision-making and real alerts are gratefully acknowledged. It does not appear to be abused since false alarms are personally damaging and not repeated. 11.3 Generative Interview Techniques

This is a top down enquiry and judgement of unique organisations rather than a bottom up audit for deficiencies and castigation of variations for like organisations. The object is to delve sufficiently until evidence to sustain a judgement is transparently available to those who are concerned. (Enquiries should be positive and indicate future directions whereas audits are usually negative and suggest what ought not to be done). The diagram below shows a stylised picture of the corporate soup. Individuals have different levels of responsibility. For example, some are firmly grounded with direct responsibility for production and maintenance. Others work at the community interface surface with responsibilities that extend deep into the organisation as well as high into the community.

Community Interface Surface

Pathogens

Corporate Ocean

Vulnerabilities

Hazards

Grass Roots

Interview Depth

11.4

Risk & Reliability Associates Pty Ltd

Generative Techniques The idea is that a team interviews recognised 'good players' at each level of the organisation. If a commonality of problems and, more particularly, solutions are identified consistently from individuals at all levels then adopting such solutions would be fast and reliable. Other positive feedback loops should be created too. The process should be stimulating, educational and constructive. Good ideas from other parts of the organisation ought to be explained and views as to the desirability of implementation in other places sought. The following questionnaire has been used as a general basis for such an interview process. SAMPLE GENERATIVE INTERVIEW GUIDE OVERVIEW A. A.1 A.2 A.3 A.4 A.5 A.6 B. B.1 B.2 B.3 B.4 B.5 B.6 C. C.1 C.2 C.3 C.4 C.5 C.6 D. D.1 D.2 D.3 D.4 E. WHAT IS UNDERSTOOD BY RISK AND RISK MANAGEMENT? The purpose of the section is to obtain the interviewee's initial perception on risk management in the organisation. What is risk? (pure/business/speculative). What is risk management? (AS 4360 vs other concepts like assurance, quality etc) What risks are relevant to you? (Types, concerns etc). What risk management approaches do you currently use? How effective do you believe your risk management systems are? Are you familiar with the requirements of AS/NZ 4360? WHAT RISK/DEPENDABILITY/ASSURANCE MEASURES AND TECHNIQUES ARE IN USE? This section tests knowledge of formal risk related processes. What specific risk skills have you and/or your people been trained in? What makes you believe that when a (potential) emergency occurs your people will respond well? Have you or others attended courses in risk management? Do you have knowledge of the following techniques? Do you have knowledge of the following codes and standards? Do you have access to and does your staff use the library of past incidents? WHAT IS THE PRESENT RISK/SAFETY CULTURE? This section reflects the issue that systems must match cultures for optimum results. Is your culture risk pro-active? Does your section have a clear understanding of the organisation's aims? Do your people have a clear understanding of your section's aims? Do you feel there is a good active knowledge of past organisation risk failures? What are you measures of risk performance? Do you receive management feedback on risk performance? WHAT RISK INFORMATION SYSTEMS ARE IN PLACE? This is to test not only the types of risk information collected, but also how it is used and the overall integration of these systems. What are your claims management/insurance/legal response systems? How does your OSH&E function operate? How does the internal audit system function? Is the whole of life cost of risk available in the organisation information and planning systems? WHAT CHANGES WOULD YOU SUGGEST FOR RISK MANAGEMENT? This section is particularly focussed at what positive things could be done to enhance risk management in the subject organisation.

This is really a best practice focus by identifying success factors (what is being done well) and how this can be extended. It also embodies the recognition that all organisations are unique and that there are different ways of achieving success.

Risk & Reliability Associates Pty Ltd

11.5

Generative Techniques 11.4 Generative Solutions Technique

Hazard based approaches to risk focus on identifying problems, and how they should be controlled. Concepts such as ALARP (as low as reasonably practicable) are often used. Another approach is just to put up solutions, try them and see which work. Such an approach was used to develop the best way forward for Silver Fern Shipping (Kneller at al, 2002). A top down threat and vulnerability approach was initially adopted to determine primary issues with regards to potential fires with unmanned engine rooms for the Taiko and Kakariki following from fires in the Westralia and Helix. Such a review concluded (amongst other matters) that stopping all fires from starting is very difficult indeed. But it was also noted that fires in manned engine rooms were generally detected early and managed quickly. Such detection occurred via human sensory detection. In addition to sight and smell, a change in the sound pattern or altered vibrations can also provide early alert. That is, early detection was achieved by more than just typical fire detection systems. The engine room staff actually acted as environmental monitoring devices. This prompted speculation as to the best early detection system. No crisp answer was available. Much expensive research could be undertaken, but this would commit the organisation to an endless series of irresolvable what if problems and possibly an untested technology thereby sapping organisational resources and enthusiasm generally. It was also noted that the ships (marine) engineers received the greatest respect and pleasure from fixing problems and that if they had spare time at sea, there seemed to be an uncontrollable urge to 'fiddle' with things. In view of this a generative solutions approach was recommended. Basically the two ships chief engineers were each given a budget to buy detection equipment. This potentially included sniffers, cameras (thermal imaging & others), vibration monitors (torsional and longitudinal), sound and noise analysers and the like. For the next few months they fiddled and then returned to advise that which worked well on their ship. This was seen to be cheaper than hiring engineering consultants or researchers to attempt to determine a solution, which might or might not operate in a harsh marine environment. It was also constructive, agreeable and interesting for the crew.

REFERENCES Hampden-Turner C (1990). Corporate Culture, From Vicious to Virtuous Circles. Hutchinson Business Books Limited, Great Britain. Kneller A, R Robinson and D McCann (2002). A Fire Risk Assessment. Paper presented at the Pacific 2002 Conference. Darling Harbour, Sydney. Reason J (1993). Managing the Management Risk: New Approaches to Organisation Safety Chapter 1 of Reliability and Safety in Hazardous Work Systems: Approaches to Analysis and Design. Eds I Wilpert et al. Lawrence Erlbaum Associates Ltd, East Sussex. ISBN 0-86377-309-5. Reason J (1997). Managing the Risks of Organizational Accidents. Ashgate Publishing Limited. READING Reason J (1990). Human Error. Cambridge University Press.

11.6

Risk & Reliability Associates Pty Ltd

Mathematics

12.

Risk and Reliability Mathematics

This section is devoted to pure risk mathematics as used for technical and safety risk. Financial (market risk) mathematics which has both upside and downside components are described in Chapter 17.5. 12.1 Discrete Event Mathematics

Both risk and reliability engineers require an appreciation of how the probabilistic outcome of different independent events can be added together. This can be shown in several ways. The overall probability of at least one of two mutually independent systems operating successfully for a particular period of time can be shown as a form of a block diagram, that is:

Pr(A) or Pr(B) Pr(A or B) = Pr(A) + Pr (B) - Pr(A)*Pr(B)


Active Redundant System Block Diagram
(That is, both units are operating but only one needs to operate for success)

Note that a probability is a pure number between 0 and 1. So for the example above if each unit has a 50% chance of operating in the next hour then there will be a 75% of at least one operating in the next hour. That is: if Pr(A) = Pr(B) = 0.5 (or 50%) then Pr(A) or Pr(B) = 0.75 (or 75%) This can also be shown as a Venn diagram, below.

Pr(A)

Pr(B)

Pr(A) x Pr(B)

Probability of Occurrence of at Least One of Two Independent Events The total probability of at least one of the two independent events occurring simultaneously equals the combined area of the overlapping circles, that is, Pr(A) plus Pr(B) less Pr(A) x Pr(B).

Risk & Reliability Associates Pty Ltd

12.1

Mathematics 12.1.1 Systems in Series For a series block diagram shown below, the probability of occurrence of system success of all three independent components operating can be shown as:

Pr(A)

Pr(B)

Pr(C)

PR(Success) = Prs(A) x Prs (B) x Prs (C) where Prs (x) is the probability of success of the component Probability of Success of a Series Operating The probability of failure of this system can then be described as: Pr (failure) = 1 - Pr (success) Pr (failure) = 1 - {(1 - Prf (A)) x (1 - Prf (B)) x (1 - Prf (C))} 12.1.2 Systems in Parallel For a parallel block diagram, shown below, the probability of occurrence of system success of all three independent components operating can be shown as:

Pr(A) Success

Pr(B) Success

Pr(C) Success
Probability of Success of a Parallel Operating Pr(Success) = 1 {[1 Prs(A)] x [1 Prs (B)] x [1 Prs (C)]} Again, the probability of failure of this system can then be described as; Pr(Failure) = 1 - Pr(Success) Pr(Failure) = Prf(A) x Prf (B) x Prf (C) The mathematical equivalence of these formulae should be noted.

12.2

Risk & Reliability Associates Pty Ltd

Mathematics 12.1.3 Fault Trees & Block Diagrams Most risk and reliability analysis activity is done on an events per period (usually a month or a year), that is, a frequency basis. For project management, this may not be so. Usually the problem in question applies to a particular project. This means it has a "probability" (a pure number between 0 and 1) of occurrence for that project rather than any time basis. To get around this, the term "likelihood" is used as a general term in this text to describe a probability or a frequency or a combination of both. The relationships between probability of failure and success for OR and AND systems are shown in below.

Pr(A) Fails Pr(B) Fails A Fails "Swiss Cheese" B Fails Fault Tree OR Gate

OR Pr(A) Success

PR(B) Success Pr(B) Success

Series Block

Pr(A) Fails A Fails B Fails Pr(B) Fails Fault Tree AND Gate "Swiss Cheese"

&

Pr(A) Success Pr(B) Success Parallel Blocks

Venn, Fault Tree & Block Diagram Comparisons

Traffic Density

Radar Option

Separation/ Segregation

See and Avoid

Near Miss

Mid Air Collision

Series of Failure Required for a Mid Air Collision to Occur


(after Reason)

Risk & Reliability Associates Pty Ltd

12.3

Mathematics 12.2 Breakdown Failure Mathematics

Reliability is inextricably entwined with availability. If availability is thought of in terms of a repairable system being up and down then a number of concepts and terms can be simply defined.

Up state (acceptable) Up

Down state (unacceptable)

Down Time Time interval = t

Two State Availability Concept The time in the up state is related to reliability and the time to repair in the down state. MDT or Mean Down Time, that is, the average time the system is in a down state. MTTR or Mean Time To Repair, that is, the average time to restore the system to the up state. MTBF or Mean Time Between Failure, that is, the average up time. For a system where the breakdown failure rate is constant with respect to time (or random), the calculation of reliability is: where R=e =e
- t -t/MTBF

R is reliability t is mission time in hours = 1/MTBF and is the (average) failure rate per hour MTBF is mean time between breakdown failures in hours e = 2.718218(a constant) For example, if = 0.01 per hour (1 per 100 hours) and t = 10 hours then R = 0.9. That is, it has a 90% chance of operating continuously for that 10 hour period. Where the mission time equals the MTBF, the reliability formula reduces to: r = e = 0.368. This predicts that 37% of the population will survive until the MTBF. Note that unreliability = 1- e
-t -6 -7 -1

For t = 1 and very small (around 10 and 10 ) then: 1-e


-t

This is the point at which the reliability engineers fault rate becomes equivalent to the risk engineers failure frequency. Mathematically at least, it suggests that risk is a simplification of reliability. Unreliability (1 year) = per year

12.4

Risk & Reliability Associates Pty Ltd

Mathematics The table below summarises the different terminology sometimes used to describe availability. up to up to up to up to up to up to up to up to up to 30 1 5 10 30 45 1 2 10 secs downtime pa min downtime pa mins downtime pa mins downtime pa mins downtime pa mins downtime pa hr downtime pa hrs downtime pa hrs downtime pa is is is is is is is is is 99.999905% 99.999810% 99.999049% 99.998097% 99.994292% 99.991438% 99.988584% 99.977169% 99.885845% availability pa or availability pa availability pa or availability pa availability pa availability pa or availability pa availability pa availability pa or "6 nines" "5 nines"

"4 nines"

"3 nines"

Summary of Availability Numbers 12.3 State Theory Mathematics

State theory analysis considers the states in which a system can exist. An example of a multi-state system is shown below. This system can be in one of three possible states at any given time: S1 - Both units A & B are operating S2 - One unit, A or B, has ceased to operate but the other is still functioning S3 - Both units A & B cease to operate.

Multi State System One reason for this type of modelling is to take into account the decrease in reliability due to solo operation. That is, the load on the second unit may be greater than when both are operating implying that the breakdown rate of the system is higher once the first unit has failed. Breakdown and repair rates can have exponential, log normal and Weibull failure probabilities, each with respectively increasing analysis complexity. The simplest type is Markov analysis which assumes that these systems have a constant breakdown and repair rate. Monte Carlo simulation techniques are often necessary for models using the other breakdown failure distributions. The modelling is done by considering the system in its perfect state (S1) and defining all the other states in between, (S1+1 to Sn -1), to failure (Sn). These states can include degradation, maintenance and repair. The diagrams can be represented in different ways too. Consider the two units, A and B below, which are identical and have the same failure rate and repair rate.

Risk & Reliability Associates Pty Ltd

12.5

Mathematics

Sy stem States S1 S2 S3 A & B Failed Time


Multi State System State Diagram 12.3.1 Markov Analysis Consider a single unit which has three states such as a ball bearing: S1 - Bearing is in good working order S2 - Bearing is degenerating (increased vibration) S3 - Bearing has failed

A & B Operating A or B Failed

S1

S2

S3

Component State Diagram The last two states (S2, S3) can be reached in various ways; either by wearing out normally and leading to failure if not replaced (S1 to S2 to S3) or a catastrophic breakdown of the bearing due to the propagation of a hairline crack (S1 to S3). The system MTTF rate for an active redundancy system is shown in the equation below, where: Active Redundancy - two identical units: MTTF = 3 + 2 2 or 2 2 if >>

= 1/MTBF and is the (average) failure rate per hour = 1/MTTR and is the (average) repair rate per hour This equation is reached by doing an analysis of the system considering the probability of each state at any given time and then developing and solving a set of differential equations. Using the above multi state system and assuming the units are electrical generators with failure rates of -4 -2 1 x 10 per hr (100 failures per million hours) and repair rates of 2 x 10 repairs per hr (50 hrs (ave) per repair). If this system was in active redundancy then: MTTF = 3 + 2 2 3 x 10 + 2 x 10 -4 2 2 x (10 )
-4 -2

MTTF =

0.0203 -6 0.02 x 10

1,015,000 hrs

12.6

Risk & Reliability Associates Pty Ltd

Mathematics 12.4 Fractional Dead Time Mathematics

Fractional Dead Time (FDT) is the fraction of time that the equipment is dead (cannot operate properly). It is referred to as FDT because the failure of the equipment itself does not pose a threat until there is a realisation of another hazard, such as fire. The probability of the uncontrolled hazard (hence, the overall failure rate) can be determined through a simple AND gate argument:
HAZARD (Chances p.a.) & CONTROL DEAD (FDT) UNCONTROLLED HAZARD

AND Gate Argument For example, a fire detection system that is checked weekly and takes one hour to repair has a maximum dead time of one week and one hour. Assuming one equipment failure on average per year gives a maximum FDT of 0.01928 (169 hours per 8,760 hours). Similarly, equipment averaging 2 failures per year has a FDT of 0.03856. If the building typically experiences a fire once every 10 years (or, 0.1 chances p.a.), then the probability of an undetected fire is: 0.1 0.01928 = 0.001928 chances p.a. The occurrence of equipment failure can be estimated as the Mean Time Between Failure (MTBF). MTBF is the reciprocal of the equipment failure rate, with the above example having a MTBF of 1 year. It should be noted that the MTBF is characteristic of the equipment item, and is independent of the frequency of testing. Analogous to this is the Mean Time Between Hazard (MTBH), which is the reciprocal of the probability of the overall hazard (fire with no detection). In our example, the MTBH is 518 years (1/0.001928). If the system was checked once every year, it would have a MTBH of 10 years. These examples show the importance of checking equipment regularly, as the time between checks is usually much greater than the time required to repair the equipment. Given the reciprocal relationship between MTBH and FDT, using the worst-case scenario for FDT produces a minimum MTBH. In practice however, if the system fails randomly then, as an average, we could say it fails mid term between testing periods. For the above example, the FDT would be one half week plus one hour (85 hours per 8,760 hours), or 0.0097. Mean Time Between Hazard = 1/(0.1 0.0097) = 1,030 years Which calculation is a closer approximation to reality depends on the failure curve after testing. That is, if failure is most likely to occur immediately after the equipment goes on line after testing (often the case) rather than randomly, then the minimum Mean Time Between Hazard is probably prudent design assumption.

Risk & Reliability Associates Pty Ltd

12.7

Mathematics READINGS Finucane, Pinkney etc (1989). Reliability of Fire Protection and Detection Systems. Fire Safety Engineering - Proceedings of the 2nd Conference International Conference. Kirwin Barry (1994). A Guide to Practical Human Reliability Assessment. Taylor & Francis, London. Lees F P (1995). Loss Prevention in the Process Industries. 2nd Edition. Butterworth-Heinemann Ltd, Oxford, UK. (3 Volumes). Moubray, John (1992). RCM II Reliability Centred Maintenance. Butterworth Heinemann. Sherwin & Bossche (1993). The Reliability, Availability & Productiveness of Systems. Chapman & Hall, London. Smith David J (1993). Reliability, Maintainability and Risk. Practical Methods for Engineers. Fourth Edition. Butterworth Heinemann, Oxford. Smith Anthony (1993) Reliability Centred Maintenance. McGraw Hill. Villemeur Alain (1992). Reliability, Maintainability and Safety Assessment. John Wiley & Sons. Vinogradov Oleg (1991). Introduction to Mechanical Reliability: A Designers Approach. Hemisphere Publishing Corporation, New York.

12.8

Risk & Reliability Associates Pty Ltd

Process Industry and Consequence Modelling

13.
13.1

Process Industry Modelling


Safety Cases

With large and complex plants, the process of managing safety, health and environmental issues requires a formal management system. The formal approach adopted is usually referred to as a safety management system (SMS). An argument or case that the operation of a facility is performed with acceptable risks is often termed a safety case. There are parallels to business cases, which are usually drawn up to convince a financier that a business is viable (Redmill). The object of a business case is to ensure that all significant factors affecting the business have been identified and that appropriate measures are in place to maximise the positive factors and minimise the negative ones. It is usually the responsibility of the highest levels of management within the organisation. Accordingly, responsibility for failure of the business usually rests there too. A safety case is intended to provide the same assurance with respect to the safety of a system or facility. Again it is primarily the responsibility of the operating company, at its highest levels. The Victorian major hazards legislation, for example, requires that the CEO or the most senior company officer resident in the State of Victoria sign off the safety case.

Board

Safety Audit

Safety Management System

CEO

Business Management System

Financial Audit

Middle Management

Business Units

Idealised Safety Case Structure Once established, a safety case effectively manifests itself as a contract between an organisation and a regulator that permits the organisation to operate within defined limits in accordance with documented procedures. Compliance failure is a breach of contract. If damage to third parties, or death and injury occur due to such breaches then serious liabilities arise. Because of this, it appears to the authors that the legal system has converted the safety case concept to a liability management device. This means that an overriding consideration is that any safety case work be to the satisfaction of legal counsel. This is difficult if the safety task is assigned to technical 'experts' in isolation. An initial context definition is essential. What constitutes a safety case varies from industry to industry. The paradigm discussion from Chapter 2 is relevant. Based on a number of presentations made to various lawyers, those techniques and paradigms highlighted in the following table at least can be used in developing a safety case.

Risk & Reliability Associates Pty Ltd

13.1

Process Industry and Consequence Modelling

Technique>> Risk Management Paradigm 0. The rule of law 1. 2. Insurance approaches Asset based, 'bottom-up' approaches Threat based 'topdown' approaches Business (upside AND downside) approaches Solution based best practice approaches Biological, systemic mutual feedback loop paradigms Risk culture concepts

Expert reviews Yes (Legal opinions) Yes (Risk surveys, actuarial studies) Yes (QRA, availability & reliability audits) Difficult in isolation Yes (Actuarial studies) Difficult to be comprehensive Yes (Computer simulations) Yes (Quality audits)

Facilitated workshops Yes (Arbitration, moot courts) Yes (Risk profiling sessions) Yes (HazOps, FMECAs etc) Yes (SWOT & vulnerability) Difficult in isolation Difficult to be comprehensive Yes (Crisis simulations) Difficult

Selective interviews Yes (Royal Commissions) Yes (especially moral risk) Difficult

3. 4. 5. 6. 7.

Yes (Interviews) Yes (Fact finding tours) Yes (Fact finding tours) Difficult Yes (Interviews)

Risk Management Paradigm - Technique Matrix Each of the approaches in the cells above has particular strengths and weaknesses. They can be combined in different ways. This chapter will consider the ways in which the safety case arguments are developed in the process industries, especially with regard to quantitative risk assessment (QRA), the mainstay of safety cases in the process industry to date. It is also important to note that various state legislation call up the term 'safety case' especially in regard to major hazard industries. Because of this, smaller facilities may choose to undertake a similar process but use a different term to avoid legal entanglements. 13.2 Context (Top Down)

There are a number of methods by which the context and the depth of the technical study required can be assessed and explained. 13.2.1 Vulnerability Workshops

Top down techniques are generically described in Chapter 7. They include asset and threat assessments such as those used by military intelligence and other authorities. The key hazards are identified using a consequence assessment based on an Asset and Threat (or vulnerability) technique in a workshop with key design team personnel. This tends to focus on the worst possible outcomes irrespective of the cause or relative likelihood of such problems.

13.2

Risk & Reliability Associates Pty Ltd

Process Industry and Consequence Modelling Basically, the main assets are defined and then all the possible threats to them identified. The concept is that a hazard exists when an asset is actually vulnerable to a threat. The main asset groups usually include: People (especially off site persons like pedestrians, people in vehicles, neighbours, the wider community, and visitors, employees and contractors, emergency services) Environment (habitat) Operability (business continuity) Property (third party and company property)

Threats are typically energy based in the first instance, for example: Chemical Energy (including fire, explosion, BLEVE, toxic cloud, vapour cloud explosion) Kinetic Energy (including impact of cars, trucks, projectiles due to exploding 200 l drums etc) Potential Energy (including landslides, collapsing structure, falling objects, dam failure) Environmental (including storm, wind, hail, lighting, floods)

Phase 1 Context Definition & Legal Sign Off

Vulnerability (Context) Workshop - Group Session - Best Available Knowledge - Completeness Check - Representative Scenario Identification - Corporate Legal Sign off

Phase 2 Technical Study Safety Case (WorkCover Regulations) Fire Safety Study (NSW Dept. of Planning HIPAP 2) OH&S - Manual Handling - Machine Guarding

- Model Scenario Impacts - High Consequence - Preventative Measures - Low Frequency - Protection Measures - Control Measures - Capability of Resources - Safety Management System - Emergency Planning Threat and Vulnerability Approach

Depending on the outcome, the need for further detailed studies can be decided. 13.2.2 Tiered Approach

A three-stage process, consistent with the National Code of Practice for the Control of Major Hazard Facilities (NOHSC:2016:1996) provides a tiered approach for a risk review. It suggests that the following types and combinations of risk assessments be considered: A broad qualitative hazard analysis; A semi-quantitative hazard consequence evaluation to determine hazard effects; or A quantitative risk assessment

Risk & Reliability Associates Pty Ltd

13.3

Process Industry and Consequence Modelling

RISK REVIEW METHODS DEFINE SCOPE OF RISK REVIEW Determine proposed risk review process, methodology and criteria for levels I, II, III with relevant public authority Checklists 'What if' Analysis Reactive Chemicals Review Consolidated Audit Technology Review Insurance Inspections

IDENTIFY OPPORTUNITIES TO REDUCE RISK AND REVISE SYSTEM

CONDUCT PRELIMINARY HAZARD ANALYSIS

EXCEEDS LEVEL 1 Criteria

CONDUCT RISK EVALUATION

HazOps, FMECAs, Zonal Vulnerability Analyses Consequence Analysis Consequence Classification Likelihood Assessment (Qualitative)

EXCEEDS LEVEL II Criteria

CONDUCT QUANTITIVE RISK ASSESSMENT

QRA Cause Consequence Modelling (Quantitative) Escalation & Propagation Scenario Assessment

HIGH LEVEL REVIEW OF ACTIVITY WITH RELEVANT PUBLIC AUTHORITY

YES

EXCEEDS LEVEL III Criteria

MANAGE RESIDUAL RISK

IEC 61508 Criteria

Multilevel Risk Review The National Code of Practice for the Control of Major Hazard Facilities gives an example of the Multilevel Risk Review Process used by Dow Chemical Limited (adapted by R2A), which is shown below. A similar approach is followed in the New South Wales Department of Urban Affairs and Planning's "Multi-Level Risk Assessment" guidelines (1997). The tiered approach of the multilevel risk review is structured so that if the preliminary studies do not find that there are significant offsite risks, then detailed studies such as quantitative risk assessments may not be necessary.

13.4

Risk & Reliability Associates Pty Ltd

Process Industry and Consequence Modelling 13.3 Quantitative Risk Assessment (QRA)

13.3.1 Concept The figure below summarises an individual risk plotting process. This is a preliminary individual risk diagram of a LPG tank at a service station. Known hazards include a relief valve fire on the tank itself, a relief valve fire on the truck that fills it, major leak valve fires and a tank rupture with resulting vapour cloud explosion. Each has a different likelihood of occurrence and a different consequence severity as well as a different location and hazard radius.

Chances in a million per year Events and Frequencies Tank Relief Valve Fire (17x10 -6 pa) Tanker Relief Valve Fire (10x10 -6pa) Major Leak Fire (7x10 -6 pa) Tank Rupture Explosion (3x10 -6 pa) 40 30 20 10 0

Risk = 3 x 10 pa Risk = 10 x 10 pa Risk = 20 x 10 pa Risk =37 x 10


-6 -6 -6

-6

pa

Site Boundary
Individual risk plot for a LPG Tank (plan is a 10m grid) The likelihood of each event occurring is shown in chances per year. Each circle represents the region in which an unprotected standing person is likely to be killed if a particular event eventuates. So if the sum of all the event frequencies per year is calculated at a point, the likelihood of killing an individual standing at that spot continuously for one year is known. Having added up the cumulative risk at different locations, it is then possible to plot iso-risk contours and compare these to the land use planning criteria described later in this chapter to determine the acceptability/unacceptability of the facility or operation in question. Individual risk is the risk that an individual would face from a facility if they remained fixed at one spot 24 hours a day 365.25 days per year, the so called tethered person. This effectively relates to an individual such as a toddler or elderly adult who has limited mobility and may be expected to be present at a residential location for much of the time.

Risk & Reliability Associates Pty Ltd

13.5

Process Industry and Consequence Modelling The generic steps for the QRA procedure for the risk assessment process hazards are: a) b) c) d) e) Context and Scope Credible Threat (Hazard) Identification Likelihood Assessment Consequence Assessment Risk Assessment (combining c & d)

The five key stages of the QRA process are expanded in the following sections. 13.3.2 Credible Threat (Hazard) Identification Credible threat (hazard) identification is the stage where materials, equipment and operations that have the potential to do harm are identified. Threats can include the storage or processing of hazardous substances and operations where error can result in the release of hazardous material or damaging energy. There are a number of generic techniques that can be used to perform a well documented and systematic threat (hazard) identification. Some of these techniques are discussed in Chapters 7, 9 and 10. Chief amongst these are: Top Down * Threat and Vulnerability Assessments (can be done on a geographic or zonal basis). * Tiered Approach (Section 13.2.2) Bottom Up * Fault Mode Effects & Criticality Analyses (FMECA) * Hazard and Operability Studies (HazOps) 13.3.3 Likelihood Assessment When all threats (hazards) have been identified the frequency of their occurrence is estimated, usually by consideration of relevant historical data. For the process industries the initial incident usually involves a loss of containment of some sort, typically a leak. Hence the most common failure modes are various hole sizes producing different sized leaks. R2A like to use the term Hazardous Event for the initial incident, as at this point there is the chance that no harm will eventuate. Hazards can have a variable number of potential failure modes. For example, piping sections have an infinite spectrum of potential hole sizes and resultant release rates. In order to deal with this the failure modes (hole sizes) of the equipment making up the hazard are broadly categorised in a number of discrete groups, such as pinhole, hole, and rupture. The number of discrete groups used to classify potential releases is dependent on the sensitivity of the overall risk results to this grouping, the nature of available historical failure rate data, and the need to constrain the analysis from becoming overly complex. With the failure modes of a hazard categorised, all components contributing to each failure mode are identified. The process of how the failure rate of various components is aggregated into an overall failure rate is shown in the next figure.
Potential Failure Components Piping Flanges Valves etc OR Hazardous Event Failure Mode

Minor Leak

Time

Fault tree showing logical combination of component failures 13.6 Risk & Reliability Associates Pty Ltd

Process Industry and Consequence Modelling The process described here has been systematically expressed as the R2A computer based system of work as follows. Process and instrumentation diagrams (P&IDs) are imported as images into the R2A system. Identified hazards are separated into isolatable sections containing common failure modes (pipes or vessels). Intelligent computer 'objects' representing all valves, flanges, vessels, pumps, pipework etc are overlaid on the P&ID. These (potential) failure items are linked to a failure rate database.

Each isolated section is aware of failure items associated with it. Thus the failure rates for the range of hole sizes deemed appropriate for the section can be aggregated. Up to 4 hole sizes are selected to represent the spectrum of failure hole sizes possible for the process section under consideration. 13.3.4 Consequence Assessment Incident Outcome Determination Having established the range of failure modes to be considered for each hazard, the next stage of the analysis is to determine the range of possible outcomes for each failure mode. This is dependent on the existence and implementation of mitigation measures (automatic or manual detection & isolation), and on the potential for event escalation (for example, ignition of flammable material). A useful method for representing the time sequence of events and the possible outcomes following a release is an event (outcome) tree analysis. The event tree starts at the hazardous event, which is one of the failure modes of the hazard in question. The tree branches, with each fragmentation representing an intermediate event such as early ignition of a flammable release. Each branch is assigned a probability, with the ends of the tree representing the probabilistic distribution of all potential outcomes. The figure below shows an extension of the fault tree shown in section 13.3.3 Likelihood Assessment. It includes the fault tree as well as an event tree and hence becomes a cause-consequence diagram.
Threat (Hazard) Components Piping Flanges Valves etc OR Hazardous Event Failure Mode (Loss of Control) Intermediate Events Yes Minor Leak Yes No No Rapid Isolation? Large release Medium release Outcomes Small release

Delayed Isolation? Time

Cause Consequence Diagram The intermediate events that can cause a permutation of outcomes can be release intervention strategies such as: * * automatic detection and isolation equipment, manual detection and isolation equipment,

or factors effecting the nature of a release such as: * * * early or delayed ignition of flammable releases, rainout of a two phase release, high obstacle density providing the potential for the detonation of a release, presence of bunding or drainage for liquid releases. 13.7

Risk & Reliability Associates Pty Ltd

Process Industry and Consequence Modelling Each of the intermediate events is predetermined to occur at a nominated time, and in a specific time order, with changes to the time order influencing the potential outcomes. As timing can also affect the size of a release, the analysis can also demonstrate how the performance of mitigation and control equipment will affect the overall risk result. The conditional probability of intervention strategies can be determined from reliability data of the components making up the system. For intervention and detection equipment that fails in a hidden manner, fractional dead time analysis can provide conditional probabilities that the equipment is in a failed state when called upon (refer section 12.4). Fractional dead time is dependent on the testing period of the equipment, which means another performance measure can be included in the risk model. Using event trees to show the time order of potential intermediate events following an initial release is a useful way of exploring the range of possible outcomes. For a simple plant where the number of possible intermediate events will be small, choosing a fixed time order is reasonable. For a complex and congested plant, the number of intermediate events will be large, and determining the time order of these events with certainty becomes impossible. In these cases more complex models are required which consider all possible permutations of the time order of intermediate events. Impact Quantification Event trees establish the size of potential releases and their probabilistic consequence scenarios. Scenarios resultant from a flammable release that have an impact include: * * * * * * Fireballs or BLEVEs (Boiling Liquid Expanding Vapour Cloud Explosions) Flash Fires Vapour Cloud Explosions Pool Fires Jet Fires Projectiles (especially 200 l drums of flammable liquid).

Releases of toxic materials can have wide ranging impacts as toxic clouds. The severity of impact that can result from these consequence scenarios can be quantified in terms of: * * * * Heat Radiation for Fireballs, Pool Fires and Jet Fires; Explosion Overpressure for Vapour Cloud Explosions; Flammable Concentrations for Flash Fires & Toxic Load or dose for Toxic clouds.

In order to determine the extent of the impact of the consequence scenarios a model or combination of models is required for each type of consequence. The modelling of the impact of accidental releases of hazardous materials is an extensive subject, discussed briefly in this chapter. Probit Equations To quantify the risk of fatality or injury following a hazardous release, a dose response relationship is required. Probit equations are particularly useful for heat radiation or toxic releases, where a sustained low level exposure can be equally as fatal as an instantaneous high level exposure. Probit equations are usually written in the form: Y = A+ Bln(hazardous load) The probit, Y is a random variable with a mean of 5, and a variance of 1 (for example, Y=5 corresponds to a 50% chance of fatality). Probit equations for exposure to thermal radiation and toxic gas are expanded later in the chapter.

13.8

Risk & Reliability Associates Pty Ltd

Process Industry and Consequence Modelling 13.3.5 Risk Assessment Risks to the life and safety of people on and off site can be measured in a number of ways, some of the more common are: * * * * Individual Risk, Societal or Group Risk, Potential Loss of Life (PLL), and Other Criteria, for example TLS (Target Level of Safety) for rare maintenance events).

Individual risk and societal risk are discussed in Chapter 6. Individual risk is the risk that an individual would face from a facility if they remained fixed at one spot 24 hours a day 365.25 days per year. Its value is a frequency of fatality, usually chances per million per year, and it is displayed as a 2 dimensional plot over a locality plan as contours of iso-risk. The fact that the values are for fixed targets is not always made clear, as it may be assumed that some individuals have the potential to only be present periodically. The figure below shows a simplified example of an individual risk plot.
-7 1 x 10 -6 1 x 10 1 x 10 -5

Site Boundary

Simplified Individual Risk Plot (numbers are fatality frequency per year) Societal Risk is a measure of the frequency (F) of fatalities of various numbers (N) of the community for a particular hazard. This is represented as a curve on log axes, which is called an FN curve. The curve is cumulative in terms of frequency, as if there have been 10 fatalities there has also been 9, 8, 7 etc. Societal risk is designed to display how risks vary with changing levels of severity. For example a hazard may have an acceptable level of risk for just one fatality, but may be at an unacceptable level for 10 fatalities. The figure below shows a simplified example of a societal risk plot.
10 -3 Netherland Unacceptable Limit

10 -4

Frequency of N or more fatalities per year

10 -5

10

-6

10 -7

-8 10 1

Netherland Acceptable Limit 10 100 1000 Number of Fatalities (N)

Societal Risk Plot (or FN Curve)

Risk & Reliability Associates Pty Ltd

13.9

Process Industry and Consequence Modelling The data from a societal risk plot can also be used to determine the PLL (probable life loss). This is basically the sum of the product of each FN pair. The result is a single number, which represents the expected number of fatalities per year. Whereas individual risk uses the "tethered person" approach, societal risk (and hence potential loss of life) is more flexible in terms of the habits of the population. Factors such as variable population densities during the day and protective measures installed can be taken into account when determining the number of fatalities. Traditionally QRA for the petroleum and chemical industry is required to produce results as both individual risk and societal risk plots. This allows a comparison against regulatory risk criteria and facilitates the assessment of available risk control options. Typically a QRA uses a facilitys stable, year to year operating mode. However, the risks associated with construction and commissioning provide for possible increased risk at that particular time. Annualising these risks in the QRA may not be wholly relevant since the precautions that are taken during normal operation may be expected to be different during construction. In practice, some form of Not Less Safe (NLS) or common law criterion is often applied. The NLS criterion is essentially a question of the form, "What should be done during these potentially higher risk periods to ensure that the risk to people (the public and workers) remains not greater than the risk during normal operation". The QRA and the application of the Individual and Societal Risk criteria then become the base case to which any special process such as construction may be compared. The common law criteria are final arbiters, which extend beyond all of the above and directly address causation, foreseeability, preventability and reasonableness. They really considers the question, "Is there any practicable good precaution, which should be applied?" This tests to see if there is a simple risk control available at minimal cost that should be applied irrespective of any formal QRA type criteria. 13.3.6 QRA Difficulties Unreality Quantitative risk analysis is all about finding out what things must conspire together to bring about a serious problem, assessing which of these has the greatest importance in the hazard, and suggesting that such items be the primary focus of risk management. It often deals with absurdly small numbers and statistics, which can often lead observers to question the validity of the approach. One important factor in the outcome is the failure data used. Often an analyst is forced to use failure data for 30 year old facilities simply because it is widely accepted in the field as being the most reliable, whereas more modern data is less certain. A possible answer is that whilst it is not an exact description of reality, it can be the best available to date so that until another better method is developed it should be used to demonstrate due diligence. Not Reproducible There are arguments that the results of QRA are best used to compare the relative safety of different systems and not look at the absolute magnitude of the risk in relation to risk criteria. Whilst relative risk may be useful for designers to choose an optimum design, it does not address the public and hence the regulators concern of the level of risk a facility presents beyond its site boundary. However, the use of alternative failure rate data and consequence models can also provide different results for analyses conducted on the same plant. Standardised failure data and methodologies would also address some of the differences between QRA results that can arise between studies carried out by different analysts on similar facilities. A major limitation of quantitative risk assessment (QRA) is that it relies on the application of generic data where no specific data is available, in particular for pipeline failure rates. This does not take into consideration improvements in manufacturing and monitoring standards, or the possibility that local systems are superior to world standard. Failure rates also do not take into account land use. For example, third party pipeline damage is far more likely in a rural area, than in a major city street.

13.10

Risk & Reliability Associates Pty Ltd

Process Industry and Consequence Modelling QRA is a methodology widely used in the process industry, where risk is localised, and can often be contained within the site boundaries. "Black box" QRA approaches contain value judgements that are not made explicit and that the wide range of parameters is beset by uncertainty. A more transparent approach seeks to exemplify the source, range and application of assumptions, so as to provide decision makers with the best possible information at the time the decision is made. Expense The expense of QRA is also of concern. Multilevel risk reduction ideas are being used as previously described in Section 13.2.2, 13.3. Regulatory authorities are increasingly adopting these to reduce the cost burden on industry. 13.4 Fire Modelling

13.4.1 Finite Element Modelling Thermal impacts are quantified in terms of radiative heat flux (kW/m ), which is the main form of damaging energy, provided there in no direct flame impingement. The models used to calculate heat flux represent the flame as a solid surface that is treated as a grey body radiator. An average radiative heat flux is assigned to the surface (SEP), with view factors (F) and atmospheric transmissivity () used to determine the proportion of the heat incident at a specific location:
2

I = F SEP
Finite element analysis breaks the surface of a flame and the target down into a number of planar surfaces, and aggregates the heat flux contribution from all fire elements on all receptor elements:

Finite Element Calculation of a Tank on Fire

Risk & Reliability Associates Pty Ltd

13.11

Process Industry and Consequence Modelling 13.4.2 View Factors For each element receiving radiation, a live can be drawn to each element emitting radiation. The normal vectors to the elements and the line form angles 1 and 2. If either of these angles is equal to, or greater than 90, the elements cannot see one another, and the view factor is zero. The view factor between two differential elements can be expressed as:

Fd1 d 2 =

cos 1 cos 2 r 2

View Factors 2.4.3 Effects of Thermal Radiation

In order to predict the number of fatalities resulting from jet fires impacts, a relationship is required between heat radiation and fatalities. Morbid statistics for lethality resulting from heat radiation do exist, primarily coming from measurements from WW2 and military research. A combination of the significant levels of heat radiation follows according to the sources quoted by Lees (1996):
Heat Radiation 2 1.2 kW/m 2 2.1 kW/m 2 4.7 kW/m 2 12.6 kW/m 23 kW/m
2

35 kW/m

Effect Received from the sun at noon in summer Minimum to cause pain after 1 minute Will cause pain in 15-20 seconds and injury after 30 seconds exposure * Significant chance of fatality for extended exposure. * Thin steel with insulation on the side away from the fire may reach thermal stress level high enough to cause structural failure. * Likely fatality for extended exposure and chance of fatality for instantaneous exposure. * Spontaneous ignition of wood after long exposure. * Unprotected steel will reach thermal stress temperatures, which can cause failure. * Cellulosic material will pilot ignite within one minutes exposure. * Significant chance of fatality for people exposed instantaneously.

Heat Radiation Values (after HIPAP No 4:1992) 2.4.4 Thermal Radiation Fatality Probits
2

Thermal dose is typically expressed as a combination of the thermal radiation intensity I (W/m ), and the exposure time t (seconds). The model proposed by Eisenberg, Lynch and Breeding (See Lees 1996) determines a probit value, which is a normally distributed variable with mean 5 and variance 1 (so a value of 5 represents a 50% chance of fatality). The model proposed by Lees relates thermal load to burn depth, and then uses a correlation between burn depth and mortality determined by Hymes (See Lees 1996). Typically, 90 seconds exposure to a heat flux level of 12.6kW/m2 results in a fatality probability of around 50%.

13.12

Risk & Reliability Associates Pty Ltd

Process Industry and Consequence Modelling 13.5 Pool Fires

In order to calculate the heat radiated from a fire, it is first necessary to determine the size and shape of the flame. For pool fires the flame can be represented as a tilted cylinder. The parameters used to define the flame shape for the case of a tilted cylinder are presented in the figure below.

Flame Tilt

Flame

Flame Length

Pool Diameter Dragged Diameter

Parameters Defining Pool Fire Shape 13.5.1 Flame Dimensions The physical dimensions of pool fires including flame tilt, dragged pool diameter and flame length are dependant of the properties of the material (mass burning rate, vapour density), and on environmental factors (wind speed, air temperature, humidity). Pool diameter is often based on physical constraints such as bund dimensions. Flame height is only constrained in particular scenarios such as tunnel fires. Numerous models are available based on experimental observations for a large range of materials and pool sizes. R2A use the following correlations available from the SFPE Handbook of Fire Protection Engineering (1995): Flame Tilt: Dragged Diameter: Flame Length: 13.5.2 Surface Emissive Power The surface emissive pool diameter and physical properties of the burning product. Experimental data indicates that larger pool fires have a lower surface emissive power, due in part to a loss in combustion efficiency in larger fires. Smoke and soot particles also reduce the surface emissive power of pool fires, 2 2 with soot having a SEP of around 20kW/m , and clean flame around 140kW/m . Typical averaged 2 SEPs are in the order of 25-90kW/m . American Gas Association Welker & Sliepcevich Thomas Equation

Risk & Reliability Associates Pty Ltd

13.13

Process Industry and Consequence Modelling 13.6 Jet Flames

Jet fires can liberate large amounts of energy. According to Chamberlain a release rate of 100kg/s over a few seconds would produce a flame about 65m long in moderate winds and release some 5000MW of combustive power which is more than two and a half times the output of Loy Yang power station. The model developed by G.A. Chamberlain (1987), of Shell assumes that the surface of the flame can be treated as a frustum for the purpose of calculating the Surface Emissivity Power (SEP). The dimensions of the flame can be defined in terms of the flame lift-off, tilt, length, frustum length, base width & tip width:

Jet Flame Frustum 13.6.1 Release Rates Gaseous release rates are calculated using an analytical solution assuming adiabatic flow of gas leaving an orifice. Different relationships are used if the flow is "choked" (critical) or "un-choked" (subcritical). Under choked flow, the gas exits the pipeline at greater than atmospheric pressure, and continues to expand downstream of the release. For full bore ruptures, choked flow occurs when sonic velocity is achieved, which is the maximum possible velocity in the pipe. The calculation used gives a good estimation for the release rate of a gas leaving an orifice, but as hole sizes approach the pipeline diameter the calculation begins to over predict the release rates. This makes the analysis somewhat conservative. The following graph shows how the release rate drops as a function of pipeline length for a 100mm diameter pipeline rupture at transmission pressure:
30 25

Release rate (kg/s)

20 15 10 5 0 0 200 400 600 800 1000

Distance along pipeline (m)

13.14

Risk & Reliability Associates Pty Ltd

Process Industry and Consequence Modelling 13.6.2 Surface Emissive Power The net heat release rate of a flame, Q (kW) is simply the product of the heat of combustion (H c) of the gas (kJ/kg), and the rate of gas release (kg/s). Jet flames have a much higher surface emissive power than pool fires, owing to the more efficient combustion as a result of turbulent gas flow. The fraction of the total heat that is radiated is a function of the gas jet velocity (u), and can be determined from the following expression: Fr=0.21 exp(-0.00323u) + 0.11 Typically the emissivities of jet flames are in the order of 100-400kW/m . 13.7 Explosions
2

The energy released in an explosion is normally due to stored chemical energy, fluid expansion energy or vessel strain energy. For all explosion types, the energy released is equal to the work done by the expansion of gas from its initial to its final state:

W = 1 PdV
The effects of an explosion are determined using a scaling law, and an equivalent number of tonnes of TNT (W). For a particular criterion, the scaled distance (z) is determined, which can then be used to find the actual distance (r) to the overpressure using the following formula:

r = zW
13.7.1 Scaled Distance

1 3

The scaling is a function of the overpressure, and is usually determined from a graph based on empirical studies. The following chart for vapour cloud explosions is based on the equation in "Major nd Industrial Hazards technical papers" from the Warren Centre, University of Sydney, sourced from the 2 report UK Advisory Committee on Major Hazards:
900 800 700

Scaled Distance

600 500 400 300 200 100 0 0 10 20 30 40 50 60 70

OverpRessure (kPa)

Risk & Reliability Associates Pty Ltd

13.15

Process Industry and Consequence Modelling 13.7.2 TNT Equivalence The equivalent quantity of TNT is calculated based on a heat of combustion of 4600kJ/kg. For vapour cloud explosions, energy release is based on complete combustion of the explosive material. In determining the equivalent mass of TNT, a yield factor is applied. Energy in the blast wave of an explosion is generally a small fraction of that theoretically available, with kinetic energy of shrapnel, potential energy in products, and residual energy in air also occurring. Typically, 1-10% of the available energy of an explosion is in the blast wave. The yield of the Flixborough explosion in which 30-40 metric tonnes of cyclohexane were released was estimated to be 4-5%. 13.7.3 Effects of Explosive Overpressure The following table outlines the typical observable effects of explosive overpressures.
Explosion Overpressure 3.5 kPa (0.5 psi) 7 kPa (1 psi) 14 kPa (2 psi) 21 kPa (3 psi) 35 kPa (5 psi) 70 kPa (10 psi) Effect * 90% glass breakage. * No fatality and very low probability of injury. * Damage to internal partitions and joinery can be repaired. * Probability of injury is 10%. No fatality. * House uninhabitable and badly cracked. * Reinforced structures distort. * 20% chance of fatality to a person in a building. * 50% chance of fatality for a person in a building and 15% chance of fatality for a person in the open. * Threshold for lung damage. * 100% chance of fatality for a person in a building or in the open. * Complete demolition of house.

Some Effects of Explosion Overpressure (after HIPAP No 4:1992) 13.8 Toxic Gas Clouds

Many calculation intensive computer programs exist to determine the toxic "footprint" as a function of time in the event of a release of a heavier than air toxic gas. Major factors affecting the impact of such releases are discussed below. 13.8.1 Release Type The manner in which a material is released will have a large bearing on the toxic cloud footprint. Sudden releases of liquefied gas tend to result in result in a large initial cloud due to aerosol particles and flashing liquid, which will rapidly drop back to a steady state size. Continuous releases will take longer to achieve a maximum cloud size, which is often the same size as the steady state cloud formed by a sudden release. The steady state cloud size is limited by the rate of mass transport from the liquid pool. This is influenced by factors such as heat transfer from the ground, solar radiation levels, and the surface area of the pool (which can be limited by bunding). For gaseous releases, high pressure causes forced mixing of air and gas, resulting in a long narrow plume. Lower pressure releases tend to be wider as natural dispersion is more influential. For an equivalent release rate, low pressure scenarios are likely to have more far reaching impacts. 13.8.2 Meteorological Data Atmospheric stability characterises the conditions of convective heat and mass transfer within the atmospheric boundary layer. This will influence both the rate at which liquid chlorine will evaporate from a pool, and disperse from the toxic gas cloud. Pasquill stability is determined based on the wind speed and solar radiation levels (or at night, cloud cover). The table below outlines factors used to determine atmospheric stability:

13.16

Risk & Reliability Associates Pty Ltd

Process Industry and Consequence Modelling

Wind Speed (m/s) <2 2-3 3-5 5-6 >6

Day Solar Radiation Strong Moderate Slight A A-B B A-B B-C C B B-C C C C-D D C D D

Cloud <0.5 F F E D D

Night Cover Fraction 0.5-0.8 >0.8 E D-E E D-E D D D D D D

Atmospheric Stability 13.8.3 Surface Roughness Effective surface roughness (in metres) characterises the ground conditions over which a plume will travel. Surface roughness generally varies between 0.005 and 1.5m, with the lower end representing a surface such as a spill over water, and the upper end forested or built up urban areas. Increased surface roughness reduces the impact area of toxic clouds. 13.8.4 Probit Relationships Probit equations for toxic exposure take that same form as that for heat radiation exposure used by Eisenberg, Lynch and Breeding: Y = A+ Bln(toxic load) Toxic load or dose are interchangeable terms for the integration over time (t) of the concentration of a toxic substance (C), raised to a power termed the dose exponent (n).

toxic load = C n dt
The dose exponent has the effect of placing greater emphasis on acute exposures (high concentration over a short time) than chronic exposure (low concentration over a sustained period). Toxic load is expressed in terms of concentration (in ppm) with respect to time (minutes). Typical probit equation constants for chlorine exposure (sourced from Lees) are:
Probit Equation Eisenberg, Breeding & Lynch Perry & Articola Rijnmond ten Berge & van Heemst Withers & Lees (Standard Population) Withers & Lees (Vulnerable Population) A -17.1 -36.45 -11.4 5.04 -8.29 -6.61 B 1.69 3.13 0.82 0.5 0.92 0.92 n 2.75 2.64 2.75 2.75 2 2

Representative Probit Equation Constants 13.9 Fire Safety Studies

A fire safety study is a useful tool for a systematic review of an existing or planned fire prevention and protection system. It represents what would be done in the event that the risk prevention system breaks down and contingency plans are invoked. In the sense of the risk management matrix, it is a combination of best practice and simulation. From the point of an attending fire brigade, it does not have a likelihood component in the sense that the event would be assumed to be happening. That is, the brigades would only be required to attend because the undesired event is underway.

Risk & Reliability Associates Pty Ltd

13.17

Process Industry and Consequence Modelling The structure used to perform Fire Safety Studies is often that adopted by the NSW Department of Planning in its Advisory Paper No 2. Fire Safety Study Guidelines, namely: identify fire hazards (this stage may already have been completed if a top down context study has been completed) determine the credible fire scenarios from identified hazards determine preventive measures to minimise the possibility of fire model the potential impacts of identified scenarios quantify the fire protection resources required to manage the identified scenarios model the capability of proposed or installed fire protection systems capability to provide these resources

This approach is performance based although relevant codes and standards are still used for guidance. Adopted references include the NFPA (National Fire Protection Association of the USA) Codes, and Australian Standards including AS 1940 Storage and Handling of Flammable and Combustible Liquids. A range of fire models can be used to estimate flame impacts, usually pool fire and jet fire models. These include the use of finite element 3D modelling. An example of an R2A model used to determine the radiation impact from a high pressure gas line in a city fire is shown in figure below. This is available for viewing on the R2A website (www.r2a.com.au). Once the consequences of a fire have been determined, the level of protection required for adjacent facilities and the requirements to extinguish the fire can be ascertained. This is typically done using a combination of thermal response models, code requirements and experience.

3D View of a High Pressure Gas Jet Fire in a City Block 13.10 Risk Criteria used in Australia and New Zealand

Individual and societal risk criteria have been defined by the Victorian WorkCover Authority, the NSW Department of Planning and the Western Australian Environmental Protection Authority (EPA). Other Australian States and New Zealand authorities tend to utilise a combination of these criteria when assessing individual and/or societal risk. It is important to note that such regulatory compliance does not appear to satisfy common law criteria. Even if in the acceptable region any cost effective precaution that reduces risk further needs to be considered. This issue is expanded further in Chapter 4, Liability.

13.18

Risk & Reliability Associates Pty Ltd

Process Industry and Consequence Modelling 13.10.1 Victorian Risk Criteria Individual and societal risk criteria for public safety relating to hazardous industries have not been formally established and publicised in Victoria. There is currently a set of draft criteria issued by the Victorian WorkCover Authority (VWA), which is used by Government Authorities involved in Land Use Planning. This criteria was used as part of the Technica Ltd, Risk Sensitivity Analysis for the Altona Petrochemical Complex and Environs, October 1997. The following tables outline the risk criteria for individual fatality risk for both new and existing installations.
Risk Level -5 >10 pa -5 -7 10 to 10 pa <10 pa
-7

Actions Must not be exceeded at the plant boundary All practicable risk reduction measure to be taken. No residential development applicable to new developments. Acceptable

Individual Fatality Risk - New Installations


Risk Level -5 >10 pa -5 -7 10 to 10 pa <10 pa
-7

Actions Must not be exceeded at the plant boundary. All practicable risk reduction measures to be taken but restrictions on residential development applicable to new developments. Acceptable

Individual Fatality Risk - Existing Installations The document also establishes criteria for societal risk. Societal risk analysis combines the consequence and likelihood information with population information. This is presented as a F-N plot, which indicates the cumulative frequency (F) of killing 'n' or more people (N). A log-log F-N plot results in two parallel lines which defines three zones. a) b) c) above the acceptable limit the societal risk level is not tolerable between the acceptable and negligible limits the societal risk level is acceptable but if the perceived benefits gained by the activity are not high enough, some risk reducing measures may be required. Risk should be "as low as reasonably practicable" (ALARP). below the negligible limit, the societal risk level is acceptable, regardless of the perceived value of the activity.
10-2 Risk Unacceptable

10 -3

Frequency of N or more fatalities per year

10-4 Risk Acceptable but remedial measure desirable Risk Negligible 10-7 1 10 100 1000 Number of Fatalities (N)

-5 10

10

-6

Victorian Societal Risk Criteria

Risk & Reliability Associates Pty Ltd

13.19

Process Industry and Consequence Modelling 13.10.2 NSW Department of Planning The NSW Department of Planning has published an advisory paper "Risk Criteria for Land Use Safety Planning" (June 1992) that outlines the criteria by which the acceptability of risks associated with potentially hazardous developments will be assessed. The table below summaries the criteria for the individual fatality risk for new installations. Risk Level -6 0.5 x 10 pa 1.0 x 10 pa -6 5 x 10 pa 10 x 10 pa -6 50 x 10 pa
-6 -6

Land Use Hospitals, schools, child care facilities, old age housing Residential, hotels, motels, tourist resorts Commercial developments including retail centres, offices and entertainment centres Sporting complexes and active open spaces Industrial

Individual Fatality Risk-New Installations The NSW Department of Planning also puts forward risk criteria for property damage and inter-plant -5 propagation. They recommend that risk no greater than 5 x 10 pa for levels of: 23 kW/m of radiative heat flux; and 14 kPa of explosive overpressure should be experienced at an adjacent site.
2

Societal risk is also addressed. It outlines two components of the societal risk concept, namely the number of people exposed to the levels of risk and that society is more averse to incidents that involve multiple fatalities or injuries than to the same number of deaths or injuries occurring through a large number of smaller incidents.
10 -3 Netherland Unacceptable Limit

-4 10

Frequency -5 10 of N or more fatalities per year -6 10

-7 10

-8 10

Netherland Acceptable Limit 1 10 100 1000

Number of Fatalities (N)

NSW (Netherlands) F-N Curve The department then explains that societal risk criteria F-N curves should be used cautiously. This is also the R2A experience. They provide insight into the matter under investigation and a view as to the effectiveness of proposed precautions. But as noted at the commencement of this section, compliance is not sufficient to satisfy common law criteria. Even if in the acceptable region any cost effective precaution that reduces risk further will need to be considered.

13.20

Risk & Reliability Associates Pty Ltd

Process Industry and Consequence Modelling 13.10.3 Western Australia EPA Criteria In the document "Guidance for the Assessment of Environmental Factors, Risk Assessment and Management: Offsite Individual Risk from Hazardous Industrial Plants, No. 2 (Interim July 1998)", the Western Australia EPA has set out the following criteria for individual fatality risk. a) b) A risk level in residential zones of one in a million per year or less, is so small as to be acceptable to the EPA. A risk level in "sensitive developments", such as hospitals, schools, child care facilities and aged care housing developments, of one half in a million per year or less is so small as to be acceptable to the EPA. In the case of risk generators within the grounds of the "sensitive developments" necessary for the amenity of the residents, the risk level can exceed the risk level of one half in a million per year up to a maximum of one in a million per year, for areas that are intermittently occupied, such as garden areas and car parks. c) Risk levels from industrial facilities should not exceed a target of fifty in a million per year at the site boundary for each individual industry, and the cumulative risk level imposed upon an industry should not exceed a target of one hundred in a million per year. A risk level for any non-industrial activity located in buffer zones between industrial facilities and residential zones of ten in a million per year or less, is so small as to be acceptable to the Environmental Protection Authority. A risk level for commercial developments, including offices, retail centres and showrooms located in buffer zones between industrial facilities and residential zones, of five in a million per year or less, is so small as to be acceptable to the Environmental Protection Authority.

d)

e)

13.10.4 Risk Criteria in New Zealand The risk criteria used in New Zealand (Auckland City Council, NZ 1998) for land use safety planning appears to be the same as the New South Wales risk criteria for land use (Department of Planning, Sydney 1990) which are listed in Section 13.9.2.

Risk & Reliability Associates Pty Ltd

13.21

Process Industry and Consequence Modelling REFERENCES Auckland City Council, New Zealand (1998). Auckland Western Reclamation Area Land Use Safety Study. Chamberlain G.A. (1987), Developments In Design Methods for Predicting Thermal Radiation from Flares, Chem Eng Res Des Vol. 65, July 1987 DNV Technica Ltd, (1997). Risk Sensitivity Analysis for the Altona Petrochemical Complex and Environs. Prepared for ACC and Victorian Government, October 1997. Lees F.P (1996) Loss Prevention in the Process Industries hazard Identification, Assessment and Control. NSW Department of Urban Affairs and Plannings Multi-Level Risk Assessment guidelines (1997) NSW Department of Planning, Fire Safety Study Guidelines. Hazardous Industry Planning Advisory Paper No.2 (1993). NSW Department of Planning, Risk Assessment. Hazardous Industry Planning Advisory Paper No.3 Environmental Risk Impact Assessment Guidelines (1993) . NSW Department of Planning, Risk Criteria for Land Use Safety Planning. Hazardous Industry Planning Advisory Paper No.4 (1992). NSW Department of Planning, Guidelines for Hazard Analysis. Hazardous Industry Planning Advisory Paper No.6 (1992). Redmill, Felix and Jane Rajan (1997). Human Factors in Safety Critical Systems. ButterworthHeineman, Oxford. Society of Fire Protection Engineers, SFPE Handbook of Fire Protection Engineering (1995). Standards Australia. Storage and Handling of Flammable and Combustible Liquids. Australian Standard AS 1940:1993. Western Australia EPA (July 1998). Guidance for the Assessment of Environmental Factors, Risk Assessment and Management: Offsite Individual Risk from Hazardous Industrial Plants, No. 2 (Interim July 1998) Worksafe Australia (1996). The National Code of Practice for the Control of Major Hazard Facilities [NOHSC:2016] 1996.

13.22

Risk & Reliability Associates Pty Ltd

Process Industry and Consequence Modelling READING Australian Standard 2885.1-1997 "Pipelines-Gas and liquid petroleum, Part 1: Design and Construction" Australian Standards HB105-1998 "Guide to pipeline risk assessment in accordance with AS 2885.1" Barry Thomas F(1995). An Introduction to Quantitative Risk Assessment in Chemical Process Industries, Section 5 Chapter 12, SPFE Handbook of Fire Protection Engineering, 2nd Edition, 1995. Chamberlain GA (1987), Developments in Design Methods for Predicting Thermal Radiation Flares, Chem Eng Res Des, Vol.65. Chen, Richardson & Saville (1992), Numerical Simulation of Full Bore Ruptures of Pipelines Containing Perfect Gases Trans IChemE, Vol 70, Part B, May 1992. Det Norske Veritas (USA) Inc (1999), "API Committee on Refinery Equipment BRD on Risk Based Inspection", Revision 04. E & P Forum "Risk Assessment Data Directory", 1996 European Gas Pipeline Incident Data Group (EGPIDG)Gas pipeline incidents: 1970 - 1992, Pipes & Pipelines International, July-August 1995, as quoted in "E&P Forum QRA Data Directory", Section 9. Report No 11.8/250 October 1996. Johnson AD, Brightwell HM, and Cresley AJ (1994), A Model for Predicting the Thermal Radiation Hazards from Large-scale Horizontally Released Natural Gas Jet Fires, Trans IChemE, 72(B3) (1994). Kletz T A (1986). HAZOP & HAZAN Notes on the Identification and Assessment of Hazards. IChemE, London. Lees F P (1995). Loss Prevention in the Process Industries. 2nd Edition. Butterworth-Heinemann Ltd, Oxford, UK. (3 Volumes) Miller Peter (1996). Difficulties with Quantifying Risk. Millers Tales. Engineers Australia May 1996. Pipeline Operators Group Database (1971-1995) Pipes & Pipelines International, July-August 1995 "Gas pipeline incidents: 1970-1992, A report of the European Gas Pipeline Incident Data Group" The Longford Royal Commission Report (1999). The Esso Longford Gas Plant Accident. Sir Daryl Dawson, Chairman and Brian J Brooks, Commissioner. Published by the Government Printer for the State of Victoria, June 1999. Tweeddale Mark (2003). Managing Risk and Reliability of Process Plants. Gulf Professional Publishing, an imprint of Elsevier Science (USA).

Risk & Reliability Associates Pty Ltd

13.23

Crisis Management

14.

Crisis Management

Ideally, every aspect of a companys activity that could expose the enterprise to significant risk should be known, assessed and managed to best effect. This is not the easiest thing to do. Total success may be more an aspiration than a reality. One thing, however, is certain. Efforts made to achieve this goal rarely fail to pay off. Effective risk management not only avoids or reduces losses, but it sharpens the competitive edge. Good governance requires that risk identification be comprehensive, and risk management as optimal as best practice will allow. Spotting and assessing risks that may result in crises with legal, political or public relations fallout is only one aspect of risk management, but it is one of the most important. Fallouts refer to the various sorts of external reactions and business consequences that may follow corporate decisions and activities. Risk management concern is mainly with those negative, sometimes unexpected reactions that can threaten the corporate image, more so if not well handled. Fallout crises may be triggered by events such as: accidents, engineering failures, and other such events that have adverse health, safety or environmental effects; cost overruns or bankruptcy due to inadequate financial management or criminal fraud; challenges to a corporate project at the proposal stage, or responses to a negative that is a decision not to do something, or a failure to perform as expected. perceptions by influential elements in the community that a corporations priorities or values clash with the public interest.

Fallouts can be fought out in the media, in industrial disputes, in protest actions, and/or in the courts and the legislature. In other words, the range and possibilities of fallouts are fairly limitless both in respect of their causes and the way they unfold. 14.1 Intention

As this chapter mostly addresses those already involved and experienced in many aspects of coping with risk, the intention is to skip the customary introductory points about the need for contingency planning for crisis or incident management. Instead, this section will make a number of assertions regarding crisis management, backed by illustrative examples and case studies. The aim is to help readers check whether their current thinking on risk that is, how to identify risks and prepare to manage them, hopefully before they eventuate - is comprehensive enough. The range of relevant risks matches the wide variety of fallout possibilities. The assertions also aim to prompt consideration of whether existing systems for managing risk within readers organisations are appropriate to cope with the rapidly changing social, legal, political and international public environments in which both corporations and government, public and private bodies, now have to operate. The first of these assertions refers to the most important guideline in managing fallout. One thing that managing fallout is always about, and will always remain about, is securing and maintaining public trust and confidence. By trust is meant public acceptance that: corporate management is acting in good faith; the public can accept management claims without undue suspicion, doubt, or cynicism; it can accept the corporations stated agendas as its real agendas, and these agendas are public interest as well as special interest agendas.

Risk & Reliability Associates Pty Ltd

14.1

Security & Crisis Management For example, when it comes to changes (say) in health, education, law enforcement, or some other essential public service, is the dominant and only aim to improve the quality and availability of services to the public, or is it to cut cost in line with some political agenda, or to cater only for some special interest? Trust relates to corporate ethics and social responsibility. Confidence, on the other hand, refers to ability. Confidence: This refers to public confidence in the competence, diligence, and professionalism of management to get jobs done properly i.e. efficiently, safely, on budget, on time, and with the end product delivering what earlier promotion led the public and/or consumers to expect. A common temptation in managing fallout is to fall back on accusing the media, political parties, or activist groups of unwarranted interference and trouble-making. Often this involves saying or inferring that critics are acting for reasons of special interest, ideological opposition, or out of sheer agin-thegovernment bloody-mindedness. No matter whether this is the case as it sometimes is or not, the battle is still about re-assuring or reearning public trust and confidence. Managing fallout always remains a battle for public credibility between managements and their critics. In the end the outcome will boil down essentially to what the facts are. What is the truth, or at least, who can be most believed? Rough and tumble working out of public accountability is an essential feature of the democratic process. It will remain an ever-present occupational hazard of management - especially when public services or key infrastructure development or redevelopment are involved, or when products are perceived to disaffect public health, safety or the environment. More often than not, fallouts involve legitimate and desirable public interest probing into how matters affecting the public or some significant section of the public, are being conducted. Public suspicions may sometimes be unfounded, or mischievously provoked. Criticism may be unnecessarily abrasive. Nevertheless, this does not reduce the importance of accountability and transparency in a democracy. 14.2 Lessons in Fallout Management

Given the current political and business environment, we have little excuse for not all becoming experts on the risk of fallout and its management. Local and international media deliver almost daily free and detailed lessons in fallout management and mismanagement. From September 11 to Bali, from Enron and Arthur Anderson to OneTel, HIH, Ansett, and National Australia Bank, from the Tampa and children overboard controversies, to the challenging of church managements over the handling of child abuse cases, and, most recently of all, to the accuracy and political use of intelligence to justify the pre-emptive war on Iraq, daily news headlines have been on little else. Every day brings a fresh instalment, another tutorial in the dos and donts of fallout management or mismanagement. Throughout all this coverage, stress is clearly on trust and confidence - that is, on the credibility and competence, the professional integrity and proficiency of managements. Obviously some fallout questioning is motivated by partisan political opportunism, and/or by special interest and ideological preferences of one sort or another. Quite often the manner of questioning is unnecessarily shrill and abrasive. Presumption of innocence until proven guilty is not a prominent feature of media trials. But this does not undermine the basic democratic legitimacy of these public interrogations and their social utility. Most people are more confident in those managerial spokespersons who manage to respond in a workmanlike, unresentful, fact-focussed, up-front way to media questioning even when that questioning is at its most deliberately provocative, accusatory and insulting.

14.2

Risk & Reliability Associates Pty Ltd

Crisis Management The ability of some spokespersons to do this seems to come when they accept that public accountability is an essential and proper part of their job. Not the most jolly and comfortable part, but an inevitable part. The fallout cases selected for mention so far have been particularly dramatic and occurring at the highest national and international levels. But this enhances rather than limits the lessons they carry for smaller scale enterprises. The dynamics exposed so starkly in these examples highlight the basics of risk management: i.e. the importance of prior risk identification, the origin of fallouts, the sorts of issues they raise, and the techniques that succeed or fail to manage them effectively. One clear lesson is that what management does, or can do, during the fallout is obviously limited by what they did or didnt do before the fallout. Trying to re-write, re-interpret or shred history after the event has limited success. As Arthur Anderson discovered, it is more likely to super heat the frying fat. 14.3 Design Stage

Managing fallout effectively obviously begins at the design stage. The design stage is when the risks of proposed policies and projects should be comprehensively explored. Not only in terms of the likely and possible impact on the public in general, but also on particular community, political and special interest groups many of which, as we well know, are more than capable of vigorous and practised response. It is at the design stage that we should explore how proposed decisions might be misunderstood or challenged. Also how projects can be unambiguously explained and communicated to watchful and potentially critical elements of the general public. This exploration should include how, if necessary, decisions can be justified later, after the fact, when unintended consequences may have manifested themselves. Even statements and directions that seem perfectly clear and simple can be badly misunderstood. 14.4 Case Studies

Before proceeding to two specific case studies of contrasting fallout technique, it is useful to repeat a little more bluntly the key point that has been made so far. Fallout management, public interrogation and response, is a legitimate and desirable aspect of democratic accountability. Even when the process is misused and Raffertys Rules apply, more benefit accrues overall to the corporate image through frank and willing engagement with the public than through resentful reluctance, avoidance or obfuscation. In these times when political and PR minders and spin doctors seem to abound, there is a danger that managers will be tempted to think cynically of the fallout process. Some see fallout management largely in terms of merely training spokesmen in PR and political minder-style techniques for media appearance. Training is seen too much in terms of preparation to do battle with hostile, unfair and tricky adversaries only - the negative approach rather than a positive bid to win public trust and confidence. Should combat or communication, openness or secrecy, fact or spin, explanation or avoidance, be foremost in ones approach to fallout management? The following two case studies may help readers decide between the two approaches. Both cases involved global product recalls. But they have the advantage of being widely applicable. Both are classical illustrations of good and woeful fallout management.

Risk & Reliability Associates Pty Ltd

14.3

Security & Crisis Management Perrier One recall was by Perrier of its mineral water in 1990. The other was the recall by Proctor and Gamble in 1986 of its pain relieving Tylenol pain tablets. Perriers fallout was triggered when a US health authority, using advanced technology, detected benzene in a shipment of Perrier water from France. The concentration was allowable under World Health Organisation standards but not under US standards. Different Perrier spokesmen in the US and France started making factual assertions to the media before the company had established the facts. One claim was that contamination was confined to the US shipment. Another was that the benzene came from bottle cleaning. These claims were later shown by the media to be false or mistaken. This immediately set back the companys credibility and aroused the medias blood scent. Media focus on the Perrier story intensified. The investigative spotlight extended to Perriers promotional marketing. The value of the mineral water product lay in lifestyle image. Perrier claimed that the mineral water was pure at its natural source that it was naturally sparkling, and that it was calorie and sodium free. Perriers image was built around promotional slogans like "It's Perfect, it's Perrier" and words like "Natural" and "Pure" and Health. Many consumers obviously saw Perrier as the fashionable drink of choice for those wishing to display a health-conscious, organic sort of lifestyle image. During the course of the fallout it was revealed that not only did the benzene come from the natural spring source in France, but also that the water was not naturally sparkling as it was in the bottled product. Neither was it calorie or sodium free. The company got no credit for finally admitting these facts. Its eventual retractions were regarded as forced confessions. Credibility went to the media for dragging the facts out of the Perrier spokespersons. The companys image was also adversely affected by that fact that at no time during the fallout did the company apologise to its customers or express concern. Brand loyalty of its consumers was decimated. Company spokesmen gave the impression that they regarded public questioning as something of an unwarranted impertinence. Among the public, many thought the company should have known, perhaps did know, what was in its product. There was obviously little anticipatory risk management and little or no coordination between risk management and Perriers marketing consultants. By overlooking, ignoring or concealing so many potentially explosive risk factors in its marketing, Perrier was inviting disaster. As a result, 160 million bottles of Perrier eventually had to withdrawn and disposed of. Ultimate stock market and other business losses exceeded their value many times over. By the end of the fallout and for a long time afterwards, few saw the corporations image in terms of competence, transparency, credibility, and good governance. Re-establishing the corporate image was a slow and painful task. Tylenol Now lets contrast the Perrier outcome with Proctor and Gambles handling of the Tylenol incident. Note that in the Perrier case no one was injured and probably no ones health was really damaged. Benzene is considered potential carcinogenic, but the concentrations involved were small on a borderline above US but below WHO standards. When a criminal extortionist poisoned Proctor and Gambles Tylenol tablets, however, several consumers died. When the extortion threat was first brought to the companys attention through the media, Proctor and Gamble had previous assessed the risk and had contingency plans in place - plans thought out in the calm times before any crisis occurred.

14.4

Risk & Reliability Associates Pty Ltd

Crisis Management It was the corporations CEO, not a low ranking staffer, who immediately appeared as spokesperson. He went straight on the nations leading media interview show. First, he declared the companys concern for its customers. He said the companys first priority was public safety. He said a global recall was already in motion. He announced that 24-hour, toll free, multiply phone-in lines had been set up to handle all inquires and problems. All phone-in staff were fully trained and kept up-to-date by progress briefings The CEO admitted that in hindsight the product would have been more secure if it had had tamperevident packaging. This would be corrected. He was the companys only media spokesman on the issue. He refused to hazard guesses when asked factual questions about which he was uncertain. He explained why the facts were not yet clear and what was being done to establish them. Willing transparency, demonstrable competence and public interest priorities were the qualities that won the corporation the publics trust and confidence. In contrast to Perrier, Proctor and Gamble emerged from its fallout with an enhanced rather than a splattered public image. In a short time, the value of its shares and its product market leadership went back to pre-incident levels. 14.5 Conclusion

Unlike the positive focus required of the enterprising movers and shakers vital to corporate energy and achievement, risk management has the job of looking at the grey sky scenarios - not just the sunny blue ones. It is right and proper for mainstream managers to keep their hearts and eyes on new fields of conquest. It is the risk management function to spot and mention the minefields that may slow or even prevent them reaching their goal. The current state of corporate credibility with much of the public is somewhat damaged. This has occurred at the very time that governments and others are putting pressure on corporations to expand good governance. These two trends comprise a sort of pincer movement, creating a fairly rugged environment in which to manage.

Risk & Reliability Associates Pty Ltd

14.5

Security & Crisis Management REFERENCES When the Bubble Burst, The Economist, 3 August 1991. Article on the Perrier incident. Gideon Haigh (1991). The Business of Managing Crises. The Age. 15 August 1991. Summary article including a review of the Tylenol incident. Gideon Haigh (1991). Ignorance is not Bliss in Crisis Management. The Age, 16 August 1991. Article on the Perrier incident. READING David Elias (1997). Arnotts Agenda. Textbook case was the template for food threat. The Age. 22 February 1997. p A20.
th nd th th rd

Murray Mottram (1995). Going, Going, Gone. The Sunday Age (6 August 1995). Article on the Iron Baron incident.

14.6

Risk & Reliability Associates Pty Ltd

Case Studies

15.
15.1

Industry Based Case Studies


Airspace Risk Assessment

An Airspace Risk Model (ARM) was developed to address the risks of various airspace classifications for Airservices Australia, in particular those in isolated areas (Jones et al). Initially this model was used to determine the level of risk for both the current and proposed methods of operating in Australian airspace. The critical event as defined by the airspace risk model is the near miss. This is considered to have occurred when two or more aircrafts come within the defined horizontal (1 Nm) and vertical (500 feet) limits without being aware of the others presence. By defining this as the critical event it is assumed that the loss of control of the situation is identified as the point at which movement of the control surfaces of an aircraft at risk would not have any significant effect by the time the collision point was passed; that is that no matter what the actions of the pilots were at this point the results would still be ruled by luck. This is deemed to have occurred 12 seconds before any near miss / collision. The cause/consequence diagram is centred on this critical event from which the consequences flow from left to right. Time is also considered as always progressing from left to right across the page. The figure on the next page shows this cause/consequence diagram. Event diagrams were developed to show the sequence of events that lead to the critical event in the cause/consequence diagram: * * * * Traffic Alert not received Aircraft cannot receive call Considered action fails Evasive action fails.

The event diagram for the Traffic Alert not received is shown below. An event diagram was also developed for the other three events.

Aircraft cannot recieve call

or ATS alert fails

Traffic Alert not received

& Traffic alert not provided

No alert from other aircraft

Event Diagram for Traffic Alert not received Once all these event diagrams had been developed and verified the model needed to be quantified by the panel of operational research personnel (who also referred to various surveys and publications). Once this was done the values were inserted into the model and solved using methods outlined in Chapter 9 of this text. The results showed that the model was quite sensitive in some areas, which required further investigation. This quantified risk analysis approach (cause-consequence modelling) can be calibrated to give an assessment of the existing risk of the particular system under study. By testing such models against both the available data and the experiences of senior management and the technical personnel in the industry concerned, it is ensured that the model accurately reflects the best available information and knowledge at the time it is used to make decisions regarding risk acceptance and risk reduction, if required.

Risk & Reliability Associates Pty Ltd

15.1

15.2
Collision? Aircraft Loss? Populous area? Yes Aircraft loss & 0.01 collateral damage 3.60 E-16 Yes Aircraft Loss 0.90 3.60 E-14 Yes 0.01 Collision 4.00 E-14 No Slight damage 0.10 Fly away 4.0 E-15 No Null No Aircraft loss 0.99 only 3.56 E-15 & 1st aircraft fails to avoid 2nd aircraft 2.00 E-06 Loss of Control of & aircraft energy Envelopes overlap. Aircraft collision 4.00 E-12 & 2nd aircraft fails to avoid 1st aircraft 2.00 E-06 Critical Loss of Control Event 0 seconds Immediate outcome? +10 seconds Aircraft loss? +30 second Collateral damage? +3 minutes

Case Studies

ATC Separation inapplicable

1.00 E+00

1st Aircraft

5 minute response. Considered action fails from page xx 2A 2.00 E-03

1st Aircraft 1 minute response. Avasive action fails from page xx 2B 1.00 E-03

2nd Aircraft

1 minute response. Avasive action fails from page xx 3B 1.00 E-03

2nd Aircraft

Cause-Consequence Model for Enroute Airspace Collision Risk

5 minute response. Considered action fails from page xx 3A 2.00 E-03

Risk & Reliability Associates Pty Ltd

ATC Separation

Considered action 5 minutes

Evasion Action 1 minute

Case Studies 15.2 Train Operations Rail Model

Risk analysis is being used by NSW CityRail to rank infrastructure renewals in a way that ensures that work, which is done, will have a significant impact on the business (Anderson et al 1992). This is done by obtaining specifications on the acceptable ranges of quality of service from assets, of which the lower bounds are considered unambiguously safe for all possible levels of operation. With respect to this, management can then identify and eliminate safety and service risks by doing specific projects. This would also allow management to assess the cost of providing a specific level of service and safety, or alternatively the levels that can be provided with the funding available. The information required is that which would enable the estimation of the likely frequency of occurrence of a train collision on a particular section of the track. The data sheet designed for this input data is shown below.
Section data Analyst
LINE

First 17/1/92 2:06 PM Last

Prev Next

Show Item Calculate


Line Code

Enter Cancel

Kevin Anderson Main

SECTION Fingerme to Hurtledown


LINE HAZARDS & CONTROLS Wrong side failure probability Visibility failure probability

SECTION DETAILS

Autocalc

7.87,-8 7.14,-9

Automatic warning system (AWS) No Mechanical trainstop/trip (ATS) No Electronic transponder (ATP) Model Case Line Case

14 63 Points 30 Interlocking 2 Train stops 5 Length 9.34


Track CONDITIONAL PROBABILITY Head On A B C Rear On

Signals

No

Run 1 w/o ATP

Main
TRAIN CONTROLS TRAIN Driver 2 DeadMan Vigilance AWS ATS ATP Model Case A B C D

2.27,-5 1.42,-6 1.42,-6 2.27,-5

1.5,-5 0.79,-6 0.79,-6 1.5,-5

SECTION OCCURRENCE FREQUENCY ( per annum) Head On

No Yes No No Yes No

Yes No Yes No No No

Yes Yes No No No No

No Yes Yes No No No

Line Case

0.1472 0.6611 0.8084

Main
Model Case

Rear On

Run 1 w/o ATP

TOTAL

EXPOSURE A % late upWk dnWk

Run 1 w/o ATP

.1 240 258

.1 84 84

.1 11 11

.1 11 10

Section Based Data Sheet for the Estimation of Railway Collision Risk Data
(illustrative purposes only)

The data in this sheet is then used in the Fault and Event Tree (cause-consequence model) for the Loss of Train Energy that calculates the probability of the possible outcomes. This is shown in the figure below.

Risk & Reliability Associates Pty Ltd

15.3

Case Studies

15.4

Risk & Reliability Associates Pty Ltd

Case Studies This data sheet is in the background of the layout below.
Valley Hieghts Blaxland Warrmoo Springwood Faulconbridge

Emu Plains Lapstone

Glenbrook

Blackheath Mt Victoria

Medlow Bath Katoomba

Leura Wentworth Falls

Bullaburra

Linden Hazelbrook Lawson Woodford

Newness Junction Bell

Zig Zag Tunnel Edgecombe Oakley Park Lithgow

Bowenfels

A Computer based Network Layout to which the Section Data is linked This allows the entire system to be managed on a single sheet with a juxtaposition of data that is highly relevant to the task of determining the relative importance of different line and train controls. 15.3 Fire Risk Management (in buildings)

Monash University owns or occupies many different types of buildings from multi storey high-rise buildings to low-level sprawling buildings of varying ages. Each one of these has a different level of fire protection. The authors provided advice regarding the establishment and ongoing use of a Fire Risk Management Information System, which would enable accurate assessment of deficiencies, corrective costs, work priorities and work completed to be available to management. This method would also ensure the limited pool of funds was used effectively. An initial assessment of almost half the campus building floor areas was done concentrating on the adequacy of the following systems: emergency procedures alert and communication systems exits exit signs and emergency lighting smoke control systems air handling systems fire penetrations inspections testing and maintenance fire detection and control systems

This assessment revealed considerable life safety problems, which would require large amounts of funds to correct. Due to this the costs of fire risk management was then translated into the optimisation of the total costs of risk, that is the maximisation of the risk reduction per dollar spent. This was done with respect to safety, an acceptable level of risk and duty of care as defined by the Victorian Occupiers Liability Act (1983) and the Victorian Occupational Health and Safety Act (1985).

Risk & Reliability Associates Pty Ltd

15.5

Case Studies An unacceptable level of risk is reached when the risk of fatality is assessed to be too high. An acceptable level of risk can be determined by analysing existing risks, which are familiar to and accepted by the public. A summary of hazards and involuntary risks resulting from voluntary and involuntary activities are shown and discussed in Chapter 9 of this text. With respect to this project the calculated level of acceptable risk was defined as one or more fatalities with a frequency of one (or less) in a million per year. If the calculated risk were greater than this value then risk reduction measures would be deemed necessary. A time sequence fire model was used to analyse the event/consequence model. This model was used to emphasise the time of occurrence of various conditions and related to the risk control measures. A smaller fire, which can be put out with an extinguisher and does not require fire brigade response, is not considered in this type of analysis. The frequency of larger fires is then determined and any parameters that would aid in early detection are considered. These include: smoke detection occupant response fire rating of doors, walls etc. sprinkler system operation (where installed)

A fault tree was then developed to describe the system; this tree describes the failures or faults that have to occur before the top event of this tree eventuates. This type of modelling is described in greater detail in Chapter 9 of this text To add to the complexity of the analysis the buildings were also classed as one of four different occupancies: 1. Residential Occupancy 2. Office Occupancy 3. Public Occupancy 4. Laboratory Occupancy Each of these occupancies has different parameters, which affect the result of the fault tree. A user-friendly interface that has all the relevant calculators in the background was developed on a Macintosh computer using SuperCard software. This allows the user to look at any of the building categories listed above. Particular building in their current state of life safety risk, which relates directly to the level of success of escape from a burning building, can also be viewed. Applicable financial data (including the cost of maintenance - inspection and testing) is also included. There are fifteen factors that affect the probability of escape (shown in the fault tree). Each of these factors has three possible probabilities, which relate to the items being: not installed installed but not maintained installed and maintained

Data files containing the above probabilities are used to calculate each buildings risk of multiple deaths. A fire risk optimisation model was then used to rank the buildings in descending order of risk. This routine provides a hierarchical list of risk reduction measures to be undertaken and the corresponding reduction in risk, which can be used to achieve specific risk levels. Overall the following steps were followed. i) ii) iii) iv) v) vi) Analysis of current life safety equipment. Definition of the level of acceptable risk and the number of fire starts per year. Develop a fault tree for the system. Calculate the buildings risk of multiple deaths with the system in its current state. Rank the buildings in descending order of risk. Produce hierarchical list of risk reduction measures to be undertaken and the corresponding reduction in risk. Risk & Reliability Associates Pty Ltd

15.6

Case Studies 15.4 Transmission Line Risk Management

Over 30% of the transmission lines used in Tasmania are 50 years old or more. These lines were built before the establishment of industry based guidelines. As a result of this many of these lines do not meet the clearance requirements outlined by the Electricity Supply of Australia (ESAA). Also a number of these lines were built across what were remote areas, but due to new roads being built and greater access to these areas by the general public via off road vehicles. For this reason greater clearances to comply with the ESAA guidelines are required and hence may pose a danger to the community and environment in their current state. Transend Networks Pty Ltd, Tasmania has adopted Risk Management techniques as an essential part of the Asset Management of the transmission system (Houbaer and Seddon 1995). This method was chosen to assist the Company in obtaining the greatest risk reduction per dollar spent, to reduce the amount of overall expenditure, and to optimise operations whilst also limiting their legal liability. This was done with the use of a risk-ranking model, which ranked the lines according to the severity of the breach of clearance according to the statutory minimum clearance obligation of 5.5 m above roads, (this is greater for other categories). The other techniques used to rank hazards and solutions were to quantify the level of risk exposure to people, equipment and environment, to determine the consequences of these hazards and to compare these risk levels with acceptable risk exposure levels documented by legislation, best industry practice or guidelines. This model also aided management in deciding whether expenditure on refurbishment or development projects could be minimised or deferred. In the case of deferral the likelihood of an unwanted event occurring is increased, but the risk is deemed acceptable provided the appropriate preventative control measures are put in place. This is preferable, as the cost of fixing all infringements would cost tens of millions in capital expenditure over a number of years. The four main parameters of the risk model were: identification of critically exposed groups classification of credible hazards development of cause/consequence diagrams to determine what events have conspired together to cause loss of control of conductor energy under consideration, and determining acceptable risk criteria

15.4.1 Risk Criteria The rationale for ground to conductor clearances prescribed by the ESAA Guidelines could not be established, so the following analysis was made. Vehicles over 4.3 m are over dimensioned and require statutory approval before movement can commence. The Australian bridge overpass design height is 5.5 m. Flashover distance for 110kV is 0.25m and flashover distance for 220kV is 0.55m. Flashover distance for lightning strike (about 500kV) is 1.2m. Thus for traversable areas, the following conductor risk thresholds can be established. A1 A2 B0 B1 B2 4.3 m+0.25 m 4.3 m+0.55 m 4.3 m+1.2 m 5.5 m+0.25 m 5.5 m+0.55 m =4.55 m =4.85 m =5.5 m =5.75 m =6.05 m 110 kV flashover threshold for maximum dimensioned vehicles 220 kV flashover threshold for maximum dimensioned vehicles Lightning (500 kV) flashover threshold for maximum dimensioned vehicles (ESAA Guideline for 110kV non traversable) 110 kV flashover threshold for over dimensioned vehicles that fit under bridges 220 kV flashover threshold for over dimensioned vehicles that fit under bridges (ESAA Guideline for 220kV non traversable) 15.7

Risk & Reliability Associates Pty Ltd

Case Studies C1 C2 C1 15.4.2 Process The Transmission Line Risk Management System is a PC based desktop colour publishing solution to assessing and managing span-based hazards. The prime focus is on direct flashover hazards to the public. The process was developed with the support of the HEC solicitor. The operational steps are: 15.4.2.1 Preliminary PC Based Risk Assessment using Original Design Data A preliminary computer based assessment is made using the original data used in the design of the transmission line. Some Field Verification of the original design data occurs as required. The original design data is transferred to an Excel Spreadsheet format on disk. This information is then transferred to the TLRMS PC and analysed. Infringing spans (generally less than 6.7m for 110 kV conductors and 7.6m for 220 kV conductors or 9.5m over public roads) for alternative conductor core temperatures (typically 49C and 75C) are determined. A Register of Offending Spans is then printed by Transmission Number Line and Core Temperature. 15.4.2.2 Desk Top Risk Assessment This considers each span, taking into account factors like land use, road and rail crossings, conductor crossings and the various infringements determined above. It must be done by someone knowledgeable with the conductor and its environs. 15.4.2.3 Field Inspection - Best Available Data Established Based on the results of the above, suspect spans are inspected in the field using specially developed single A4 Register pages. If the ground or conductor profiles are incorrect then the TLCAD data needs to be corrected and the above two stages repeated. 15.4.2.4 Final Assessment and Action a) If the span data is correct then the proposed (infringement) control option/s needs to be selected and costed and marked on the register page. b) Special hazards related to special critical groups (for example, scenic views encouraging low flying pilots) needs to be assessed and noted on the register page. The single 'help page lists the possible hazards considered by the original expert team, but the field operative should try to consider if any other hazards to any other exposed group exist, for example, hang gliders, abseiling and others. The control option data is then inserted into the TLRMS PC and the risk and control data options exported to an Excel spreadsheet. This data is then ranked by: Worst Electricity Supply Association of Australia (CB1) infringement per span Worst Electricity Supply Association of Australia (CB1) infringement per linear metre Greatest hazard reduction per dollar spent for design controls e) f) Action budgets are formulated and plans made. If a physical change is implemented then the design data needs to be altered and the TLRMS item re-run. If a procedural solution is adopted (for rare excessive conductor sags) during extreme weather/load conditions then this needs to be formally documented and implemented. Regular training and/or drills will be required. 5.5 m+1.2 m =6.7 m 7.5 m Lightning (500 kV) flashover threshold for over dimensioned vehicles that fit under bridges. (ESAA Guideline for 110 kV traversable). (ESAA Guideline for 220 kV traversable). is the same as the ESAA Guidelines for 110 kV.

c) d)

15.8

Risk & Reliability Associates Pty Ltd

Case Studies 15.5 Bushfire Risk Management

The need for risk management or the reduction of loss control in bushfire prone areas is discussed at length in a paper written by the authors after the devastating Ash Wednesday bushfires (Anderson and Robinson 1984). The main objectives of such a bushfire risk management system would include: i) ii) Relating the costs of various bushfire protection methods to the vulnerability of threatened assets; in particular lives, property and areas of particular environmental/habitat significance. Documenting methods of environmental management towards an optimum level of bushfire prevention and safety. This will include both active and passive management items such as the application of planning controls and standards for road access and water supply reticulation reliability. Determining an appropriate balance between environmental conservation and fire hazard reduction practices. Concentrating on prevention measures within the ambit of local councils.

iii) iv)

15.5.1 Assets The main assets that need to be protected are lives, property (residential, commercial and municipal) and areas of high environmental/habitat quality. Identifying assets also identifies where fire protection needs to be concentrated. 15.5.2 Threat Assessment To assess the threat to an asset an estimation of the type, severity and frequency of hazards needs to be made. This would be based on history and Rural Land Mapping (which includes an assessment of fire hazard). Obviously any information available on past incidents will be useful to this assessment. 15.5.3 Asset Exposure Obviously as the fire reaches and grows beyond the controllable stage the options for fire retardation decrease and the losses increase significantly. The probability of fire is dependent upon the supporting environmental conditions such as wind, temperature and combustible loading. There are generally three stages of fire growth that can be directly related to asset loss. Fire inception phase which there is very little loss, the minimum loss situation, (also referred to as Normal Loss Expectancy), is defined as the largest loss expected under normal circumstances, which assumes no loss of life and minimal loss of property. The maximum loss situation, (also referred to as the Maximum Foreseeable Loss), is the worst-case scenario. This exposure increases with the decrease of housing density and increases with the lack of clearings, adequate water supplies and access roads.

Risk & Reliability Associates Pty Ltd

15.9

15.10
LIVES Rural Not at Risk Not at Risk Control Public access to habitat areas. Experienced fire crews to do maintenance/ clearing in significant aras. Patrol areas in high fire risk areas Firebreaks around habitat if minimal disturbance to occur within areas Water Supplies in habitat area or nearby areas or provide an area for animal evacuation Infra red fire towers SIGNIFICANT HABITAT/ ENVIRONMENT CRITICAL FACILITIES & SERVICES Not at Risk Population density ensures detection in which case they should be made fire resistant If public in isolated areas public knowledge of fire danger days important Population density ensures detection in towns Water pumps etc. should be fire proof to an appropriate level Fire crews maintaining communication links to HQ's and obtaining information on water supplies etc. Overall community info. system (eg. siren) to alert to evacuation to town, or radio time for warning through fire danger period Evacuation of residents to town, some loss of life to fire fighters but minimised with better equipment, knowledge of fire situation and behaviour Fire crews in fire resistant tankers Emergency water supplies operated by diesel pumps (say underground tanks) No one way roads; loop roads for 2 way access to all areas. Emergency water supplies. If possible ground crews dispatched to work on most significant areas, (say sufficient) Protected by golf course on the north side of the township; this would act as firebreak. Critical buildings could be placed in park area; in the case where only minimal areas could be cleared the evacuation center should be double bricked and sprinklered via underground piping or alternatively the evacuation could be underground. No further protection possible

Case Studies

ASSET FIRE STAGE>>

PROPERTY

Urban

15.5.4 Protection Measures

In the initial stages of the fire;

Not at Risk

Control burning off through enforcement Clear roadsides as part of regular works program

Once the fire has started to develop;

Not at Risk

Enforce the clearing around houses, in gutters and ensure fire fighting eqipment is kept for households in isolated areas Make cleared areas available

Once the fire is at a size it can be easily detected;

Infra red fire towers minimise detection time Increased surveillance particularly on days of high fire danger (e.g - use a helicopter on days of total fire ban)

Once the fire has been detected;

Not at Risk

Rural residents evacuation to towns (elderly , children ); some may remain to protect houses.

Protection measures, which are both passive and active, were then proposed for both the Normal and Maximum Foreseeable Loss expectancy. Some examples are shown in the table below.

Prevention Measures Applicable At Various Stages of A Bushfire

Risk & Reliability Associates Pty Ltd

Fire developed to the uncontrolled stage;

Urban buildings protected as part of protection of critical areas; urban fringe buildings outside of protected area may be at risk.

Complete evacuation to town and clean up groups may be sent out after critical period to save houses. Some loss of houses, no loss of life.

Case Studies 15.6 Tunnel Risk Management

The following is summarised from Robinson, Francis & Anderson (2003). An initial vulnerability assessment was conducted as a completeness check to test for issues to be addressed. A very reduced sample for a tunnel is shown in the table below.
Assets>> Travelling Public Operator Staff Including Disabled, Including Elderly, small contractors, children, people who Breakdown behave erratically services x x xx xx x x x xx Emergency Local Habitat/ InfraServices Residents Environment structure & Fire brigade, Air quality Third Party ambulance & police x xxx x x x

Threats Motorcycle breakdown Passenger car breakdown Bus Breakdown HCV load fire stationary vehicle in free flowing traffic HCV vehicle fire burning vehicle in stationary traffic Injury/entrapment accident - all lanes blocked Fatal accident - all lanes blocked Pedestrians in Tunnel on walkway Cyclist in Tunnel

xxx

xxx

xxx

xx

xx x xx

x x x

x x x

Sample Vulnerability Table HCV (heavy commercial vehicle) fire especially in stationary traffic appears as critical (xxx) for three exposed groups and is analysed further. The figure below shows a preliminary cause-consequence model for a fire in a heavy commercial vehicle (HCV) in stalled traffic in a long two-tunnel system using longitudinal emergency ventilation (jet fans). 15.6.1 Loss of Control Point The loss of control point appears to be that fire which overwhelms the usual air handling system. There are several arguments for this. The simplest, legally, probably revolves around confined spaces. The tunnels should only have sweet, decent air whenever they are occupied, even during a fire/smoke incident. Otherwise they would be considered a confined space. Emergency ventilation to prevent a situation becoming a confined space is an attempt to restore control and acts after the event. On an open freeway a fire is mostly an isolated event since the heat and smoke goes up and exposed persons (beyond those trapped in the vehicle/s) basically stay away from the inferno until the brigade arrives or the fire burns out. In a tunnel this is potentially far more problematic because of the contained environment. Even an unmanaged 5 MW fire can create substantial problems for persons remote from the fire unless special precautions are taken. This means that it is the change of the tunnel environment by the fire that creates the loss of control.

Risk & Reliability Associates Pty Ltd

15.11

Case Studies

Threat controls Dangerous goods restrictions Non combustible vehicles

Threat Fire in HCV in stalled traffic 0.01 pa Loss of Control (Manifest Threat) Precautions Automatic fire control 0.01

Vulnerability Controls Stalled traffic minimisation Manual efforts, deluge systems Fire Brigades Reponse Emergency evacuation systems Jet fans 0.5 Hit Potential injuries and deaths 0.00005 pa

0.0001 pa Smoke/fire overwhelms usual air handling systems 5+ MW Fire? 0.5

Near Miss (Null outcome) 0.00005 pa

Usual ventilation/air handling Early automatic fire control including sprinklers/deluge systems Storm drainage deals with spilt fuel fire etc.

Preliminary Cause-Consequence Model for HCV Fire in a Tunnel in Stalled Traffic Another way to think of this relates to different size fires in the tunnel. Suppose that a car engine catches on fire, the driver pulls over and a passing truck driver stops and extinguishes the fire with a fire extinguisher. Other than the lane restriction and the possibility of collision, from the point of view of the tunnel environment, there has been no loss of control since the smoke and heat will have been dissipated in the overall tunnel air movement (piston effect of cars and the jet fans etc). However, there is a certain size fire that will disrupt the air flow, place remote persons at risk and thus bring about the need to impose emergency measures including an emergency ventilation system and the like. This appears to be the loss of control point.

Bouyancy Effect of Hot Combustion Gases

Jet Fans and Piston Effect


Fire in Downward Facing Tunnel Since tunnels can slope, cars travel in different directions and hot air rises; the fire loss of control point for two tunnels is potentially different. It is likely to be more severe in the tunnel where vehicles travel downhill. As suggested in the diagram above, fire in the down tunnel is far more likely to produce turbulence and mixing.

15.12

Risk & Reliability Associates Pty Ltd

Case Studies There are three primary risk control regions. 15.6.2 Threat Reduction Firstly, threat reduction; in this case reduce the source of fire, for example, combustible trucks with large combustible loads. Small fires in any vehicle may occur once every two months, in a heavy commercial vehicle, say once per 10 years and in stalled traffic say once in 100 years. 15.6.3 Precautions Secondly, precautions such as deluge systems that can control fire before the normal air handling system is overloaded (small fires are safe fires). A further consideration is the size of the uncontrolled fires. If the environment can be designed to manage, say a 5 MW fire and, for example, the proposed deluge system could be relied upon to control the fire 99% of the occasions on which it is called upon to act. Automatic activation is probably required to achieve such reliability. In legal terms this may be considered to be beyond reasonable doubt? 15.6.4 Vulnerability Reduction And thirdly, reduce vulnerability by ensuring no one is present during a fire (minimal stalled cars) and the provision of emergency response, ventilation and evacuation systems. The critical scenario is high congestion with stalled traffic meaning there are stopped vehicles both before and after the fire. This makes the use of the longitudinal (jet fan) emergency mode problematic since it would blow smoke over one column of stopped traffic hampering evacuation. That is, with stalled traffic and longitudinal emergency ventilation, a heavy commercial vehicle fire will expose a large number of people who would have to evacuate through a smoky environment on foot. To reliably achieve this is very, very difficult. The lawyers (and regulators to whom such arguments have been presented) have always confirmed that precautions implemented before the loss of control point are the best place for the precautionary dollar. Complex, expensive, hard to model and unpredictable emergency measures invoked after the loss of control point attempting to bring a situation back under control are legally difficult to defend, especially when a sensible pre-loss of control point precaution was available. Obviously it is necessary to acknowledge and verify the reliability of the actual automatic systems that are proposed. Complex systems require commensurate safety assurance, such as through obtaining a Safety Integrity Level (SIL) pursuant to the Functional Safety Standard IEC (AS) 61508. REFERENCES Anderson K J and R M Robinson (1984). A Proposal for the Development of a Strategy Role in Bushfire Loss Reduction. Engineers Australia Local Government Conference, Melbourne. Anderson K J, R M Robinson and D J Hyland (1992). Ranking of Infrastructure Renewals Taking into Account the Business Requirements of the Railway. CompRail 92 Conference, Washington. Houbaer R and M Seddon (1995). Risk Management of Transmission Line Clearances in the HydroElectric Commission of Tasmania. Hydro-Electric Commission, Tasmania. Jarman M, C Tillman and R Robinson (1989). Management of Building Fire Risks through quantified Risk Assessment Techniques- A case study at Monash University. NSCA Convention, Monash Univ. Jones K, K Anderson, W Ely and R Phillips (1995). Application of Risk Analysis to Airspace Planning. ICAO Review of the General Concept of Separation Panel (RGCSP), Gold Coast, Australia. Robinson Richard M, Gaye E Francis, Kevin J Anderson (2003). Lessons from Cause-Consequence Modelling for Tunnel Emergency Planning. Proceedings of the Fifth International Conference on Safety in Road and Rail Tunnels. University of Dundee. pp 149-158. ISBN 1 901808 22 X. Victorian Occupiers Liability Act (1983). Now incorporated as the Part IIA of the Wrongs Act (1958) as amended in 1989. (Reprint No. 6. 15 January 1992). Victorian Occupational Health & Safety Act (1985). Act No. 10190/1985 (Reprint No. 5. 17 November 1998). Risk & Reliability Associates Pty Ltd 15.13

Occupational Health & Safety

16.
16.1

Occupational Health & Safety


Legislative Framework

16.1.1 History Early Occupational Health & Safety legislation followed on the heels of the industrial revolution and was generally very proscriptive and detailed and was largely aimed at factories and shops. In the 1960s it was becoming increasingly obvious that proscriptive legislation could not keep pace with social, economic and technological change. Attempts at doing so had resulted in a huge volume of sometimes complex and rigid regulations. Consequently in the early 1970s the British Government established a Committee of Inquiry, chaired by Lord Robens, to review OH & S in the UK. The report of this review (Robens, 1972), which came to be known as the Robens Report, was extremely influential in the reform of OH & S in the UK, but also in Australia, Canada and many other countries. All Australian States and Territories followed in the footsteps of the UK during the 1970s and 80s in a total overhaul of their OH & S legislation and regulatory framework. The other development during last century was the establishment in some countries including Australia of Workers compensation systems and laws. These had their origin not in the UK but in Germany in the th 19 century. Prior to the establishment of Workers Compensation schemes in Australia, the only avenue for injured workers to recover costs associated with their injury was to sue their employer under Common Law. This meant the injured employee had to prove that on the balance of probablilities the employer had been negligent. For many injured employees taking legal action was beyond their financial means and even if they could afford it they risked having court costs awarded against them if they failed to prove negligence. Hence it was often not worth taking this risk if the amount of potential damages was not much greater than the court costs. The other problem with the Common Law system, which is why some States have removed or reduced the rights of workers to sue under Common Law, is that it takes many years for a Common Law claim to be decided and in the meantime there is an incentive in the form of increased damages for workers remain injured ie it is counterproductive in terms of rehabilitation. Because of this Workers Compensation legislation in Australia places an emphasis on the rehabilitation of injured workers. 16.1.2 Acts, Regulations and Codes of Practice In Australia Occupational Health & Safety is regulated by the States and Territories. In other words they have the responsibility of making and enforcing the OH&S laws in the form of Acts and Regulations. Each State and Territory has an OHS Act which sets out the general requirements for ensuring safe and healthy workplaces. These Acts establish the structure and define the responsibilities for achieving this goal. They define the government bodies responsible for OH&S as well as specifying the duty of care required by employers, employees and others who may have an impact, by their acts or omissions, on workplace health & safety. Such people may be contractors, designers, suppliers or manufacturers. The main objectives of OH & S legislation are to ensure safety, health and welfare of people at work and to eliminate risks to health and safety from the workplace. However many of the OH & S Acts extend the duty to persons at the workplace other than the employees. Hence retailers have a duty towards customers on their premises and Educational Institutions have a duty to their students. This is simply a reinforcement of the Common Law Duty of Care. Regulations can be made to support the OHS Act. In some States and Territories there are OH & S Regulations which deal with a large number of hazards and issues, whereas in some jurisdictions the regulations are hazard specific eg Noise regulations, Asbestos regulations, Plant regulations. The Regulations specify in more detail the steps that must be taken to control specific hazards and by whom. In some states Regulations may be supported by Codes of Practice. These are basically practical how to comply documents with a lot of useful advice on assessment and control.

Risk & Reliability Associates Pty Ltd

16.1

Occupational Health & Safety 16.1.3 Standards and Guidance Documents The National Occupational Health & Safety Commission (NOHSC) draws up National Standards in consultation with State/Territory Health & Safety Authorities, employee unions and employer organisations. These are adopted into their legislation by the States/Territories or called up by them is the case for the National Standards for Atmospheric Contaminants in the Occupational Environment (NOHSC, 1995). Standards produced by Standards Australia and other organisations provide technical and design advice. Some are safety related such as those dealing with fire safety and emergency standards and many others contain some health & safety provisions. There are also many other codes, standards and guidance notes in the public domain, some produced by authorities such as NOHSC and other by bodies such as professional and industry associations. The legal framework is represented in the figure below:

Legal Framework 16.1.4 Compliance Compliance with Acts and Regulations is mandatory whereas with all the other types of document mentioned above, compliance is generally not mandatory unless the document is called up by an Act or a Regulation. However Codes and Australian Standards can be used as evidence in court to demonstrate what could have been done, that is, a form of best practice. Compliance is desirable unless another solution or precaution achieves an equal or better outcome.

16.2

Risk & Reliability Associates Pty Ltd

Occupational Health & Safety 16.1.5 Extent of General Duties The wording of the General Duties of Care in Australian OH & S legislation varies between jurisdictions. For example in Victoria employers must provide a safe and healthy work environment so far as is practicable, whereas in South Australia the extent of the duties are so far as is reasonably practicable. In some states there is no such qualification so that the duty imposed is absolute. However to date there is no evidence that these differences have lead to a higher compliance standard being enforced in one State than in another. Practicable is defined (Occupational Health & Safety Act, 1985) as having regard to: (a) the severity of the hazard or risk in question; (b) the state of knowledge about that hazard or risk and any ways of removing or mitigating that hazard or risk; (c) the availability and suitability of ways to remove or mitigate the hazard or risk; and (d) the cost of removing or mitigating that hazard or risk. In general the extent of the duties appears to the Common Law Duty of Care in all Australian jurisdictions however there are significant differences between jurisdictions when it comes to regulations and this can cause added complexity for companies operating across borders. 16.1.6 Penalties and Interventions Breaches of OH & S legislation can result in fines being imposed, generally through proceedings in a Magistrates Court. But the legislation provides for inspectors and in some states other parties, such as Health & Safety Representatives, to issue Improvement Notices or Prohibition Notices. An Improvement Notice requires an employer to take specified actions within a stipulated time period. A Prohibition Notice requires work to cease until specified remedies have been implemented. It is important to be aware of the rights and powers conferred on certain types of individual under OH & S legislation as hindering these people or failing to respond to notices is usually also an offence. 16.1.7 Definition of Employer Whilst all employees are in no doubt as to this status under OH & S legislation, the question of which employees if any could be deemed to also be the employer generally causes more anxiety. The interpretation that is now generally applied is that anyone in a management or supervisory role, that is anyone who is involved in the management of others, could be an employer. There have not been many cases where middle or lower managers or supervisors have been prosecuted for OH & S breaches but it would appear that for this to occur the manager must have knowingly issued instructions or omitted to take action that s/he knew was in violation of company policy or OH & S requirements, in other words that s/he knowingly by act or omission put others at risk.

16.2

OH & S Risk Assessment

Most Australian legislation specifies that a process of hazard identification, risk assessment and risk control must be undertaken. In most instances the risk assessment methodology used is the risk matrix approach from the Australian Risk Management Standard although this Standard presents the matrix as one of several methods that can be used. The matrix approach has already been described in Chapter 7. In the OH & S context hazards are usually categorised using the energy-based classification described in Section 5.5. In our experience risk assessments are often worthless or worse, lead to efforts and expenditure being targeted inappropriately, because the hazard or vulnerability has not been properly defined. Furthermore the estimation of consequence or likelihood is often attempted using the qualitative scales given in the Standard and this then becomes a very subjective process.

Risk & Reliability Associates Pty Ltd

16.3

Occupational Health & Safety Sometimes where there are several vulnerabilities from the one hazard a critical vulnerability can be overlooked. For example, a risk assessment of liquid nitrogen use in a laboratory dealt with the risk of liquid nitrogen burns but it did not deal with the risk of asphyxiation because presumably controls were believed more than adequate as they were of best practice standard (good ventilation, backup ventilation, 24hr monitoring of ventilation, oxygen monitoring and alarms). In effect all the controls failed to one degree or another and a worker died. In hindsight it would have been better to focus risk management resources on those that had the potential for greatest consequence. The legislation requires that risk control must be based upon the Hierarchy of Controls which is defined in Victoria as being in the order of most to least preferred: 1. 2. 3. 4. 5. Elimination Substitution Engineering controls Administrative controls Personal protective equipment and clothing

There are small variations to this in other states/territories. Because of the legislative requirement to carry out risk assessments, which must be documented to prove that they have been, this can result in an extremely large list of controls that need to be implemented. The authors belief is that the best use of resources if frequently obtained by ignoring the risk assessment stage and going straight to the identification of risk mitigating controls/precautions. It is interesting to note that this concept has now being adopted in the UK and elsewhere with respect to substances where inhalation exposure is one of the main risks (IOHA, 2002). The concept of control banding is an attempt at shifting the emphasis onto controls rather than risk assessment by simplifying the risk assessment, which for inhalation exposures amounts to exposure assessment. 16.3 Performance Indicators

There are a number of possible performance measures available to assess risk and reliability. Commonly used ones are based around: Fatalities (total number and frequency of occurrence). Injuries (total number, severity and frequency of occurrence). Statutory breaches (number and severity). Days gained or delayed (especially for projects and contracts). Dollars (gained or lost). Availability (% time operating).

These measures can be per period (per day, week, month or year) for an organisation or for a particular contract or project. A number of the more commonly used formulations follow. 16.3.1 Fatality Risk A common form of assessing fatality risk is: Fatality Risk from an activity = Number of deaths per annum from that activity Exposed Population

Obviously, this is only statistically significant if the exposed population is of a reasonable size. Sometimes if the number of lives at risk can be assessed, attempts are made to assess the value of human life in financial terms. Ramachandran (1995) summarises the five methods his research shows are used to value human life.

16.4

Risk & Reliability Associates Pty Ltd

Occupational Health & Safety i) Gross Output This examines the gross output based on goods and services that a person can produce if not deprived, by death, of the opportunity to do so. This gives a relatively small value to a human life. Livelihood Approach This is not altogether different to the output approach, assigns value in direct proportion to income. This also gives a relatively small value to a human life. It favours the higher paid over lower the paid. Insurance Method This uses the value of life insurance policies purchased by individuals. This is a form of self valuation but has constraints in that what one person thinks their life is worth and what they can actually afford may be quite different. Court Awards This involves the awards given to the heirs of the deceased person. Willingness to pay. This approach to value life rests on the principle that living is generally an enjoyable experience for which people are willing to sacrifice other activities such as consumption. That is, how much people are willing to pay to feel safe. It reflects the notion of consumer sovereignty.

ii)

iii)

iv) v)

16.3.2 Lost Time Frequency Injury Rates There have been attempts to reduce injury statistics to single numbers to compare the performance of organisations. This does not seem to have been hugely successful. Consider for example the use of a measure called the Lost Time Frequency Injury Rate (LTIFR) for OH&S performance described in the Australian Standard AS 1885.1-1990. The LTFIR is calculated by the number of incidents where more than a day was lost in a given period per million hours worked. The Lost Time Injury Rate (LTIR) is defined as the occurrence of lost time injuries per 100 workers. Even if actual days lost (per million hours worked) is used as a measure of risk, care needs to be taken with a cash flow view compared to an accruals view. The figure below represents four work injuries that occurred over three years. Each has a different duration as shown. The diagram indicates that there were three incidents in the year 2002/03, with the days lost being shown in the light grey hatching. Incident 2 was carried over from 2001/02 and incident 4 was carried over into 2003/04 and extended the whole year and beyond.

Schematic of Four Injuries that Occurred over Three Years

A consequence of the cash flow approach is that a death is measured as a loss of one man year, which is regarded as a ridiculously low value. For example, this compares to a debilitating back injury extending over several years (which would be bought to account each year) and from which a complete recovery was made, something like accident 4. This means that the focus of companies that use a concept like the Lost Time Frequency Injury Rate or any cash flow basis of risk accounting, would be on high frequency, low severity events, rather than high severity (fatality), low frequency events, the primary focus of regulators and the courts.

Risk & Reliability Associates Pty Ltd

16.5

Occupational Health & Safety An alternative proposal is for an accruals basis using days lost. So in the case above the 2002/2003 year has only two incidents that actually occurred in it (labeled 3 and 4). However, the days incurred extend into 2004. The whole of this amount would be bought to account in the 2002/03 year and 2003/04 would be deemed to have no injuries. Since an accruals basis of accounting is the one most organisations use, and the one which the whole organisation is usually trained to understand, using a cash flow basis for injury measurement seems curious. A detailed discussion of this sort of problem and other difficulties associated with the use of existing injury indicators is contained in WorkSafe Australia (1994) documents entitled Positive Performance Indicators, Beyond Lost Time Injuries. 16.4 Information Structures

This section actually addresses a larger risk management domain than OH & S but this seems to be the context in which it is most frequently raised. 16.4.1 Hazards (Vulnerabilities), Incidents and Risk The relationship between hazards (vulnerabilities) and incidents requires clarification. It is always better to focus on preventing hazards rather than managing incidents from a control viewpoint. This perhaps can be best explained as follows. Hi = Particular or specific hazard {Hi} = Set of all known hazards Ij = Particular or specific incident {Ij} = Set of all known incidents (for i = 1 to n hazards)

(for j = 1 to m hazards)

n is much larger than m, hence { Hi } { Ij } For every Ij there is a particular Hi, but not vice versa.

Frequency

Ij

Hi
Severity
Relationship of Incidents to Hazards or Vulnerabilities A pictorial representation on a risk curve is shown in the figure above. Note that the focus of risk management should be on the set of all hazards. The set of all possible incidents is in fact identical to the set of all hazards except that over a particular time period most have a null rather than actual outcome.

16.6

Risk & Reliability Associates Pty Ltd

Occupational Health & Safety For example, if a company were exposed to i hazards in a defined period, say a year, then the set of hazards {Hi} would be represented as {H1, H2, H3 , H4, H5, ... }. If we then look at data for a particular year there might have been only three actual incidents. These could be represented by {Ij} = {I1, I2, I3, I4, I5, ...}. However, since only three of the incidents do not have a null outcome it would be better represented as {Ij} = {I2, I4, I10}. The risk associated with each hazard and incident is the product of likelihood and severity. That is, how likely it is of occurring and how many days are lost, for example, if it occurs. In the case of an incident the likelihood of occurrence will be 1 for an incident that has occurred and 0 for one that hasnt occurred. The table below sets this out using some hypothetical figures. Note that the null incidents are also shown.
HAZARDS LikeSeve- Risk lihood rity 0.1 2 0.2 0.2 0.05 0.3 0.65 0.025 0.001 0.45 0.01 0.5 0.005 0.003 3 50 2 13 260 1500 0.5 6 60 100 1 0.6 2.5 0.6 8.45 6.5 1.5 0.23 0.06 30 0.5 0 51.1 INCIDENTS AND OCCURRENCES LikeSeve- Risk lihood rity I1 0 2 0 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 : Ij Ij 0 0 0 52 1 0 2 0 0 0 0 0 1 0 3 50 2 13 260 1500 0.5 6 45 100 3 0 4 0 0 0 0 0 45 0 CLAIMS JUDICIAL PROCEEDINGS LikeSeve- Risk LikeSeve- Risk lihood rity lihood rity 0 2 0 J1 0 2 0 1 0 1 0 0 0 0 0 1 0 0 3 50 2 13 260 1500 0.5 6 45 100 0 3 0 2 0 0 0 0 0 45 0 0 50 J2 J3 J4 J5 J6 J7 J8 J9 J10 J11 : Jj Jj 0 0 0 47 0 0 1 0 0 0 0 0 1 0 3 50 2 13 260 1500 0.5 6 45 100 0 0 2 0 0 0 0 0 45 0

H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 : Hi Hi

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 : Cj Cj

Event Horizon <<<<<<Pre-Event Control / Post - Event Management >>>>>>>>>>>>>>>>>>>>>>

Concept Hazard (or Vulnerability) Register In this particular example the total risk due to the hazards is 51.1, which represents a theoretical loss of fifty one days. The incidents that are recorded show that fifty two days were lost, although only two of the potential hazards actually caused the incidents. In fact, if there are a statistically large enough number of hazards then the sum of the probabilised outcome of the hazard set should be equal to the sum of the actual incidents experienced. This means that with a large amount of data over a long period of time it is possible to determine the probable risk loss, based on the following formula:

0 Risk { Hi } = 0 Risk { Ij }
The focus is then on reducing the probable risk amount that in turn will reduce the actual risk loss due to incidents occurring.

Risk & Reliability Associates Pty Ltd

16.7

Occupational Health & Safety 16.4.2 Coordinated Information To ensure that the information regarding losses, incidents, near misses and control system failures are effectively recorded, a coordinated risk information system needs to be available. This may be part of other information systems but its definition needs to be independently developed to support the predicted credible loss scenarios (especially legal and insurance details) and identify in a timely manner any emerging, unpredicted hazards or vulnerabilities. In terms of a strategic risk management control system, names need to be given to different parts so that it is clear to everyone what is being discussed. There are several possible good solutions to the 1 naming issues so the principle of Ockhams Razor has been applied. Broadly this means, choose the simplest answer unless a reason to select a more complex one is discovered. In information terms for risk management this can mean:
Hazards Incidents Loss of Control (Near Misses, no loss) Losses Death Injury Medical cost Damage Statutory Breach Claims (Insurable losses) Courts

Time

Hazards

Control System Failures

Risk Control Management Efforts

Risk Management Information System

Hazard (Risk)s Information Framework There can be discussion about the desirability of including control system failures in Incidents, especially if there were other parallel control systems in place which prevented the loss of control so that a near miss occurred or the hazard did not occur whilst the control system was not operational. The authors believe any control system failures ought to be recorded as a significant increase in these are indicative of the health or otherwise of the control system. In cause - consequence terms it means incidents are all those items shown in the larger shaded area below.

Hazards Loss of Control


Control System Failure

Loss

Incidents

Near Miss

Concept Cause-Consequence Diagram for Information Framework In practice the documentation process outlined in the following figure is needed.

Ockham's Razor. The usual formulation of the principle of ontological economy attributed to William of Ockham is: Entia non sunt multiplicanda praeter necessitatem or Entities are not to be multiplied beyond necessity.

16.8

Risk & Reliability Associates Pty Ltd

Occupational Health & Safety


Fire Model etc.Summary etc. Collision Model Review by Period Likelihood 100 (p.a)
10 1

Loss Calculator Vulnerability Register Co-ordination Incident ProForma analysis Date Event Type Damage

1 10 100

10000

Consequence '000$

Event Lookup Tables Location Exchange Treasury Collision Production Dang. Goods Shipping OHS &E

summary report
Fire Model etc Summary Collision Model Feedback by Event Type Hazard Control Incident Loss Null Reporting by Region per Period

recommendation

advice

Cause-consequence Models incorporating energy-damage and time-sequence analysis concepts

Strategic Information System A key element is the need to assess the significance of each incident by co-ordinator/s. For example, the authors have noted many cases where an incident such as a broken rail is given the same rating irrespective of the location, be it a remote siding or a busy main line with many high speed passenger trains. Obviously a sudden increase in main line breaks is of considerably greater concern than a similar increase on rarely used sidings. To obtain this understanding requires a co-ordination review that reclassifies, on the basis of current operation, the risk associated with each event. Then and only then can a review by period have meaning. 16.4.3 An Integrated Concept of Risk & Reliability Information Management The figure below describes an understanding of how the different processes and techniques described in this text fit within a large organisation, and how the information flows occur.
Board and CEO (Policy)
Vulnerability Analysis SWOT Analysis Underwriting Assessment Availability Assessment
Crisis Management

Top Down

Review Co-ordination
Operations & Maintenance

Feedback Control QRA Hazops RCM Job Safety Analysis (JSA) Detectability Reliability Maintainability
Cause-Consequence Modelling etc Pre-event Strategic Event Tactical

Reporting
Losses Incidents & Breakdowns Fire Fighting First Aid Bottom Up Judicial Actions Insurance Payments

Post-Event

An Integrated Concept of Risk & Reliability Information Management Interestingly, the process of risk related information management does seem to need the loop shown in the circle above. Note that this does not exclude once off studies over any of the boundaries, which can be done at any time.

Risk & Reliability Associates Pty Ltd

16.9

Occupational Health & Safety 16.5 Audit & Safety Management Systems

There has been a continuing desire to develop systems that can provide advice as to the overall effectiveness of risk control systems. These have manifested themselves in various auditing and scoring systems. 16.5.1 SafetyMAP The Victorian WorkCover Authority has developed a health and safety audit system whose purpose is to enable an organisation to: a) b) c) d) Measure the performance of its health and safety program Implement a cycle of continual improvement Introduce recognised bench marking standards for health and safety Gain recognition for its health & safety management standards.

It has five elements: 1. 2. 3. 4. 5. Health and safety policy Planning Implementation Measurement and evaluation Management review

Initial Level Certification requires an organisation to satisfy the requirements of 82 SafetyMAP audit criteria. The Victorian WorkCover Authority states that these criteria have been selected as encompassing the building blocks for an effective, integrated health and safety management system. Advanced Level Certification requires all 125 applicable SafetyMAP audit criteria to be in place. Interestingly, this system is based on the concept of ensuring that the process (the presence and effectiveness of management systems) is well and that therefore the proper results will follow. However, as the Victorian WorkCover Authority notes: However conformance to SafetyMAP criteria, whether recognised by formal certification or other means, does not assure compliance with statutory obligations nor does it preclude any action by a statutory body. The danger with such a system is that OH & S resources become focussed on preparing documentation rather than action and prevention.

16.10

Risk & Reliability Associates Pty Ltd

Occupational Health & Safety 16.5.2 ISRS (International Safety Rating System) This has been developed in various guises in different parts of the world. The manifestation described here is that by Det Norske Veritas (UK, 1996) who appear to have purchased the Frank E Bird, Jrs (1976) Atlanta based International Loss Control Institutes program. The program is based on several key propositions: 1. 2. 3. 4. Safety is good for business and profits. Proactively managing loss is much better than reacting to events. Losses are ultimately due to a lack of effective management systems. An audit system can indicate the health of the proactive loss control management systems.

The following figure shows the time sequence model adopted.


Lack of Control Basic Causes Personal Factors Job Factors Immediate Causes
Incident Loss

1.Inadequate Progamme 2.Inadequate Programme Standards 3.Inadequate Compliance Standards

Substandard Acts and Conditions

1. Inadequate Programme Contact 2. Inadequate with Program Standards Energy or 3.Inadequate Substance Compliance Standards

People Property Process Environment Quality

The DNV Loss Causation Model The key program elements and points score/weighting are given in the table below. Recognition levels are scored out of 10. ISRS Program Elements 1. Leadership and Administration 2. Leadership training 3. Planned inspections and maintenance 4. Critical task analysis and maintenance 5. Accident/incident investigation 6. Task observation 7. Emergency preparedness 8. Rules and work permits 9. Accidents/incident analysis 10. Knowledge and skill training 11. Personal protective equipment 12. Health and hygiene control 13. System evaluation 14. Engineering and change management 15. Personal communications 16. Group communications 17. General promotion 18. Hiring and placement 19. Materials and services management 20 Off-the-job safety ISRS Program Elements Like the other audit systems, scoring a perfect 10 out of 10 does not mean that all legal duties have been met. Points 1310 700 690 650 605 450 700 615 550 700 380 700 700 670 490 450 380 405 615 240

Risk & Reliability Associates Pty Ltd

16.11

Occupational Health & Safety 16.5.3 The DuPont Safety Training Observation Program (STOP) System STOP was developed by DuPont to provide a behaviour based observation program that may be used to improve safety in any organisation. This system is designed to be used by management at all levels. It is not really an audit system as such although the authors' have observed it being used in this capacity. STOP is based on a series of Safety Principles noted below: All injuries and occupational illnesses can be prevented. Safety is everyone's responsibility. Management is directly accountable for preventing injuries and occupational illnesses. Safety is a condition of employment. Training is an essential element for safe workplaces. Safety audits must be conducted. Safe work practices should be reinforced and all unsafe acts and unsafe conditions must be corrected promptly. It is essential to investigate injuries and occupational illnesses, as well as incidents with the potential for injury. Safety off the job is an important element of the overall safety effort. Preventing injuries and occupational illnesses is good business. People are the most critical element in the success of a safety and health program.

DECIDE

REPORT

STOP
STOP

... for Safety

ACT

OBSERVE

Safety Observation Cycle

The procedure to be used can be seen in the Safety Observation Cycle in the figure above. This shows path of action which starts with a manager deciding to observe an employee. The manager must then stop and watch the employee carry out their job, particularly noting how the employee does or does not adhere to safe working practices. The manager then needs to approach the employee and discuss their working practices reinforcing the safe ones as well as addressing the unsafe. The manager then needs to report the situation appropriately to their superiors. In an Australian cultural context, the system has various degrees of success attributed to it. Certainly, the authors' have noted that if it becomes known as the 'dob-a-mate' technique then it seems to be a cultural anathema and failure. Conversely, if it becomes a 'look-after-your-mate' process then it seems to have a good chance of being effective.

16.12

Risk & Reliability Associates Pty Ltd

Occupational Health & Safety 16.5.4 NSCA 5-Star Health & Safety Management System The NSCA 5-Star Health & Safety Management System was developed by the National Safety Council of Australia to identify the elements of a complete OH&S program. It also provides organisations with a framework for improvement and quantitative measurement of OHS performance. The system uses 60 key elements considered to be comprehensive and exhaustive set of risk management components for any organisation in any aspect of business. These 60 Key Elements are grouped into 5 categories in the NSCA system. The categories are as follows: 1. 2. 3. 4. 5. Policy, Organisation & Program Management Management of Health & Safety Risks Control of Specific Work Risks Working Environment Emergency Preparedness & Management

There are 5 star gradings: zero to five. Grading Audits are conducted within an organisation on an annual basis and assessed according to Key Elements. A star grading is awarded after each annual grading audit to record an organisation's standard of achievement in implementing best practice levels of risk management. A One Star grading means that the organisation's OHS system is better than approximately 50% of other organisations. Respectively, a Five Star grading means that the organisation is in the top 2-5%. The Key Element Score (KES) and the Injury & Illness Statistics Index (IISI) are then used to assess the current state of an organisation and allocate a Star Grading. Star Grading 0 Star 1 Star 2 Star 3 Star 4 Star 5 Star KES% 00-49 50-59 60-69 70-79 80-89 90-100

The Benefits of using the NSCA 5-Star Health & Safety Management System are described as: 1. A better measurement of performance 2. Independent assessment 3. International recognition 4. Improved management skills and communication 5. Improved employee involvement The NSCA 5-Star System states the following in terms of legal obligations; "The organisation's standards of health, safety and environmental risk management are normally based on continuous improvement above the legal statutory minimum obligations up to international "best practice". Where national/international standards are incomplete, or unacceptably low or non existent, NSCA 5Star System (Version 2) assists an organisation define its own standards based on its corporate structure."

Risk & Reliability Associates Pty Ltd

16.13

Occupational Health & Safety REFERENCES Bird Frank E, Jr (1974). Management Guide to Loss Control. International Loss Control Institute, Georgia, USA. DuPont Safety and Environmental Management Services, DuPont STOP for safety system (supervision) 1986, Revised 1992 and 1995 IOHA (2002) Report of the International Control Banding Workshop, International Occupational Hygiene Association, London 2002. National Safety Council of Australia, 5-Star Health and Safety Management System. Version 2 (1995). NOHSC (1995) National standards for Atmospheric Contaminants in the Occupational Environment, NOHSC:3008. National occupational Health and Safety Commission, Canberra Ramachandran G (1995). Value of Human Life. Society of Fire Protection Engineers Handbook (1995). Section 5, Chapter 8. Society of Fire Protection Engineers, Boston. Robens, Lord (1972) Committee on Health and Safety at Work, Report. HMSO, London Standards Australia/Standards New Zealand (1999). Risk Management. Australian/New Zealand Standard AS/NZS 4360:1999. WorkSafe Australia (National Health and Safety Commission) (1994). Positive Performance Indicators, Beyond Lost Time Injuries, Part 1 - Issues. ISBN 0 644 35266 3. The Commonwealth of Australia. WorkSafe Australia (National Health and Safety Commission) (1994) Positive Performance Indicators, Beyond Lost Time Injuries, Part 2 - Practical Approaches. ISBN 0 644 35267 1 The Commonwealth of Australia. Victorian Occupational Health & Safety Act (1985). Act No. 10190/1985 (Reprint No. 5. November 1998) Victorian WorkCover Authority (2002). A Guide to Occupational Health and Safety Management Systems. SafetyMAP (4th Edition). READING The NOHSC web site (http://www.nohsc.gov.au/) provides a lot of useful information as well as providing links to all the State/Territory Authority web sites.

16.14

Risk & Reliability Associates Pty Ltd

Financial Risk

17.

Financial Risk

The good news is that risk can have its speculative as well as negative aspects. It can offer business opportunities. The more successful companies become at identifying and managing risk, the bigger the comparative advantage they gain. 17.1 Terms

Banks and large-scale financial institutions have only comparatively recently started to focus on riskadjusted return measures on capital rather than purely a return on asset or book equity. (Smithson 1997). In doing so, they appear to be creating a new lexicon. Terms currently used include: VAR EAR Value at Risk Earnings at Risk

Raroc Risk adjusted return on capital Rorac Return on risk adjusted capital Rarorac Risk adjusted return on risk-adjusted capital Economic Capital = credit risk capital + market risk capital + operational risk capital. These are obviously designed to take the costs-of-risk into account in terms of the financial institutions business. In part it is trying to ask the question: How much has been earned for the risks that have been taken? The problem of advisors and managers taking extreme chances with someone elses money is always real. If everything goes well, everyone profits from such extreme risk taking (blue sky) but if matters sour it is the shareholders and investors that lose money, not the advisors or managers. Based on perusal of US magazines like Risk and Financial Derivatives and Risk Management there are remarkable pockets of extraordinary sophisticated statistical modelling occurring. But whether this is truly cost effective is difficult for an outsider to know. There are editorials reporting that some managers feel, that using derivatives destroys shareholder value through the costs of dealing; monitoring the transactions; and management time. (Cooper, 1997). That is, that the costs of managing risk can exceed the reduction in the costs-of-risk. Further, such activities impose risk on others. 17.2 Hedge Funds

In September 1998 the world came, within a whisker of meltdown (David Thomas 1999). It arose because the Long Term Capital Management (LTCM) a US based hedge fund went to the wall. US Hedge funds reportedly manage up to $1 trillion US dollars, most (up to 90%) of it borrowed. They insert this in various markets acquiring around $10 trillion worth of exposures. To put this in perspective, the GNP of the US is reported to be about $7 trillion. These hedge funds are secretive things. Provided there are less than 100 investors, they do not have to report to the US government who these investors are, how much money they raise and how they invest it (Browning, 1998). Basically, it seems that despite having (or perhaps because of it) Nobel Prize winning economists on staff, LTCM punted and lost. By September 1999, it had lost about 90% of its capital. However, rather than let the company remain subject to market forces and let it go belly up, with the US Federal Reserve leading, the US financial community and Wall Street Authorities provided enough capital (US$3.65 billion) for the hedge fund to be salvaged. The idea was to prevent a domino effect that might fatally destabilise a weakened global market. This approach contrasts vigorously with the approach taken by the IMF and the US with Asian and Latin American debtor countries. Essentially what is happening is that the profits associated with the hedge funds are retained by the funds but the risks associated with their operation are being shared by the global community. Obviously, this has not gone unnoticed.

Risk & Reliability Associates Pty Ltd

17.1

Financial Risk For example, the Australian Treasurer, Peter Costello, has mentioned in a speech in March 1999 that the global overhaul in finance had to, amongst other things, address the need for better supervision of the highly leveraged international investors, such as hedge funds. To quote the Treasurer, Vested interests in the international financial sector who benefit from the international communitys sharing of their risks (but not their profits) will resist the necessary evolution in the international financial architecture. As the journalist Alan Wood noted at the time (The Australian), for vested interests, read the interests of Wall Street. This may not have changed much in recent times. Tim Colebatch (The Age) reviewing the Treasurer's presentation to the Asia Pacific Economic Summit (Sept 2000) notes that the Treasurer states that there is still no agreement on reforms such as requiring hedge funds in capital markets to disclose their operations. Many articles point out (Smithson 1997), the major challenge will be in accounting. The dominant methodology begins with book-keeping and subjects these to a series of adjustments governed by precise rules. This is backward looking at a stable past. Looking to an uncertain future, the essence of risk, and the performance of the market, market valuations of derivatives and the risks they are used to manage, is very difficult. And how should such general uncertainty in accounts be portrayed? How can all this be made transparent to investors and customers? 17.3 Utility and Risk

The financial economics literature always starts by discussing the concept of utility. In common parlance an individuals utility is the gain or usefulness one obtains from a certain course of action, relative to its cost, which in real life is often not quantified or even quantifiable at all. Of course, in finance things are quantified, and so certain simplifying assumptions must be made. These assumptions are extremely important to bear in mind, since the ultimate conclusions one comes to are highly influenced by these fundamental assumptions. Individual preferences are very diverse, individuals are often characterised as being either risk averse or risk takers. Those who gamble say at a casino or in tattslotto are willing to lose a small, or often cumulatively not so small amount of money, in the hope of making a large gain. Rationally, they are playing what is called in the statistics literature a negative sum game. They are certain to lose in the long run, otherwise the casino could not pay all its operational costs and return a profit to its owners. The risk function of the gamblers is not symmetric, since they accept a small loss in the hope of a large gain. But most models assume that financial risk is symmetric. 17.4 Models

Markets go up and down. So risk in market terms can be adverse (pure risk) or beneficial (speculative risk). From observation and experience it would seem most investors have a greater preference for not losing money rather than gaining it, that is, given equal probabilities risk averseness is the norm. However, in finance risk is normally assumed to be symmetric. This is not absolutely true, but by making such an assumption many of the tools of statistics become available, most notably the normal distribution, which is symmetric about its mean value.

Standard deviation deemed to equal risk

Pure Risk

Speculative Risk

Rate of Return
Rate of Return 17.2 Risk & Reliability Associates Pty Ltd

Financial Risk So for practical reasons it is assumed that mean and standard deviation are the appropriate measures for the return and risk respectively, that is, risk is assumed symmetric and investors are risk neutral. This enables some formal definitions. i. ii iii. Mean = 'average' = a measure of central tendency or average return on the asset. Standard deviation or its volatility. (The terms risk, volatility and standard deviation of returns tend to be used synonymously) Distribution. For example, stock prices are log-normally distributed

In the finance sector there are a wide number of uses to which the above principles can be put: Life Insurance: matching of assets and liabilities and solvency margins General Insurance: business risk, catastrophes, re-insurance and claims reserving Superannuation and Funds Management: asset allocation, returns to members, guaranteeing, minimum returns. Banking: risk assessment, accumulations of risk, value at risk (VAR) across the business, derivatives. The general approach to dealing with portfolios of assets or liabilities is the same. Since financial market returns are ultimately dependent upon the economy, events will tend to affect different assets in often similar ways. For example, if interest rates rise a bank will find problems in all aspects of its book: real estate, business or other loans. Given that returns to assets are clearly not independent statistical principles are again used. This is covered in further detail in section 12.5 Market Risk Mathematics. 17.4.1 Diversification: Systematic and Unsystematic Risk Within an asset class most securities are highly correlated. Hence there is a limit to the reduction in variance, which can be achieved in practice. The risk of being in the market per se cannot be eliminated (indeed it is the source of the reward). The index for an asset class and its standard deviation (= risk) is effectively the minimum risk for that asset class. In practice, this can be achieved with a relatively small number of securities - 25-30 is usually more than adequate and even 10 may not be far off. That element of risk, which can be eliminated by diversification, is called diversifiable or unsystematic. That component which remains (the core risk for being in the market) is called systematic. 17.4.2 Asset Allocation Securities can be categorised into asset classes (in an intuitive sense) which have like characteristics, for example, fixed interest securities, Australian equities, international equities and so on. (There are, of course, securities, which are hybrid or intermediate in nature). Indices are used to represent price movements in the asset classes as a whole. In Australia we use: All Ordinaries Index Commonwealth Bank Bond Index (All Maturities) Morgan Stanley Capital International Index Australian Equities Fixed Interest International Equities

The above principles are used to build suitable portfolios of assets, that is, by knowing the correlations between asset classes, optimal portfolios can be built. That is, the appropriate asset mix that gives the best possible return can be determined for a given level of risk. A plot of these points is known as the efficient frontier. 17.4.3 Value at Risk (VAR) The risk in any business can be assessed in just the same way, not just for funds management. Thus by analysing a bank into its component assets and liabilities one can derive a single estimate of how much a firm could lose due to the price volatility of the instruments it holds. This methodology, introduced by J P Morgan, requires just such a system of correlations and matrices as described above. (Of course, there are competing methodologies of risk assessment, described in the Risk magazine special supplement).

Risk & Reliability Associates Pty Ltd

17.3

Financial Risk 17.4.4 Solvency Risk Both general and life insurance companies need to maintain prudent levels of reserves to cope with fluctuations in the business. They set their solvency levels so as to be able to meet all eventualities to a certain level of probability. By analysing their assets and liabilities they can assess this particular measure of risk because the resulting portfolio of assets and liabilities (by assumption) follows a normal distribution. 17.4.5 Claims Reserving The process of measuring outstanding liabilities is called claims reserving. A company needs to estimate current levels of profit, but leave behind sufficient reserves to meet obligations as they arise. These obligations may not arise for many years, as in diseases like asbestosis. By putting together the risks from the separate lines of business one can assess risk for the company as a whole. 17.5 Market Risk Mathematics

In finance, risk is normally assumed to be symmetric. In taking such a position, market risk analysts are defining risk as a simultaneous combination of pure and speculative risk. That is, the likelihood of loss is the same as the likelihood of gain, an interesting and perhaps optimistic assumption. It may not be true, but by making such an assumption many of the tools of statistics become available, most notably the normal distribution, which is symmetric about its mean value. So for practical reasons it is assumed that mean and standard deviation are the appropriate measures for the return and risk respectively, that is, risk is assumed symmetric and investors are risk neutral. This enables some formal definitions.
Standard deviation deemed to equal risk

Pure Risk

Speculative Risk

Average Rate of Return

Standard Deviation Showing the Mean and Variance i. Mean = average = r =

r
i= 1

pi where

pi = prob (ri ) of occurrence; as a measure of central

tendency or average return (r) on the asset.

ii.

n Standard deviation S = ri r i= 1

( )

pi

1 2

is a measure of the risk of an investment or its

volatility. (The terms risk, volatility and standard deviation of returns tend to be used synonymously). Also of use are the skewness and kurtosis (3rd and 4th moments about the mean), which are measures of the symmetry of the distribution and its peakedness, respectively. For the standard normal these are 0,1,0,3 respectively. iii. Distribution. For example, stock prices are log-normally distributed that is:

ln pt ~ N , 2 the standard normal distribution.


Similarly:

% pt ~ N , 2 which is the more usual way of expressing this fact.


17.4 Risk & Reliability Associates Pty Ltd

Financial Risk (that is,

p ln t = ln pt ln p t1 ) p t 1

In the finance sector there are a wide number of uses to which the above principles can be put. Since financial market returns are ultimately dependent upon the economy, events will tend to affect different assets in often similar ways. For example, if interest rates rise a bank will find problems in all aspects of its book; real estate, business or other loans. Given that returns to assets are clearly not independent, statistical principles are again used. Correlation and the Correlation Coefficient Given any two series

X = {x 1 , .... x n } Y = {y1 ,.. .. y n }it is of considerable interest to estimate any

linkages between the two time series. A measure of this is the covariance, or the degree to which the series rise or fall together, and it is defined to be: cov ( X , Y )= Note that :

1 (x x )(y i y ) = E (( X x )(Y )) n i

Var

(X )= cov (X , X )= E (X )2 = X 2

The correlation coefficient between two series is the standardised variate, which has a value between +1 or perfect correlation and -1 or perfect inverse correlation:-

X ,Y =
12.5.1 The Two Variable Case

cov (X , Y ) XY

Since market risk analysts use the standard deviation as a measure of risk, the need arises to consider what happens when two securities or assets (X and Y) are combined in a simple portfolio. In general it is assumed that the securities are not independent and that the price changes will in fact be correlated. Securities are after all only financial claims on assets in the real economy. Thus:

var (X + Y )= E (( X + Y) ( X + Y )) = E (X X
2 Y

2 2

[ ) + (Y )] = E ( X ) + 2 (X )(Y )+ (Y ) ] ( = E ( X ) + E (Y ) + 2 E (X )(Y )) (
2 2 X X Y Y 2 2 X Y X Y

= var (X )+ var (Y )+ 2 cov (X , Y) = var (X )+ var (Y )+ 2 =


In general:
2 X

(var (X )var (Y )

+ + 2 X Y

2 Y

var (aX + bY )= a 2 2 + b 2 2 + 2 a b X Y X Y
and

E (aX + bY ) = aE (X )+ b E ( ) Y

Risk & Reliability Associates Pty Ltd

17.5

Financial Risk 12.5.2 Extension to n securities : Real Portfolios, Real Assets Real world portfolios consist of many securities (within an asset class) and indeed many potential asset classes (each with n securities). The above may be extended by noting:
n

If

S n = X 1 + ... .+ X n = X i
i =1
h

X i not independent and var (X i )= 2 i

Then

var (S n )= 2 + 2 cov (X i , X i ) the 2nd term being all possible combinations of X i , X j , i


i =1 i,j

and noting that cov

(X X )= cov (X X ), then there are


i j j i

n(n 1) n n! pairs = = 2 (n 2)! 2! 2


This can be put in matrix form (the variance - covariance matrix)

X1 X2

Xn

X1 2 1,1 2 2 ,1 . . . . 2 n ,1

X2 2 1,2 . .

Xn . . . . . . .

2 n ,n

Note: the leading diagonal being the variances and the matrix itself being symmetrical about the leading diagonal. The above process may then be used to combine assets in such a way as to achieve a minimum variance or risk, for example by choosing assets that have a low or negative correlation with each other. This process is known as mean-variance optimisation. User friendly computer packages exist to remove the heavy computations. In optimising the risk of a portfolio of securities or assets, it becomes apparent from the above matrix that the number of covariances far out-number the number of variances. To simplify matters, let us assume we are dealing with a portfolio of N assets of securities. The proportion invested in each asset is 1/N. In each variance cell in the matrix we have (1/N) x variance and in each covariance cell we have (1/N) x covariance. Portfolio variance = N x (1/N) x average variance + (N -N) (1/N) x average covariance = (1/N) average variance + (1-1/N) average covariance As N increases, the portfolio variance approaches the average covariance. If the average covariance is zero, this mean that every asset or security behaves independently of the other and it is possible to eliminate all risk. However this rarely occurs in a given market or industry as assets or securities are affected by similar factors. The average covariance is the lowest level of risk than can be achieved by diversification. This residual is the market risk.
2 2 2 2 2

17.6

Risk & Reliability Associates Pty Ltd

Financial Risk REFERENCES Fukuyama Francis, Professor of Public Policy, George Mason University. The Independent (16/6/99) Browning Bob (1998). Hedge Fund Fears Come Years too Late. Article in News Weekly, October 17, 1998, pages 6 and 7. Colebatch Tim, The Age, September 27, 1999 Cooper Graham, Editorial, Risk Magazine, Volume 10/No 6/June 1997. Costello, Peter (Australian Treasurer) (1999). Reform. As quoted in two articles in the Weekend Australian, March 27-28, page 5, one each by Ian Henderson and Alan Wood. Radcliffe, Robert C., (1994) Investment: Concepts Analysis Strategy, 4th Edition, Harper Collins, New York. See p.170 for a discussion of alternatives both symmetric and asymmetric. Smithson, Charles, Tyrone Po and John Rozario (1997) Capital Budgeting. Article in Risk Magazine, Volume 10/No 6/June 1997. Thomas, David (1999). Nightmare on Wall Street. The Age - Good Weekend February 6, 1999. READING Francis, Jack Clark (1991) Investments: Analysis and Management, 5th Edition, McGraw Hill, New York. Hensel, Chris R.; Ezra, D.Don and Ilkiw, John H. (July-August 1991). The Importance of the Asset Allocation Decision, Financial Analysts Journal 65-72. Paul-Choudhury, Sumit et al (July 1996). Firmwide Risk Management - A Special Supplement to Risk, Risk Magazine, London.

Risk & Reliability Associates Pty Ltd

17.7

Security

18.0

Security

The international reach and severity of contemporary terrorism, the increasingly sophisticated modus operandi of much modern crime, especially white collar crime, and the way in which criminal networks are globalising, have raised the importance of security within risk management and good governance. Security is obviously more relevant to some enterprises than others. Generally the most relevant factors in assessing the vulnerability of companies to terrorist and/or criminal threats are the location, national identity and political profile of companies, together with the nature of their operations and products. However, few if any enterprises are immune from security risk of some sort. Even companies that are not the direct targets of terrorist or criminal intentions can be indirectly affected by attacks on others, principally by the way threat environments raise costs and impact stock and product markets. For example, the tourist industry has been directly affected by attacks on hotels and resorts. But it has also been indirectly affected by the attacks on airlines on which the tourist market depends. The cost of exporting certain goods to US markets has been affected by delays and costs caused by stringent new border crossing custom requirements. Public infrastructure management, in particular, is currently beset with the need to reassess security in the light of new terrorist threats. Meeting the costs of sometimes substantial enhancement of security impacts the user-payers as well as the owners of infrastructure. One or more of the new security threats is affecting businesses across the board. Threats range from mega-corporate bankruptcies as a result of management-auditor malfeasance, to industrial espionage, to electronic and credit card fraud, computer hacking and viruses, to petty vandalism. Whether directly attacked or not, public as well as corporate enterprises need to exercise security cognisance and apply the appropriate type and degree of security risk management in regard to this widening range of threats. 18.1 Security and Risk Management

The security function is required to cope with aspects of risk that differentiate it from other risk management functions. The chief of these is that security threats spring from deliberate intention rather than from accidental, natural, or dysfunctional systemic causes. Persons - not systems, nor components, nor acts of god - create security threats. Persons are not only capable of acts of ill will, but also have intelligence. Intelligence enables persons to discern what protective systems are in place and devise ways to defeat them. Consequently, the first priority and unavoidable task in the security process is to assess the threat. Does a threat actually exist? Do any politically or criminally motivated actions pose a significant risk to the enterprise in question? If so, what are the likely methods of attack? Once the threat assessment is made, then most of the regular processes and techniques of risk management kick in. The key steps after the threat assessment are common to both security and general risk management functions. Most if not all of the above steps that follow the threat assessment will or should have been performed in the course of previous risk management. Those assets vital to the conduct and success of the enterprise will already have been identified. System vulnerability to failure of systems due to explosion, fire, flood, mis-operation, and loss will also have been identified, as will resilience, appropriate crisis management, recovery plans, business impact costs, and so on. The cause of damaging events may be different in the security context, but the effects and responses are mainly replicated in the other areas of risk management. In most business and other organisations security is separated and often isolated as a management function. This occurs mainly for reasons of confidentiality. Nevertheless, security remains in essence a risk management function requiring coordination and integration into the overall management system and a key consideration in good governance.

Risk & Reliability Associates Pty Ltd

18.1

Concepts 18.2 Security Terms

Because security personnel use certain terms differently to other risk management professionals, it is appropriate to begin with a definition of three terms basic to security management. Security Management This refers to managing the risk of deliberate intention and attempts to cause harm and/or inflict loss. Security risk emanates from individual or agencies with will and intelligence. Consequently, it involves the potential to detect and defeat controls designed to preventing loss, dysfunction, or harm by natural, accidental or deliberate causes. Security Threats This refers to a generic risk or hazard of a security nature. For example, as used, in the sentences The threat of terrorism is being taken more seriously in Europe after the carnage in Madrid ; or, The threat of burglary is a constant concern of many householders. Security references are generally to the threat of..., not to a threat (as in Company X received a bomb threat). Security Vulnerability This refers to a weakness or susceptibility of something (a potential target) to a security threat. eg. as used, for example, in the sentences, The inadequately trained and equipped Iraqi police are particularly vulnerable to terrorist attack; or, Democracies provide countless soft targets for terrorists. Shopping centres, railway stations, and other crowded locations ,for example, are especially vulnerable as they are largely unprotectable. Non-security risk professionals often use the term vulnerability to indicate the extent of exposure of an organisation to some risk, rather than its susceptibility to that risk. For example, The firms vulnerability to currency fluctuations could be in the order of millions of dollars, compared with The firm is vulnerable to currency fluctuations. 18.3 Basic Elements of Security Management

The central considerations in the design or review of a security system is to identify and assess the following elements: Assets, Threats, Vulnerabilities, Business Impact and Counter Measures. The choice of elements is determined by the logic of the flow chart below:

ASSET
Yes

Valuable?
Yes

No

END

Threatened?
Yes

No

END

Vulnerable?
Yes

No

END

Adverse business impact?


Yes

No

END

Cost effective counter measures?


Yes

No

END

ACTION

18.2

Risk & Reliability Associates Pty Ltd

Security No security risk exists, nor is expense on counter-measures warranted unless the organisation in question has valuable assets; those assets are threatened and are vulnerable to those threats; and significant business impact would result if the threats eventuated; and cost-effective, appropriate counter- measure options can be identified.

Proposal Model The following suggests the basic elements of a generic model for a risk control proposal: Risk control measure A is proposed It is designed to protect assets B and C, which are at risk from threats D, E and F which are assessed as having the likelihood of occurrence G and H due to existing vulnerabilities I and J. The business impact (severity), if these threats eventuate, is estimated to be in the region $K-$L. There are also the human factors M and N to consider. The cost of implementation of the measure is $P, maintenance approximately $Q per year. The risk reduction from counter-measure A will produce an estimated cost-benefit in the order of $...... Assessment Status The assessment is that the above threats exist, are sufficiently likely to occur, and have significant business impact to warrant the above responses. Business impact assessment was made and/or checked out with functional managers: production, marketing, finance, legal, industrial relations, health and safety, insurance and other relevant personnel A Risk Control Format 18.3.1 Assets The first task in the security management process is to identify comprehensively all the significant assets of the organisation. This includes identifying the relative importance of various types of asset to the viability and success of the organisation. Not all assets are material assets such as capital, plant, equipment, products, etc. many of the more important are non-material assets. The chart below includes a number of asset categories as a partial guide to asset identification. Naturally every organisations list will be somewhat different and be more comprehensive. company reputation with consumers, the public, government, regulatory agencies, etc; morale, loyalty, retention, motivation of staff; industrial relations electronic data in transmission; information in the possession of staff; credit rating; competitive edge-comparative advantage. intellectual property market sensitive information Accounting and auditing integrity Good governance State of OH&S Position regarding legal liability

Risk & Reliability Associates Pty Ltd

18.3

Concepts The three charts below indicate ways of ensuring that the asset survey is complete, and that no assets are over-looked that would cause the organisation significant harm if lost or impaired;
Inward Goods Orders - Goods (quantity & quality) accounts - payments stock control

Raw Material Storage

Production

continuity, waste, quality control, formulae unaccountable, desirable stock control

Finished Goods

Product Warehouse

Outward Goods

accounting and stock control

Wholesaler/retailer Public Arena Consumer reputation market share liability extortion regulation

Asset Survey by Workflow


Staff Security Safe workplace Assault Harassment Discrimination Traffic control Car parks Change rooms Consumer Security Product liability Contamination Product Extortion Public Security Pollution Toxic emissions Fires Explosions

Asset Survey by Legal Issues


Competitive Marketing Customer lists Formulae Processes Price Sensitive Property buying Takeovers Personnel Medical records Salaries Form of Information Hardcopy Electronic email Mail Voice Location of Information IT centre Laptops Desktops Board reports Consultants Government Sales staff

Asset Survey by Information


18.4 Risk & Reliability Associates Pty Ltd

Security 18.3.2 Threats The second task, after identification and assessment of assets, is identification and assessment of threats to these assets. The type and degree of protection required for different assets will depend on the nature, likelihood, and severity of the threat. The security appropriate to bomb threats, for example, is obviously different to if the threat was product extortion or industrial espionage. The issue to be considered at this stage is: What particular threats, if any, exist to the identified assets, and which are significant? A sample Threat Checklist is shown below.
Threats to Treasury & Finance Credit squeezes Liquidity issues Customer payment defaults Exchange fluctuations Funding sources failure Interest rate fluctuations Threats to Assets Fire Earthquake Flood Explosion Critical plant failure Malicious damage Threats of Business Interruption Industrial action Political/Civil upheaval Picketing/Demonstrations/Boycott Bomb Threat Bomb "Hoax" Malicious Damage/Sabotage Threats to Information Industrial Espionage Takeover Sabotage of data Threats to Company Reputation Scandal (eg, frauds, business or political) Product Fault or Contamination Environmental pollution Non-compliance Threats to Company's Competitive Edge Professional incompetence Failure to best practice Failure to continuously improve Poor public image Threats to Product Product Extortion Collusive Theft Pilferage Contamination Threats to Staff Discrimination OH&S injury Harassment Threats from Staff Pilferage Theft Fraud Malicious Damage Threats to Equipment, Cash Robbery Burglary Drug abuse,gambling Sovereign Risk Nationalisation Military Threats Coups Civil disturbance Civil war

A Sample Threat Checklist It is important to check and review assessments. Consultation with at least functional managers and staff, for example, financial, legal, personnel, industrial relations, public relations, security, safety, warehouse, stock control in addition to specialist police services (for example, bomb, crime prevention, armed robbery, fraud squads) is desirable. Relevant private services, (financial auditors, risk engineers, liability lawyers etc) might also be consulted. Remember that it is futile to include threats, which are not credible. The consultation of others is particularly important in this regard.

Risk & Reliability Associates Pty Ltd

18.5

Concepts 8.3.3 Vulnerability

Vulnerability is a weakness or susceptibility of an asset with respect to a threat. This weakness may be intrinsic to the asset. For example, a US multinational company is more vulnerable to politically motivated attacks than a Swiss company. A company with a Board practicing inadequate or inappropriate corporate governance is more vulnerable to costly scandal than one maintaining best practice and continuous improvement. A financial company is more vulnerable to theft and fraud if the accounting, investment and audit systems are dominated by the requirements of the sales and marketing department to the detriment of accurate and timely accounting, audit and risk management. Or the weakness may be due to the location of the asset. For example, a multinational company in the Middle East may be more vulnerable to terrorism than one in Iceland. Confidential information on a meeting room blackboard in an office with some public access is more vulnerable than when it is in a locked cabinet in a manager's private office or a secure registry. Or the weakness may be due to inadequate or inappropriate protection against known threats. For example, a plant with poor personnel, industrial, and public relations may be more vulnerable to malicious damage than one with good relations. A company with no contingency planning for serious security and other incidents, and with no pre-prepared disaster recovery plan/guidelines may be more vulnerable to adverse business impact if certain threats materialise. A sample list of vulnerabilities is shown below.

Business Continuity Production dependent on on-going supplies of raw materials, which could be stopped by picketing? Cash flow interruptions through product recall due to contamination or extortion could prove financially difficult for the company? Business Reputation Removal of product from sales for a period could affect long-term market share? (That is, people try other brands and change brand loyalties) Information Price and competition sensitive information exists? Competitors exist? Unscrupulous competitors exist? Environmentalist or consumerist critics exist? Political and/or industrial militant critics exist? Data is inadequately backed up? Some managers refuse to take risk seriously and manage it professionally.

Plant Staff Product

Production equipment, which is easily damaged and slow to be replaced? Inadequate access control? Inappropriate intruder detection? Inadequate personnel selection, checking, and training procedures? Poor personnel relations / supervision Disgruntled employees, exemployees, contractors? Isolated female staff working at night? Badly lit car parks?

Stock control system will not warn reliably and in good time that a loss trend has emerged? Product loss is put down to unexplained "shrinkage" or inaccurate stocktaking or accounting? Product is small, highly desirable, easily disposable, subject to access during night shifts, and employees' car park is unlit and close to rear doors of product warehouse which is poorly supervised.

Table of Vulnerabilities

18.6

Risk & Reliability Associates Pty Ltd

Security Vital or Key Points The concept of vital points (sometimes also referred to as key points) is important to vulnerability assessment and prioritisation. A vital or key point of any asset from the security viewpoint, is any part or feature of an asset (For example, plant, equipment, communications or information system) that is essential to its continuing operation or integrity. If this vital point is easily damaged (due to accessibility or fragility), and would be difficult, for any reason, to restore to proper operation, it becomes all the more vital to reduce its vulnerability. 18.3.4 Business Impact Having identified and assessed an organisations assets, significant threats to them and whether they are vulnerability to those threats, the fourth task is to assess the business impact if various threats were to eventuate. Only when the four elements are identified, assessed, and related can the appropriate priorities of a security system be correctly determined. Business impact is the overall consequences for an organisation if threats succeed. Business impact assessments are similar to, but not identical with risk management severity measurements. Business impact should include human cost, that is, suffering, anguish, anxiety, stress, and the like, which staff, members of the public, and associated families would experience - not just loss measurable in dollars. Good corporate citizens and managers are motivated by normal human values, not just the bottom line or Profit is King attitudes. It is necessary also to consider consequential or indirect costs as well as direct costs. For example, it may only cost thousands of dollars to replace a contaminated product, even less if it is covered by insurance. But the loss of market share, brand loyalty, and business reputation may be far more important. Consequential damage includes such things as: business interruption loss of market share or competitive edge fines due to incidental pollution resulting from fire, explosion, or malicious damage Consequential damage can result also if a breach of security causes such things as: strikes legal liability government regulation deterioration in relations with staff, unions, neighbourhood, government, media / public Sometimes security itself can be the cause of poor staff and union relations if it is inappropriate, or insensitivity implemented. A common example is the inept use of baggage inspections or searches as a counter-measure against terrorism. Assessing business impact is a collective task. A manager cannot do it effectively without the assistance of other managers of specialist functions. Virtually all other functions are involved in assessing business impact in relation to one or other of the company's assets. Obviously insurance and finance/accounting departments need to be involved, but so too, in many cases, do production/operations, marketing, personnel, industrial relations, public and media relations, and legal departments. Business impact is a form of risk characterisation particularly persuasive in assessing commercial risk. It is the overall cost to the company if threats succeed. Proper assessment of potential business impact is essential in determining the cost-benefit of proposed counter-measures. The key issues are to establish the nature of the perceived vulnerability quantified in terms of possible dollar impact and return period. How much would the counter-measure cost to implement and maintain? How much risk reduction would this achieve? How does this compare with the maximum foreseeable loss that could result if the measure was not introduced and threats succeeded?

Risk & Reliability Associates Pty Ltd

18.7

Concepts 18.3.5 Counter Measures When identification and assessment of assets, threats, vulnerabilities, and potential business impact is complete, it is possible to consider what cost-effective counter-measure options exist to avoid or reduce the cost of risk. Counter-measures to avoid or deter security threats, lessen vulnerability and reduce potential business impact comprise both material and non-material measures. Material measures or physical security include such things as: access control systems intruder detection and alarm systems perimeter fences, locks, safes, and other physical barriers signage guards, patrols

Many possible control options are non-physical. For example: credible threat intelligence (that is, pre-warning of crime, terror or other relevant trends) accounting and inventory control techniques (that is, capacity to get timely warning that losses are occurring through theft, pilferage, fraud, etc.) personnel, industrial, and public relations techniques (that is, reducing risk from disgruntled staff, unionists, neighbours, activist groups) training (that is, raising security consciousness and motivation) contingency planning crisis management (damage/business impact control) avoidance (giving up activities if they are too risky compared with the possible profit;--relocating activities to safer areas) transference of risk (that is, insurance, contracting out) secure back-up (for example, data and equipment back-up and offsite secure storage) payroll techniques (for example, payment by cheque or bank deposit) law enforcement and security liaison arrangements effective monitoring performance indicators for timely warning of loss trends.

Any selection of physical and non-material measures does not constitute an effective security system unless these measures are coordinated so as to complement each other in the furtherance of the organisations goal and objectives. It is important that the security function should not be compartmentalised so as to allow demarcation gaps, contrasting security arrangements, or haphazard variations in security standards within the one organisation. Zoning of security standards and control levels can, however, be appropriate when it is a considered, deliberate and coordinated measure applied to vital points within an organisation. For example, research or confidential information storage or processing departments. Testing protective security How well a vital point is protected can be highlighted for review by applying what some call the onion test. This is illustrated in the onion diagram below. The security principle illustrated by the onion test is that the degree of protection is indicated by the number of protective layers that surround a vital point. In high security situations, an initial barrier and intruder detection should operate at the external perimeter so as to warn security monitors in time to respond before an intruder reaches the core of the concentric circles surrounding the vital point. Inner barriers aim to delay the intruder to facilitate timely intervention.

18.8

Risk & Reliability Associates Pty Ltd

Security
PHYSICAL SECURITY Patrols Fencing Security lighting Door, window locks Intruder detection Secure room Security monitor Valuable asset Contingency plan Security management Staff security awareness Accounting system Inventory system Personnel selection

MANAGERIAL SECURITY

Onion Test of Vulnerability Resources can be used more effectively if it possible to concentrate protection around vital points within an establishment rather than seek to protect everything within a location equally by often futile efforts to seal off the whole establishment at the outer perimeter. 18.4 The Terrorist Threat

Contemporary terrorism has put increased emphasis on the security function in general and on certain elements of that function in particular: 18.4.1 Severity The severity of the terrorist threat has increased. For xample, as exemplified in the World Trade Centre, Bali and Madrid incidents. Currently favoured targets are highly vulnerable crowded public areas such as transport stations, entertainment and tourist hotel areas. 18.4.2 New Modus Operandi Terrorists can combine primary and secondary targets. For example, as in the highjack of airliners and their weaponisation into missiles to attack the primary targets the Twin Towers and the Pentagon. 18.4.3 Range and Applicability The threat now has a global reach with attacks ranging from Moscow to Bali, from New York to Madrid. Although most business operations will never become the primary or secondary targets of the new terrorism, few if any will avoid being affected indirectly. Terrorism has already and will continue to increase certain costs of business, including compliance costs in regard to increasing anti-terrorist regulation (For example, container export to the US market) possible delays and uncertainties regarding to just-in-time manufacturing delivery systems delays and interference with executive travel accidental involvement in counter-terrorist investigations (For example, unwitting involvement in terrorist money laundering and funding operations) unanticipated economic and/or market fluctuations in various parts of the world due to terrorist incidents, war, and civil disturbances.

Risk & Reliability Associates Pty Ltd

18.9

Index
INDEX Entries in italics type indicate other referenced writers. Adversarial Legal System Airspace Risk Assessment Asset Management Audit Systems Australian Risk Criteria Availability Beck, U Best Practice Risk Management Bipartite Philosophies Biological Metaphors Block Diagrams Block vs Trees Blombery Dr Ron Bottom Up Techniques Breakdown Failure Mathematics Browning R W Browning R L Bushfire Risk Management Business Impact Page 4.2-3 15.1 2.2, 2.14, 7.3-16 4.6, 11.1 13.18 9.13 3.1, 3.3 2.4 1.3 2.5, 5.3 9.1, 9.5 9.6 1.1 8.1-15 12.4 3.10 8.10 15.10 7.7, 18.7 Equipment Breakdown Failure Rates Ethical Criteria Event Trees Facilities Management Factory Mutual System Failure Modes Failure Rates Fatality Risk Fault Trees and Block Diagrams Fault Trees and Success Trees Feigenbaum A Fire Safety Studies Fire Risk Management (in buildings) FMEA, FMECA FMECA Registers Fractional Dead Time Mathematics, Generative Techniques Group/Societal Risk Criteria HACCP Analysis Hazard (OH&S) Registers Hazards, Incidents and Risk HazOps HazOp Risk Registers Haddon W Heinrich H W Human Error Rates Page 9.12 6.11 9.3 3.17 1.3 1.4 1.5, 9.12 16.4 12.3 9.2-3 1.6 13.17-18 15.6 10.2 8.1 12.7 11.1 6.6 10.13 8.1 16.6 10.6-10 8.1 5.8 5.6 9.10-11

Causation 5.1-7 Cause-Consequence Modelling 4.4, 9.7-10 Chadwick E L 1.2 Chapman and Ward 7.16 Claims Reserving 9.22 Common Cause and Mode Failures 9.10 Common Law Criteria 4.1-2 Common Mode Failures 10.11 Conditions and Failures 5.10-11 ContextProcess Industry Risk Assessment 13.2-4 Control 7.15-16 Conway W E 1.6 Coordinated Information 16.8 Costello, Peter (Australian Treasurer) 3.11 Costs of Ownership 3.15 COTS 9.13 Creighton W B 1.3, 4.1 Criminal Matters v's Civil Standards 4.1 Crosby P 1.6 Dawkins Richard Demming W E Discrete Event Mathematics Discrete State Concepts Diversification Due Diligence Det Norske Veritas DuPont STOP System Energy Damage Energy Damage Models Environmental Risk Criteria 5.4 1.6 12.1 5.4 3.12 4.3 4.6, 16.11 16.12 5.8 5.8-10 6.7-9

Idealised Risk Management Structure 3.17 Imai M 1.6 Individual Fatality Risk 6.2 Individual Risk Levels 6.2-5 Industry Based Risk Assessment 15.1 Information Systems 16.6-10 Information Measures 16.4 Information Security 7.6 Information Structures 16.6 Insurance based Risk Management 2.2 Insurance Criteria 6.10 Integrated Information Management 16.10 Integrated Investment Ranking 8.12 Intergovernmental Environment Agreement 6.9 International Safety Rating System 3.8, 16.11 Ishikawa Fishbone Diagram 5.7 Ishikawa K 1.6 Juran J M Juries and Justice Kauffman R Key Performance Areas Kletz T Kuhn T 1.6 4.2 5.4 3.19 5.6 2.1, 5.1

Risk & Reliability Associates Pty Ltd

Index
page 9.7 6.1 4.1 4.6 16.5 2.4, 3.10 17.4 3.11-12 12.6 5.2 1.6 9.1 1.4, 10.5 1.6 13.18 1.2 16.13 6.2, 13.20 1.6 16.8 8.2 8.1 3.14 2.1 2.8 5.3 8.12-14 1.6 5.1 6.1 13.1 3.18 2.4, 7.15 8.1 8.10-11 3.19 1.6 11.5 7.1 Risk Management Risk Management Overview Risk Management Process Models Risk Management & Project Life Cycle Risk Management Structure Risk Profiling Risk Registers Risk & Reliability Diagrams Risk & Reliability Mathematics Risk Role Models Rowe W D Rule of Law Safety Cases Safety Culture Safety Integrity Level (SIL) SafetyMAP Severity Criteria Shingo S Simulation Smith D J Societal Risk Criteria Solution Based Risk Management Solvency Risk SOUP State Theory Mathematics STOP System (duPont) Success Trees SWOT Assessments Systems in Series Systems in Parallel page 1.1 3.16 3.15 7.16 7.15 7.14 8.1 3.14 12.1 3.19 5.7 2.1 4.5, 13.1 2.7 9.13 16.10 6.4 1.6 2.5 9.11 6.4 2.5 3.13 9.13 12.5 16.12 9.3 2.1, 7.1 12.2 12.2

Lees F P Legal Criteria Liability Liability & Consequence Management Lost Time Frequency Injury Rates Market Risk Market Risk Mathematics Market Risk Models Markov Analysis Maruyama M Mizuno S Modelling Techniques Moubray J Mller C New Zealand Risk Criteria Nohl J NSCA 5-Star NSW Department of Planning Oakland J Ockhams Razor OH&S Hazard Ranking OH&S Hazard Registers Organisational Models Paradigms Paradigms Integration Pathogen Metaphor Model Payback Assessments Peters T Popper K R Probability Criteria Process Industry Risk Assessments Process Risk Management Project Risk Process Model Property Loss Prevention Registers Property Loss Prevention Ranking Public Risk Quality Quantitative Risk Analysis (QRA) Ranking Techniques RCM

Taguchi G 1.6 Taylor R T 1.2 Terrorism 18.9 Time Sequence 5.5 Threats 4.4, 7.6, 7.15,14.1, 18.1-5,18.9 Top Down Context 13.2 Top down Techniques 7.1 Train Operations Rail Model 15.3 Transmission Line Risk Management 15.7 Tripartite Risk Control Philosophies 1.3 Tweeddale H M 10.7 UK Health & Safety Executive Utility and Risk Value at Risk Victorian Risk Criteria Viner D B L Vulnerabilities Vulnerability Assessments Vulnerability Registers Vulnerability Workshops 6.8 3.11

8.4 Reason James 0.6, 3.4, 5.3,10.9 Redmill Felix 3.5 Reliability 1.4 Residual Risk 9.15 Rise of the Risk Society 1.8 Risk 1.1 Risk Assessments 7.11-16 Risk Assessment in the Process Industry 13.1 Risk as Variance 2.4 Risk Auditing Systems 4.6 Risk Characterisation 7.11 Risk Culture 2.6 Risk Criteria 6.1, 13.18-21 Risk of Financial Loss or Gain 3.10

3.13 13.19 5.9, 9.7 7.2-4, 7.8-10, 18.2, 18.6 2.3, 7.2-4, 18.6 8.1 7.8 6.3 6.4 1.2 7.8 6.5

Western Australia EPA Risk Criteria Wiggins J H Winslow C E A Workshops Wright J H

II

Risk & Reliability Associates Pty Ltd

Вам также может понравиться