Data-Centric Safety: Challenges, Approaches, and Incident Investigation
By Alastair Faulkner and Mark Nicholson
()
About this ebook
This book will help developers and safety engineers to:
- Determine what data can be used in safety systems, and what it can be used for
- Verify that the data being used is appropriate and has the right characteristics, illustrated through a set of application areas
- Engineer their systems to ensure they are robust to data errors and failures
Alastair Faulkner
Dr. Alastair Faulkner is a Consultant Engineer at Abbeymeade Limited. He has more than 30 years of experience in senior management and has specialist knowledge of data-centric systems. He specialises in system safety and systems engineering. He supports clients with business planning, execution, delivery, risk assessment and management.
Related to Data-Centric Safety
Related ebooks
Recognizing Catastrophic Incident Warning Signs in the Process Industries Rating: 0 out of 5 stars0 ratingsDynamic Risk Assessment and Management of Domino Effects and Cascading Events in the Process Industry Rating: 0 out of 5 stars0 ratingsProcess Safety and Big Data Rating: 0 out of 5 stars0 ratingsA Systems Approach to Managing the Complexities of Process Industries Rating: 0 out of 5 stars0 ratingsProcess Systems Risk Management Rating: 3 out of 5 stars3/5A New Approach to HAZOP of Complex Chemical Processes Rating: 0 out of 5 stars0 ratingsRisk Management and System Safety Rating: 5 out of 5 stars5/5Guidelines for Integrating Management Systems and Metrics to Improve Process Safety Performance Rating: 0 out of 5 stars0 ratingsGuidelines for Investigating Chemical Process Incidents Rating: 0 out of 5 stars0 ratingsDealing with Aging Process Facilities and Infrastructure Rating: 0 out of 5 stars0 ratingsSafety in the Chemical Laboratory and Industry: A Practical Guide Rating: 0 out of 5 stars0 ratingsStochastic Global Optimization Methods and Applications to Chemical, Biochemical, Pharmaceutical and Environmental Processes Rating: 0 out of 5 stars0 ratingsPersonal protective equipment Complete Self-Assessment Guide Rating: 5 out of 5 stars5/5A Practical Approach to Hazard Identification for Operations and Maintenance Workers Rating: 0 out of 5 stars0 ratingsRisk and Safety Analysis of Nuclear Systems Rating: 0 out of 5 stars0 ratingsUncertainty in Risk Assessment: The Representation and Treatment of Uncertainties by Probabilistic and Non-Probabilistic Methods Rating: 0 out of 5 stars0 ratingsConduct of Operations and Operational Discipline: For Improving Process Safety in Industry Rating: 5 out of 5 stars5/5Recognizing and Responding to Normalization of Deviance Rating: 0 out of 5 stars0 ratingsBow Ties in Risk Management: A Concept Book for Process Safety Rating: 0 out of 5 stars0 ratingsEssential Practices for Creating, Strengthening, and Sustaining Process Safety Culture Rating: 0 out of 5 stars0 ratingsSafety engineer The Ultimate Step-By-Step Guide Rating: 0 out of 5 stars0 ratingsGuidelines for Developing Quantitative Safety Risk Criteria Rating: 0 out of 5 stars0 ratingsGuidelines for Process Safety Acquisition Evaluation and Post Merger Integration Rating: 0 out of 5 stars0 ratingsCause Analysis Manual: Incident Investigation Method & Techniques Rating: 0 out of 5 stars0 ratingsFundamentals of Risk Management for Process Industry Engineers Rating: 0 out of 5 stars0 ratingsHuman Factor and Reliability Analysis to Prevent Losses in Industrial Processes: An Operational Culture Perspective Rating: 0 out of 5 stars0 ratingsGuidelines for Integrating Process Safety into Engineering Projects Rating: 0 out of 5 stars0 ratingsFault tree analysis A Complete Guide Rating: 0 out of 5 stars0 ratingsOn the Practice of Safety Rating: 0 out of 5 stars0 ratingsRisk Management of Complex Inorganic Materials: A Practical Guide Rating: 0 out of 5 stars0 ratings
Chemical Engineering For You
Membrane Separations Technology: Single-Stage, Multistage, and Differential Permeation Rating: 0 out of 5 stars0 ratingsCreative Chemistry Experiments - Chemistry Book for Beginners | Children's Science Experiment Books Rating: 5 out of 5 stars5/5Handbook of Cosmetic Science: An Introduction to Principles and Applications Rating: 4 out of 5 stars4/5Piping Materials Guide Rating: 4 out of 5 stars4/5Power System Frequency Control: Modeling and Advances Rating: 0 out of 5 stars0 ratingsTrevor Kletz Compendium: His Process Safety Wisdom Updated for a New Generation Rating: 0 out of 5 stars0 ratingsEngineering Chemistry Rating: 4 out of 5 stars4/5The Future of Decentralized Electricity Distribution Networks Rating: 0 out of 5 stars0 ratingsAn Introduction to the Periodic Table of Elements : Chemistry Textbook Grade 8 | Children's Chemistry Books Rating: 5 out of 5 stars5/5Distillation Rating: 5 out of 5 stars5/5Demystifying Explosives: Concepts in High Energy Materials Rating: 0 out of 5 stars0 ratingsWell Control for Completions and Interventions Rating: 4 out of 5 stars4/5Fragrance Chemistry: The Science of the Sense of Smell Rating: 3 out of 5 stars3/5Phase Equilibria in Chemical Engineering Rating: 4 out of 5 stars4/5Fermentation and Biochemical Engineering Handbook Rating: 5 out of 5 stars5/5Polymer Blends and Composites: Chemistry and Technology Rating: 0 out of 5 stars0 ratingsThe Periodic Table of Elements - Post-Transition Metals, Metalloids and Nonmetals | Children's Chemistry Book Rating: 0 out of 5 stars0 ratingsElectrochemistry for Technologists: Electrical Engineering Division Rating: 1 out of 5 stars1/5Contemporary Catalysis: Fundamentals and Current Applications Rating: 0 out of 5 stars0 ratingsGuidelines for Process Safety Acquisition Evaluation and Post Merger Integration Rating: 0 out of 5 stars0 ratingsDistillation Troubleshooting Rating: 0 out of 5 stars0 ratingsTroubleshooting Vacuum Systems: Steam Turbine Surface Condensers and Refinery Vacuum Towers Rating: 5 out of 5 stars5/5Mixing V1: Theory and Practice Rating: 0 out of 5 stars0 ratingsPhysical and Chemical Equilibrium for Chemical Engineers Rating: 5 out of 5 stars5/5What Went Wrong?: Case Histories of Process Plant Disasters and How They Could Have Been Avoided Rating: 5 out of 5 stars5/5Chemical Plant and Its Operation: Including Safety and Health Aspects Rating: 5 out of 5 stars5/5The Metal Bible for Kids : Chemistry Book for Kids | Children's Chemistry Books Rating: 0 out of 5 stars0 ratingsPrinciples of Ion Exchange Technology Rating: 0 out of 5 stars0 ratingsIndustrial Surfactants: An Industrial Guide Rating: 5 out of 5 stars5/5Why Are Chemicals Not Named John? Naming Chemical Compounds 6th Grade | Children's Chemistry Books Rating: 0 out of 5 stars0 ratings
Reviews for Data-Centric Safety
0 ratings0 reviews
Book preview
Data-Centric Safety - Alastair Faulkner
Data-Centric Safety
Challenges, Approaches, and Incident Investigation
First edition
Alastair Faulkner
Mark Nicholson
Table of Contents
Cover image
Title page
Copyright
Preface
Readership
Directed Reading
Bibliography
It's Monday Morning …
Bibliography
Acknowledgements
List of Figures
List of Tables
Part I: Data-Centric Safety
1: Introduction
Abstract
1.1. Logic and Rationality
1.2. Data
1.3. Data, Information, Knowledge and Wisdom
1.4. Systems Reliant on Data
1.5. Data becomes the Dominant Systems Component
Bibliography
2: System Safety Management
Abstract
2.1. Safety Management Systems
2.2. Hazard, Opportunity, Incident
2.3. Decision, Confidence and Uncertainty
2.4. Errors, Faults, Failures and Anomalies
2.5. 4Plus1 Safety Assurance Principles
2.6. Risk Management Model
2.7. Safety Justification
2.8. Maturity Modelling for Data-centric Systems
2.9. Safety Management Paradigms
Bibliography
3: Challenges to Systems Engineering
Abstract
3.1. Systems Science
3.2. Systems Engineering
3.3. Cyber Security Management
3.4. Identity Model
3.5. Information Systems
3.6. Emerging Disciplines
3.7. The Accidental System
3.8. Change in the Systems Domain
Bibliography
Part II: Data-Centric Fundamentals
4: Data Fundamentals
Abstract
4.1. Data Quality
4.2. Value (Economics) of Data
4.3. High-Integrity Data
Bibliography
5: Data-Centric Systems
Abstract
5.1. Classification of Data
5.2. Decision Model
5.3. Uncertainty
5.4. Autonomy and Perception
5.5. Safety Management of Adaptive Systems
Bibliography
6: System Context
Abstract
6.1. Mature Context
6.2. Multiple Contexts
6.3. Context Switch
6.4. Learning, Adaptive and Autonomous
6.5. Indeterminate Context
6.6. Summary
Bibliography
7: System Definition
Abstract
7.1. Requirements
7.2. Requirements Management
7.3. Data Definition Languages (DDL)
7.4. Supervisory Model
7.5. Service Provision
7.6. Rely-Guarantee
7.7. Performance
7.8. Metamodels and Metadata
7.9. Safety-Related Application Conditions
7.10. Security Requirements
7.11. Summary
Bibliography
Part III: Data-Centric Design
8: Data-Centric Architecture
Abstract
8.1. Computational Models
8.2. Diversity
8.3. Architecture Styles and Patterns
8.4. Interfaces and Interface Agreements (IA)
8.5. Critical Control Points
8.6. Metamodel Architectures
8.7. Metadata for IA
8.8. Data Paths
8.9. Summary
Bibliography
9: Development
Abstract
9.1. Operational Context
9.2. Architecture and the Operational Context
9.3. Project Management
9.4. Life Cycle Models
9.5. Configuration Management
9.6. Data Path Implementation
9.7. Analysis
9.8. Threat Identification
Bibliography
10: Acceptance and Approval
Abstract
10.1. Policy, Strategy and Planning
10.2. Assessment of Design and Implementation
10.3. Assessment against 4Plus1 Principles
10.4. Evaluation of Risk Assessment of Design
10.5. Assessment of Implementation
10.6. Assessment of Safety Management System
Bibliography
Part IV: Operational Management and Maintenance
11: Operational Matters
Abstract
11.1. Business Model and Data Metamodel
11.2. Data-Centric Operational Organisation
11.3. Business Management
11.4. Organisational Metamodel
11.5. Self-consistent Organisation
11.6. Operational Modes
11.7. Emergency Preparedness
Bibliography
12: Live Management and Control
Abstract
12.1. Data Management Plans
12.2. Business Continuity
12.3. Safety-related System Continuity
12.4. Data Integration
12.5. Managing Data Change
12.6. Operational Safety Management
12.7. Authentication
12.8. Competency
12.9. Maintenance of Data as a (Virtual) Asset
12.10. Data Obsolescence and Destruction
Bibliography
Part V: Incident Investigation
13: Major Incident Response
Abstract
13.1. Incident Response
13.2. Effective Response and Recovery
13.3. Immediate Aftermath
Bibliography
14: Investigation Management
Abstract
14.1. Planning
14.2. Strategy
14.3. Execution
Bibliography
15: DCI Investigation Methodologies
Abstract
15.1. Derivation of an Incident Model
15.2. Classification
15.3. AcciMap
15.4. Systems-Theoretic Accident Model and Processes
15.5. Functional Resonance Analysis Method (FRAM)
15.6. Network Theory
15.7. Systems Dynamics
15.8. Applying DSM to Incident Investigation
Bibliography
16: Incident Investigation
Abstract
16.1. Investigation Planning
16.2. Validation of the System Context
16.3. Validation of the System Definition
16.4. Analysability
16.5. Access, Security and Authorities
16.6. Ongoing Data Safety Incidents
16.7. Data Safety Incident Investigation
Bibliography
17: Investigation Methodology Maturity
Abstract
17.1. Validation
17.2. Investigation Repeatability
17.3. Education and Training Requirements
Bibliography
18: Analysis as Part of a DCI
Abstract
18.1. Evidence Directed Analysis
18.2. Root Cause Analysis (RCA)
18.3. Incident Model Validation
18.4. Replicating the Incident
Bibliography
19: Incident Report
Abstract
19.1. Evidence Navigation
19.2. Incident Report
19.3. Escalation and Resolution
Bibliography
Part VI: Data Safety Model
20: Data Safety Model
Abstract
20.1. Model Elements
20.2. Transformation Model (T-axis)
20.3. Abstraction Model (A-axis)
20.4. Product, Installation and Maintenance (P-axis)
20.5. Interface Agreements (IA)
20.6. Critical Control Points
20.7. Metadata and Metamodels
20.8. Data-Centric Decisions
20.9. Identity and Identity Management
20.10. Implementing Permit to Work
20.11. Triplet Relationships
20.12. Time, Change and Maintenance
Bibliography
21: Using the DSM
Abstract
21.1. Initial TAP Identification
21.2. Analysis of P-TAP
21.3. Confidence in Risk Assessment over DSM
21.4. Impact of Change on the DSM (Brownfield Sites)
Bibliography
22: Validation
Abstract
22.1. AcciMap
22.2. STAMP
22.3. FRAM
22.4. Network Theory
22.5. System Dynamics
22.6. Weinberg's Categorisation of System Complexity
22.7. Resilience Engineering
22.8. Data Security
22.9. Explanation and Communication
Bibliography
Part VII: Application Areas
23: Autonomous Flight
Abstract
23.1. Introduction
23.2. System Description
23.3. Normal Operation
23.4. An Airspace Described in Data
23.5. Applying DSM
23.6. Metamodel
23.7. Metadata
23.8. Incident Investigation
23.9. Expressing the Supervisory Model in Metadata
Bibliography
24: Enterprise
Abstract
24.1. Introduction
24.2. System Description
24.3. Normal Operation
24.4. Multi-layer Error Management
24.5. Acquisition and Merger
24.6. Divestment
24.7. Permit-to-Work Failure
24.8. Safe-Method-of-Work Failure
24.9. Emergency Response
Bibliography
25: Healthcare
Abstract
25.1. Introduction
25.2. System Definition
25.3. Metamodel Integration
25.4. Vertical Integration
25.5. Horizontal Integration
25.6. ‘Product Line’ Integration
25.7. Cyber Physical System Threats
25.8. Healthcare Incident
25.9. Summary
Bibliography
Part VIII: References
Bibliography
Bibliography
Abbreviations
Definitions
Index
Postface
Bibliography
Copyright
Elsevier
Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
Copyright © 2020 Alastair Faulkner and Mark Nicholson. Published by Elsevier Ltd. All rights reserved.
data-centric-safety.com
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher's permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
ISBN: 978-0-12-820790-1
For information on all Elsevier publications visit our website at https://www.elsevier.com/books-and-journals
Publisher: Susan Dennis
Acquisitions Editor: Anita Koch
Editorial Project Manager: Kelsey Connors
Production Project Manager: Poulouse Joseph
Designer: Victoria Pearson
Typeset by VTeX
Preface
Technology evolves, shaped by its use, often in unexpected ways. Products once constrained by ‘air gaps' are enabled by communications-based infrastructural technologies and data ecosystems. Yet data is too broad a term as it does not address its many roles within systems. It is true that historically in most systems, data is merely consumed, processed and some action performed based on predetermined criteria. In this case data is passive and inert, and needs to be consumed to participate in or to direct actions or activities. However, data often exhibits many degrees of freedom that include the description of functionality, performance, capability, capacity and constraint. Data may also include temporal (sequence or order) or time-based (time, rate or calendar) properties. Data has a mercurial property. It is challenging to manage and control; it has a habit of being consumed by systems that it was not produced for, by omission or by design, perhaps without the awareness of the system designer. It is common for it to pass (often unchecked or even unwittingly) across system and organisational boundaries.
Data (in all its forms) is often unchallenged, unverified, ubiquitous, unrecorded and invisible. Yet this data increasingly determines the behaviour of systems and through this behaviour our access to products (goods and services). Data may be internal or fed to systems with a safety responsibility. As a result, data error or omission may go undetected with potentially hazardous or catastrophic consequences. There may also be consequent damage to assets. Failure of such systems may also contribute to harm indirectly through incorrect decisions made by actors (human or computer) who rely on, or trust, these systems and the data they supply. How should safety justifications reason about Data-Centric Systems (DCS) so that our reliance on, or trust in their correct operation can be justified?
In using the term DCS we acknowledge the ever-increasing volumes of data. Data may be structured or unstructured. However, not all data has value to us; not all data is fit to be used in a system with safety implications or as part of the assurance of such systems. So how should we determine what data can be used, and what it can be used for? How do we assure ourselves that the data used is appropriate and has the right characteristics? How do we engineer our systems to ensure they are robust and resilient to data errors and failures?
As these DCSs grow, they experience a change of scale, consuming (and potentially producing) vast quantities of data. As a result automated methods are required to ensure and assure the contribution of data to system safety in such systems. Furthermore, how do we ensure that actors using the data generated by such systems do so in the intended way and with the appropriate level of criticality?
Currently, no mature methods exist to address these issues. Careful development of data-intensive systems will improve an organisation's ability to ensure and assure the safety of systems. Currently, guidance in this area is very immature. We address these issues in this book.
System Safety Engineering (SSE) is applicable across the entire life cycle of a product, from concept to disposal. In this book we address data safety issues relating to both physical goods and service elements of a product within an SSE framework. However, the emerging field of data safety means that in this first edition there are aspects of data safety that we do not address.
WARNING: This book contains Scary Monsters …
Where the authors have paused for thought, tea and/or discussion …
A scary monster is used to identify open research questions, open certification issues, a requirement for an in-depth discussion to take an issue further than it is explored in this text and Key Safe Behaviour (KSB) deficiency in current SSE / Safety Management System (SMS) practice.
Readership
System Safety Engineering (SSE) has its origins in very high-impact activities, such as controlling nuclear power plants or aircraft. As the use of Computer-Based Technologies (CBTs) to control systems and provide services has spread, so has the range of systems that need to consider system safety. Communications-enabled CBTs combine to create infrastructural technologies. Infrastructural technologies are one foundation of the data ecosystem that deliver the performance and low latency required to handle extensive, complex data (1.2.1). Infrastructural technologies make it possible to run applications on systems with thousands of nodes, involving vast quantities of data. The emergence of data as a determinant of system behaviour has followed the spread of these technologies. This book is therefore relevant for practitioners in the classical system safety industries, but also for an ever-increasing set of providers of products (goods and services) outside these industries. It is also an area that has not been adequately addressed by the academic community.
As a result, the audience for this book includes, but is not exclusive to the following communities:
Academia
• Post-graduate students undertaking or wishing to undertake research into Safety Management of data-centric and data-intensive organisations and systems.
• Safety Engineers / Professionals studying the development, operation or oversight of data-centric or data-intensive systems and organisations as part of Continuing Professional Development (CPD).
Industrial practitioners
• Incident Investigators who need to address data issues in their causal analysis and improvement recommendations.
• Safety Engineers / Professionals working in the development, operation or oversight of data-centric or data-intensive systems and organisations.
• Software Engineers who have to develop software that interfaces with and uses data to determine the set of services provided and the results of the services provided by data-centric and data-intensive systems.
• Data scientists who have a role in safety-related information systems or data-intensive control systems development or operation.
• Enterprise architects who wish to determine how data-centric and data-intensive their architecture should be. There are implications for enterprise architectures derived from the move towards data-centric and data-intensive activities.
• Enterprise Architects whose data has the potential to be used for functions beyond their original intention. Boundaries for the data use may be defined or additional efforts made to improve integrity, etc., where such limits cannot be imposed.
Corporate and management practitioners
• Operational Managers who have System Safety Management responsibility for data-centric and data-intensive systems and organisations.
• Information managers of data and metadata interested in the link between data and the trust and reliance that can, or should, be placed on the information extracted from it.
• Enterprise Architects whose data, metadata and metamodels have the potential to describe and shape an organisation or enterprise to be safe by design and to ensure that boundaries are enforced.
• Project Managers who wish to set the competencies of staff engaged in data safety activities, ideally before the start of such activities. Furthermore, they will be interested in identifying, understanding and controlling the risks that exist with data, and the risks associated with data safety and the current status of the identification and control of those risks.
• Data and Commercial Managers who are interested in the impact of the safety ensurance and assurance work on cost, value, timeliness, logistics and disposal / retention issues relating to safety.
• Data Asset Managers who are interested in the through-life management of data.
• Lawyers addressing liability issues implied by the use of data (and faulty states induced by data and data errors) in data-centric and data-intensive systems. Liability accrues proportionally to the contribution of the activity / element to safety risk. Data crosses boundaries, which makes intellectual property, copyright and theft issues relevant.
• Training and education departments within data-centric and data-intensive organisations who have to provide, or commission, training towards competence.
• Corporate Managers interested in acquisition or divestment. Especially the legal and corporate issues associated with the management of data, metadata and particularly metamodels as these contain Intellectual Property (IP).
Societal Guardians
• Regulators and approval bodies. What should they be asking for, and how will they know that applicants have addressed data safety appropriately in their safety and compliance cases?
• Policymakers who are interested in updating existing regulations to incorporate data contributions to safety. Typically, this will have international and national contributions.
Directed Reading
The discipline of data safety is immature and needs to be improved as a matter of urgency. In this book, we attempt to raise awareness of data (1.2.1), metadata (7.8.3), metamodels (7.8.2) and the Social influences and impacts of data, and Interface Agreements (IA) (8.4.3) (or their absence and enforcement) in Data-Centric Systems (DCS) (1.5.9). This book provides a structure within which proposed solutions can be analysed. Experience is not available at the time of writing as to the effectiveness of these approaches other than on individual system exemplars. Where experience is available, it is highlighted.
This book could not, and should not, be read in isolation. We have deliberately built on existing material (and the concepts that they contain); therefore, there are many references to external sources. While recognising that the application domain is continuously changing, the core safety concepts, techniques and measures are incorporated into mature Safety Management Systems (SMS) (2.1.1) (typically arranged as sequences of processes), which are to be adapted to support DCS in their operational contexts. Situations where current SMS practices may no longer be applicable or sufficient are subject to increasing research activity. For example, systems may employ Machine Learning and as a result are highly dynamic in the evolution of their safety characteristics. Systems use data as a critical enabler. The primary challenge of the developer is to know the methodology of the learning and the associated integrity / criticality, which could be assured by such methods.
The Reader is reminded that established practices and SMS apply equally to all components of the system (hardware, software, people, process and data (including metadata and metamodels)). We note that many established standards offer little guidance, explicitly addressing data. Data's absence from standards (and guidance notes) does not provide the basis for credible claims that data is outside the confines of safety management, and therefore few safety resources, if any, are required to be allocated to data. This text focuses on the data (including metadata and metamodels) as the emerging and soon-to-be-dominant system safety component.
System Safety Practitioners
Section 2.9 (Safety Management Paradigms) expresses the evolution of system safety management and the challenges that lie ahead. These challenges are explored through headline issues.
• Boundaries: Large datasets obscure boundaries (2.1.12), and without clear boundaries hazard (2.1.10) management is problematic. Section 8.4 (Interface Agreements) (IA) provides one means of managing and controlling real and virtual boundaries.
• Identity: Increases in the number of elements gives rise to identity (1.0.4) and identity management requirements. Section 3.4 (Identity Model) provides one means of expressing issues associated with identity.
• Safe Method of Work: Existing SMS requires high integrity implementation of Permit-to-Work (24.7.1). The increased span of control requires that these practices be reinforced in the DCS. Section 24.7 (Safe Method of Work) addresses these issues.
• Data Safety Model (DSM): The interconnected nature of DCS requires a way to express the data element of a product (1.5.4), the operational process and organisational hierarchy. These issues are expressed in Section 20 (Data Safety Model).
• Using the DSM: In a complex context using the DSM becomes challenging. Section 21 (Using the DSM) provides initial guidance on its application.
• Data, Metadata and Metamodels: It is becoming clearer that managing data through content (1.3.3) (Data Quality) is no longer enough. A dependency on data and its ever-growing volume inevitably draw comparisons with machine code and the use of abstraction in software engineering. Data should be abstracted into metadata and metadata abstracted into metamodels.
• Autonomy and Automation: A growing reliance on data requires transparency, visibility of the influence of data and the errors that data may contain.
• Incident Investigation: Finally, data-centric systems will fail; this failure will lead to harm (1.0.3). Part V (Incident Investigation) provides one way to investigate data incidents.
System Safety Acceptance and Approvals
Independent review is one of the cornerstones of System Safety practice. The Safety Assessor will be a System Safety Practitioner; therefore, the guidance in the preceding paragraph applies. In addition, the Assessors need to be satisfied that the element is suitably and sufficiently described in its context and that its features, functions, dependencies and failure modes are understood well enough to manage safety risk.
The use of Autonomy and Automation presents particular difficulties as to the nature and form of the safety case. Safety I (S.2.9.1), in which the set of hazards (2.1.10) are sufficiently well known, represents the current footprint for Independent assessment and review. Products are complete; their failure mechanisms are known. Required mitigations and barriers to escalation (2.2.1) are also known.
Autonomy, the use of Artificial Intelligence (AI) and Machine Learning (ML) will result in unfinished elements. At the point they go operational, they learn and adapt their behaviours, and in doing so give rise to new hazards and new combinations of hazards. The Safety Assessor will be expected to express a professional opinion as to the safety risks involved in such systems (see Section 10 (Acceptance and Approval)).
Incident Management and Investigation
The Safety Investigator will be a System Safety Practitioner; therefore, the guidance in the preceding two paragraphs applies. Evidence, in the form of witness marks on physical components and eye witness statements, has been pivotal in determining the root causes of many fatal accidents (13.0.2). A reliance on data may mean a reduction in the availability of physical evidence to the extent that the absence of physical evidence is an important feature. An incident (13.0.1) involving an Autonomous Vehicle (AV) may not include skid marks, which would indicate a failure to brake.
As reliance on data increases the probability of systematic data failure increases. Therefore, rather than single incidents at single locations and points in time, multiple incidents may manifest at many times and locations. Investigating the underlying data causes from a set of complex situations is challenging. Section 15 describes a range of incident investigation methodologies. The investigation methodology should be documented to ensure repeatability and audit. This methodology may be a combination of existing approaches, a hybrid or something new.
Corporate and Management Practitioners
Autonomy and the use of AI and ML require the application of system safety to evolve recognising that treating only product-based hazards may not be enough. This places additional responsibilities on operational and corporate management and requires operational managers to become Duty Holders (2.0.4).
• Safety I (S.2.9.1): This is a conventional view, represented in many system safety standards, where all hazards are known, managed, mitigated or removed such that the residual risk is at least tolerable (2.0.5). Products and systems (1.5.5) are ‘finished’ and are supported by operational processes faithfully executed by competent, trained and experienced users. Section 2.9 (Safety Management Paradigms) expresses the evolution of SSM and the challenges that lie ahead. Highly configurable data systems present significant management challenges.
• Safety II (S.2.9.2): Hollnagel [302,299] recognises that safety systems are not perfect and that users play an important role in the resilience of the safety system. One extension of resilience is the implementation products that are unfinished at the point they are set to work. These issues recognise a shift in emphasis towards adaptive requirements placed on operations (Section 11 (Operational Matters)) and maintenance (Section 12 (Live Management and Control)). Who will be liable for incidents involving these unfinished products?
• Safety II+ (S.2.9.3): Reduced oversight and an increased span of control require tasks to be automated. To what degree should these tasks be automated, and how is this automation to be supported by autonomous systems? What contribution can data assurance make to the assurance of AS?
• Safety III (S.2.9.4): This is an area for academic research. The use of Safety III implies that autonomous behaviours also have input to SMS. Current implementations of autonomy are changing safety practice. In which other SMS elements (philosophy, policy, procedure, practice) or responses should we permit autonomy to change? (see Figure 2.1)
Academia
The scope for further academic work is extensive. Solution constraints formerly imposed by hardware, software and limited communications infrastructures are significantly diminished. As a result, highly connected and adaptive systems are emerging, as embodied in technologies such as the Internet of Things (IoT). Several fundamental building blocks are incomplete and require academic research.
• Scary Monsters: This text contains many ‘Scary Monsters’. They represent the unasked and unanswered questions; where possible, we try to isolate them to formulate problem descriptions for academic consideration.
• Teaching and Training: DCS offer an unprecedented opportunity to refresh and revise curriculum. SSE has to evolve to encompass DCS. This text is a reference work collating and collecting many sources.
New to System Safety?
We hope that you find our writing style readable. While we do include introductory material, beginning with Section 1 and reading to the end will present you with a substantial learning curve. Before you apply any of the concepts contained in this text, we recommend you consult a System Safety Practitioner familiar with DCTs and its application domain.
Bibliography
[299] Erik Hollnagel, Safety-I and Safety-II. Routledge; 2014 978-1472423085.
[302] Erik Hollnagel, Jean Paries, John Wreathall, Resilience Engineering in Practice: A Guidebook, Volume Ashgate Studies in Resilience Engineering. CRC Press; 2013 978-1472423085.
It's Monday Morning …
You have read the book (hopefully you found it interesting), and arrived at work. You're in a data-centric organisation (DCO) (1.5.3) with many data-centric systems (DCS) (1.5.9). You've got a data-centric problem …where do you start?
This problem is enormous …big enough for you to reopen this book …
There is no easy answer; much depends on the industry sector (regulated or unregulated), the nature of the safety problem (in its operational context (9.1.1) and its position within the Data Safety Model (DSM) (20.0.1) and one or more TAP points). It would be unreasonable of us to be prescriptive …
What we can do is outline a process, a place to start, and to issue a stern warning: you must adapt this process to your data-centric problem; we cannot do this for you.
Develop a Remit
It is important that you establish what it is that you want from this investigation. Data Safety (DS) assurance and associated investigations have a propensity to consume resources, not because data (1.2.1) is more complex than other system components, but simply because of its potentially extensive technical footprint. It is all too easy for data, metadata (7.8.3) or an element of the metamodel (7.8.2) to be shared by multiple DCSs and DCOs. Some of these uses will be explicit and some implicit; hopefully, only rarely will they be ‘unintended', accessible through sneak circuits (15.8.2).
It is essential to set a boundary (2.1.12) on your remit and the ‘area of interest'. The identification of context (6.0.2) is of concern as the data may not be valid outside the context and uses it was created for. It is common for a system (1.5.5) to be within a hierarchy. We can no longer assume that the user will be human. To reflect the increased use of automation, the term ‘user' is replaced by actor (1.0.1).
Existing Safety Management
All operational domains contain risk (2.0.2). Regulated domains include at least one Duty Holder (2.0.4) and Designer (2.0.6) identifying their roles and responsibilities. Systems in these regulated domains are associated with one or more Safety Cases (2.7.4), addressing their use by competent and trained actors. Therefore, your context may contain some or all of the following existing safety documents:
1. Safety Management System (SMS) (2.1.1)
2. Safety Management Manual (SMM) (2.1.2)
3. Safety Management Plan (SMP) (2.1.3)
4. one or more existing Safety Cases
Figure 2.1 illustrates the relationships between these documents. Your ‘area of interest' may be associated with a Hazard Log (2.1.11) to track all hazards (2.1.10), hazard analysis, risk assessment and risk reduction activities for the ‘whole-of-life' of the safety-related system (SRS) (2.7.7) for any conditions that can potentially lead to harm (1.0.3), including identification of those at risk.
Enabling Works
The remit is extended and elaborated to identify the infrastructural technologies (4.0.2). These are the underlying, often ignored, communications systems that form the foundation of DCS. This examination is to confirm that the topology and configuration contains no errors (2.4.1) that might permit sneak circuits and hence unintended (rogue) data paths (8.8.1). Use Network Theory (S.15.6) to construct the initial network representation of your ‘area of interest'.
Context
Stepwise decomposition of the ‘area of interest' is used to refine and create one or more hierarchies based on the A-axes (of the DSM). Each of these hierarchies will contain one or more systems and actors that use them. Choose the hierarchies carefully as further decomposition simply reinforces the choices you have made, and therefore the cost of any rework. It is good practice to create several (say three) first-level decompositions so that you can evaluate them and choose the ‘best fit' for further decomposition. Develop a context and boundary for each of the systems identified. Create the initial System Definition (7.0.1) for the ‘area of interest'.
Enterprises and Organisations
For your chosen hierarchy, identify the enterprises (1.5.1) and organisations (1.5.2) (their respective boundaries within the context). This provides demarcation between the Duty Holders and Designers, and between their respective roles and responsibilities and any Safety Cases.
This is the process at the systems level. Now identify and locate any potential or actual ‘incident harm' within the context. These may already be described in the list of top-level hazards for the ‘area of interest' as part of the SRM. Refine the initial System Definition.
Constituent Systems
For the chosen hierarchy, use stepwise refinement to decompose the hierarchies into its constituent systems. Create a System Definition for each of the constituents systems. Identify the systems directly associated with ‘incident harm', the top-level hazards and the hazard records. The goal is to provide a basis for the identification of interfaces (8.4.1) that will be used in the next step.
Interfaces
For each interface, identify the ‘Owner' [System] and the connected systems. Identify, describe and document the Interface Agreements (IA) (8.4.3). Examination of the interfaces provides a check on the system and its description. Therefore, if necessary, refine the top-level System Definition, its network representation (see ‘Enabling Works' above) and the System Definitions for each of the constituents systems.
Actors, Identities and Authentication
Consider how you might gain access to a computer system. Typically, you would log on at a keyboard with a ‘username' and ‘password'. In this example, the ‘username' is your identity (1.0.4) and the ‘password' provides a means of authentication (3.3.4). In a DCSs and DCOs identity applies to each system, subsystem, product (1.5.4), interface and IA will also have an identity.
For each interface, establish the actors, their identities, their authentication and the authorities (3.3.3) used with that interface. From these lists construct the following:
1. initial Identity Model (3.4.2);
2. initial Security Model (3.3.2).
It cannot be assumed that the identity model and security model will be homogeneous, that is, uniform and applied across the whole ‘area of interest'. A ‘triplet' [35] access strategy can be used to access Information Systems (3.5.1), including legacy systems, with a minimum of intervention and change to those legacy systems. Therefore, part of this process is to identify and document these ‘triplet' systems. The initial security model should address the following processes:
1. Ensuring that all connected systems are supported by, and protected by, a suitable security model (the capability of the security model is to be supported by a suitable and sufficient risk and threat assessment);
2. Determining access requirements;
3. Identifying the types of searches (to develop an index [for ‘triplet' hops across intermediate systems to destination retrieval system(s)]);
4. Identifying the types of access (read only; read and update; read, write, create and delete);
5. Specifying the unique identity (R.3.4.1) of the ‘triplet' access agent (for security and logging, and to support subsequent audit requirements).
Data, metadata and elements of the metamodel
Each interface is examined to determine what data, metadata and elements of the metamodel flows within the context and its hierarchy. This may require ‘recursion', that is, stepping along the interfaces until the source is determined (the ‘stopping condition'). It also may involve many subsidiary data paths as different ‘threads' are combined. In this way the documented description of data, metadata and elements of the metamodel is created.
The use of infrastructural technologies which are often associated with data ecosystems (4.0.1) enable the creation of architectures that employ highly adaptive applications. If these data ecosystems are beyond the boundary of the ‘area of interest' then the ‘stopping condition' is the IA at that boundary.
It is now possible to use Root Cause Analysis (RCA) (S.18.2) to trace the information used in the causal chains (2.1.16). The analysis should consider errors, faults (2.4.3) and failures (2.4.5) and security issues such as authentication failure, all of which can prevent access. The analysis should also consider employing a form of ‘Reverse Engineering' and Sneak Circuit (15.8.2) analysis.
This process step has done the following:
1. Established what systems are involved in the ‘area of interest' by identifying
(a) data, metadata and elements of the metamodel
(b) candidate ‘triplets' relationships (S.20.11)
(c) IA
(d) the identities used to access data, metadata and elements of the metamodel
2. Outlined the steps required
(a) to select the minimum set of relevant data via navigation of an appropriate set of ‘triplets'
(b) to identify relevant TAP point(s) on DSM
i. characterising the data requirements over each TAP point
ii. selecting an appropriate ‘triplet' interface point to an adjoining TAP point
iii. navigating outward through ‘triplet' set(s) until reaching stopping criteria
iv. repeating for each relevant ‘triplet' set interfacing directly with TAP point
v. collecting back to central location or running of applications remotely on data
What's Next
This is a starting point. With the results of this process you are in a position to analyse the effects of proposed changes, to look at the issues associated with corporate acquisition and divestment and to have a firm basis from which to participate in the discussion about the impact of automation. One of the possible uses for this process is incident investigation.
Bibliography
[35] Gerard Askew, Triangulation: Navigation of Information Contexts Using Triplet Relationships. [UNPUBLISHED] 2016.
Acknowledgements
Alastair Faulkner
This book could not have been completed without the support and patience of my wife Cheryl, and my children Eamon and Grace.
I would like to thank my colleague Ron Pierce for his patience and understanding. Ron is my industrial mentor, initially from my doctorate, and has extensive experience of systems, software and safety issues. I would also like to thank Andy Harrison who has witnessed the journey, recognised its importance and offered help and assistance.
Mark Nicholson
Writing books of this scope is a long haul, rather than a sprint. Thank you to Rachel for her input and her entreaties to get on with it. I would like to thank my colleagues, those who have talked to me at meetings, conferences and the odd bar, for their patience, helpful discussions and polite pointers as to the errors in my approach. Robust but helpful scepticism is the lifeblood of these endeavours.
This book marks the start of a journey as the horizon for this work expands to include the industrialisation of Autonomous Systems, and the assurance thereof. I would therefore like to thank my colleagues in the Assurance of Autonomy Programme a priori for their patience, discussions and robust scepticism as the journey to the second edition of this book unfolds.
Why write this book
Safety management must evolve to address the challenges posed by a reliance on data, enabled by infrastructural technologies, data ecosystems and autonomy. An awareness of this growing gap gives rise to a chronic unease where data-centric autonomous agents are used in safety systems.
To our proofreaders
Developing in the abstract is one thing; writing it down concisely and unambiguously so that the text communicates the intent is another. We would like to thank our proofreaders:
List of Figures
1.1 Bow-tie Diagram 5
1.2 Liew (2013) DIKIW Elements and Linking Statements 13
1.3 Surface and Deep Learning in the DIKIW 14
1.4 DIKIW and Human Centred Competence 14
1.5 Broad Comparison of DIKIW and Semiotic Model 17
1.6 An Intelligent (Expert) System based on Symbolic AI 17
1.7 A Cyber-Physical System based on Smart AI 18
2.1 Safety Management System – Basics 26
2.2 Hazard – Incident Sequence 29
2.3 Hazards in a Systems Hierarchy 30
2.4 Data Error Traversing a Data Path 33
2.5 Extending Villemeur: Primary-Secondary-Command-Decision Failures 35
2.6 Decomposition of Safety Requirements 38
2.7 Safety Triumvirate 40
2.8 Fragment of Decision Making Pattern Based on a Information System 42
2.9 Dynamic Safety Assurance Process 43
2.10 TRL versus IRL versus SRL 46
2.11 Safety Related Information System 52
3.1 A Communications Mesh – Physical and Logical Address 60
3.2 Example Interface Agreement Implementation 63
3.3 Safety Related Information System in Safety Decision Context 66
3.4 Safety Management via Operational Hazard Logs 68
3.5 Accidental System 70
4.1 Data Sources and the IoT 83
4.2 Deep Learning Modelling Life cycle 83
4.3 DS Integrity Resolution within Safety Case Regime 91
5.1 Veracity Challenges 100
7.1 Service Provision Actors 122
7.2 Service Specification 123
7.3 A Model Constructed and Interpreted 131
8.1 DSM and the TAP Axis 140
8.2 Implementation Model for Interface Agreements 150
8.3 Example Interface Agreement Implementation 151
8.4 Interface Agreement Service Provision 152
8.5 An Array of Processes with a Hierarchy 154
8.6 A Simple Linear Metamodel Architecture 156
8.7 Maritime Data Path 159
9.1 A Set of Operational Contexts for Nested Systems 162
9.2 Development Milestone Terms 163
9.3 Architectures in the Development Context 167
9.4 Project Management System – Basics 169
9.5 Data Path Layer Model 177
9.6 Identify the Data Origins 177
9.7 Identify the Boundaries 178
9.8 Identify the Transformations and Processing of the Datasets 178
9.9 Apportion the Integrity Requirements 179
9.10 Identify Evidence Requirements 179
9.11 Specify Corrective Action Process 180
9.12 Completed Data Path 180
11.1 DCO Operating Canvas 198
12.1 System or Organisation's Resilience 212
12.2 Motor as a Line Replaceable Unit 212
12.3 Data (Asset) Management System – Basics 213
12.4 Data (Asset) at TAP (t, a, p) 214
12.5 Use of ETL to create virtual schemas 217
12.6 Near Real-time Monitoring of Operational Safety Management 221
14.1 Evidence Management in Context 243
15.1 A Small Network with Both Multi-edges and Self-edges 259
15.2 System and Subsystem as a Directed Network 260
15.3 SoI Viewpoints 268
16.1 Incident Investigation Management in Context 272
16.2 Incident Footprint in Time, Space, Complexity and Severity 273
18.1 Incident Analysis in Context 282
19.1 Incident Report in Context 287
20.1 TAP axes of the DSM 292
20.2 Computer-Based Technology and / or Human Systems 294
20.3 A Layered Model for a Hierarchy of Systems 295
20.4 System Interfaces 297
20.5 System Boundary Issues 298
21.1 Generation of an Initial List of TAP Points 308
21.2 Illustration of Relationships between PoI and TAP points 310
21.3 Illustration of Generation of TAP Critical Control Points 311
21.4 Safety Argument over TAPs and 4plus1 Principles 319
23.1 qCopter (Physical) Context 332
23.2 Entity-relationships for the qCopter System 333
23.3 Example of qSpace Airspace Segments 334
23.4 Fragment of an Entity-relationship Diagram of a Flight Plan 335
23.5 Example Airspace – Physical Data 338
23.6 Initial P-TAP DSM Representation of qPilot Flight Data 343
23.7 V-TAPs Identified for P-TAP P-Airspace 344
23.8 D-TAPs Identified for P-TAP P-Airspace V-TAPs 345
23.9 qPilot with an Initial Set of CCPs 346
23.10 Initial DSM Representation of qPilot 347
23.11 qCopter Incident 352
23.12 Initial Network Representation of the qCopter Incident System 354
24.1 Well-formed Enterprise 359
24.2 Logical and Physical Production 360
24.3 A Manufacturing Cell 364
24.4 A DSM Hierarchy of Cells 364
24.5 A DSM Hierarchy of Cells 365
24.6 Vertical and Horizontal Integration 371
24.7 Ring-fencing an Acquired Organisation 372
24.8 Transformation using Critical Control Points 374
24.9 Vertical and Horizontal Divestment 375
24.10 Divestment Threat Assessment Process 378
24.11 Implementation of the Partition Barrier 380
24.12 A timeline for the use of Permit to Work 381
24.13 Interface Agreement in Normal Operation 383
24.14 Permit to Work Interface Agreement 383
25.1 A Broad Categorisation of Healthcare Provision 391
25.2 High-level Context for UK Healthcare Provision 394
25.3 ASimplifiedSupplier,SecondaryandTertiaryDelivery,andPrimaryProvisionModel 404
25.4 Mass Casualty Event 409
List of Tables
1 Key to use of italics, referencing and hyperlinks viii
1.1 Argument Terminology for Logic 9
1.2 13 Types of Knowledge Based on Source 11
1.3 State of Knowledge 12
1.4 Human Centred Competence 14
1.5 Types of Actor Exposed to DCS 15
2.1 DMMi Risk Management Support Function 47
2.2 Data-centric Organisation Data Safety Risk Management 48
2.3 Strategies for Controlling Safety Risk 49
2.4 Situation Awareness as Product and Process 51
3.1 Minimum Set of Elements an Identity Model 61
3.2 Interface Agreement Acronyms 62
3.3 Four Types of SoS 72
5.1 Ethics of Uncertainty 101
7.1 Characteristics of Good Requirements 115
7.2 Additional Characteristics of Good Requirements 116
7.3 Desirable Properties of DCS Performance 130
7.4 Metamodel Category Descriptions 132
7.5 Metadata Category Descriptions 134
7.6 Typical SRACs Address 135
8.1 Selection of Common Definitions of Architectures 140
8.2 Characteristics of Storage Types 144
8.3 IA Template 151
8.4 IA – Sample Implementations 153
8.5 AnIncompleteSelectionofISOStandardsRelevanttoMetadataandMetamodels 157
9.1 Data Path Symbols 175
9.2 Data Path Layers 176
10.1 Description of Assessment Life Cycle Model Phases 187
10.2 4Plus1 Data Safety Principles 188
10.3 Risk Assessment Maturity Model Categories 189
11.1 Components of the Business Model Canvas 202
11.2 Business Policy Features 203
11.3 Value Map 204
11.4 Customer Profile 205
11.5 Operating Model Canvas Components 205
11.6 Operational Modes 207
11.7 Non-operational Modes 208
11.8 Emergency Preparedness 209
12.1 Change Management in Adaptive Systems 219
12.2 Start-up Mode After Data Modification 219
12.3 Areas for Discussion for Adaptive SMS 221
13.1 An Overview of Incident Types 233
13.2 Principles of Effective Response and Recovery 236
15.1 Partial Classification of Incident Models 248
15.5 Healthcare Epidemiological Models 263
15.6 Additional Data-Centric Systems Dynamics Terms 264
15.7 Annotation of the Application of the DSM 267
18.1 Root Cause Analysis – Sample Questions 283
20.1 DSM Metadata Category Descriptions 300
20.2 DSM Metamodel Category Descriptions 301
21.1 Model Categories 306
21.2 Generalised Process Frame for the DSM 307
21.4 Key to Identities Used in TAP Identification Outline 309
22.1 Partial Classification of Model Viewpoints 323
23.1 qCopter System Entities Descriptions 333
23.2 qCopter System Organisations 335
23.3 Summary of Autonomous Flight Roles 335
23.4 Typical Air Traffic Control – Top Level Hazards 337
23.5 An Initial an Abstract Hierarchy (A-axis) 339
23.6 qPilot Interfaces 340
23.7 Initial qPilot Operational Context 341
23.8 qPilot: Steps a ‘day-in-the-life’ for a qCopter Flight Use Case 341
23.9 qPilot: Data Instantiations within P-axis Entities for Element of Flight 342
23.10 qPilot: Flight Steps 8 to 11 and Associated Data Instantiations 342
23.11 qPilot P-TAP Entities Required to Execute Autonomous Flight 343
23.12 V-TAPs identified for P-TAP P-Airspace 344
23.13 D-TAPs Identified for P-TAP P-Airspace 345
23.14 Airspace Enterprise Metamodel 348
23.15 Extract of the Airspace Segment Metamodel 349
23.16 Extract from the Flight Controller Metamodel 350
23.17 Extract from the Airspace Metadata 350
23.18 Extract from the Airspace Metadata 351
23.19 Extract from qCopter #QC007 and #QC901 Flight Data 352
23.20 Application of the DSM to Autonomous Flight Incident 353
23.21 qPilot Interface Agreements 355
23.22 Actor and Identity 356
24.1 Typical People-related Safety Hazards 360
24.2 Typical Top Level Hazards for Autonomous Asset Movements 361
24.3 Shop Floor Value Proposition 366
24.4 Organisational Metamodel Category Descriptions 368
24.5 Examples of Metamodel Errors 370
24.6 Examples of Metamodel Errors Exposed by ‘Change’ and Scale 370
24.7 Examples of Errors in Acquired and Merged Systems 373
24.8 Divestment Areas for Consideration 376
24.9 Assessment of Divestment Threats 379
24.10 Enterprise – Emergency Preparedness 385
25.1 Domains of Healthcare Provision 391
25.2 Desirable Properties of Healthcare Performance 392
25.3 Healthcare Management Metamodels 393
25.4 Initial Healthcare ‘Enterprise’ Metamodel 394
25.5 Initial Healthcare ‘Organisational Unit’ Metamodel 395
25.6 Typical Issues that can arise in the Healthcare Context 397
25.7 ICP expressed in the Healthcare TAP DSM 403
25.8 NHS Incident Classification 407
25.9 NHS Incident Level 407
Part I: Data-Centric Safety
Outline
1. Introduction
2. System Safety Management
3. Challenges to Systems Engineering
1: Introduction
Abstract
This chapter provides an introduction to the overall content of the book. Many of the core terms employed in the book are defined, and the importance of data is explored. Data is employed extensively by systems, enterprises and organisations. Data is used to configure and characterise. Data is produced, passed across interfaces, stored, processed, transformed and consumed. Improvements in communications technologies allow the interconnection of systems into ever greater Systems of Systems. In these large-scale System-of-Systems domains, we manage their scale by creating a series of abstractions; such as metadata and metamodels. The chapter also identifies the core use of data as an element in an information processing chain involving the collection of the data, transformation into information to be internalised to provide knowledge that can be employed to interact effectively within a given context. The Semiotic and DIKW models are provided as examples.
Keywords
Definitions; Data centric organisations; Metadata; Human learning; Machine learning
Data is becoming a more and more important element of modern life.
Data is at the core of science.
– Neil deGrasse Tyson
Data is important, the contribution to (safety) risks associated with the use of data (1.2.1) are real [227,229] and are beginning to be recognised [238,139,168]. What is more problematic is the lack of consensus, within the safety community, regarding the design, management and treatment of data consumed and produced by systems (1.5.5) with potential safety consequences.
We acknowledge that data (like software) cannot directly harm (1.0.3) you, without an operational environment. We note that data requires an actor (1.0.1) to interpret and act on the data. However, access denial, delay or misdirected data services may contribute to harm (particularly where time is a factor). In a data-centric world, how might data constrain or enable individual actors to interact safely with their operational environment and how will omissions or data errors (2.4.2) influence safety?
Definition 1.0.1
Actor
an individual, entity, or combination of product (1.5.4), people and process.
The role of actor (1.0.1) will increasingly be undertaken by one or more Autonomous Agents (1.0.2).
Definition 1.0.2
Autonomous Agent
(AA) an entity operating on the owner's behalf, as an actor (1.0.1) without interference from the ownership entity. Typically, these are products (1.5.4) that incorporate varying degrees of Artificial Intelligence (AI) and Machine Learning (ML) [241,391].
AAs are often the controlling entity in Autonomous Systems (AS) (3.2.2) reliant on data products (3.6.2).
Where's the harm?
Definition 1.0.3
Harm
physical injury or damage to the health of people or damage to property or the environment [319].
Technology continues to evolve, enabled primarily through infrastructural technologies (4.0.2) connecting the Information Systems domain dominated by service. Data is the common factor in the provision of these integrated services. Commercial entities have become dependent on this data, as without it they could not operate. The provision of data services has also evolved from client-server, multi-tier systems, data warehousing to enterprise and organisational data architectures. Data Safety (DS) continues to evolve as decision-making (autonomous) technologies mature. All these changes are taking place while data volumes are rising exponentially, in turn giving rise to the new disciplines of Data Science (3.6.1) and Data Engineering [638].
Infrastructural technologies provide a way to share resources and information about resources. Shared resources require the use of identity (1.0.4) and therefore an identity model (3.4.2). It is desirable that these identities are unique (R.3.4.1) (within specified boundaries (2.1.12) and, where appropriate, the entire system). The use of identity requires consideration of access. Access should be controlled to create privileged areas, functions, applications, system and groups of systems (including the data (1.2.1), metadata (7.8.3) and metamodels (7.8.2)) they may contain). Access Control applies to all elements. As a result, identity and Access Control are a critical interface with Cyber Security (3.3.1) Management.
Definition 1.0.4
Identity
A unique labelling of attributes of the object (system resource) being accessed and of the actor (1.0.1) requesting access in a given context (6.0.2).
Communications-enabled technologies also change organisational structures and the enterprises (1.5.1) that use them. Internet-based services are perhaps the most recognisable of these changes with the rise of online shopping. The execution of a retail website transaction includes an array of services on the website, from payment, goods selection and dispatch to confirmation of delivery, often across several commercial entities. Data is the common factor in the provision of these integrated services. These business entities have become data-centric, as without this (correct) data they could not operate. As a result, data errors and failures (2.4.5) become a significant feature of incidents (13.0.1). Therefore, data errors may be a part of a direct causal chain (2.1.16) and contribute to harm.
Threat Identification and Risk Management
A combination of factors and circumstances will be required to give rise to an incident with data as a contributing cause. One simple model to represent this is the bow-tie diagram [119]. Figure 1.1 represents the contribution of data error, failures and malicious threat events to incidents. The left-hand side of the bow-tie diagram shows data errors, failures or malicious threat events. Furthermore, the diagram can represent the impact of data errors, failures or malicious threat events on the effectiveness of mitigation (2.2.3). Incident sequences (13.0.3) and therefore safety risk management, is significantly affected by data.
Figure 1.1 Bow-tie Diagram
System Safety Principles are incorporated into mature Safety Management Systems (SMS) (2.1.1). Similar risk management systems are used to address disciplines from asset management to enterprise and organisational (1.5.2) risk management. A greater reliance on data affects all these risk management frameworks.
Where does all this data come from?
Data has always been present. The volumes of data used in protection systems have been limited to data generated by dedicated sensors and exchanged through limited interfaces (8.4.1). Larger volumes of data are common in Air Traffic Control for the management of navigation data, flight planning and operations. Typically, these are closed systems (1.5.8).
Infrastructural technologies and data ecosystems (4.0.1) enable the creation of architectures that employ highly adaptive applications that are data-dependent, if not data-centric. In parallel, developments in hardware and operating systems have allowed the creation of low-cost platforms that form the Internet of Things (IoT) (3.8.1). The ubiquity of the IoT has the potential to produce vast quantities of data in open systems (1.5.7) and environments. Often IoT devices exploit remote cloud data storage and Fog Computing [70].
This storage and computing capability change the nature and possible uses of data, as well as the potential impact of data error. Data growth is exponential. Many datasets make reference to other data, often across organisational domains and system boundaries. Data volume growth provides a multiplier for the data references that it contains.
What does this mean for Safety Management?
In addressing Data-Centric Systems (DCS) (1.5.9), the Safety Principles embodied in many mature SMS remain unchanged, but it will be more challenging to marshal and control the resources required to create and maintain the integrity of systems. This is especially true where the capability to evolve rapidly using ML exists. The system becomes highly dynamic.
The introduction of new technology is often associated with step change. Over time these technologies mature and the processes related to them become normalised, requiring the standardisation of components. Economic factors drive component inventories to minimum levels; such pressures encourage re-use and give rise to the requirements to reduce costs associated with change [230].
The sheer range of applications of Computer Based Technologies (CBT) creates issues of scope and applicability. System safety management has its foundation in protection systems, typically fast-acting rule-based technologies. The implementation of configurable CBTs changes the risk profiles associated with this established domain. This is in part due to changes in Supervision, Optimisation and Control (SOC), creating more complex requirements for vertical and