Вы находитесь на странице: 1из 16

European Journal of Operational Research 73 (1994) 407-422 North-Holland

407

Invited Review

Review of delay-time OR modelling of engineering aspects of maintenance


R.D. Baker and A.H. Christer Centre for Operational Research and Applied Statistics, Department of Mathematics and Computer Science, University of Salford, Salford M5 4WT, UK
Received March 1993; revised September 1993

Abstract: This review paper discusses the development of delay-time analysis as a means of modelling engineering aspects of maintenance problems. The main concern of the paper is with the philosophy, underlying principles, assumptions and experiences in practice, with references being given to more detailed accounts elsewhere. The current state of knowledge and research in the area will be outlined, and future trends in modelling applications and research predicted. A question that will be addressed is whether the application of delay-time analysis could and should be de-skilled to the extent that it could be effectively utilised by engineers in the absence of a modelling analyst. Keywords: Repairable machinery; Maintenance; Mathematical modelling; Delay-time; Review; Subjective data

I. Introduction

General critique of maintenance models


Decisions associated with managing the maintenance function could be divided into operational decisions and engineering decisions. The former relate to the operations aspect of implementing decided engineering actions, and typically embrace manpower modelling, logistics and inventory control. Engineering decisions relate to the choice of what engineering actions to actually take and when to take them. Whereas operations decisions influence the efficiency of implementCorrespondence to: Dr. R.D. Baker, Centre for Operational Research and Applied Statistics, Department of Mathematics and Computer Science, University of Salford, Salford M5 4WT, UK.

ing a maintenance concept, engineering decisions determine the maintenance concept itself. Experience shows that operational research modelling has much to offer both types of problem. We are concerned with a machine or system liable to (costly) failure. Failure is taken here to mean a breakdown or catastrophic event, after which the system is unusable until repaired or replaced. It may also be simply a deterioration to a state such that the repair can no longer be postponed. Preventive maintenance is some activity carried out at intervals, with the intention of reducing or eliminating the number of failures occurring, or of reducing the consequences of failure in terms of, say, downtime or operating cost. There are a great many models of preventive maintenance in the literature, and the Delay-Time Model (DTM) along with others is reviewed by

0377-2217/94/$07.00 1994 - Elsevier Science B.V. All rights reserved SSDI 0 3 7 7 - 2 2 1 7 ( 9 3 ) E 0 3 4 5 - X

408

R.D. Baker, A.H. Christer / Review of delay-time OR modelling

Valdez-Flores and Feldman [33] and Thomas et al. [32]. What is striking about many of these models is the seemingly arbitrary nature of their assumptions, and the lack of evident conviction in applicability to real-world situations manifest by 1. no indication of how the values of model parameters can be determined, 2. no mention of model validation, i.e. assessing the quality of the 'fit' of the model to data, and 3. no examples of actual applications or casestudies or post-modelling analysis. We naturally strive to avoid such criticism in presenting the DTM, which is an attempt to capture the essence of actual engineering perception, experience and practice. In attempting to apply models of engineering decision in real-world situations, the immediate problem is often not that the model does not fit the data, but that there are no data to be fitted. In such data-starved situations, one wishes to initiate collection of any available data as quickly as possible. Subjective information from engineers is of value, and can be obtained by administering questionnaires. Much thought needs to be given to data, its quality, its cost, its acquisition and its use in modelling. Suffice it to say that it is as useful to produce elaborate models that require data that cannot be obtained as it would be to manufacture a car to run on an as-yet undiscovered fuel. Here we are more concerned with introducing the concept of and real-world applications of D T M than with the beauties of mathematical derivation. A rigorous mathematical and theoretical underpinning is important, and this can be found elsewhere in the references. Work to date on the D T M has in the main been characterised by an emphasis on using any available data, seeking to model engineering practice, and seeking to validate the final model. In doing this, statistical as well as probabilistic methods are required.

The Delay Time Model in more detail


The objective of maintenance modelling is to present the output measures of interest to management as functions of decision variables. For example, if preventive maintenance activity were performed every T time units, and downtime were of key concern, the task would be to establish a measure of the expected downtime per unit time as a function of T. It might also be a function of the quality of the maintenance activity. Before attempting maintenance modelling, it is necessary to take a broadbrush view of the situation and identify whether or not the level of defects and other adverse effects is really unavoidable, and exactly what is the cause. It could be that engineering solutions exist, or a revised maintenance concept in the sense of Gits [25] will assist. At any maintenance intervention there is a wealth of information potentially available, both objective - what has failed, and what is being replaced, and subjective - what caused the defect, and is the occurrence preventable. It is argued that much of such information could only be collected live over a survey period. A systematic method of doing this called snapshot modelling has been proposed by Christer and Whitelaw [23]. It has been found that the analysis of such data can have a direct input to engineers' decisions, and aids recognition of maintenance problems for what they are (lack of manpower, training deficiency, over maintenance, faulty components, inadequate inspection, etc.). Some form of problem-recognition analysis is necessary if the risk of developing a solution to the wrong problem is to be avoided. In the rest of this paper, it will be assumed that such an analysis has been conducted and the correct problem has been recognised. The concept of failure delay time or simply delay time is central to our model of preventive

Figure 1. How inspections prevent failures in the component-tracking model

R.D. Baker, A.H. Christer / Reuiew of delay-time OR modelling

409

maintenance. Failure is regarded as a two-stage process. First, at some time u a component of the system becomes recognisable as defective, and the defective component subsequently fails after some further interval h. Preventive maintenance is assumed to consist primarily of an inspection resulting in the replacement or repair of defective components. Figure 1 shows how inspections prevent failures in the component-tracking model. The open circles represent the origination of defects, the closed circles represent failures, the vertical lines inspections. The third defect has originated but has been detected at inspection and so has not caused a failure. Figure 2 shows how inspections prevent failures for the pooled-components model. The open circles represent the origination of defects, the closed circles represent failures, the vertical lines inspections. With periodic inspections, as in the lower part of the figure, the second, fourth, fifth and eighth defects have now been detected and have hence not caused failures. Figures 1 and 2 demonstrate the fundamental role of the delay-time concept in modelling the inspection aspect of a maintenance policy. Operational definitions of failure and defectiveness used in the plant concerned are adopted for modelling purposes. The judgement that a component is 'defective' is made by the maintenance technician or engineer. It is clear that with this paradigm of the effect of maintenance, there will in general be an optimum frequency of preventive maintenance. If maintenance is rarely done, expensive failures will result. If maintenance is frequent, defective

components are replaced before they cause an expensive failure, but the maintenance activity itself is costly. To date, most D T M development has been with the two phase model, OK to fault, and fault to failure. This extension of the conventional reliability modelling of time to failure has been found to be necessary for modelling maintenance problems. Modelling additional stages of deterioration of the state of components is a refinement that is possible, but there is little point unless a decision hangs on it, such as the decision to inspect a slightly defective component more frequently, as is envisaged by Kander [27]. This type of refinement is necessary in modelling concrete structures, where the states are: new to cracking, cracking to spalling, and spalling to failure. Along with this simple notion of the function of maintenance goes a simple purpose of modelling it: to choose a maintenance concept that minimises some measure of cost, such as downtime, or actual financial cost. Experience shows that more elaborate decision rules are unlikely to be adopted by management, and therefore the class of maintenance concepts optimised over needs to be pragmatic. A great variety of models can be built using the philosophy outlined above, and they can be dichotomised according to whether or not they track individual components. A system or machine may have a number of components, and some later delay-time models approximate the system by a few key components [3,4]. Each component may be replaced many times, and the time u of a defect first becoming visible is measured from the time of the latest renewal. A defect

ta)

(b) Figure 2. How inspections prevent failures for the pooled-components model

410

R.D. Baker, A.H. Christer / Reviewof delay-time OR modelling


6. There are no false positives, i.e. if a defect is not present one will not be identified. 7. Every defect has the same probability/3 < 1 of being detected at an inspection, and this probability does not vary with time since the defect first became visible. 8. The delay time h of a fault is independent of its time of origin u. 9. All costs or surrogate measures such as downtime are fixed quantities, i.e. they are not stochastic. Typical additional assumptions for a simple model that tracks key components are: Set C. 1. Each component has only one failure mode. 2. f and g are modelled as exponential or Weibull distributions. 3. The age of the system, as distinct from the age of the component, does not influence the distributions g and f. 4. Repairs are taken as replacements, so that the faulty component is restored to an 'as-new' condition. 5. The key components of a machine are assumed independent, i.e. the failure of one will not affect the subsequent functioning of another. 6. If more than one machine in a set is modelled, machines are assumed to behave identically and to have uniform usage. Additional model assumptions for a simple model where individual components are not tracked are: 1. The number of components is very large, and the probability of any given component becoming defective is very small, so that defects arise in a NHPP. 2. Defects are repaired sufficiently well that the probability of any given repaired component again becoming defective is infinitesimally small. This assumption is required in order not to jeopardise the N H P P of defect arrival times. (For example, imperfect repair would cause a clustering of defect arrival times). The first set of model assumptions (set A) cannot be changed without ceasing to have a recognisable 'delay-time' model. Subsequent and more specific model assumptions (sets B and C) can of course be relaxed or varied to suit the problem at hand. Given a model constructed according to these assumptions, the maintenance activity is under-

arises with pdf g(u). In the simplest model, failure follows a time h later with pdf f(h). Component replacement or complete repair occurs at failure. The alternative approach, suitable to complex machines where individual components cannot be tracked, is simply to pool defects from all components. As the number of components becomes large and the probability of any given component becoming defective approaches zero, defects arise in general in a nonhomogeneous Poisson process (NHPP), and the pdf f ( h ) is the pdf for any random defect causing a failure after an interval h. It may well be that inspections are so frequent in comparison with the service life of the system that the N H P P can be replaced by a HPP, so that defect origin times are uniformly distributed. The derivation of this model from the componenttracking model is discussed in [2].

Modelling assumptions
Given the delay-time concept as outlined, it is possible to derive very many mathematical models. All of them include the following general assumptions, which characterise the delay-time concept: Set A. 1. Failure is detectable as soon as it occurs and without the need for inspection. 2. A failed system must be repaired before it is again usable. 3. Before failure occurs, a component passes through one or more impaired or defective states. 4. Whether or not a component is in a defective state can only be determined by inspection, i.e. a defective component appears to otherwise function normally. Typical additional assumptions for relatively simple models are: Set B. 1. The only effect of preventive maintenance on the system is the replacement of defective components, and maintenance has no other beneficial or hazardous effect. 2. Inspection and the repair and replacement of defective components (preventive maintenance) are undertaken jointly. 3. Inspections occur at equally spaced intervals. 4. All identified defects are repaired. 5. Inspections and repairs take negligible time.

R.D. Baker, A.H. Christer / Review of delay-time OR modelling

411

stood well enough to calculate optimum maintenance policies. This will often mean simply finding the optimum frequency of maintenance. The development will again differ according to the criterion chosen for optimisation, e.g. minimum cost, minimum downtime or maximum output. It is possible to devise an optimum policy for component-tracking models by which maintenance occurs at irregularly-spaced epochs after renewal of a component, and this is discussed in [16]. It would also be possible in theory to evaluate the utility of three other policies: failure-based replacement only, age-based replacement without inspection, and allowing the component to continue in use for some specific time after a defect had been detected at inspection. The first is a special case of the general policy, in which the interval between inspections approaches infinity. The third, if it proved cheapest, would suggest that engineers had an overly stringent criterion for replacing defective components. Rather than keeping defective components in use, one might liaise with engineers to suggest the adoption of a less stringent criterion. Some of these policies are discussed in [8], in an application to the coal-mining industry, where inspection was continuous, but maintenance could only occur at certain times.

and 'If the repair were not carried out, how much longer could it be delayed before a repair is essential (HML)?'. At an inspection, the subjective estimate of delay-time h = H L A + HML, and similarly for a failure, when of course H M L = 0. If the inspection identifying a defect is made at time t, then u = t - HLA. Work to date since the genesis of the basic model divides into two categories. The first is model development, to include effects that seem likely to be important in practice, such as imperfection of inspection. Consequence variables such as asymptotic cost per unit time have sometimes been derived. Insight has been gained by exploring the consequences of such effects through their mathematical modelling. The other category is that of fitting DTMs to data in case studies, with emphasis on parameter estimation and model validation. This type of work suggests useful model developments, and so fuels work in the more theoretical category. The D T M was introduced in 1982 [11] in the context of building maintenance, following the first mention of the concept in 1976 in the appendix to [10]. The model was of the ('pooled components') type which grouped defects from individual components and assumed 'overall' defects; defects arose in a HPP. Subjective and objective information were both to be used. In 1984 the D T M was applied to problems of industrial plant maintenance. In [19] the D T M was extended to cater for imperfect inspection, a N H P P of defect origination epochs over the interval between inspections, and two cost models (maintenance performed simultaneously for all defects or sequentially). In a related case-study paper [21] the D T M and snapshot analysis were used to derive an optimum-cost maintenance policy at the Pedigree Petfoods canning line, which was subsequently adopted by management. It is interesting that the distribution of h was observed to be approximately exponential, but perhaps with a longer tail. In another case-study [20], snapshot analysis and the D T M were applied to modelling preventive maintenance for a vehicle fleet of tractor units operated by Hiram Walker Ltd. Again man-

2. Brief history of the development of the DTM Before going further, it is necessary to introduce some definitions and notation. The language of industrial equipment maintenance is used throughout, although the D T M has also been used in other contexts. We therefore speak of the failure of systems (machines) composed of components which can fail. This 'failure' need not be a sudden event, and could be a deterioration such that repair could no longer be delayed. In general, one wishes the model to be consistent with objective data and subjectiue data. The objective data to be fitted would be the results of inspections - number of defects found, and times of failures. The term 'subjective data' usually refers to data acquired by administering a questionnaire [23] to engineers, when a defect is found at inspection or a component fails. The key questions are: 'How long ago could a fault have first been noticed by an inspection or operator (HLA)?',

412

R.D. Baker, A.H. Christer / Review of delay-time OR modelling

agement adopted the recommended decrease in frequency of maintenance. This study produced some peculiarities of practice which the model was extended to cope with, e.g. some defects were found by drivers, who returned the vehicle for repair at once, and the next scheduled maintenance was brought forward to coincide with the repair. No purely theoretical model can be expected to cope with all such details of practice. This paper also first mentions the observation that repair times (and hence cost) and delay times h may be positively correlated, a model extension not yet developed at the time of writing. A general account of the D T M is given in [12]. In these applications, a key quantity in modelling has been found to be the fraction of defects resulting in failures, that is the probability of a defect arising as a breakdown under a maintenance concept. This probability can be expressed as a function of the maintenance interval after determining the values of unknown model parameter(s). The probability of a defect resulting in a failure calculated from the subjective and objective data should agree with the observed fraction for the maintenance interval actually employed. In the earlier development, when agreement was not within a few percent, detailed estimates given by engineers were reassessed, and revised estimates obtained. It was found that, with care,

subjective estimates could be obtained which led to models capable of capturing existing practice to acceptable accuracy. There was no formal statistical model-fitting procedure, and indeed as will be seen, an elaborate statistical apparatus is needed if this is to be attempted. After the formulation of the DTM for the 'pooled components' case, and its early applications to real-world problems, a period of more theoretical development started. In 1987 a perfect-inspection model of the component-tracking type appeared [13] and c o m p o n e n t reliability as a function of inspection interval was calculated using a recursive formula. Note that this assumes that the cycle of regularly-spaced inspections commences at component renewal, and a modified formula would be needed when the inspection cycle is independent of component renewal times. Cerone [7] later calculated an approximate reliability measure using a simplified method. Pellegrin [30] derived a graphical procedure for finding the optimum interval between inspections under a DTM, which allows the various factors relevant to decision-making to be emphasized. A further version of the pooled-component model applicable to the building industry followed in 1988 [14]. Here a D T M was developed in which the probability p ( y ) of detection of a defect at time y from the defect origin time u

0.7

"I"

(1.2x - 1 + e- " ~)/1.2x - 0.6

(~ -

1% ~-=)/~

--

0.5

0.4

b(T) b*
0.3

0.2 0.1
~

y
. . ~ L L

"-Subjective curve

.J

,..I.

0.2

0.4

0.6

T* 0.8

T--+

1.2

1.4

1.6

1.8

F i g u r e 3. A d j u s t m e n t o f s u b j e c t i v e m e a s u r e m e n t s o f p r o b a b i l i t y b(T) o f a d e f e c t a r i s i n g as a f a i l u r e at i n s p e c t i o n i n t e r v a l T. T h e m o d e l c u r v e , h e r e s h o w n f o r a n e x p o n e n t i a l d e l a y - t i m e d i s t r i b u t i o n , m u s t b e a d j u s t e d to b e c o n s i s t e n t w i t h t h e p o i n t ( T * , b * )

which is known from current practice

R.D. Baker, A.H. Christer / Ret:iewof delay-time OR modelling increased from zero at y = 0 to unity at y = h. Repair cost now varied over the delay time as a deterministic function C ( y , h). Developments of this work are ongoing in the form of a major collaborative research project with the Concrete Research G r o u p at QMC, London, into the inspection and repair modelling of concrete bridges and high-rise structures. Later papers [17,18] also on the pooled-component model considered more formal methods for revising subjective estimates of delay-time and prior forms of models so that the fraction of defects ending in failure would agree with the observed value for the maintenance policy actually in use. That is, the modelling is calibrated with status quo observation. The problem is presented in Figure 3. Suppose the decision variable is the inspection period T, and the probability of a defect arising as a breakdown conditional upon T is b(T). The lower curve in Figure 3 indicates the estimate of this curve which is a function of g(u) and f ( h ) . This curve should pass through the observed point b* corresponding to the current practice period T*. The problem is to revise f ( h ) , perhaps g(u) and maybe the efficacy of the inspection process, measured by /3, so that the subjectively derived curve b ( T ) passes through the status quo point ( t * , b*). The simplest method proposed for such revision was a scaling of the subjective estimates of h. It was pointed out in [17] that estimates of h derived from subjective estimates of H L A at failure tended to be biased downwards, as smaller delay-times are more likely to cause a failure before being picked up at the next inspection. Similarly, estimates of h from H L A + H M L at inspection tend to overestimate h. The pooled distribution from failures and inspections is unbiased. Theoretical distributions for observational bias in h were developed in [17], and techniques for removing the bias proposed. A later p a p e r [18] considered scaling and other methods of revising subjective estimates in the context of a reanalysis of older case studies. Whilst the theoretical tools for formally revising the delay-time estimate were developed in anticipation of the problem, only recently has a problem arisen in connection with a bus fleet [24], that requires their use. Initial estimates based on carefully collected data have previously proved to be close

413

enough to the status quo point not to require revision of the prior distributions or model. Condition-monitoring D TM There is considerable interest nowadays in condition-monitoring, with the advent of hi tech methods that can detect abnormal vibration frequencies, high concentrations of trace metals in oil, and other correlates of wear or damage. The D T M is currently based on the simplest measurement of condition possible - O K or defective [22]. The coal-mining equipment case-study [8] has already been mentioned, and a general p a p e r [15] gave a non-mathematical summary of the DTM. In [16] the D T M for the component-tracking case was discussed from the viewpoint of 0-1 condition-monitoring, and the asymptotic cost per unit time of irregular inspection policies was derived. This cost was used in [22], where a D T M was derived for a 0 - 1 condition-monitoring model with regularly spaced inspections for a linear pattern of wear characteristic of some plant in the steel industry. In this DTM, a positive correlation between u and h is induced by variability in a population of components. All case-related models developed to this point had been based substantially upon subjective data. In [3] a component-tracking delay-time model was fitted to purely objective data culled from records kept by engineers maintaining several items of medical equipment. The model was extended to cope with tracking more than one component, and the maximum-likelihood method used to fit it to data. The Akaike Information Criterion (AIC) was used to choose the best parameterisation for the model, and statistical and graphical measures of goodness of model fit were derived. This study showed that it was possible to estimate D T M parameters avoiding the use of subjective data. In a later p a p e r [4] some model extensions were derived and used in fitting data. In [2] the use of maximum-likelihood methods for the pooled-component model was discussed. The M L approach was shown to give the most efficient estimator of the p a r a m e t e r s of the distribution of h, and the bias of the ML estimate was derived and shown to be small. Sample sizes needed for cost-effective maintenance policies to

414

R.D. Baker, A.H. Christer / Reviewof delay-time OR modelling

be found were shown to be smallish (a few hundred faults). The relation between the component-tracking and pooled-component models was shown. A discussion of recent work on the D T M will be found in Wang [34]. To date the D T M has undergone much theoretical analysis and development, and has been successfully applied in several case studies.

3. Relationship of the DTM to other models of preventive maintenance


The classification of models introduced in [33] is into inspection models, minimal-repair models, shock models, and miscellaneous. Minimal-repair models are not relevant to maintenance as understood here, and pertain to, e.g. age-based replacement. Many inspection models follow the premise of Barlow [5] that after 'failure' a component still functions, but at an increased running cost. Inspection can detect such 'failed' components. Hence these models are not related to the delaytime model. However, the semi-Markov model of Luss [28] is related, and assumes a probability q that a failure is detected at once. It remains undetected until the next inspection with probability 1 - q. The inspection models of Chou [9] and Chou and Butler [6] are also essentially delay-time models. They classify a component into functioning, functioning but impaired, and failed states. Inspection may cause immediate failure or increase the hazard of failure. Theirs is a discrete-time Markov chain model, so that exponential distributions for g and f are assumed. The D T M is perhaps more closely related to shock models. Here, typically a system undergoes a series of shocks, causing increasing damage or 'wear'. The system is judged to have failed when the wear exceeds some threshold, and the function of maintenance is to inspect and replace the system when the wear exceeds some lower threshold. The required calculation is usually that of the optimum level of wear at which one should replace under continuous inspection. In a D T M this level of 'wear' is assumed to be already decided by engineers, and so the only remaining problem is how often to inspect.

One could thus regard a D T M as a special case of a wear model in which 'defective' and 'failed' levels of wear are known, and the distributions g and f are fitted heuristically to available data rather than being derived, albeit with unknown parameters, from a stochastic wear process. However, the D T M can be used even when there is no underlying wear process; it does not make assumptions about the cause of defects. This relationship does offer the possibility of deriving the distributions g and f by assuming an underlying wear model. An underlying wear model also appears to justify the assumption that the delay time h does not depend on u, the time of origination of the defect. This is because of the Markov nature of a wear process. However, it will be seen later that correlations between u and h can still be induced, e.g. by component variability [22]. A less extreme form of maintenance than the replacement of defective components is that of their rejuvenation. Maintenance can restore a deteriorating component so that its hazard of failure returns to an earlier and lower value. Nakagawa [29] discusses such a model, and this paradigm can be included in the DTM, so that a component which is not defective and has not been replaced has its age effectively reduced (beneficial maintenance) or increased (hazardous maintenance). [3]. On the subject of placing the D T M into its theoretical context, there is an interesting relation between the D T M and queuing theory: the pooled components model with a HPP of defect arrival times is an example of an M / G / ~ queue [31]. Finally, there is a close analogy with medical screening. For example, X-rays that may cause the cancer they attempt to detect are a medical example of hazardous inspection. Some references to work in this area are given in [16]. However, although the analogy is close, the situations seem different enough for useful models to diverge considerably. In the medical case the emphasis is rather on minimising the delay between a defect arising and being detected than on minimising the cost of 'failure', i.e. presumably death or an advanced state of disease. 'Failure' as a sudden event is replaced by a progressive deterioration in quality of life. Also, false positives are common, the population can be stratified into

R.D. Baker, A.H. Christer / Review of delay-time OR modelling

415

different risk groups, some diseases are contagious, and so on.

4. Extensions to the basic model

Extensions m a d e to date have b e e n responses to specific cases, and not all variants have yet been explored because to date there has been no need. For example, inspection and repair times were small in the Pedigree Petfoods study [21] and the Hospital equipment study [3,4], and only crude cost information was available. H e n c e there was no pressing case to extend the model to allow for finite repair times, or to add stochastic costs. Assumptions A and B listed earlier for a D T M are not quite the simplest possible, e.g. inspection is allowed to be imperfect. As will appear later, the need for some further model extensions could be indicated by the data, such as long repair times, whereas others could only be discovered by fitting more complex models, and remarking the improvement in model fit. For example, a correlation between u and h is not evident initially if subjective data are not available, as the objective data are a complex function of g and f. H e n c e the philosophy has been to fit a range of more complex models, if only to ensure the adequacy of simpler models. H e n c e some of these latter extensions that have been developed have not yet proved necessary, while others have clearly made an improvement. The requirement that maintenance occurs at equally spaced intervals can be relaxed. Baker and Wang [3,4] fit a D T M to data where maintenance was carried out irregularly, and Christer [16] discusses optimal policies where the spacing of inspections changes with component age. The requirement that inspection and maintenance occur together is also easily relaxed, and Chilcott and Christer [8] consider a condition-monitoring case where inspection was continuous but maintenance could only occur at given times. The assumption of only two unfailed states (normal and defective) can be changed when engineers wish to classify the state of a c o m p o n e n t more precisely. Work is in progress in a major collaborative study [31] to develop a D T M for the inspection of concrete structures, where there are three phases to 'failure', new to cracking, cracking to spalling, and spalling to essential repair.

H e r e more complex decisions can be made than in the simpler case. The assumption that all identified defects are repaired is a natural one; after all, why inspect at all if this is not the intention? Baker and Wang [3] found that infusion p u m p batteries were sometimes noted as unreliable but not replaced at once; this point was not developed, but could be modelled with three unfailed states, 'as-new', 'suspect', and defective. In [3] however, in practice no clearly formulated policy (such as more frequent inspection) appeared to follow the identification of a suspect component. In [4] several model extensions were derived and the extended model fitted to medical (infusion pump) data. These extensions were: 1. The I F R (Increasing Failure Rate) Weibull distribution has the unfortunate property that the hazard of failure is zero at time zero. Left-truncated Weibull distributions were used to parameterise g and f , where the addition of a truncation-time p a r a m e t e r allows the hazard to be initially nonzero. 2. The functions g ( u ) and f ( h ) , hitherto parameterised as Weibull distributions, were allowed to depend on the machine age am; this was achieved by letting the scale p a r a m e t e r of the distributions be a function of the age a m at the m o m e n t the component was renewed, e.g. by taking the scale factor a e Aam. 3. Inspection may have a beneficial or adverse effect on a component's performance. It was assumed that the inspection exerted this influence by subtracting a period A from the effective age of the component. A was estimated along with the other model parameters through the maximum-likelihood method. 4. It could happen that machines from which the data were collected had different usages and ages, and should not be treated as identical. The maximum likelihood principle was extended to cope with a population of machines, via the ' E m pirical Bayes' method. 5. Two mechanisms which could induce correlations between the periods u and h were discussed. One mechanism, which gives rise to positive correlations, invokes a population of components. The other, which gives rise to a negative correlation, requires a two-stage failure process as before, but with an additional delay after the completion of the first stage before a fault be-

416

R.D. Baker, A.H. Christer / Review of delay-time OR modelling

comes visible. An alternative parameterisation is given in [14]. The AIC was used to decide whether an extra model parameter should be retained or discarded. Of these extensions, several were retained. In one application, the hazard of a defect arising was found to double over the lifetime of the system (machine). As might be expected, a new component may be less reliable if plugged into an old machine. The 3-parameter distribution also gave an improved fit in one application, while the correlation induced between u and h by virtue of a population of components with varying 'frailties' was also significant in one application. This was the fitting of some data from Stone, cited in [26]. He measured time to appearance of a defect, and subsequent time to failure for a sample of cableinsulation. The Takahasi-Burr distribution [26] was used to fit Stone's data. Here the scale factors of the Weibull distributions for g and f are assumed distributed as a Gamma random variate. The long hyperexponential tails of the marginal distributions of u and h were successfully reproduced, as was the large correlation between u and h (Spearman correlation Ps = 0.58). The coefficient of variation of the Weibull scale factors was as high as 1.6. Clearly such correlations can be too large to neglect. However, in the medical equipment application, no significant variation was found. There was also no significant variation in 'intrinsic frailty' between systems (machines), and the delay between a defect arising and becoming visible, which induces a negative correlation between u and h also did not improve the model fit. The age A removed from a component at maintenance was also not significant, showing that inspection was neither beneficial nor hazardous. However, it seems unlikely that this will usually be so.

and h may be biased. In this paper a statistical approach to the problem is outlined.
General methodology

5. Parameter estimation and model validation

Parameter estimation and model validation are areas often neglected in the theoretical developments, but which must be dealt with in any casestudy. The mix of subjective and objective data is a further complication here, and all D T M modelling to date in which subjective data have been used has recognised that subjective estimates of u

The methodology for model development for any particular case-study would be: 1. Liaise with engineers and identify the characteristic features of the maintenance procedure, perhaps utilising snap-shot modelling [23]. 2. Examine available objective data and decide whether this must be supplemented with subjective and objective data sampling - if so, design a questionnaire and commence sampling. It may be necessary to use methods given in [2] to estimate an adequate sample size. Unfortunately extensive computer records stretching back a long time may be of little value, as vital facts have often been omitted from the record. 3. Formulate the simplest model that could be adequate. 4. Fit the model to available data by likelihood maximisation. Use the AIC to decide how many parameters are needed in the model. 5. Assess model adequacy graphically where possible, and also by goodness-of-fit tests. Extend the parameterisation as necessary to obtain an adequate fit to data. Liaise with engineers when doing this if appropriate. The final stage, given the model, is the calculation of an optimum policy, which may be carried out analytically if possible, or else numerically or by simulation. The likelihood maximisation method is attractive because (i) it produces efficient estimators of the model parameters, (ii) error bars can be calculated on these estimators by standard methods, (iii) likelihood-ratio tests of model fit can be derived and (iv) the method can cope with missing data. For example, in the medical case-study in [3], a machine with two key components was considered. At failure of either component, an inspection was made of both, and the other also replaced if it was faulty. Sometimes it was not clear from the records which component had failed and which was defective. However, it was only necessary to sum the likelihoods for each possibility, to obtain the pdf of observing one unspecified failure and one unspecified preventive replacement. It must be admitted that the complexity of the

R.D. Baker, A.H. Christer / Review of delay-time OR modelling

417

likelihood function can look intimidating. There is a need for graphical methods wherever possible to suggest good parameterisations and starting values for the iterative likelihood maximisation. Graphical and statistical methods of assessing model fit are given in [3]. The A I C is also discussed in that p a p e r as a useful method of penalising the use of large numbers of model parameters, and thus selecting models with the best predictive power, based on available data. Note that subjective data should not be thought of as a Bayesian prior. H L A and H M L are derived in specific situations where a defect is present and are as much measurements as is failure time, and so should a p p e a r in the likelihood function. A second type of subjective data could be collected if the investigator were called in before the system began operation. Engineers could be questioned about likely defects and their delay-times, and an optimum maintenance policy devised, although this has never been done. Once collected, the likelihood derived from such data could form a Bayes-type prior probability which would multiply the likelihood derived from actual data when the system began operation. In the remainder of this section, we present the suggested method for fitting parameters and models. We demonstrate this in the context of a single component system under perfect inspection and with a basic component tracking model, when subjective data are available. Our task is to examine how the subjective information might be incorporated into the likelihood formulation. Estimation of objective and subjective model parameters via the maximisation of such a likelihood for a large case-study with reliable objective data could be a means of assessing the bias and error of subjective estimates of H L A and H M L , as will be shown.

H e r e h = t - u, and all possible (unknown) times of defect origin are integrated over. The problem is how to extend this pdf to include the observation of a subjective measurement HLA. We write

Pf( t~, t, H L A ) = ftlg(u)f(t-u)q(HLAlt, u) du


(2)

as the pdf of (1), but now including the pdf of observing the subjective m e a s u r e m e n t HLA. The pdf q ( H L A l t , u) is the pdf of observing H L A given a defect origin time u and a failure at time t. The correct value for H L A would be h - t - u. We assume that the pdf of observing H L A depends only on the true value h being subjectively assessed, so that q ( H L A ] t , u) - q ( H L A l t - u ) . Assume further that H L A is a biased estimator, so that E(HLA)
=

yqh,

where the p a r a m e t e r yq need not be unity. A suitable candidate distribution for q is the G a m m a density

q ( x [ h ) = F ( r q ) " h ~ ~qh ]

l rqx I

~ rq-- 1

e-~"x/v"h'

(3)

where the scale constant has been chosen so that E ( x ) = yqh as required. The standard deviation is

t~ = yqhrq 1/2,
which approaches zero as rq ~ ~, in which case q(ULA]tu) -~ 3 ( H L A yq(t- u)),

the Dirac delta function. Writing this as ~ ( H L A - "gq(t - u ) ) = t~( yqU - ('yqt - H L A ) )

Likelihood function for delay-time problems with subjective data


The likelihood in [3] is built up as a product of probabilities and pdfs. For example, a series of inspections at which no defect is found, with the last occurring at time t n from latest component renewal, and culminating in a failure at time t, would contribute a factor to the likelihood of

t, HL-tt,
it is clear that a s rq --o oo and the random error on the subjective estimate H L A tends to zero, the integral in (2) collapses so that

Pf( t~, t, H L A ) ~ y q l g

( HLA HLA)
t- - f Yq / \ ~q ,

(4)

418

R.D. Baker,A.H. Christer/ Rev&wof delay-timeOR modelling


Given a large dataset from a case-study, it would be possible to estimate yp and yq, and rp and rq, giving bias and error of the subjective estimates, and to estimate by how much the standard deviation of the optimum inspection interval had decreased through use of subjective data. The parameterisation here is only crude, and a bivariate distribution for H L A and H M L should really be fitted, with a correlation between them, and a nonlinear bias. However, this parameterisation is already quite complex enough for a beginning. If subjective measurements of H L A or H M L were reported as zero, this would cause a problem, as the whole likelihood function would then reduce to either zero or infinity (depending on the values of yp, yq). This difficulty could be overcome by integrating the Gamma pdf for H L A or H M L from zero to some small positive value e, to obtain a mixed distribution. 'Zero' would then be taken to mean 'less than or equal to e'.

as expected; this is just the pdf of observing a delay-time h = H L A / y q and a defect origin time u = t - h, with y q 1 as a Jacobian. Turning to another likelihood factor, the probability of finding a defect at an inspection at time t, subsequent to a series of inspections at which no defect was found, the last of these being at time t n < t is

Pi(tn, t ) = ftlg(u ) duft~uf(h ) dh.

(5)

When in addition H L A and H M L are measured subjectively, the factor becomes Pi(tn, t, HLA, H M L )
oo

= (g(u)

duft_J (h)p(HMLlu,

h)
(6)

q ( n L A [u, h) dh,

where p ( H M L [ u, h) is the pdf of observing H M L given a defect origin time u and a delay-time h. As before, assume that H M L and H L A are stochastic functions only of their true values, h + u - t and t - u respectively. Then P~(tn, t, HLA, H M L )

6. Further research possibilities


Having already summarised previous work on the DTM, and its present status, we seek to identify useful areas of development for the next few years. The D T M is clearly a useful tool for improving maintenance policies. Although many variations in maintenance practice are possible, the situation where an engineer decides at inspection that a component is defective, and replaces it, is very common. Hence there will continue to be a place for models of this practice. The DTM, in common with seemingly all models of maintenance, has at its core the notion of maintenance as restoration to an earlier and more reliable state. What is unusual here is the emphasis that the model must be consistent with available data on current practice. This is to some extent built-in to the model, by formulating it in terms of two unknown distributions, g and f , rather than specifying the form more closely. For example, Luss [28] developed a semi-Markov model with exponentially distributed state-sojourn times; this parameterisation may be a little restrictive. What we suggest is required to further establish and develop modelling is: 1. The derivation and testing of suitable basic or core models for various applications, such as

g(u) d u f t _ u f ( h ) p ( H M L I h
q ( H L A I / - u) dh.

+u-t)
(7)

Assume that E ( H L A ) = Tq( t - u) and E ( H M L ) = yp(h + u - t ) , and again (3) (with ,/q replaced by yp, with rq by rp and h by h + u - t ) is a candidate pdf for p(x I h + u - t). It is evident that as rp --* oo and rq ~ oo, then H L A and H M L become precise measurements with no random error, and the two delta-functions resulting cause the double integral to collapse to PI(tn, t, HLA, H M L )
= Tp-l'yq-lg t

3'q

HLA
- -~i

HML )

3'p

(8)

R.D. Baker, A.H. Christer / Review of delay-time OR modelling

419

building, vehicle fleet, and industrial plant maintenance, in which all major effects (such as imperfection of inspection) which experience suggests are important are included. Analytic calculation of consequence variables such as cost and downtime should also be taken as far as possible for such core models. 2. Sensitivity analysis be used to establish the relative suboptimality in decision variables associated with error in model p a r a m e t e r s and distributions. 3. The identification of minor effects which may also need to be included in models in some cases, and their best parameterisation. 4. The development of a well-tried and tested methodology, to collect adequate data to enable the calculation of optimum maintenance policies from the DTM. 5. The development of a set of best methods for identifying the required model extensions, fitting models to data, and validating fitted models. In addition to the normal validation, post implementation assessments are required. 6. Ultimately, a near automatic procedure for calculating optimum maintenance policies. Regarding the first point, modelling proceeds iteratively by formulating what seems a plausible model, and refining it on experience of its performance in case studies. It is not a priori clear whether, for example, the e x p o n e n t i a l / W e i b u l l distribution is a good choice for g and f , or whether another family of distributions would be preferable. After more experience with case studies, it may be possible to produce standard parametric forms for g and f which are generally adequate, or at least, one family of distributions per type of application. It would be worthwhile assuming an underlying wear model in some cases, to see if this parameterisation can improve model fit over the ad hoc distributions used to date. Analytic calculations of consequence variables are worthwhile and desirable but not crucial, as at a pinch simulation can often be used to find the optimum policy. However, the second point concerning sensitivity analysis is important in establishing the significance of p a r a m e t e r estimation. The third point is that there will always be special considerations caused by some particular vagary of maintenance practice, which even a general basic model will not accommodate. It is

desirable to recognise these and be p r e p a r e d for them as far as possible. Some areas of interest are where: There is a correlation between u and h. The probability of defect detection may for example depend on time since defect origin. The condition-monitoring situation where some measure of wear is monitored is different from the case where some defect such as a crack originates and may or may not be noticed. The cost of repair depends on time since defect origin and may be a deterministic or stochastic function of it. There is component dependency, e.g. failure of one component could make others also fail or become aged. Under the fourth head of data collection, work has already been carried out on eliciting subjective data from engineers. U n d e r this head also come such humble but time-consuming problems as that of extracting and validating data from records held on unfamiliar computer systems. Such practical problems must be addressed, experience gained in solving them, and perhaps a handbook for O R practitioners prepared. Under the fifth heading, if we accept that this is the domain of Statistics, p a r a m e t e r estimation by maximum-likelihood and use of the A I C to select the model with best predictive power are the r e c o m m e n d e d methods to adopt. Some casestudies with a large amount of data should be carried out, to enable the usefulness of subjective data to be evaluated. It is important to establish how valuable subjective data is, and to produce good methods of analysing it. Such analysis may well be often necessary, as the quality of so-called objective data logged with information systems is often low. Eliciting subjective data is not easy, and will require considerable liaison with engineers. If such data can be dispensed with for some applications, one would like to do so, or to only collect subjective data that usefully complemented any objective data that might exist. Otherwise, the fifth heading includes the development of more aids for assessing model fit and suggesting ways to improve it. Graphical aids are probably most useful. Establishing that the nonlinear likelihood maximisation program has converged to the true maximum is essential. Theoretical proofs of a unique 'hill' to be climbed are best, but failing that, adequate numerical meth-

420

R.D. Baker, A.H. Christer / Review of delay-time OR modelling

ods are needed. At present, the only such result is the general theoretical result that for asymptotically large sample sizes, the likelihood function has a unique maximum. Finally, we have seen that the process of model construction, fitting, and validation, followed by calculation of consequence variables and the production of an optimum policy, is a complex one. How automatic can this be made? Could there be for example an expert system that someone (e.g. an engineer) with no mathematical knowledge could use with limited knowledge and, therefore, understanding, and which would produce a 'correct' answer? Failing that, could a handbook of instructions and a computer program be provided that such a person could successfully use? There seem to be two steps that must be taken before this would be possible. The first is the formulation of a set of generic models which are adequate to all situations. Considerable experience will be needed before we can confidently say we have such general models. Otherwise, blind application of models will neglect crucial divergences between the model and the reality. For example, in the hospital study, it was discovered that infusion pumps were often examined by engineers in response to what later transpired to be user-errors. There were many such false alarms. However, at each such incident, some maintenance activity was carried out. To neglect this activity when deriving an optimum policy would have led to the recommendation of an unduly short inspection interval. Having such a generic model, it would also be necessary to ensure that the user clearly understood how to specify the options available. For example, in a multicomponent model, either all components may be inspected when any one of them fails, as was the case in [3], or only the failed component replaced, with no examination of other components. The user might have to specify which practice prevailed, and a wrong answer would lead ultimately to an incorrect maintenance schedule. Second, model-fitting is a nonlinear process, which currently requires some TLC (Tender Loving Care). Iteration may be begun from inappropriate starting values and converge to an obviously wrong result, or it may diverge slowly to infinity. It is in principle possible to construct a robust function-minimiser that could detect all

errors in data, and be guaranteed to yield a sensible result, but this work would not be trivial. Having listed all these problems, it is hard to see that an expert system would be other than premature if constructed in the next few years. Even then, it would need to be developed for a specific application area, such as, for example, vehicles or buildings. The analogous domain of Statistics Packages is one where a set of related techniques (t-test, correlation, factor analysis, etc.) unfamiliar to most users of the package is provided, with an accompanying manual. The end-user is expected to master enough statistical knowledge and computing ability to successfully use the package to solve his or her problem. If we exclude really 'advanced' statistical methods, this is perhaps a task of comparable complexity to deriving a near optimum maintenance regime by applying the DTM. The analogy assumes that considerable work has gone to package a set of D T M programs for easy use, and to write a handbook for users. One of the authors has had long (and bitter) experience of advising such end-users. Some undoubtedly produce useful and correct results. Many muddle through, and produce results which while not quite correct, are more or less adequate. Some have no idea at all what they are doing, and produce results (if at all) which are arguably worse than useless. Although Statistics Packages have been in use for decades, and friendly online help commands have replaced obscure error messages, there is as yet no successful package that provides 'intelligent help'. By this is meant help that shows some understanding by the program of whether the user is attempting a task appropriate to the data. A f o r t i o r i there is no intelligent system that can replace a Statistics Package, online help, and a set of manuals. It is concluded that the provision of a delay-time modelling package and accompanying textbook is a task that could be reasonably attempted after a few years more experience with the model, but that the justifiable provision of an expert system is not, perhaps, on the immediate horizon.

7. Conclusions

One of the reasons the D T M has met with initial success in applied studies is that the con-

R.D. Baker, A.H. Christer / Review of delay-time OR modelling

421

cept is fundamental to engineering experience, and, therefore, readily embraced by engineers. This does not mean it is simple to implement at present. Studies have been effective in that they caused reasoned changes to take place in management practice which were subsequently vindicated. Any modelling requires close collaboration between the analyst and customer, but in the case of the DTM, the association needs to be particularly close and continuous if the correct data are to be properly collected, and appropriate models constructed. Some of the necessary steps required to expand the applicability of delay-time analysis to enable engineering aspects of maintenance to be modelled for a wider class of problems have been presented and discussed. It is concluded that effective intelligent software support for the DTM is possible, but needs to wait until the techniques of DTM are further developed and substantiated for defined classes of

problems.

Acknowledgement

Research reported here has been supported in part by the SERC grant Nos. GR/HZ3351 and GR/H81535.

References
[1] Abdel-Hameed, M., "Inspection and maintenance policies of devices subject to deterioration", Advances in Applied Probability (1987) 10 917-931. [2] Baker, R.D., "Estimating optimum inspection intervals for repairable machinery by fitting a Delay-Time model", Salford University Mathematics Dept. Technical Report MCS-92-08, 1992. [3] Baker, R.D., and Wang, W., "Estimating the delay-time distribution of faults in repairable machinery from failure data", IMA Journal of Mathematics Applied in Business and Industry 3 (1991) 259-281. [4] Baker, R.D., and Wang, W., "Developing and testing the delay-time model", Salford University Mathematics Dept. Technical Report MCS-92-09, 1992, and Journal of the Operational Research Society 44 (1993) 361-374. [5] Barlow, R.E., and Hunter, L.C., "Optimum preventive maintenance policies", Operations Research 8 (1960) 90100. [6] Butler, D.A., "A hazardous inspection model", Management Science 25 (1979) 79-89. [7] Cerone, P., "On a simplified delay-time model of reliability of equipment subject to inspection monitoring", Journal of the Operational Research Society 42 (1991) 505-511.

[8] Chilcott, J.B., and Christer, A.H., "Modelling of condition-based maintenance at the coal face", International Journal of Production Economics 22 ( 1991) 1- 11. [9] Chou, C.K., and Butler, D.A., "Assessment of hazardous inspection policies", Naval Research Logistics Quarterly 30 (1983) 171-177. [10] Christer, A.H., "Innovatory decision making, " in: K. Bowen and D.J. White (eds.), Proc. NATO Conference on Role and Effectiveness of Decision Theory in Practice, 1976. [11] Christer, A.H., "Modelling inspection policies for building maintenance", Journal of the Operational Research Society 33 (1982) 723-732. [12] Christer, A.H., "Operational Research applied to industrial maintenance and replacement," in: Eglese and Rand (eds.), Developments in Operational Research, Pergamon Press, Oxford, 1984, 31-58. [13] Christer, A.H., "Delay-time model of reliability of equipment subject to inspection monitoring", Journal of the Operational Research Society 38 (1987) 329-334. [14] Christer, A.H., "Condition-based inspection models of major civil-engineering structures", Journal of the Operational Research Society 39 (1988) 71-82. [15] Christer, A.H., "Modelling for control of maintenance for production", in: Onderhoud en Logistiek (Op Weg naar Integrale Beheersing), Samsom/Nive, 1991. [16] Christer, A.H., "Prototype modelling of irregular condition monitoring of production plant", IMA Journal of Mathematics Applied in Business and Industry 3 (1991) 219-232. [17] Christer, A.H., and Redmond, D.F., "A recent mathematical development in maintenance theory", 1MA Journal of Mathematics Applied in Business and Industry 2 (1990) 97-108. [18] Christer, A.H., and Redmond, D.F., "'Revising models of maintenance and inspection", International Journal of Production Economics 24 (1992) 227-234. [19] Christer, A.H., and Waller, W.M., "Delay time models of industrial maintenance problems", Journal of the Operational Research Society 35 (1984) 401-406. [20] Christer, A.H., and Waller, W.M., "'An operational research approach to planned maintenance: Modelling PM for a vehicle fleet", Journal of the Operational Research Society 35 (1984) 967-984. [21] Christer, A.H., and Waller, W.M., "Reducing production downtime using delay-time analysis", Journal of the Operational Research Society 35 (1984) 499-512. [22] Christer, A.H., and Wang, W., "A model of condition monitoring of a production plant", International Journal of Production Research 9 (1992)2199-2211. [23] Christer, A.H., and Whitelaw, J., '<An OR approach to breakdown maintenance problem recognition", Journal of the Operational Research Society 34 (1983) 1041-1052. [24] Desa, I.M., and Christer, A.H., "Maintenance availability modelling of bus transport in Malaysia: Issues and problems", International Conference on Operational Research for Development, Ahmedebad, India, 1992. [25] Gits, C.W., "On the maintenance concept for a technical system", Maintenance Management International 6 (1987) 223-237. [26] Hutchinson, T.P., and Lai, C.D., Continuous Bilariate

422

R.D. Baker, A.H. Christer / Review of delay-time OR modelling [31] Redmond, D.F., Personal communication. [32] Thomas, L.C., Gaver, D.P., and Jacobs, P.A., "Inspection models and their application", IMA Journal of Mathematics Applied in Business and Industry 3 (1991) 283-303. [33] Valdez-Flores, C., and Feldman, R.M., "A survey of preventive maintenance models for stochastically deteriorating single-unit systems", Naval Research Logistics Quarterly 36 (1989) 419-446. [34] Wang, W. "Modelling condition monitoring inspection using the delay-time concept", Ph,D. Thesis, Dept. of Mathematics and Computer Science, University of Salford, 1992.

Distributions, Emphasising Applications, Rumsby Scientific Publishing, Adelaide, SA, 1990. [27] Kander, Z., "Inspection policies for equipment characterised by N quality levels", Naval Research Logistics Quarterly 25 (1978) 243-255. [28] Luss, H., "Maintenance policies when deterioration can be observed by inspection", Operations Research 24 (1976) 359-366. [29] Nakagawa, T., "Periodic inspection policy with preventive maintenance", Naval Research Logistics Quarterly 31 (1984) 33-40. [30] Pellegrin, C., "A graphical procedure for an on-condition maintenance policy: Imperfect-inspection model and interpretation", IMA Journal of Mathematics Applied in Business and Industry 3 (1991) 177-191.

Вам также может понравиться