reliability

© All Rights Reserved

Просмотров: 35

reliability

© All Rights Reserved

- The Ultimate Probability Cheatsheet
- YMS Ch7: Random Variables AP Statistics at LSHS Mr. Molesky
- SM Pizza Report
- Brownian Motion on Time Scales, Basic Hyper Geometric Functions, & Some Continued Fractions of Ramanujan
- Advanced Probability Theory for Bio Medical Engineers - John D. Enderle
- hhdghh bmmmmm bbjkkk blggjkk bnnm,,
- M1905 Topic 2A Probability
- Preventive Ma Int
- Lec Uniform Disttribution
- Mobile Computing and Simulation April 2009
- RV and Distributions
- 3160_PracticeProbs_Exam2
- 9A04303 Probability Theory & Stochastic Processes
- Chapter 3 Part 1
- c65
- Review Normal
- Chapter 7 L2
- Actuary PS3
- 1107.5849v3
- 04 Exponential Poisson

Вы находитесь на странице: 1из 32

Ernesto Gutierrez-Miravete

Spring 2007

1

1.1

Introduction

Probability

Events and sample space are fundamental concepts in probability. A sample space S is

the set of all possible outcomes of an experiment whose outcome cannot be determined

in advance while an event E is a subset of S. The probability of the event E, P (E), is a

number satisfying the following axioms

0 P (E) 1

P (S) = 1

P(

Ei ) =

P (Ei )

One can associate with each occurrence in S a numerical value. A random variable X

is a function assigning a real number to each member of S. Random variables can adopt

discrete values or continuous values.

1.2

If X is a random variable (i.e. a function defined over the elements of a sample space) with

a finite number of possible values xi RX with i = 1, 2, ... , where RX is the range of values

of the random variable, then it is a discrete random variable.

The probability of X having a specific value xi , p(xi ) = P (X = xi ) is a number such that

p(xi ) 0

for every i = 1, 2, ..., and

X

i=1

p(xi ) = 1

The collection of pairs (xi , p(xi )), i = 1, 2, ... is called the probability distribution of X.

p(xi ) is the probability mass function of X.

Two examples of discrete random variables are:

Number of jobs arriving at a job shop each week.

Tossing a loaded die.

1.3

The probability that X [a, b] is

P (a X b) =

f (x)dx

f (x) 0

Z

RX

f (x)dx = 1

and, if x 6 RX

f (x) = 0

Two example of continuous random variables are:

The life of a device.

Temperature readings in a turbulent flow field.

1.4

The CDF is defined as

n

X

F (x) =

i=1

p(xi )

Z

F (x) =

f (t)dt

for continuous X x.

Note that if a < b then F (a) F (b), limx F (x) = 1, limx F (x) = 0 and P (a

X b) = F (b) F (a).

Exercise. Determine the probabilities of various outcomes in tossing a loaded die and

also the probability that a device has a certain life.

1.5

n

X

E(X) =

xi p(xi )

i=1

E(X) =

xf (x)dx

for continuous X. E(X) is also called the mean or the first moment of X. Generalizing,

the nth moment of X is

E(X n ) =

n

X

i=1

E(X) =

xni p(xi )

xn f (x)dx

for continuous X.

A moment generating function of a random variable X can be defined as

Z

tX

(t) = E(e ) =

3

etX dF (x)

Moments of all orders for X are obtained as the derivatives of . The existence of a moment

generating function uniquely determines the distribution of X.

The variance of X, V (X) = var(X) = 2 is

2 = E((X E(X))2 ) = E(X 2 ) (E(X))2

are associated with its skewness and its kurtosis, respectively.

Exercises. Determine the expectations of various outcomes in tossing a loaded die and

that of certain device having a certain life.

Another important statistic is the covariance of two random variables X and Y , Cov(X, Y ).

This is defined as

Cov(X, Y ) = E(XY ) E(X)E(Y )

If Cov(X, Y ) = 0 the variables are said to be uncorrelated. Further, the autocorrelation

coefficient, (X, Y ) is defined as

(X, Y ) =

Cov(X, Y )

1

(var(X)var(Y )) 2

The conditional probability gives the probability that a random variable X = x given

that Y = y and is defined as

P (X = x|Y = y) =

P (X = x, Y = y)

P (Y = y)

Exercise. In a population of N people NA are color blind, NH are female and NAH are

color blind females. If a person chosen at random turns out to be a female, what is the

probability that she will also be color blind?

1.6

The following limit theorems are of fundamental and practical importance. They are given

here without proof.

The strong law of large numbers states that if the random variables X1 , X2 , ..., Xn

are independent and identically distributed (iid) with mean then the limit

Pn

lim

Xi

=

= lim X

n

n

with probability P = 1.

Furthermore if the variance of the distribution of the Xi above is 2 , the central limit

theorem states that

Z a

1

X

2

ex /2 dx

a] =

lim

P

[

n

/ n

2

4

In words thetheorem states that the distribution of the normalized random variable

)/(/ n) approaches the standard normal distribution of mean 0 and standard

(X

deviation 1.

Discrete Distributions

2.1

Bernoulli distribution

For an experiment consisting of n independent trials each with two possible outcomes, namely

success and failure. If Xj = 1 for a success and Xj = 0 for failure and the probability of

success remains constant from trial to trial, the probability of success at the jth trial is given

by the Bernoulli distribution as follows

xj = 1, j = 1, 2, ..., n

pj (xj ) = p(xj ) = 1 p = q xj = 0, j = 1, 2, ..., n

0

otherwise

Note that E(Xj ) = p and V (Xj ) = pq.

The outcome of tossing a fair coin n times can be represented by a Bernoulli distribution

with p = q = 12 .

2.2

Binomial distribution

The number of successes in n Bernoulli trials is a random variable X with the binomial

distribution p(x)

p(x) =

n

x

px q nx x = 0, 1, 2, ..., n

otherwise

where

n

x

n!

x!(n x)!

Consider as an example the following situation form quality control in chip manufacture

where the probability of finding more than 2 nonconforming chips in a sample of 50, is

P (X > 2) = 1 P (X 2) = 1

2

X

x=0

n

x

px q 50x

2.3

Geometric distribution

The number of trials required to achieve the first success is a random variable X with the

geometric distribution p(x)

(

p(x) =

q x1 p x = 1, 2, ...

0

otherwise

Exercise. In acceptance sampling one must determine, for example, the probability that

the first acceptable item found is the third one inspected given that 40% of items are rejected

during inspection. Determine the values of x and q and find p(x).

2.4

Poisson distribution

(

p(x) =

exp()x

x!

x = 0, 1, 2, ...

otherwise

F (x) =

x

X

exp()i

i!

i=0

The number of customers arriving at a bank.

Beeper calls to an on-call service person.

Lead time demand in inventory systems.

3

3.1

Continuous Distributions

Uniform distribution

For a random variable X which is uniformly distributed in [a, b] the uniform probability

density function is

(

f (x) =

1

ba

axb

otherwise

F (x) =

x<a

ax<b

xb

xa

ba

(ba)2

x2 x1

.

ba

a+b

2

and

V (X) = 12 .

Examples of uniformly distributed random variables could be:

Inter arrival time for calls seeking a forklift in warehouse operations.

Five minute wait probability for passenger at a bus stop.

Readings from a table of random numbers.

3.2

Exponential distribution

(

f (x) =

exp(x) x 0

0

elsewhere

(

F (x) =

0R

x

0

x<0

exp(t)dt = 1 exp(x) x 0

Examples of exponentially distributed random variables include:

Inter arrival times of commercial aircraft at an airport.

Life of a device.

The exponential distribution possesses the memoryless property, i.e. if s 0 and t 0

then P (X > s + t|X > s) = P (X > t). Clearly, unless there is agreement beforehand the

time one person arrives at the bank is independent of the arrival time of the next person.

Another example is that of the life of a used component which is as good as new. In the

discrete case the geometric distribution also possesses the memoryless property.

3.3

Gamma distribution

Z

() =

x1 exp(x)dx

A random variable X has a gamma probability density function with shape parameter and scale parameter if

(

f (x) =

(x)1

()

exp(x) x > 0

otherwise

(

F (x) =

1

0

R

x

(t)1

()

exp(t)dt x > 0

x0

3.4

Erlang distribution

cumulative distribution function is

(

F (x) =

1

0

Pk1

i=0

exp(kx)(kx)i

i!

x>0

x0

Examples of gamma distributions occur for random variables associated with the reliability function and in the probability that a process consisting of several steps will have a

given duration.

3.5

Normal distribution

N (, ) if its probability density function in x [, ] is

1 x 2

1

f (x) = e 2 ( )

2

F (x) = P (X x) =

1 t 2

1

e 2 ( ) dt

2

deviation of 1. Its probability density function is:

z2

1

(z) = e 2

2

(z) = P (X x) =

t2

1

e 2 dt

2

Time to perform a task.

Time waiting in a queue.

lead time demand for an item.

3.6

Lognormal distribution

density function in x [0, ] is given by

f (x) =

x 2

1

2

e[(ln( m )) /(2 )]

(x ) 2

where is the location parameter (often = 0) and m is the scale parameter. When = 0

and m = 1 one has the standard lognormal distribution.

The cumulative distribution function is

ln x

)

F (x) = P (X x) = (

Because of its relation with the normal distribution of mean and variance 2 , the

probability density function of the lognormal distribution with location parameter = 0 is

sometimes expressed as

f (x) =

x 2

e[(ln x)

2 /(2 2 )]

here, and 2 are the mean and standard deviation of the random variables logarithm. The

expected value (mean) of the lognormal distributed random variable is

E(x) = exp( + 2 /2)

and the variance is

var(x) = exp(2 + 2 2 /2) exp(1 + 2 )

9

3.7

Weibull distribution

A random variable X associated with the three parameters < < (location), > 0

(scale) and > 0 (shape), has a Weibull distribution if its probability density function is

(

f (x) =

x 1

( )

exp(( x

) ) x

otherwise

(

F (x) =

0

x<

x

1 exp(( ) ) otherwise

(

f (x) =

1 ( x )1 exp(( x ) ) x 0

0

otherwise

(

F (x) =

0

x<0

1 exp(( x ) ) otherwise

(

f (x) =

exp( x ) x 0

0

otherwise

The mean and variance of the Weibull distribution are, respectively E(X) = +( 1 +1)

and V (X) = 2 (( 2 + 1) ( 1 + 1)2 ).

Examples of Weibull distributed random variables include:

Mean time to failure of flat panel screens.

Probability of clearing an airport runaway within a given time.

3.8

its probability density function in x [, ] is given by

f (x) =

x

1 x

e ee

10

where is the location parameter and is the scale parameter. When = 0 and = 1 one

has the standard Gumbell distribution.

The cumulative distribution function of the standard Gumbell distribution is given by

x

F (x) = P (X x) = 1 ee

This distribution has been found useful for the description of extreme events such as

floods or earthquakes.

3.9

Triangular distribution

f (x) =

2(xa)

(ba)(ca)

2(cx)

(cb)(ca)

axb

bxc

otherwise

F (x) =

xa

a<xb

b<xc

x>c

(xa)2

(ba)(ca)

(cx)2

1 (cb)(ca)

The mean E(X) = (a + b + c)/3 and the mode M = b. The median is obtained by setting

F (x) = 0.5 and solving for x.

The triangular distribution is a useful one when the only information one has available

about the random variable are its extreme and its maximum values.

Empirical Distributions

If the distribution function of a random variable can not be specified in terms of a known

distribution and field data is available, one can use an empirical distribution. Empirical

distributions can be discrete or continuous.

of a certain population using only information obtained from a random sample extracted

from such population. Inference is an aid in making decisions confronted with uncertainty

and it is the foundation of modern decision theory.

11

of the population using the sample data. Confidence intervals with a specified degree of

confidence are used in interval estimation.

Sometimes, rather than in the value of a parameter one is interested in the validity of

a certain statement (hypothesis testing). In such cases one can encounter the following

situations:

Accept the statement, it being true (No error is involved).

Reject the statement, it being true (Type I error).

Accept the statement, it being false (Type II error).

Reject the statement, it being false (No error is involved).

One is then interested in the probabilities of incurring in Type I and Type II errors

(respectively, and ).

Two commonly used statistical inference tests in simulation modeling are the Chi squared

and Kolmogorov-Smirnov tests.

Exercise. Do some research and find out how are the Chi-squared and the KolmogorovSmirnov tests performed.

6.1

Stochastic Processes

A stochastic process takes place in a system when the state of the system changes with

time in a random manner. Many if not most natural and/or human-made processes are

stochastic processes, although in some cases the random aspects can be neglected.

6.2

Poisson Process

Often one is interested in the number of events which occur over a certain interval of time,

i.e. a counting process (N (t), t 0). A counting process is a Poisson process if it

involves

One arrival at a time.

Random arrivals without rush or slack periods (stationary increments).

Independent increments.

12

Under these circumstances, the probability that N (t) = n for t 0 and n = 0, 1, 2, ... is

P (N (t) = n) =

exp(t)(t)n

n!

This means that N (t) has a Poisson distribution with parameter = t. Its mean and

variance are E(N (t)) = V (N (t)) = = t. It can be shown that if the number of arrivals

has a Poisson distribution, the inter arrival times have an exponential distribution.

The random splitting property of Poisson processes states that if N (t) = N1 (t)+N2 (t)

is Poisson with rate , then N1 and N2 are independent Poisson with rates p and (1 p),

where p and1p are the probabilities of the branches N1 and N2 . Similarly, if N1 (t)+N2 (t) =

N (t), the reverse is true (random pooling property).

6.3

place depend only on the state of the system at the current time, one has a Markov process

or chain. The effect of the past on the future is contained in the present state of the system.

As a simple example of a Markov chain consider a machine that works until it fails

(randomly) and then resumes work once is repaired. There are two states for this system,

namely

The machine is busy (S0 )

The machine is being repaired (S1 )

The system moves from state S0 to S1 at a rate and from S1 back to S0 at a rate .

Exercise. Make a graph representing the above Markov chain.

As a second example consider now a facility where two machines A and B perform an

operation. The machines fail randomly but resume work once they are repaired. The four

possible states of this system are

Both machines are busy (S0 )

Machine A is being repaired while B is busy (S1 )

Machine B is being repaired while A is busy (S2 )

Both machines are being repaired (S3 )

Now 1 and 2 are, respectively, the failure rates of machines A and B while 1 and 2 are

the corresponding repair rates.

Exercise. Make a graph representing the above Markov chain.

13

The Kolmogorov Balance Equations are differential equations relating the probabilities of the various states involved in a Markov chain P0 , P1 , P2 and P3 . They are obtained

by a probability balance on the states. For the second example above they are

dP0

= 1 P1 + 2 P2 (1 + 2 )P0

dt

dP1

= 1 P0 + 2 P3 (1 + 2 )P1

dt

dP2

= 2 P0 + 1 P3 (1 + 2 )P2

dt

dP3

= 2 P1 + 1 P2 (1 + 2 )P3

dt

Under steady state or equilibrium conditions the time derivatives are zero and the probabilities are then related by a system of simultaneous linear algebraic equations.

6.4

A queueing system involves one or more servers which provide some service to customers

who arrive, line up and wait for service at a queue when all the servers are busy. Typically,

both arrival and service times are random variables. The single server queue consists of a

single server and a single queue. If the inter arrival times of customers and the service times

are exponentially distributed the resulting queue is known as the M/M/1 queue.

Inter arrival and service times in queues are often modeled probabilistically. Two

examples of queueing systems are:

Inter arrival times of mechanics at a centralized tool crib.

Number of mechanics arriving at a centralized tool crib per time period.

Random inter arrival and service times are often simulated using exponential distributions. However, sometimes a normal distribution or a truncated normal distribution may

be more appropriate. Gamma and Weibull distributions are also used.

An important parameter of the queueing system is the server utilization given by

=

where is the mean arrival rate of customers from the outside world into the queueing

system and is the mean service rate.

14

The single server queue can also be regarded as a Markov chain in which the various

states are distinguished only by the number of customers waiting in the queue. Let us call

the corresponding states S0 , S1 , ..., Sn . The system can then move into state Si either from

Si1 (if a new customer arrives before service is completed for the customer being served)

or from Si+1 if service is completed and the next customer in line begins service before any

new arrival. Let i,j be the rate at which the system transitions from state Si to state Sj .

Exercise. Make a graph representing the Markov chain for the single teller queue.

If the queue is at steady state, the Kolmogorov equations yield

n1,n ...1,2 0,1

Pn =

P0

n,n1 ...2,1 1,0

where Pn is the probability of encountering n customers in the system and 0,1 = .

Exercise. Derive the above expression.

In investigating queueing systems one is interested in performance measures such

as the expected number of customers in the system L, the expected number of

customers in the queue Lq , the expected wait time of customers in the system W ,

and the expected wait time of customers in the queue Wq . The above expectancies

are related by Littles Formula. The formula simply states that

L = W

or that

Lq = Wq

Exercise. Derive the above expression relating L and W .

A number of queueing problems have been solved yielding closed form expressions for

the above performance parameters. For instance, for the M/M/1 queue at steady state the

results are as follows

L

= 1

1

1

W

= (1)

Lq

Wq

Pn

2

2

= 1

()

= (1)

()

(1 )( )n

Further, for the M/G/1 queue in which the service times have a mean of 1/ and a

variance 2 the corresponding results are

L

W

Lq

Wq

P0

2

2 2 )

2 (1/2 + 2 )

= + (1+

2(1)

2(1)

2 + 2 )

1

+ (1/

2(1)

2

2 2 )

2 (1/2 + 2 )

= (1+

2(1)

2(1)

(1/2 + 2 )

2(1)

15

For the sake of computational convenience in reliability analysis and modeling, raw data

which is known to consist of independent and identically distributed (i.i.d.) entries are fitted

to a theoretical distribution function. This is nowadays easily done using programs such as

Stat::Fit or Expert.Fit.

Time to failure of complex components is usually represented with the Weibull distribution. If failures are completely random, the exponential distribution is used but if failure

times fluctuate equally around a mean, the normal distribution may be more appropriate.

The lognormal distribution can also be used. For incomplete data uniform, triangular

and beta distributions are used.

Data must then be tested for independence. Some useful tools are:

Scatter Plots. Contiguous values in a string of values of a random variable are plotted

on a x y plane. The resulting pattern of points is characteristic of the distribution.

Autocorrelation Plots. The covariance of values separated by a specified lag in a string

of values of a random variable is plotted as a function of the number of data points.

Runs Tests. This searches for peculiar patterns in substrings of numbers from a larger

stream.

Data must also be tested to see if they are Identically Distributed (Homogeneity Tests).

Some useful tools are:

Histograms.

Distribution Plots.

Quartile-Quartile Plots.

Kolmogorov-Smirnov.

Chi-squared.

Time Dependency of Distributions.

ANOVA.

The collected data typically consists of a limited number of data values. Simulation

modeling require large numbers of multiple samples therefore the data must be converted to

a frequency distribution.

Raw data can be used as input for the simulation project but this is usually not recommended except in special cases. More commonly, once data have been tested for independence

and correlation they are converted to a form suitable for use in the simulation model. This is

16

done by fitting it to some distribution. Once a distribution fitting the data has been determined, input for the simulation program is produced as random variates sampled from the

fitted distribution. The frequency distribution selected can be empirical or theoretical, discrete or continuous. Discrete distributions are rarely directly used. Instead, numerical values

of discrete probabilities are directly used. Effectively, continuous, theoretical distributions

are almost always employed.

Of the many available theoretical distributions 12 or so are commonly used in simulation

modeling. Data are fitted to theoretical distributions by identifying the theoretical distribution which best represents the data. Stat::Fit provides a ranking of distributions fitting

a particular data set together with a goodness of fit (Chi-squared or Kolmogorov-Smirnov)

diagnostic. Note also that if the fitted distribution is unbounded values for simulation should

be taken rather from a truncated version of the selected distribution in order to avoid unrealistic extreme values.

7.1

Each statistical distribution function has a physical basis. An understanding of this basis

is useful in determining candidate distributions to represent field data. Following is a brief

summary of the physical basis of selected distributions.

Binomial. This represents the distribution of a random variable giving the number of

successes in n independent trials each yielding either success or failure with probabilities

p and 1 p, respectively.

Geometric. This represents the distribution of a random variable giving the number

of independent trials required in an experiment before k successes are achieved.

Poisson. This represents the distribution of a random variable giving the number of

independent events occurring within a fixed amount of time.

Normal. This represents the distribution of a random variable which is itself the result

of the sum of component processes.

Lognormal. This represents the distribution of a random variable which is itself the

result of the product of component processes.

Exponential. This represents the distribution of a random variable giving the time

interval between independent events.

Gamma. A distribution of broad applicability restricted to non-negative random

variables.

Beta. A distribution of broad applicability restricted to bounded random variables.

17

Erlang. This represents the distribution of a random variable which is itself the result

of the sum of exponential component processes.

Weibull. This represents the distribution of a random variable giving the time to

failure of a component.

Uniform. This represents the distribution of a random variable whose values are

completely uncertain.

Triangular. This represents the distribution of a random variable for which only

minimum, most likely and maximum values are known.

7.2

Representations of Collected Data

Input data for DES models must often be created according to a specific statistical distribution. The required distribution must be identified based on how well it represents the

collected data. Following is a brief summary of the real-life situations where the distributions

mentioned above are likely to be encountered.

Binomial. Useful when there are only two possible outcomes of an experiment which

is repeated multiple times.

Geometric. Useful also when there are only two possible outcomes of an experiment

which is repeated multiple times.

Poisson. Useful to represent the number of incoming customers or requests into a

system.

Normal. Useful to represent the distribution of errors of all kinds.

Lognormal. Useful for representation of times required to perform a given task or

accomplish a certain goal.

Exponential. Useful to represent inter arrival times in all kinds of situations.

Gamma. Useful also for representation of times required to perform a given task or

accomplish a certain goal but more general.

Beta. Useful as a rough model under situation of ignorance and/or to represent the

proportion of non-conforming items in a set.

Erlang. Useful to represent systems making simultaneous request for attention from

a server.

18

Uniform. Useful when one knows nothing about the system.

Triangular. Useful when one know little about the system.

7.3

In DES modeling the collected input data is often replaced by computer generated random

variate values which accurately represent the original data. Typically, several distributions

will be considered good candidates. A histogram of the data can provide a first inkling

about the family of distribution function(s) which can well represent the data.

Another simple test which can be used to quickly determine whether a given set of data are

adequately represented by a specific distribution is the construction of quantile-quantile

plots.

Assume that X is a random variable whose cumulative distribution function is F . The

q-quantile of X is that value of the random variable which satisfies the equation

F () = P (X < ) = q

-largest value

In n data values are arranged in increasing order then the value of the j(n+1)

k

-th

or

half-way

through

will be denoted by gj/k . Therefore, the median is g1/2 , i.e. the n+1

2

the data set.

A common application of this concept is in investigating the distribution of income in

a population where the total number of households is divided into five quintiles, (q =

0.2, 0.4, 0.6, 0.8 and 0.1) by increasing values of income. Specifically, here in the USA, if your

household income is more than about = 80, 000 dollars per year then you belong in the

top quintile. One in five households is in that quintile.

Consider a collection of n values of the random variable X, xi for i = 1, 2, ..., n. If the

values are arranged according to their magnitude a new string of values is obtained which

we call yj with j = 1, 2, ..., n. The new variable becomes immediately an estimate for the

(j 12 )/n quantile of X, i.e.

yj F 1 (

j 12

)

n

Once an appropriate family of distributions has been selected one proceeds to determine

the various distribution parameters using the collected data values. Following is summary

of distribution parameters and their estimators for three commonly used distributions.

Poisson. Parameter: E(X) = . Estimator: sample mean.

Exponential. Parameter: E(X) = . Estimator: reciprocal of the sample mean.

Normal. Parameters: and 2 . Estimators: sample mean and variance.

19

7.4

To determine the appropriateness of a given distribution in a particular situation goodnessof-fit tests are required. The tests verify the validity of the null hypothesis H0 which

states that the random variable X follows a specific distribution.

In this section we examine two commonly used tests used for this purpose, the Chisquare test (applicable to large samples) and the Kolmogorov-Smirnov test (applicable

to small samples and restricted to continuous distributions).

For the Chi-square test the n data points are arranged into a desired number of cells (k).

The expected number of points to fall inside the i-th cell, Ei is then given by

Ei = npi

where pi is the probability associated with that interval and is obtained from the specified

distribution.

For instance, consider the case of reliability data consisting of a total of nf failures,

binned into cells representing number of failures within time intervals of uniform duration

ti = ti+1 ti . Assume that direct calculation of the failure rate (hazard function value)

yields a reasonably constant trend and that an average value is computed. Introduce then

the null hypothesis that the data is exponentially distributed with constant failure rate

estimated as the average failure rate computed from the data. The expected number of

failures inside each time bin is then given by

i ) exp(t

i+1 )]

Ei = nf [exp(t

Next, using the actual number of data points contained in each cell, Oi one computes

the statistic

20 =

n

X

(Oi Ei )2

Ei

i=1

To test the null hypothesis, the critical value of the statistic is then determined as 2,ks1

where is the confidence level and k s1 is the number of degrees of freedom and s is

the number of parameters in the candidate distribution (s = 1 in the case of the exponential

distribution). Finally, if 20 > 2,ks1 then H0 is rejected but if 20 < 2,ks1 the hypothesis

cannot be rejected at the given confidence level.

If the null hypothesis cannot be rejected one can then calculate a confidence interval for

the distribution parameters. For instance, considering again the case of the exponentially

distributed reliability data above, one can show that a 100(1 )% confidence interval for

is given by

the value of

,

]

2nf

2nf

20

For the Kolmogorov-Smirnov test the n data points are also arranged in increasing order.

If possible the data are made dimensionless dividing each value by the largest value in the

set. Then, one calculates the statistics

D+ = max (

i

Ri )

n

D = max (Ri

i1

)

n

and

D = max (D+ , D )

and compares D against the critical value Dc . When D < Dc the null hypothesis H0 cannot

be rejected.

7.5

Sometimes input data for DES models is just not easily available. In those instances one must

rely on related engineering data, expert opinion and physical or other limitations to produce

reasonable input values for the model. A few data values combined with the assumption of

uniform, triangular or beta distribution can provide a solid starting point for research.

7.6

In some situations various inputs may be related to each other or the same input quantity

may exhibit autocorrelation over time. Typical examples are in inventory modeling where

demand data affect lead time data and in stock trading where buy and sell orders called to

the broker tend to arrive in bursts.

When two correlated input variables X1 and X2 are involved one uses their covariance

Cov(X1 , X2 ) or their correlation

=

Cov(X1 , X2 )

2

1

If collected data values for the two values yields >> 0 then one needs to generate correlated

variates.

The following algorithm generates two correlated random variates with normal distributions with parameters 1 , 1 and 2 , 2 , respectively.

Generate two independent standard normal variates Z1 and Z2 .

Set X1 = 1 + 1 Z1

21

Set X2 = 2 + 2 (Z1 +

1 2 Z 2 )

from the same distribution then one uses the lag-h auto covariance Cov(Xi , Xi+h ) or the

lag-h correlation

h =

Cov(Xi , Xi+h )

i+h

i

are commonly used to generate autocorrelated time series.

The algorithm for the AR(1) model is as follows

= Cov(Xt , Xt+1 )/S 2

Using the collected data, determine the values of the parameters X,

(lag-1 autocorrelation) and = S 2 (1 2 ).

Generate t from a normal distribution with mean 0 and variance 2 .

Generate X1 from a normal with mean and variance /(1 2 ).

Set Xt = + (Xt1 ) + t .

Repeat.

The algorithm for the EAR(1) model is as follows

and =

Using the collected data, determine the values of the parameters 1/X

2

Cov(Xt , Xt+1 )/S (lag-1 autocorrelation).

Generate X1 from an exponential with mean 1/.

Generate U from a uniform [0,1].

If U set Xt = Xt1 .

If U > generate t from an exponential with mean 1/ and set Xt = Xt1 + t .

Repeat.

22

Numbers

Recall that for a random variable X which is uniformly distributed in [0, 1] the uniform

probability density function is

(

f (x) =

1, 0 x 1

0, otherwise

0,

x<0

F (x) = x, 0 x < 1

1, x 1

A random number (RN) stream is a collection of uniformly distributed random variables.

A truly random stream of numbers has the following characteristics:

Uniformly distributed.

Continuous-valued.

E(R) = 12 .

2 =

1

.

12

No runs.

In practice one always works with streams of pseudo random numbers (PRN). These

have approximately the same characteristics as RNs. PRNs are generated with a computer

using a numerical algorithm embedded in a computer program or routine. The requirements

of a good PRNG routine are:

Fast.

Portable.

Long Cycle.

Replicability.

Produce PRN with the desired characteristics.

23

8.1

The established algorithm for PRN generation is the linear congruential method (LCM).

More sophisticated approaches still use as foundation this method. The fundamental relationship of the LCM is

Xi+1 = (aXi + c)mod (m)

This means that the value of Xi+1 is the remainder left from integer division of aXi + c by

m. Note that the values obtained form the LCM are from the set I = {0, 1/m, 2/m, ..., (m

1)/m}.

One key feature of the method is its period (P ) (the number of numbers that can be

generated before the same number appears twice). The period is related to the values of m

and c as follows:

If m = 2b and |c| > 0, P = m = 2b .

If m = 2b and c = 0, P = m/4 = 2b2 .

If m = prime and c = 0, P = m 1 = 2b 1.

8.2

Large simulations require large collections of PRNs and there is a need for still longer periods.

These can be obtained by the use of combined linear congruential methods (CLCM).

The fundamental theorem associated with CLCM is LEcouyers.

If W i, 1, Wi,2 , ..., Wi,k are independent discrete-valued random variables with at least one

of them (say Wi,1 ) being uniformly distributed between 0 and m1 2. then

Wi = (

k

X

j=1

More specifically, consider the following algorithm

Xi = (

k

X

j=1

(

Ri =

Xi

,

m1

m1 1

,

m1

24

Xi > 0

Xi = 0

It can be shown that the maximum period obtained with this algorithm is

P =

2k 1

X1,j+1 = 40014X1,j mod (2147483563)

X2,j+1 = 40692X2,j mod (2147483399)

produce the combined PRNG

Xj+1 = (X1,j+1 X1,j+1 )mod (2147483562)

to yield

(

Rj+1 =

Xj+1

,

2147483563

2147483562

,

2147483563

Xj+1 > 0

Xj+1 = 0

Since one always works in practice with PRN streams it is necessary to check how close are

their characteristics to those of real RN streams. Assume a stream containing N PRNs has

been produced. To verify their characteristics the stream is subjected to various tests. In all

cases, one states a hypothesis about a given characteristic of the stream and then accepts

it or rejects it with a given level of significance where

= P (rejectingH0 |H0 is true)

(i.e. Type I error).

In testing for uniformity The null hypothesis H0 is

Ri U [0, 1]

while the alternative hypothesis H1 is

/ U [0, 1]

Ri

In testing for independence The null hypothesis H0 is

Ri independent

while the alternative hypothesis H1 is

/ independent

Ri

25

9.1

For this test the numbers are first arranged in increasing order

R1 < R2 < ... < RN

The test makes use of the new variables

D+ = max (

i

Ri )

N

D = max (Ri

i1

)

N

and

D = max (D+ , D )

Once D has been computed, a critical value Dc is obtained from the K-S statistical

table for the desired and the given N . Finally

If D > Dc , H0 is rejected (H1 is accepted).

If D Dc , H0 is not rejected (i.e. the numbers are uniformly distributed).

9.2

In this test the numbers are arranged into n classes by subdividing the range [0, 1] into n

subintervals and determining how many of the numbers end up in each class i, (Oi ).

The test uses the statistic

20 =

n

X

(Oi Ei )2

Ei

i=1

where Ei = N/n are the expected numbers of numbers in each class for a uniform distribution.

Once 20 has been computed, a critical value 2,n1 is obtained from the Chi-square

statistical table. Finally

If 20 > 2,n1 , H0 is rejected (H1 is accepted).

If 20 2,n1 , H0 is not rejected (i.e. the numbers are uniformly distributed).

26

9.3

Runs Test

This test aims to detect whether there are patterns in substrings of the stream. One examines

the stream and checks whether each number is followed by a larger (+) or a smaller ()

number. Runs are the resulting patterns of +s and s. In a truly random sequence the

mean and variance of the number of up and down runs a are given by

a =

2N 1

3

and

a2 =

16N 29

90

Z0 =

a a

a

which has the normal distribution of mean zero and unit standard deviation (N (0, 1)).

Once Z0 has been computed a critical value z/2 is obtained from the normal statistical

table. Finally

If Z0 < z/2 or Z0 > z/2 , H0 is rejected (H1 is accepted).

If z/2 Z0 z/2 , H0 is not rejected (i.e. the numbers are independent).

Other types of runs tests are also possible, for instance runs above and below the mean

and run lengths. For runs above and below the mean a test similar to the one above is used

but with the values of mean and variance for the number of runs b

b =

2n1 n2 1

+

N

2

and

b2 =

2n1 n2 (2n1 n2 N )

N 2 (N 1)

where n1 and n2 are, respectively, the numbers of runs above and below the mean.

For run lengths one uses the Chi square test to compare the observed number of runs

of given lengths against the expected number obtained in a truly independent stream.

27

9.4

Autocorrelation Test

This test aims to detect correlation among numbers in the stream separated by specific

number of numbers (lag). Consider the autocorrelation test for a lag m. One investigates

then the behavior of numbers Ri and Ri+jm . If the autocorrelation im > 0 there is positive

correlation (i.e. high numbers follow high numbers and vice versa) and if im < 0 one has

negative correlation. The autocorrelation is estimated by

0im

M

X

1

[ Ri+km Ri+(k+1)m ] 0.25

=

M + 1 k=0

where M is the largest integer satisfying i + (M + 1)m N . The test statistic is in this case

given by

Z0 =

0im

0im

where

0im =

13M + 7

12(M + 1)

Once Z0 has been computed a critical value z/2 is obtained from the normal statistical

table. Finally

If Z0 < z/2 or Z0 > z/2 , H0 is rejected (H1 is accepted).

If z/2 Z0 z/2 , H0 is not rejected (i.e. the numbers are independent).

9.5

Gap Test

This test checks for independence by tracking down the pattern of gaps between a given

digit in the stream. The test is performed using the Kolmogorov-Smirnov scheme.

9.6

Poker Test

This test checks for independence based on the repetition of certain digits in the sequence.

The test is performed using the Chi-square scheme.

10

Discrete event simulation models require as inputs the values of random variables with

specified probability distributions. Such random variables are called random variates.

28

Input data for DES models are collected from the field and/or produced from best available

estimates. However, the amount of data collected is rarely enough to run simulation models

and one must use the data to create PRN streams with statistical characteristics similar to

those of the original data.

So, on the one hand one needs to identify the statistical characteristics of the original

data and on the other one must be able to produce large collections of random variates

with statistical characteristics similar to those of the original data. Here we focus on the

second aspect, namely once we have determined the probability distribution applicable to

our data we proceed to generate random variate streams for use in the simulation. This is

accomplished by the inverse transform method.

10.1

Determine the cumulative distribution function of X, F (X).

Set F (X) = R.

Solve the equation F (X) = R for X in terms of R, i.e. X = F 1 (R).

Repeat the above for the stream of random (or pseudo-random) numbers R1 , R2 , ..., Rn

to obtain the stream of random variates X1 , X2 , ..., Xn .

Next, the formulae obtained by the inverse transform method for several commonly used

random variates are given.

10.2

Following are the specific steps required to obtain exponentially distributed random variates

with mean from a random number stream using the inverse transform method.

F (x) = 1 ex .

Set F (X) = 1 ex = R.

X = 1 ln(1 R).

For i = 1, 2, ..., n, compute Xi = 1 ln(1 Ri )

29

10.3

Following are the specific steps required to obtain uniformly distributed random variates

between a and b from a random number stream using the inverse transform method.

F (x) =

xa

.

ba

Set F (X) =

Xa

ba

= R.

X = a + (b a)R.

For i = 1, 2, ..., n, compute Xi = a + (b a)Ri

10.4

Following are the specific steps required to obtain Weibull distributed random variates with

parameters and from a random number stream using the inverse transform method.

F (x) = 1 e(x/) .

1

X = [ln(1 R)] .

1

10.5

Following are the specific steps required to obtain random variates with triangular distribution between 0 and 2 with mode 1 from a random number stream using the inverse transform

method.

F (x) =

x2

2

(2x)2

2

x0

0<x1

1<x2

x>2

Xi =

2Rqi

0 < Ri 12

2 2(1 Ri ) 12 < Ri 1

30

10.6

If no appropriate distribution can be found for the data one can resort to resampling the

data. This creates an empirical distribution. A simple empirical distribution can be

produced from given data by piecewise linear approximation.

Assume the available data points (observations) are arranged in increasing order x1 , x2 , ..., xn .

Assume also that a probability is assigned to each resulting range xj xj1 such that the

cumulative probability of the first j intervals is cj . The associated random variate is obtained

as

Xi = xj1 +

xj xj1

(Ri cj1 )

cj cj1

10.7

The normal distribution does not have a closed-form inverse transformation. However, the

following expression is an excellent approximation to the inverse cumulative distribution

function of the standard normal distribution.

Xi

Ri0.135 (1 Ri )0.135

0.1975

From the above, random variates with a normal distribution of mean and standard deviation are readily obtained as

Xi + (

Ri0.135 (1 Ri )0.135

)

0.1975

A direct transformation can be used to produce two independent standard normal variates

Z1 and Z2 from two random numbers R1 and R2 according to

1

Z1 = (2 ln R1 ) 2 cos(2R2 )

and

1

Z2 = (2 ln R1 ) 2 sin(2R2 )

Normal random variates Xi with mean and standard deviation can then be obtained

from

Xi = + Zi

31

10.8

If the random variable Y has the normal distribution with mean and variance 2 , the

associated random variable X = exp(Y ) has the lognormal distribution with parameters

and 2 .

Thus, random variates with a standard lognormal distribution can be generated from the

expression

Xi exp(

Ri0.135 (1 Ri )0.135

)

0.1975

Random variates with a lognormal distribution of parameters and are then generated by

Xi exp[ + (

10.9

Ri0.135 (1 Ri )0.135

)]

0.1975

A similar procedure to the one indicated above can be used to produce discretely distributed

random variates. Since the cumulative distribution functions for discrete distributions consist

of discrete jumps separated by horizontal plateaus, lookup tables are a convenient and very

efficient method of generating inverses.

10.10

When two or more random variables are added together to produce a new random variable

with a desired distribution one is using the method of convolution.

If one generates the random variate by selective accepting or rejecting numbers from a

random number stream one is using the acceptance-rejection technique.

Detailed descriptions of these two methods as well as examples can be found in your

textbook.

32

- The Ultimate Probability CheatsheetЗагружено:qtian
- YMS Ch7: Random Variables AP Statistics at LSHS Mr. MoleskyЗагружено:InTerp0ol
- SM Pizza ReportЗагружено:Joel Manalili
- Brownian Motion on Time Scales, Basic Hyper Geometric Functions, & Some Continued Fractions of RamanujanЗагружено:api-26401608
- Advanced Probability Theory for Bio Medical Engineers - John D. EnderleЗагружено:atul626
- hhdghh bmmmmm bbjkkk blggjkk bnnm,,Загружено:Swagata Sahoo
- M1905 Topic 2A ProbabilityЗагружено:moskvax
- Preventive Ma IntЗагружено:Mahira Zeeshan
- Lec Uniform DisttributionЗагружено:Rana Gulraiz Hassan
- Mobile Computing and Simulation April 2009Загружено:Famida Begam
- RV and DistributionsЗагружено:Shahrul Amirul
- 3160_PracticeProbs_Exam2Загружено:Erik Rogoff
- 9A04303 Probability Theory & Stochastic ProcessesЗагружено:sivabharathamurthy
- Chapter 3 Part 1Загружено:Syihabuddin Yusoff
- c65Загружено:LTE002
- Review NormalЗагружено:Elena Pinka
- Chapter 7 L2Загружено:Mbiko Sabeyo
- Actuary PS3Загружено:appleduck
- 1107.5849v3Загружено:Joseph Zizys
- 04 Exponential PoissonЗагружено:Fadila Nastiti
- stats final project 2Загружено:api-242348262
- 255 project 1Загружено:api-323727518
- Engr 0020 Exam 1 EquationsЗагружено:Zoe
- 2014 12 Lyp Mathematics 04 Outside DelhiЗагружено:Anurag Yadav
- STA1503_2017_TL_104_1_BЗагружено:sal27adam
- FoundationsЗагружено:Eduard Dănilă
- Table of Discrete Probabilty DistributionsЗагружено:Nchang
- Bootstrap Confidence Intervals of Generalized Process Capability Index Cpyk for Lindley and Power Lindley DistributionsЗагружено:sss
- BT QTЗагружено:Harsha Kasaragod
- Lec25_1121Загружено:aqszaqsz

- Frequency Analysis in hydrologyЗагружено:suman59
- Supplement Ch05Загружено:michellefrancis26
- Midterm Self TestsЗагружено:Walter Golden
- _newbold_ism_07.pdfЗагружено:Irakli Maisuradze
- StatChapter7._studentpptЗагружено:khairitkr
- 2012.1 probest sc.pdfЗагружено:Jonathan Zimmerman
- sta1503_2013_trial_exam+solutions_--tl_103_2013_3_eЗагружено:sal27adam
- asgnmnt 1Загружено:Ali Murtaza
- GOODACRE - A Statistical Analysis of the Spatial Association of Seismicity With DrainageЗагружено:Marcelo Delaneze
- EntropyЗагружено:johnoftheroad
- Concept SummaryЗагружено:Sankalp Srivastava
- 3rd Year Course OutlineЗагружено:Miliyon Tilahun
- Random ProcessesЗагружено:jeremiashforonda
- Johnson Statistics 6e TIManualЗагружено:unseenfootage
- Team One Group Final ProjectЗагружено:dee kee
- Edward Greenberg - Introduction to Bayesian Econometrics (2007)Загружено:Timur Abbiasov
- Stat219 NotesЗагружено:jmurray1022
- 2016 ps1Загружено:Tania Saha
- (Courant Lecture Notes in Mathematics 7) S. R. S. Varadhan-Probability Theory-Courant Institute of Mathematical Sciences_ American Mathematical Society (2001)Загружено:Rodrigo
- 10772(1)Загружено:skywalk189
- Lecture 10 Fall 2016Загружено:Aaron Hayyat
- Math 2240 Final Exam 2010Загружено:Karl Todd
- Random VariableЗагружено:peeyushtewari
- Discrete Random Variables.pptЗагружено:Anonymous QcSAY0p
- S.Y.B.Sc. ActuarialЗагружено:chinu-pawan
- Tutsheet2Загружено:vishnu
- PROBABILITY THEORY & STOCHASTIC PROCESSES.pdfЗагружено:Shareef Khan
- 20110907150908_RI.SMS3023Загружено:Putera Gunong Lepak
- Random VariablesЗагружено:Michael Hsiao
- Probability and Random ProcessЗагружено:Elapse Dreammaker Kibria

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.