Вы находитесь на странице: 1из 3

quality control

The Histogram as a Measurement of Process Consistency

shows absolute numbers, with the frequency in thousands. In Figure 1, the histogram on the right differs from the one on the left in that it shows the data cumulative- ly—and the total area of all the bars is equal 100%. The curve displayed is a simple density estimate. In other words, a histogram repre- sents a frequency distribution by means of rectangles whose widths represent class intervals and whose areas are proportional to the corre- sponding frequencies. The intervals are placed together in order to show that the data represented by the his- togram, while exclusive, is also con- tinuous. (For example, in a his- togram it is possible to have two con- necting intervals of 10.5–20.5 and 20.5–33.5, but not two connecting intervals of 10.5–20.5 and 22.5–32.5. Empty intervals are represented as empty and not skipped.) Histograms are used to plot densi- ty of data, and often for density esti- mation: estimating the probability density function of the underlying variable. The total area of a his- togram used for probability density is always normalized to 1. Since the sum of the intervals on the x-axis is always 1, histograms are identical to relative frequency plots. Above are examples of ordinary and cumulative histograms of the same data. The data shown is a ran- dom sample of 10,000 points from a

T he histogram is one of the seven basic tools of quality control

used to summarize, display and ana- lyze process data. Karl Pearson, 1857–1936, introduced it as a way of showing the probability distribution of a continuous variable. The derivation of the word “his- togram” is uncertain. Sometimes it is said to be derived from the Greek “histos” meaning “anything set upright” (as the masts of a ship, the bar of a loom, or the vertical bars of a histogram); and “gramma,” i.e., 'drawing, record, writing. It is also said that Karl Pearson derived the name from “historical diagram.” A histogram consists of tabular fre- quencies, shown as adjacent rectan- gles, erected over discrete intervals, with an area equal to the frequency of the observations in the interval. The height of a rectangle is also equal to the frequency density of the

interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data. A histogram may also be normalized displaying rela- tive frequencies. It then shows the proportion of cases that fall into each of several categories, with the total area equaling 1. The categories are usually specified as consecutive, non-overlapping intervals of a vari- able. The categories (intervals) must be adjacent, and often are chosen to be of the same size. The rectangles of a histogram are drawn so that they touch each other to indicate that the original variable is continuous. The ordinary histogram shows the number of datum per unit interval so that the height of each bar is equal to the proportion of total data that falls into that category. The area under the curve represents the total number of data. This histogram

Ordinary histogram
Cumulative histogram
-4
-2
0
2
4
-4
-2
0
2
4
rnorm (1000)
rnorm (1000)
Figure 1. Both histograms use the same data, the difference is in how the data is presented.
Frequency
15000
500
1000
2000
Frequency
2000 40000
1000080006000

qualitycontrol

-6
-4
-2
0
2
4
Figure 2. Bimodal Histogram.
Frequency
500 10000
1500

normal distribution with a mean of 0 and a standard deviation of 1.

SHAPE OR FORM OF A DISTRIBUTION

The shape of a histogram provides important information about the data distribution. The histogram is may be highly or moderately skewed to the left or right. A symmetrical shape is also possible, although a histogram is never perfectly symmet- rical. If the histogram is skewed to the left, or negatively skewed, the tail extends further to the left. The mode of a distribution is that value which is most frequently occurring or has the largest probabil- ity of occurrence. The sample mode occurs at the peak of the histogram. For many phenomena, it is quite common for the distribution of the response values to cluster around a single mode (unimodal) and then distribute themselves with lesser fre- quency out into the tails. The normal distribution is the classic example of a unimodal distribution. The histogram shown in Figure 2 illustrates data from a bimodal (2 peak) distribution. The histogram serves as a tool for diagnosing prob- lems such as bimodality. Questioning the underlying reason for distributional non-unimodality frequently leads to greater insight and improved deterministic model-

ing of the phenomenon under study.

data presented

above,

caused by a lack of uniformity in the

data.

of

skewed to the left might be the rela-

tive frequency of exam

70 percent

and only a few low scores occur. An example for a distribution skewed to the right or positively skewed is a his-

of

is

For example,

for

the

the

bimodal

histogram

a

An

the

example

scores

distribution

scores. Most

are above

togram showing the relative frequen-

A relatively

homes

create the skeweness to the right. The

small number of

cy

of housing

values.

expensive

tail extends further to the right. The shape of a symmetrical distribution mirrors the skeweness of the left or right tail. For example, the his- togram of data for IQ scores. Histograms can be unimodal, bi- modal or multi-modal, depending on the dataset. A truncated histogram ends abruptly at one end, which indicates possible sorting or inspection of non-conforming parts. This may also mean that part of the distribution has been removed by screening, 100 % inspection or review. Such prac- tices are usually costly and are good candidates for improvement efforts.

Plateau Histograms. A nearly flat or plateau-like histogram often means that the process is not well defined or understood by those doing the work or inspection. Since individu- als run the process in different ways, there are a great many different measurements and none that stand out. The solution is to more clearly define the process and/or piece part parameters. The plateau might be called a “multimodal distribution.” Several processes with normal distributions are combined. Because there are many peaks close together, the top of the distribution resembles a plateau.

Number of cells and width. There is

Positive Skewed
Skewed Histogram
Negative Skewed
Figure 3. Skewed Histograms.
Platykurtic
Leptokurtic
Figure 6. Illustration of Kurtosis.

probability dis- tribution of a real-valued ran- dom variable. In a similar way to

the statistical value is positive. Leptokurtic distributions have high- er peaks around the mean compared to normal distributions. The Japanese scientist, Genechi Taguchi, argued that the goal of manufactur- ing should not be to simply produce product within the specification, but rather the goal should be to pro- duce product as close to nominal as possible. He argued that any devia- tion from nominal has a cost. There isn’t space in this column to fully explain this idea—suffice to say that a leptokurtic distribution will produce superior product. There is a greater difference between a part produced near the statistical design limit in a process producing a platykurtic distribution and one with a leptokurtic distribution. The Taguchi Principle is the basic upon which six-sigma theory and practice are based.

BIO

Leslie W. Flott, Ph.B., CQE, ASQ Fellow, is certified as an IDEM Wastewater Treatment Operator and Indiana Wastewater Treatment Operator. He received his Bachelor of Science Degree in Chemistry from Northwestern University and his Masters Degree in materials engineering from Notre Dame University. Most recently, Flott served as the environmental program director and instructor at Ivy Tech Community College. Prior to that, he was the health, environment, and safety manager at Wayne Metal Protection Company.

qualitycontrol

Figure 4. Truncated, or cliff-like, Histogram.
Figure 5. Plateau-like Histogram.

no “best” number of cells, and dif- ferent cell sizes can reveal different features of the data. Some theoreti- cians have attempted to determine an optimal number of cells, but these methods generally make strong assumptions about the shape of the distribution. Depending on the actual data distribution and the goals of the analysis, different cell widths may be appropriate, so exper- imentation is usually needed to determine an appropriate width. There are, however, various useful guidelines and rules of thumb. Most engineers favor setting the number of cells somewhere between 11 and 17, but always an odd num- ber. The later point is important so that the mid-point of the distribu-

tion is not split between two cells. It

is also a good rule, when using meas-

urement data, to set the cell limits a point halfway between the number of decimal points of the most precise data. Consider what happens where a

cell is 4 to 8 and the next cell 8 to 12.

A reading of 8 could fall in either cell,

hence the rule.

Kurtosis. In probability theory and statistics, kurtosis is derived from the Greek word meaning bulging is any measure of the “peakedness” of the

the concept of skewness, kurto- sis is a descriptor of the shape of a probability distribution and, just as for skewness, there are different ways of quantifying it for a theoretical dis- tribution and corresponding ways of estimating it from a sample from a population. One math-based common measure of kurtosis, originating with Karl Pearson, is based on a scaled version of the fourth moment of the data or population, but it has been argued that this measure really measures heavy tails, and not peakedness. For this measure, higher kurtosis means more of the variance is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations. It is common practice to use an adjusted version of Pearson’s kurtosis, the excess kurtosis, to pro- vide a comparison of the shape of a given distribution to that of the nor- mal distribution. Distributions with negative or positive excess kurtosis are called platykurtic or leptokurtic distributions, respectively. When a curve, or histogram, is compared to a normal distribution, a platykurtic data set has a flatter peak around its mean, which causes thin tails within the distribution. Leptokurtic is a description of the kurtosis in a distribution in which