Вы находитесь на странице: 1из 54

# Statistical Methods

Descriptive Statistics

Inferential Statistics

Parametric

Non-parametric

Types of data:
Data

Categorical

Numerical

Discrete

Continuous

Yes / No

## Ex: How many books do you have in your library?

Number

Centimeters or Inches

Scale of measurement

Nominal

Ordinal

Interval

Ratio

No order, distance,

## Order, distance, and unique origin

Ex: Gender: Male/Female What is your work place? Production Sales Finance Personnel

Ex: rank the following personal computers w.r t usage in your office. IBM/AT IBM/XT Apple II Macintosh Compaq

## Classification of data: A process of arranging data in groups.

Complexities of the data Comparisons and drawing inferences from the data. Mutual relationship among elements of a data set.

## Basis of Classification: Geographical Chronological Qualitative Quantitative

Construction of frequency distribution: No. of non-overlapping intervals. Width of class intervals. Class limits for each class interval to avoid overlapping.

Ex: Raw data pertaining to total time (in hours) worked by machinists: 94 88 93 89 93 84 88 94 93 89 93 84 90 94 91 94 93 93 92 92 85 88 88 91 87 94 89 85 90 95

Ex: Following is the increase of D.A in the salaries of employees of a firm at the following rates: Rs. 250 for the salary range up to Rs. 4749 Rs. 260 for the salary range from Rs. 4750 Rs. 270 for the salary range from Rs. 4950 Rs. 280 for the salary range from Rs. 5150 Rs. 290 for the salary range from Rs. 5350

5422 4714 5182 5342 4835 4719 5234 5035 5085 5482 4673 5335 4888 4769 5092 4735 5542 5058 4730 4930 4978 4822 4686 4730 5429 5545 5345 5250 5375 5542 5585 4749

No increase of D.A for salary of Rs.5500 or more. What will be the additional amount required to be paid by the firm in a year which has 32 employees with the following salaries (in Rs)?

## Bi-variate frequency distribution

The following figures indicate income (X) and % expenditure on food (Y) of 25 families. Construct a bivariate frequency table classifying X into intervals 200-300, 300-400,etc. and Y into intervals 10-15, 15-20,etc.

Income % expenditure Income % expenditure Income % expenditure 550 623 310 420 600 225 310 640 512 690 12 14 18 16 15 25 26 20 18 12 680 300 425 555 325 202 255 492 587 643 13 25 16 15 23 29 27 18 21 19 689 523 317 384 400 11 12 18 17 19

Tabulation of data: It is the way of summarizing and presenting the data in rows and columns in a systematic manner.

## Statistical comparisons Reference

Parts of a table: Table number Title of the table Caption & Stubs Body

## Table 1 Total Retailing: Retail Sales by Distribution Channel 1998-2002

1998 High Street Retailing Home shopping Internet Retailing Direct Selling Total Retail Sales Source: Euromonitor (2003) 37, 140 1, 799, 467 78, 335 10, 644 1, 673, 348

Source note :

## Table 10.1 Candidates interviewed for employment in a company

Candidates profile

Number of candidates

Total

## Males Experienced Inexperienced Total 35 10 45

Females 15 60 75 50 70 120

Table 10.1 Candidates interviewed for employment in a company Candidates profile Number of candidates Males Married Experienced Inexperienced Total 15 2 17 Unmarried 20 8 28 Females Married 5 10 15 Unmarried 10 50 60 50 70 120 Total

Ex: Draw a blank table to show the number of candidates sex-wise appearing in the pre-university, first year, second year, and third year examinations of a university in the faculties of Arts, Science, and Commerce in a certain year.

Ex: In 1994, out of a total of 1950 workers of a factory, 1400 were members of a trade union. The number of women employed was 400 of which 275 did not belong to a trade union. In 1999, the number of union workers increased to 1780 of which 1490 were men. On the other hand, the number of non-union workers fell to 408 of which 280 were men. In year 2004, there were 2000 employees who belonged to a trade union and 250 did not belong to a trade union. Of all the employees in 2004, 500 were women of whom only 208 did not belong to a trade union. Present this information in suitable form.

## Summarizing Qualitative Data

Frequency Distribution Relative Frequency Percent Frequency Distribution Bar Graph Pie Chart

Frequency Distribution

A frequency distribution is a tabular summary of data showing the frequency (or number) of items in each of several nonoverlapping classes.

## Example: Quality Inn

Guests staying at Quality Inn were asked to rate the quality of their accommodations as being excellent, above average, average, below average, or poor. The ratings provided by a sample of 20 guests are shown below. Below Average Above Average Above Average Average Above Average Average Above Average Average Above Average Above Average Below Average Below Average Poor Poor Excellent Above Average Above Average Average Average Above Average

Frequency Distribution Rating Poor 2 Below Average Average 5 Above Average Excellent 1 Total 20 Frequency 3 9

## Relative Frequency Distribution

The relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class. A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class.

## Percent Frequency Distribution

The percent frequency of a class is the relative frequency multiplied by 100. A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class.

## Percent Frequency 10 15 25 45 5 100

Bar Graph
A bar graph is a graphical device for depicting qualitative data that have been summarized in a frequency, relative frequency, or percent frequency distribution. On the horizontal axis we specify the labels that are used for each of the classes. A frequency, relative frequency, or percent frequency scale can be used for the vertical axis. Using a bar of fixed width drawn above each class label, we extend the height appropriately. The bars are separated to emphasize the fact that each class is a separate category.

Bar diagram:
9 8

Frequency

## 7 6 5 4 3 2 1 Poor Below Average Above Excellent Average Average Rating

Pie Chart
The pie chart is a commonly used graphical device for presenting relative frequency distributions for qualitative data. First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class. Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90 degrees of the circle.

Pie chart:
Exc. Poor 5% 10% Above Average 45% Below Average 15% Average 25%

Quality Ratings

## Insights Gained from the Pie Chart

One-half of the customers surveyed gave Quality Inn a quality rating of above average or excellent. For each customer who gave an excellent rating, there were two customers who gave a poor rating.

The manager of Bajaj Auto would like to get a better picture of the distribution of costs for engine tune-up parts. A sample of 50 customer invoices has been taken and the costs of parts are listed.

91 78 71 69 1 0 47 4 85 97 62 82

93 57 75 52 99 72 89 66 75 79 6 2 6 8 9 7 1 0 57 7 88 68 83 68 71 9 8 1 0 17 9 1 0 57 9

80 75 65 69 69

97 6 72 7 8010 67 7 62 7

Frequency Distribution
Guidelines for Selecting Number of Classes
Use between 5 and 20 classes. Data sets with a larger number of elements usually require a larger number of classes. Smaller data sets usually require fewer classes.

## Guidelines for Selecting Width of Classes

Use classes of equal width. Approximate Class Width =
Largest Data Value Smallest Data Value Number of Classes

Frequency Distribution If we choose six classes: Approximate Class Width = (109 - 52)/6 = 9.5 10 Cost (Rs) Frequency 50-59 2 60-69 13 70-79 16 80-89 7 90-99 7 100-109 5 Total 50

Relative Frequency and Percent Frequency Distributions Relative Percent Cost (Rs) Frequency Frequency 50-59 .04 4 60-69 .26 26 70-79 .32 32 80-89 .14 14 90-99 .14 14 100-109 .10 10 Total 1.00 100

## Insights Gained from the Percent Frequency Distribution

Only 4% of the parts costs are in the Rs 50-59 class. 30% of the parts costs are under Rs70. The greatest percentage (32% or almost one-third) of the parts costs are in the Rs70-79 class. 10% of the parts costs are Rs100 or more.

Histogram
Another common graphical presentation of quantitative data is a histogram. The variable of interest is placed on the horizontal axis and the frequency, relative frequency, or percent frequency is placed on the vertical axis. A rectangle is drawn above each class interval with its height corresponding to the intervals frequency, relative frequency, or percent frequency. Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes.

Histogram
18 16 14 12 10 8 6 4 2 50 60 70 80 90 100

Frequency

110

## Parts Cost (\$)

Cumulative Distribution
The cumulative frequency distribution shows the number of items with values less than or equal to the upper limit of each class. The cumulative relative frequency distribution shows the proportion of items with values less than or equal to the upper limit of each class. The cumulative percent frequency distribution shows the percentage of items with values less than or equal to the upper limit of each class.

Cumulative Distributions Cumulative Cumulative Cumulative Relative Cost (Rs) Frequency Frequency < 59 2 .04 < 69 15 .30 < 79 31 .62 < 89 38 .76 < 99 45 .90 < 109 50 1.00 Percent Frequency 4 30 62 76 90 100

Ogive
An ogive is a graph of a cumulative distribution. The data values are shown on the horizontal axis. Shown on the vertical axis are the:
cumulative frequencies, or cumulative relative frequencies, or cumulative percent frequencies

The frequency (one of the above) of each class is plotted as a point. The plotted points are connected by straight lines.

Ogive
Because the class limits for the parts-cost data are 50-59, 60-69, and so on, there appear to be oneunit gaps from 59 to 60, 69 to 70, and so on. These gaps are eliminated by plotting points halfway between the class limits. Thus, 59.5 is used for the 50-59 class, 69.5 is used for the 60-69 class, and so on.

## Parts Cost (\$)

Ex: The Nielson Home Technology Report provided information about home technology and its usage by persons aged 12 and older. The following data are the hours of personal computer usage during one week for a sample of 50 persons. 4.1 1.5 10.4 5.9 3.4 5.7 1.6 6.1 3.0 3.7 3.1 4.8 2.0 14.8 5.4 4.2 3.9 4.1 11.1 3.5 4.1 4.1 8.8 5.6 4.3 4.3 7.1 10.3 6.2 7.6 10.8 2.8 9.5 12.9 12.1 0.7 4.0 9.2 4.4 5.7 7.2 6.1 5.7 5.9 4.7 3.9 3.7 3.1 6.1 3.1

Summarize the data by constructing the following : A frequency distribution ( use a class width of 3 hours) A relative frequency distribution A histogram An ogive

Ex: In alphabetical order, the six most common last names in the USA are Brown, Davis, Johnson, Smith, and Williams. Assume that a sample of 50 individuals with one of these last names provided the following data.

Brown Smith Davis Jhonson williams williams Jhonson Jones Davis Jones

williams Jones Smith Smith Davis Jhonson Smith Jones Jones Jhonson

williams Smith Jhonson Jones Smith Smith williams Brown williams Jhonson

williams Jhonson williams Smith Davis Jhonson Brown Smith Jhonson Brown

Jhonson Brown Jones Davis Smith Davis Jhonson Brown Jhonson Brown

Summarize the data by constructing the following: Relative and percent frequency distributions A bar graph A pie chart Based on these data, what are the three most common last names?