1 Introduction To Statistics

Introduction to Statistics
Learning Objectives
In this chapter, you will learn:
• What is Statistics
• Why Statistics
• Basic vocabulary used in Statistics
• How statistics is used in Business
• The sources of data and its types used in Business
• Types of Variables
• Level of Management
• Tabular and Graphical Presentation of Data
What is Statistics?
The science of collecting, describing, and interpreting data.
“Statistics is a way to get information from data”
Statistics
Data Information
Data: Facts, especially Information: Knowledge
numerical facts, collected communicated
together for reference or concerning some
information. particular fact.
Statistics is a tool for creating new understanding from a set of numbers.

What is statistics?
• The word “statistics” is used in 3 main ways:
– Common meaning : factual information involving
numbers. A better word for this is data
– Precise meaning: quantities which have been
derived from sample data, e.g. the mean (or
average) of a data set
– Common meaning: an academic subject which
involves reasoning about statistical quantities
Why Study Statistics?
Decision Makers Use Statistics To:
• Present and describe business data and information

properly
• Draw conclusions about large populations, using
information collected from samples
• Make reliable forecasts about a business activity
• Improve business processes
Types of Statistics
Descriptive Statistics
• Descriptive Statistics is that branch of Statistics
that summarizes, presents and analyzes the great
bodies of statistical data for describing their
salient features.
• If a business analyst is using data gathered on a
group to describe or reach conclusion about the
same group the statistics is called descriptive.
• Descriptive statistic includes methods of
organizing, summarizing, analyzing, and
presenting data in an informative way.
Descriptive Statistics
 Collect data
 ex. Survey
 Present data
 ex. Tables and graphs
 Characterize data
 ex. Sample mean =  X i
n
 Collect
 Organize
 Summarize
 Display
 Analyze
Inferential Statistics
• Another facet of statistics is inferential statistics-

also called statistical inference and inductive
statistics.
• If a researcher gather data from sample and uses
statistics generated to reach conclusion about
population from which sample was taken.
• Statistical inference is that branch of Statistics
that deals with drawing valid inferences about
the population parameters on the basis of sample
data along with an associated degree of their
reliability.
Inferential Statistics
 Estimation
 ex. Estimate the population
mean weight using the sample
mean weight
 Hypothesis testing
 ex. Test the claim that the  Predict and forecast values
population mean weight is 120 of population parameters
pounds  Test hypotheses about
values of population
parameters
 Make decisions
Drawing conclusions and/or making decisions concerning a population
based on sample results.
Basic Vocabulary of Statistics
Population
A population consists of all the items or individuals about
which you want to draw a conclusion.
A population is the group of all items of interest to a
statistics practitioner.
 frequently very large; sometimes infinite.
E.g. All 1.252 Billion Indian population i.e. census data.
Sample
 A subset of the population.
 A sample is a set of data drawn from the population.
 Potentially very large, but less than the population.
E.g. a sample of 765 voters exit polled on election day
Population
Sample
Subset
Parameter Statistic
Measures used to describe Measures computed

the population are called from sample data are
parameters called statistics
Variable
 A variable is some characteristic of a population or sample.
E.g. student grades. Typically denoted with a capital letter: A,
B, C…
 The values of the variable are the range of possible values for
a variable.
E.g. student marks (0..100)
Data
 Data are the observed values of a variable.
 Data are the different values associated with a variable.
E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
Example
NSB dean is interested in learning about the average age of PGDM (E)
students. Identify the basic terms in this situation.
The population is the age of all PGDM (E) students at the Institute.
A sample is any subset of that population. For example, we might
select 10 PGDM (E) students and determine their age.
The variable is the “age” of each PGDM (E) students.
The data would be the set of values in the sample.
The parameter of interest is the “average” age of all PGDM (E)
students at the Institute.
The statistic is the “average” age for all PGDM (E) students in the
sample.
Why Collect Data?
• A marketing research analyst needs to assess the
effectiveness of a new television advertisement.
• A pharmaceutical manufacturer needs to determine whether
a new drug is more effective than those currently in use.
• An operations manager wants to monitor a manufacturing
process to find out whether the quality of product being
manufactured is conforming to company standards.
• A power company collect data to predict Electricity prices
and optimizing operations.
Sources of Data
 Primary Sources:
The data collector is the one using the data for analysis
 Data from a political survey
 Data collected from an experiment
 Observed data
 Secondary Sources
The person performing data analysis is not the data collector
 Analyzing census data
 Examining data from print journals or data published on
the internet.
Types of Variables
Data
Categorical Numerical
Examples:
 Marital Status
Discrete Continuous
 Political Party
 Eye Color
Examples: Examples:
(Defined categories)
 Number of Children  Weight
 Defects per hour  Voltage
(Counted items) (Measured characteristics)

Types of Variables
Categorical
 Qualitative variables have values that can only be placed into
categories, such as “yes” and “no.”
 A variable that categorizes or describes an element of a
population.
Note: Arithmetic operations, such as addition and averaging, are not
meaningful for data resulting from a qualitative variable
Numerical
 Quantitative variables have values that represent quantities.
 A variable that quantifies an element of a population.
Note: Arithmetic operations such as addition and averaging, are
meaningful for data resulting from a quantitative variable.
Example
Identify each of the following examples as attribute (qualitative) or
numerical (quantitative) variables.
 The amount of CNG pumped by the next 10 customers at the local

hp PUMP . (Numerical)
 The amount of radon in the basement of each of 25 homes in a
new development. (Numerical)
 The color of the baseball cap worn by each of 20 students.
(Attribute)
 The length of time to complete a mathematics homework
assignment. (Numerical)
 The state in which each truck is registered when stopped and
inspected at a weigh station. (Attribute)
Question?
Identify each of the following as examples of qualitative or
quantitative variables:
The temperature in Barrow, Alaska at 12:00 pm on any
given day.
The make of automobile driven by each faculty member.
Whether or not a 6 volt lantern battery is defective.
The weight of a lead pencil.
The length of time billed for a long distance telephone call.
The brand of cereal children eat for breakfast.
The type of book taken out of the library by an adult.
Level of Measurement
Ratio
Interval
Ordinal
Nominal NOIR
Nominal scale
A nominal scale classifies data into distinct categories in
which no ranking is implied.
Categorical Variables Categories

Personal Computer Yes / No
Ownership
Type of Stocks Growth, Value, Other

Owned
Internet Provider Microsoft Network /

AOL
Ordinal scale
An ordinal scale classifies data into distinct

categories in which ranking is implied
Categorical Variable Ordered Categories
Student class designation Freshman, Junior, Senior
Product satisfaction Satisfied, Neutral, Unsatisfied
Faculty rank Professor, Associate Professor,

Assistant Professor, Instructor
Standard & Poor’s bond ratings AAA, AA, A, BBB, BB, B, CCC, CC,
C, DDD, DD, D
Student Grades A, B, C, D, F
Example of Ordinal Measurement
1 f
6 i
2 n
4 i
3 s
h
5
Interval scale
• Distances between consecutive integers are equal
– Relative magnitude of numbers is meaningful
– Differences between numbers are comparable
– Location of origin, zero, is arbitrary
Example:
the difference between 1 and 2 years of age is the
same amount as the difference between 21 and 22
years of age, or 50 and 51, or 65 and 66.
the difference between a height of 60 inches and a
height of 55 inches is the same amount of difference
as a height of 72 inches and a height of 67 inches.
Ratio Level Data
• Highest level of measurement
– Relative magnitude of numbers is meaningful
– Differences between numbers are comparable
– Location of origin, zero, is absolute (natural)
Examples: Height, Weight, and Volume
Example: Monetary Variables, such as Profit and Loss,
Revenues, and Expenses
Example: Financial ratios, such as P/E Ratio, Inventory
Turnover, and Quick Ratio.
Example
The Hierarchy of Levels
Nominal
Nominal Attributes are only named; weakest

Ordinal

Ordinal Attributes can be ordered

Interval

Interval Distance is meaningful

Ratio

Ratio Absolute zero

Level of Measurement :
Characteristics
Level of Measurement:
Statistical Tests
Example
Identify each of the following as examples of (1) nominal, (2)
ordinal, (3) discrete, or (4) continuous variables:
 The length of time until a pain reliever begins to work.
 The number of chocolate chips in a cookie.
 The number of colors used in a statistics textbook.
 The brand of refrigerator in a home.
 The overall satisfaction rating of a new car.
 The number of files on a computer’s hard disk.
 The pH level of the water in a swimming pool.
 The number of staples in a stapler.
Class Exercise
Q 1: Determine whether the variable is categorical
or numerical If numerical, determine whether the
variable is discrete or continuous .Determine the
level of measurement
Amount of money spent on clothing in past
month?
Favorite department store?
Most likely time period during which shopping for
clothing takes place?
Number of pairs of shoes owned?
Class Exercise
Q 2: A manufacturer of dog food was planning to
survey household in India to determine purchasing
habit of dog owners. Among the variables to be
collected are
The primary place of purchase of dog food?

Whether dry or moist food can be purchased ?
Number of dogs living in the household?
Whether the dog is pedigreed?
Class Exercise
Q3 : Suppose the following information collected from
Mr X on his application for a home loan at the HDFC
bank Loan department
a. Monthly payment : Rs 25100
b. Annual Family income:
c. Marital status: Married
d. No of job changed in past 10 years: 2
Classify each of the response by type of data and level of

measurement.
Organizing and Visualizing
Categorical and Numerical Data
Categorical Data Are Organized By
Utilizing Tables
Categorical Data
Tallying Data
One Categorical Two Categorical

Variable Variables
Summary Table Contingency Table

Organizing Categorical Data:
Summary Table
A summary table indicates the frequency, amount, or
percentage of items in a set of categories so that you can
see differences between categories.
How do you spend the holidays? Percent

At home with family 45%
Travel to visit family 38%
Vacation 5%
Catching up on work 5%
Other 7%
Contingency Table
Used to study patterns that may exist

between the responses of two or more
categorical variables
Cross tabulates or tallies jointly the responses
of the categorical variables
For two variables the tallies for one variable
are located in the rows and the tallies for the
second variable are located in the columns
Contingency Table - Example
 A random sample of 400 Contingency Table Showing

invoices is drawn. Frequency of Invoices Categorized
 Each invoice is categorized as a By Size and The Presence Of Errors
small, medium, or large No
amount. Errors Errors Total
 Each invoice is also examined to Small 170 20 190

Amount
identify if there are any errors.
Medium 100 40 140
 This data are then organized in Amount
the contingency table to the Large 65 5 70
right. Amount
335 65 400
Total
Contingency Table Based on
% of Overall Total
No
42.50% = 170 / 400
Errors Errors Total
25.00% = 100 / 400
Small 170 20 190 16.25% = 65 / 400
Amount
Medium 100 40 140
Amount No
Large Errors Errors Total
65 5 70
Amount Small 42.50% 5.00% 47.50%
Amount
335 65 400
Total Medium 25.00% 10.00% 35.00%
Amount
83.75% of sampled invoices have no
errors and 47.50% of sampled invoices Large 16.25% 1.25% 17.50%
are for small amounts. Amount
83.75% 16.25% 100.0%
Total
% of Row Totals
No
89.47% = 170 / 190
Errors Errors Total
71.43% = 100 / 140
Small 170 20 190 92.86% = 65 / 70
Amount
Medium 100 40 140
Amount No
65 5 70
Amount Small 89.47% 10.53% 100.0%
Amount
335 65 400
Total Medium 71.43% 28.57% 100.0%
Amount
Medium invoices have a larger chance
(28.57%) of having errors than small Large 92.86% 7.14% 100.0%
(10.53%) or large (7.14%) invoices. Amount
83.75% 16.25% 100.0%
Total
Percentage of Column Total
No
Errors Errors Total 50.75% = 170 / 335
30.77% = 20 / 65
Small 170 20 190
Amount
Medium 100 40 140
Amount No
65 5 70
Amount Small 50.75% 30.77% 47.50%
Amount
335 65 400
Total Medium 29.85% 61.54% 35.00%
Amount
There is a 61.54% chance that invoices Large 19.40% 7.69% 17.50%
with errors are of medium size. Amount
100.0% 100.0% 100.0%
Total
Tables Used For Organizing
Numerical Data
Numerical Data
Frequency Cumulative
Ordered Array
Distributions Distributions
Organizing Numerical Data:
Ordered Array
An ordered array is a sequence of data, in rank order,
from the smallest value to the largest value.
Day Students
16 17 17 18 18 18
Age of
Surveyed 19 19 20 20 21 22
College 22 25 27 32 38 42
Students
Night Students
18 18 19 19 20 21
23 28 32 33 41 45
Frequency Distribution
 The frequency distribution is a summary table in which the
data are arranged into numerically ordered class groupings.
 You must give attention to selecting the appropriate number
of class groupings for the table, determining a suitable width
of a class grouping, and establishing the boundaries of each
class grouping to avoid overlapping.
 The number of classes depends on the number of values in
the data. With a larger number of values, typically there are
more classes. In general, a frequency distribution should
have at least 5 but no more than 15 classes.
 To determine the width of a class interval, you divide the
range (Highest value–Lowest value) of the data by the
number of class groupings desired.
Frequency Distribution Example
Example: A manufacturer of insulation randomly

selects 20 winter days and records the daily
high temperature
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41,
43, 44, 27, 53, 27
STEPS
1. Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43,
44, 46, 53, 58
2. Find range: 58 - 12 = 46
3. Select number of classes: 5 (usually between 5 and 15)
4. Compute class interval (width): 10 (46/5 then round up)
5. Determine class boundaries (limits):
1. Class 1: 10 to less than 20
6. Compute class midpoints: 15, 25, 35, 45, 55
7. Count observations & assign to classes
Frequency Distribution Example
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Class Midpoints Frequency
10 but less than 20 15 3

Total 20
Relative & Percent Frequency
Distribution
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Relative
Class Frequency Percentage
Frequency
10 but less than 20 3 .15 15
20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100
Cumulative Frequency
Distribution
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Cumulative Cumulative
Frequency Percentage
10 but less than 20 3 15% 3 15%

20 but less than 30 6 30% 9 45%
30 but less than 40 5 25% 14 70%
40 but less than 50 4 20% 18 90%
50 but less than 60 2 10% 20 100%
Total 20 100 20 100%
Why Use a Frequency Distribution?
 It condenses the raw data into a more useful form

 It allows for a quick visual interpretation of the data
 It enables the determination of the major
characteristics of the data set including where the data
are concentrated / clustered
Frequency Distributions:
Some Tips
 Different class boundaries may provide different
pictures for the same data (especially for smaller
data sets)
 Shifts in data concentration may show up when
different class boundaries are chosen
 As the size of the data set increases, the impact of
alterations in the selection of class boundaries is
greatly reduced
 When comparing two or more groups with different
sample sizes, you must use either a relative
frequency or a percentage distribution
Visualizing Categorical Data
Through Graphical Displays
Categorical Data
Visualizing Data
Summary Table Contingency

For One Variable Table For Two
Variables
Bar Pareto Side By Side Bar

Chart Chart Chart
Pie Chart
Summary Table
In a bar chart, a bar shows each category, the length
of which represents the amount, frequency or
percentage of values falling into a category.
How Do You Spend the Holidays?
Other 7%
Catching up on… 5%
Vacation 5%
Travel to visit… 38%
At home with… 45%
0% 10% 20% 30% 40% 50%

Pie Chart
The pie chart is a circle broken up into slices that
represent categories. The size of each slice of the pie
varies according to the percentage in each category.
How Do You Spend the Holiday's
7% At home with family
5%
5% 45%
Travel to visit family
Vacation
38%
Catching up on work
Other
Pareto Diagram
 Used to portray categorical data

 A bar chart, where categories are shown in
descending order of frequency
 A cumulative polygon is shown in the same graph
 Used to separate the “vital few” from the “trivial
many”
Pareto Diagram
Current Investment Portfolio

45% 100%
% invested in each category (bar
40% 90%
35% 80%
cumulative % invested
70%
30%
60%
(line graph)
25%
graph)
50%
20%
40%
15%
30%
10% 20%
5% 10%
0% 0%
Stocks Bonds Savings CD
Plot the Ogive
Rel Cum Cum rwl

Interval Frequency frequency % frequency fre Cum fre %
10-19 5 0.125 12.5 5 0.125 12.5
20-29 7 0.175 17.5 12 0.3 30
30-39 12 0.3 30 24 0.6 60
40-49 10 0.25 25 30 0.75 75
50-59 6 0.15 15 40 1 100
40
Visualizing Categorical Data:
Side By Side Bar Charts
The side by side bar chart represents the data from a contingency table.
No
Errors Errors Total
Invoice Size Split Out By Errors & No
Small 50.75% 30.77% 47.50% Errors
Amount
Medium 29.85% 61.54% 35.00% Errors
Amount
No Errors
Large 19.40% 7.69% 17.50%
Amount
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0%
100.0% 100.0% 100.0% Large Medium Small
Total
Invoices with errors are much more likely to be of

medium size (61.54% vs 30.77% and 7.69%)
Visualizing Numerical Data By
Using Graphical Displays
Numerical Data
Frequency Distributions and

Ordered Array Cumulative Distributions
Stem-and-Leaf
Display Histogram Polygon Ogive
Stem and Leaf Display
 A stem-and-leaf display organizes data into groups
(called stems) so that the values within each group
(the leaves) branch out to the right on each row.
Age of Day Students Age of College Students
Survey
16 17 17 18 18 18 Day Students Night Students
ed
College
19 19 20 20 21 22 Stem Leaf Stem Leaf
Studen 1 67788899 1 8899
ts 22 25 27 32 38 42
2 0012257 2 0138
Night Students 3 28 3 23
18 18 19 19 20 21 4 2
4 15
23 28 32 33 41 45
Stem and Leaf Display
A stem-and-leaf display organizes data into groups
(called stems) so that the values within each group
(the leaves) branch out to the right on each row.
Age of College Students
Day Students Night Students
Stem Leaf Stem Leaf
1 67788899 1 8899
2 0012257 2 0138
3 28 3 23
4 2 4 15
Visualizing Numerical Data:
The Histogram
 A graph of the data in a frequency distribution is
called a histogram.
 In a histogram there are no gaps between adjacent
bars.
 The class boundaries (or class midpoints) are shown
on the horizontal axis.
 The vertical axis is either frequency, relative
frequency, or percentage.
 Bars of the appropriate heights are used to represent
the number of observations within each class.
The Histogram
Relative
Class Frequency Frequency Percentage
10 but less than 20 3 .15 15

20 but less than 30 6 .30 30 Histogram: Daily High
30 but less than 40 5 .25 25 10 Temperature
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100
Frequency
5
0
5 15 25 35 45 55 More
The Polygon
 A percentage polygon is formed by having the

midpoint of each class represent the data in that class
and then connecting the sequence of midpoints at
their respective class percentages.
 The cumulative percentage polygon, or ogive,
displays the variable of interest along the X axis, and
the cumulative percentages along the Y axis.
 Useful when there are two or more groups to
compare
The Frequency Polygon
Relative
Frequency
10 but less than 20 3 .15 15

20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100
10 Frequency Polygon: Daily High
Temperature
Frequency
5
(In a percentage polygon
the vertical axis would be
defined to show the 0
percentage of observations 5 15 25 35 45 55 More
per class)
The Cumulative Percentage Polygon
Class Lower % Less Than
Boundary Lower
Boundary
10<20 10 0
20<30 20 15
30<40 30 45 Ogive: Daily High Temperature
Cumulative Percentage
40<50 40 70
100
50<60 50 90
60 100
50
0
10 20 30 40 50 60
Scatter Plots
Scatter plots are used for numerical data

consisting of paired observations taken from
two numerical variables
One variable is measured on the vertical axis
and the other variable is measured on the
horizontal axis
Scatter plots are used to examine possible
relationships between two numerical
variables
Scatter Plot Example
Volume Cost per

per day day
23 125
Cost per Day vs. Production
26 140
250 Volume
29 146 Cost per Day
200
33 160 150
100
38 167 50
42 170 0
20 30 40 50 60 70
50 188 Volume per Day
55 195
60 200
Time Series
 A Time Series Plot is used to study patterns in the
values of a numeric variable over time
 The Time Series Plot:
Numeric variable is measured on the vertical axis and
the time period is measured on the horizontal axis
Attendance (in millions) at USA amusement/theme parks from 2000-2005
Year Year Number Attendance
2000 0 317
2001 1 319
2002 2 324
2003 3 322
2004 4 328
2005 5 335
Time Series Example
Attendance (in millions) at US Theme

336
Parks
332
Attendance
328
324
320
316
0 1 2 3 4 5 6
Year (Since 2000)

Principles of Excellent Graphs
 The graph should not distort the data.

 The graph should not contain unnecessary
adornments (sometimes referred to as chart junk).
 The scale on the vertical axis should begin at zero.
 All axes should be properly labeled.
 The graph should contain a title.
 The simplest possible graph should be used for a
given set of data.
Graphical Errors: Chart Junk
Bad Presentation  Good Presentation
Minimum Wage Minimum Wage

1960: $1.00
$
4
1970: $1.60
2
1980: $3.10
0
1990: $3.80 1960 1970 1980 1990
Graphical Errors:
No Relative Basis

A’s received by A’s received by
Freq. students. % students.
30%
300
200 20%
100 10%
0 0%
FR SO JR SR FR SO JR SR
FR = Freshmen, SO = Sophomore, JR = Junior, SR = Senior

Graphical Errors:
Compressing the Vertical Axis

Quarterly Sales Quarterly Sales
$ $
200 50
100 25
0 0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Class Exercise 1
The owner of the restaurant wanted to study the demand for
dessert. He decided that in addition to studying whether the desert
was ordered, he would also study the gender of individual. Data
were collected from 600 customers and organized in the following
contingency tables.
Gender
Dessert Ordered Male Female Total
Yes 40 96 136
No 240 224 464
Total 280 320 600
a.Construct a contingency tables for row, column and total percentage?
b.Which type of percentage (row, column and total ), do you think more
informative for each gender?
c.What conclusions concerning the pattern of dessert ordering can the
restaurant owner reach?
Class Exercise 2
The Following Table represents estimated green power sales
by renewable energy source 2008
Source Percentage
Geothermal 2.8
hydro 11.3
Landfill mass and biomass 28.1
Solar 0.2
Unreported 2.5
Wind 55.1
a. Construct a bar chart, pie chart and Pareto chart
b. What conclusion can you reach about the sources of green
power
Source: National renewable energy laboratory,2008
Class Exercise 3
Calculate the following ?
a. Divide the data into classes
b. Absolute frequency
c. Relative frequency
d. Percentages
e. Cumulative frequency
f. Cumulative percentage
g. Midpoints
h. Draw Histogram and relative frequency
polygon
THANKS

1 Introduction To Statistics

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

1 Introduction To Statistics

Загружено:

Авторское право:

Доступные форматы

Introduction to Statistics

The science of collecting, describing, and interpreting data.

“Statistics is a way to get information from data”

Statistics is a tool for creating new understanding from a set of numbers.

• Present and describe business data and information

• Another facet of statistics is inferential statistics-

Measures used to describe Measures computed

(Counted items) (Measured characteristics)

 The amount of CNG pumped by the next 10 customers at the local

Categorical Variables Categories

Type of Stocks Growth, Value, Other

Internet Provider Microsoft Network /

An ordinal scale classifies data into distinct

Categorical Variable Ordered Categories

Student class designation Freshman, Junior, Senior

Product satisfaction Satisfied, Neutral, Unsatisfied

Faculty rank Professor, Associate Professor,

Nominal Attributes are only named; weakest

Nominal Attributes are only named; weakest

Ordinal Attributes can be ordered

Nominal Attributes are only named; weakest

Nominal Attributes are only named; weakest

Interval Distance is meaningful

Ordinal Attributes can be ordered

Nominal Attributes are only named; weakest

Ordinal Attributes can be ordered

Nominal Attributes are only named; weakest

Ratio Absolute zero

Interval Distance is meaningful

Ordinal Attributes can be ordered

Nominal Attributes are only named; weakest

The primary place of purchase of dog food?

Classify each of the response by type of data and level of

One Categorical Two Categorical

Summary Table Contingency Table

How do you spend the holidays? Percent

Used to study patterns that may exist

 A random sample of 400 Contingency Table Showing

 Each invoice is also examined to Small 170 20 190

Example: A manufacturer of insulation randomly

Class Midpoints Frequency

10 but less than 20 15 3

10 but less than 20 3 15% 3 15%

 It condenses the raw data into a more useful form

Summary Table Contingency

Bar Pareto Side By Side Bar

0% 10% 20% 30% 40% 50%

 Used to portray categorical data

Current Investment Portfolio

Rel Cum Cum rwl

Invoices with errors are much more likely to be of

Frequency Distributions and

10 but less than 20 3 .15 15

 A percentage polygon is formed by having the

10 but less than 20 3 .15 15

Scatter plots are used for numerical data

Volume Cost per

Attendance (in millions) at US Theme

Year (Since 2000)

 The graph should not distort the data.

Bad Presentation  Good Presentation

Minimum Wage Minimum Wage