Data Analysis: - Describing Data and Datasets

Data Analysis
Describing data and datasets
Introduction to Data Analysis

and Decision Making
Making inferences from data and datasets

Searching for relationships in data and
datasets
Decision Making
Uncertainty
Optimization
Measuring uncertainty
Decision analysis with uncertainty
Modeling and simulation
Sensitivity Analysis
What is Management Science?

Logical, systematic approach to decision
making using quantitative methods.
Science Scientific methods used to
solve business related problems.
Goal for this class: logically approach and
solve many different problems.
Observation
Identify the problem
Problem does not imply that there is
something wrong with the process
Problem could imply need for
improvement
Management Science Approach

to Problem Solving
Observation
Definition of the Problem
Constructing the Model
Solving the Model/problem
Implementation of Solution
(process is never really complete)
Definition of the Problem

Clearly define problem
Prevents incorrect/inappropriate solution
Listing goals could be helpful
Constructing the Model

Represents the problem in abstract form
Schematic, scale, mathematical
relationship between variables (equation)
Ex: Income = Hours Worked * Pay
Model Solution
Same as solving the problem:
Ex:
Z = $20X 5X
subject to
4X = 100
Solution:
Components of the Model

Variable/Decision Variables
Independent
Dependent
Objective Function
Parameter
Constraints
Implementation of Solution
Solution aids us in making a decision but
does not constitute the actual decision
making.
X=25 Z = $375
Example
Msci Approach to Problem Solving
Blue Ridge Hot Tubs manufactures and sell hot tubs.

The company needs to decide how many hot tubs to
produce during the next production cycle. The company
buys prefabricated fiberglass hot tub shells from a local
supplier and adds pump and tubing to the shells to
create his hot tubs. The company has 200 pumps
available. Each hot tub requires 9 hours of labor. The
company expects to have 1,566 production labor hours
during the next production cycle. A profit of $350 will be
earned on each hot tub sold. The company is confident
that all of the hot tubs will sell. The question is, how
many should be produced if the company wants to
maximize profits during the next production cycle?
Problem: Determine # of hot tubs to produce

Definition: Maximize profit within the constraints
of the labor hours and materials available
Model: Max Z = $350X
subject to
9X 1,566 labor hours
Solution: X = 174; Z = 350(174) = $60,900
Implementation: Recommend making 174 hot
tubs
A Generic Mathematical Model
Categories of Mathematical Models
Y = f(X1, X2, , Xk)

Where:
Model
Category
Prescriptive
Form of f(.)
Predictive
f(.) = function defining the relationship between the Xi and Y
Descriptive
OR/MS
Techniques
known,
well-defined
known or under
decision makers
control
LP, Networks, IP,

CPM, EOQ, NLP,
GP, MOLP
unknown,
ill-defined
known or under
decision makers
control
Regression Analysis,
Time Series Analysis,
Discriminant Analysis
known,
well-defined
unknown or
uncertain
Simulation, PERT,
Queueing,
Inventory Models
Y = dependent variable (a bottom line performance measure)

Xi = independent variables (inputs having an impact on Y)
Independent
Variables
Example Spring Mills

280 observations
Three variables per observation
Relatively large dataset
RECEIVE.XLS
Background Information
Spring Mills produces and distributes a wide
variety of manufactured goods. It has a large
number of customers.
Spring Mills classifies these customers as
small, medium, or large, depending on the
volume of business each does with them.
Recently they have noticed a problem with
accounts receivable. They are not getting
paid by their customers in as timely a manner
as they would like. This obviously costs them
money.
Summary Measures for

Combined Data
Spring Mills has gathered data on 280

customer accounts.
For each of these accounts the data set lists
three variables:
Size - The size of the customer (coded 1 for
small, 2 for medium, 3 for large).
Days - The number of days since the customer
was billed.
Amount - The amount the customer owes.
What information can we obtain from this

data?
Scatterplot: Amount vs Days

All Customers

Medium Customers

Small Customers

Large Customers
Analysis -- continued
There is obviously a lot going on here and it is
evident form the charts. We point out the following:
there are considerably fewer large customers than
small or medium customers.
the large customers tend to owe considerably
more than small or medium customers.
the small customers do not tend to be as long
overdue as the large and medium customers.
there is no relationship between Days and Amount
for the small customers, but there is a definite
positive relationship between these variables for
the medium and large customers.
Findings
If Spring Mills really wants to decrease receivables, it
might want to target the medium-sized customer
group, from which it is losing the most interest.
Or it could target the large customers because they
owe the most on average.
The most appropriate action depends on the cost
and effectiveness of targeting any particular
customer group. However, the analysis presented
here gives the company a much better picture of
whats currently going on.
Modeling and Models

Graphical models
Algebraic models
Spreadsheet models
The Modeling Process
Define the problem

Collect and summarize data
Formulate a model
Verify the model
Select one or more suitable decisions
Present the results to the organization
Implement the model and update through time
Descriptive vs Inferential
Statistics
Describing Data:
The Basics
Descriptive statistics:
The process of applying a method of analysis
to a set of data in order to better understand
the information contained within.
Inferential statistics:
Using a (sub)set of data (a sample) to predict
behavior of a larger set of data (the
population).
Population
Definition:
Set of existing units (usually people, objects,
transactions, or events); or
Every element in a group that is the subject of
interest
Depends upon the problem or situation
Examples:
College students, Honda Accords, cash sales
Population Parameters and Sample Statistics

A population parameter is number calculated from all
the population measurements that describes some
aspect of the population.
The population mean, denoted , is a population
parameter and is the average of the population
measurements.
A point estimate is a one-number estimate of the value
of a population parameter.
A sample statistic is number calculated using sample
measurements that describes some aspect of the
sample.
Measures of Central Tendency

Mean,
The average or expected value
Median, Md The middle point of the ordered

measurements
Mode, Mo
The Mean
Population X1, X2, , XN
Sample x1, x2, , xn
The most frequent value

Population Mean
Sample Mean
i =1
Relationships Among Mean,

Median and Mode
i =1
Variables
Definition:
Characteristic or property of an individual
population unit
Particular characteristics or properties may
vary among units in a population
Examples:
Starting salary of MBA college graduates
Price of peanut butter at grocery stores
Measurement
Definition:
The process of quantifying information
Quantitative variables:
Test scores, product and process
measurements, survey results, etc.
Qualitative variables:
Product rating, arbitrary scales, etc.
Statistical Inference
Definition:
Estimation, prediction, or other generalizations
about a population based on information
contained in a sample.
Example:
Based on a 5 year sample of similar weather
patterns, predicting the chance of rain today.
Sample
Definition:
Subset of the units of the population
Example:
100 GPAs from all finance majors
Tool wear on 3 machines out of 45 machines
Notes:
A random sample implies no statistical bias
A census includes all population members
Reliability of the Inference

Four items discussed thus far allow for
statistical inference:
A population, variable(s) of interest, a sample,
and an inference.
Fifth Item: A measure of the reliability of

the inference.
How good the inference is, i.e. how much
confidence can we place in the inference?
10
Example
Process Statistics
The approval rating of the President; what does

it really mean?
Uses a sample from the population to infer the
percentage of the population that approves of
his overall performance.
Implies that 55% of the population approves of
the presidents performance plus or minus 5%,
i.e. between 50% and 60%.
A process transforms inputs into outputs:

A manufacturing process which transforms aluminum
sheet into aluminum cans.
A service process which offers financial advice based
on a customers input.
Samples are obtained from a process and

statistical procedures can then be applied to
make inferences about the process itself.
Sampling a Process
Process
A sequence of operations that takes inputs (labor, raw
materials, methods, machines, and so on) and turns them
into outputs (products, services, and the like.)
Inputs
Process
Types of Data
Data can be classified into four types:
Nominal
Ordinal
Interval
Ratio
Outputs
A process is in statistical control if it displays constant

level and constant variation.
11
Nominal Data
Classify the members of the sample into
categories (Categorical Data).
Examples:
An individuals religious affiliation
Gender of applicants
An individuals political party affiliation
No mathematical properties, i.e. numerical

values are only codes.
Interval Data
Sample measurements enable comparisons
between members of the sample, i.e. the
differences between samples has meaning.
Examples:
Temperature or pressure readings.
Machine speeds
Can add and subtract but cannot multiply or

divide; origin has no meaning.
Ordinal Data
Units of the sample can be ordered with respect
to the variable of interest.
Examples:
Size of rental cars.
Ranking of microbrews with respect to taste.
Ranking of consumer preferences for a product.
No mathematical properties in that the difference

between ranking values is meaningless.
Ratio Data
Equal distance between numbers imply
equal distances between the values of the
characteristic being measured, i.e. zero
represents the absence of the characteristic
being measured.
Examples:
Sales revenue for a product or service.
Unemployment rate.
12
Classes of Data
Data can be classified as either being:
Qualitative data - nominal, ordinal, or
Quantitative data - interval, ratio.
Describing Data:
Graphs and Tables
Numerical data can also be discrete (countable)

or continuous.
Spreadsheet (or Database)
Variable (or Field)
Observation (or Record)
Displaying Data
For both Qualitative and Quantitative Data:
Pie Charts
Bar Graphs (Bar Charts)
Histograms
Frequency Tables
Stem and Leaf Diagrams
Pie Chart Example

1999 Cigarette Sales
(in billions) by
company
Philip Morris, 211.8
Reynolds, 189.7
Brown and Williamson,
69.1
Lorillard, 48.6
American, 43.9
Liggett, 29.8

(Billions of Cigarettes)
48.6
69.1
43.9
29.8
211.
8
189.
7
Philip Morris
Brown and Williamson
Reynolds
Lorillard
American
Liggett
13
Bar Graph Example

(in billions) by
company
Philip Morris, 211.8
Reynolds, 189.7
Brown and Williamson,
69.1
Lorillard, 48.6
American, 43.9
Liggett, 29.8

(Billions of Cigarettes)
Liggett
American
Histogram Example
Percentage of Sales
Revenue spent on
Advertising for a sample
of 35 Fortune 500
companies:
Lorillard
Brown and Williamson
Reynolds
Philip Morris
1% to 3% (4)
3% to 5% (9)
5% to 7% (11)
7% to 9% (8)
9% to 11% (3)
100 200 300
Measurement Classes
Intervals are called measurement classes:
A count of the members of a measurement class is
the frequency.
The proportion of members in a measurement class
is the relative frequency. For a given interval, this
proportion is calculated by dividing the frequency of
the measurement class by the sample size.
12
11
10
9
8
8
6
4
4
3
2
0
Relative Frequency
Sample:
Sales
Sales
Company Revenue Company Revenue
1
3.1
19
6.2
2
7.4
20
8.4
3
2.2
21
1.9
4
10.9
22
5.8
5
4.5
23
4.9
6
8.6
24
6.4
7
3.7
25
3.6
8
6.3
26
7.9
9
7.6
27
3.2
10
5.4
28
8.5
11
2.3
29
6.2
12
5.8
30
9.7
13
4.2
31
7.1
14
6.1
32
5.9
15
9.1
33
5.7
16
5.5
34
4.4
17
4.8
35
2.9
18
8.9
Frequency Table:
Range
1% to 3%
3% to 5%
5% to 7%
7% to 9%
9% to 11%
Count
4
9
11
8
3
Proportion
0.114
0.257
0.314
0.229
0.086
Divide range into intervals

of equal size.
Count the number of
sample members that fall
within the ranges.
14
Relative Frequency Histogram

Example
Percentage of Sales
Revenue spent on
Advertising for a sample
of 35 Fortune 500
companies:
1 to 3% (4/35=0.114)
3 to 5% (9/35=0.257)
5 to 7% (11/35=0.314)
7 to 9% (8/35=0.229)
9 to 11% (3/35=0.086)
Stem and Leaf Diagrams

Data is displayed
graphically:
0.35
0.30
The stem is the portion of

the data to the left of the
decimal point.
The leaf is the portion of
data to the right of the
decimal point.
0.25
0.20
0.15
Graphical representation
much like Histogram.
0.10
0.05
0.00
The Effect of Measurement

Class Size on a Histogram
A Histogram showing
greater detail can be
obtained by:
Decreasing class size
(which increases the
number of classes), or
Increasing sample size
(which increases the
number of members in
each class).
5
4
Histograms
4 4
Scatterplots
Leaf
1 9
2 239
3 1267
4 24589
5 457889
6 12234
7 1469
8 4569
9 17
10 9
Key: Leaf units are tenths.
Frequency tables
Stem
Excel and StatPro Add-in

Demonstration
7
6
From our previous

data:
Time series plots
15

Data Analysis: - Describing Data and Datasets

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Analysis: - Describing Data and Datasets

Загружено:

Авторское право:

Доступные форматы

Data Analysis

Describing data and datasets

Introduction to Data Analysis

Making inferences from data and datasets

Decision analysis with uncertainty

Modeling and simulation

What is Management Science?

Management Science Approach

Definition of the Problem

Constructing the Model

Components of the Model

Msci Approach to Problem Solving

Blue Ridge Hot Tubs manufactures and sell hot tubs.

Problem: Determine # of hot tubs to produce

A Generic Mathematical Model

Categories of Mathematical Models

Y = f(X1, X2, , Xk)

f(.) = function defining the relationship between the Xi and Y

LP, Networks, IP,

Y = dependent variable (a bottom line performance measure)

Example Spring Mills

Summary Measures for

Spring Mills has gathered data on 280

What information can we obtain from this

Scatterplot: Amount vs Days

Scatterplot: Amount vs Days

Scatterplot: Amount vs Days

Scatterplot: Amount vs Days

Modeling and Models

The Modeling Process

Define the problem

Population Parameters and Sample Statistics

Measures of Central Tendency

The average or expected value

Median, Md The middle point of the ordered

Sample x1, x2, , xn

The most frequent value

Relationships Among Mean,

Reliability of the Inference

Fifth Item: A measure of the reliability of

The approval rating of the President; what does

A process transforms inputs into outputs:

Samples are obtained from a process and

A process is in statistical control if it displays constant

No mathematical properties, i.e. numerical

Can add and subtract but cannot multiply or

No mathematical properties in that the difference

Numerical data can also be discrete (countable)

Pie Chart Example

1999 Cigarette Sales

Bar Graph Example

1999 Cigarette Sales

100 200 300

Divide range into intervals

Relative Frequency Histogram

Stem and Leaf Diagrams

The stem is the portion of

The Effect of Measurement

Excel and StatPro Add-in

From our previous

Time series plots

Вам также может понравиться