Statistics Introduction Arranging Data

STATISTICS-INTRODUCTION
Role of Statistics in Managerial Decisions

Nature of Data, Population data ,Sample data.
Frequency Distribution
0
You use statistics daily without even realizing it!!!
You use statistics very often

without even realising it !!!!!
Examples ??????
1
Statistics is used to help determine
Which product I should sale (Demand stats)
How much you pay for insurance (Mortality Stat)
Whether drugs are approved for use (Drug trials)
Which cars you buy (Reliability ratings, crash tests)
Which products are on you grocery shelf (focus groups),
and where they are located (Big Bazzar & Snacks Shop are right next
to each otherwhat a concept!!!)
What politicians claim as their firm beliefs (opinion polls).
Favorites to win in sports.
Whether it will rain
And , on and on.
2
Statistics..Defn
Many people think of statistics as large amounts of numerical data, e.g.
share prices, GDP statistics, runs scored by Sachin etc etc
Definition : Statistics refers to the range of

techniques and procedure for collecting data,
summarizing data, classifying data, analyzing
data, interpreting data, displaying data and
making decisions based on data.
Definition: By Statistics, we mean aggregate of

facts, affected to a marked extent by
multiplicity of causes, numerically expresses,
enumerated or estimated accordingly to a
reasonable standards of accuracy, collected in a
systematic manner for a predetermined purpose
and placed in relation to each other
3
Characteristics of Statistics
Statistics are the aggregate of facts
Statistics are affected to a marked extent by multiplicity of causes
Statistics are numerically expressed
Statistics are expressed according to reasonable standards of accuracy
Statistics should be collected with reasonable standards of accuracy
Statistics should be placed in relation to each other
4
Why Study Statistics
It presents the facts in a definite & clear terms.
It gives the concise shape to the mass of figures and develops meaning from
the data
It helps to compare between two sets of figures
It helps in formulating & testing hypothesis

It helps in understanding & predicting the future events, from the past &
current data
It helps in formulation of suitable policies
It helps in understanding the complex happenings
Statistics are widely used in business. Usage continues to increase as the

business world becomes larger, more complex, and more quantitative.
5
Limitations of Statistics
Statistics does not study individual observations. It is only concerned with
groups of observations
Statistics deals with quantitative characteristics. It does not deal with

qualitative characteristics such as beauty, honesty, sharpness, brightness,
poverty, intelligence etc
Statistical laws are true only on averages
Statistics does not reveal the entire story
Statistics is only one of the methods of studying the problem
Statistics can be misused
Statistical data should be uniform & homogeneous.
6
Decision Making - Businesses
Accounting
Public accounting firms use statistical

sampling procedures when conducting
audits for their clients.
Economics
Economists use statistical information
in making forecasts about the future of
the economy or some aspect of it.
7
Marketing
Electronic point-of-sale scanners at

retail checkout counters are used to
collect data for a variety of marketing
research applications.
Production
A variety of statistical quality
control charts are used to monitor
the output of a production process.
8
Finance
Financial advisors use price-earnings ratios and
dividend yields to guide their investment
recommendations.
9
Uses & Abuses of Statistics
Most of the time, samples are used to infer something (draw conclusions)
about the population. However, occasionally the conclusions are inaccurate
or inaccurately portrayed for the following reasons:
Sample is too small. Even a large sample may not represent the population.
Unauthorized personnel are giving wrong information that the public will
take as truth. A possibility is a company sponsoring a statistics research
to prove that their company is better.
Visual aids may be correct, but emphasize different aspects. Specific

examples include graphs which don't start at zero thus exaggerating small
differences and charts which misuse area to represent proportions.
Precise statistics or parameters may incorrectly convey a sense of high

accuracy.
Misleading or unclear or incomplete information may be shared.
10
Misleading Statistical Presentation
These two graphs represent saleswho has

seen faster sales growth?
16000 14000
14000 13500
12000 13000
10000 12500
8000 12000
6000 11500
4000 11000
2000 10500
0 10000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
These are actually the same numbers with

different scales along the side.
11
Pictures can be misleading also
1
0.9
0.8
0.7
0.6
0.5
R
0.4
0.3
R
0.2
0.1
0
1 2
How much more is the second?

Its twice as tall, but its also twice as wide
this means 4 times the volume
It can be misleading.
12
Avoid Sensationalism!!!
e.g.
Violence Stat: Yet another incident doubling the last years

incidents
Accident Stat: First day of the year accident making it

365 a year (Almost 36 times compared to last year, when
we just had 10 accidents in a year)
13
Branches of Statistics
The academic discipline of statistics can be divided into
two major branches:
Descriptive statistics
Inferential statistics.
14
Descriptive Statistics
Deals with summarizing and presenting data in a readable, easily understood
form.
It is tabular, graphical, and numerical methods used to summarize data
Techniques:
Visualizing and Summarizing Data: Raw Data, Data Array, Distribution

Characterizing Distributions with Numerical and Graphical Tools: Histogram,
Ogive, Measures of Central Tendency: mean, median, mode; Measures of
Dispersion: Range, standard deviation, variance, etc.
Exploring the Relationship between Two variables: Scatter Diagrams,

Correlation Coefficients, Frequency Tables
15
Inferential Statistics
Drawing conclusions about a population based on information from a sample.
Statistical Inference is the process of using information obtained from

analyzing a sample to make estimates about characteristics of the entire
population. It is a discipline that allows us to estimate unknown quantities by
making some elementary measurements.
Using these estimates we can then make Predictions and Forecast the Future
Statistical Inference with Hypothesis Testing: null and alternative

hypotheses, one-tailed vs. two-tailed tests, test statistics, p-value, statistical
significance, decision rules
The Concept of Risk and Power: risks involved, type I and II errors,
confidence level and power of test
Statistical Inference with Confidence Intervals: how it works, when to use it
Equivalence of the Hypothesis Testing and the Confidence Interval
Approaches
Statistical Inference for a Single Sample or Group: Hypothesis Testing vs.
Confidence Interval Approach
16
START
Gathering of
Data
Classification,
Summarization, and
Processing of data
Presentation and
Communication of
Summarized information
Yes
Use sample information
Is Information from a
to make inferences about
sample?
Yes the population Statistical
Inference
No
Descripti
ve No Draw conclusions about
Statistics Use cencus data to
the population
analyze the population
characteristic (parameter)
characteristic under study
under study
STOP
17
Population & Sample
Population
Sample
18
Population & Sample
Population: The complete set of data elements is termed the population. It is a
set of all items in a particular study
Sample: A sample is a portion of a population selected for further analysis. It is

the subset of population
Parameter: A parameter is a characteristic of the whole population
Statistic: A statistics is a characteristic of the sample, presumably a measurable
Remember: Parameter is to Population as Statistic is to Sample
19
Why Sample Why Sample?
Less time consuming than a census
Less costly to administer than a

census
More practical to administer than

a census of the targeted
population
Case of Sampling Survey

Opinion Polls
20
Data
Data are the facts and figures that are collected,

summarized, analyzed, and interpreted. A collection of data
is called data set and a single observation is called a data
element
Data can be further classified as being qualitative

(Attribute) or quantitative (Variable).
Variables: Weight, height etcTwo types.Continuous &

Discrete
Continuous Variable is the variable, which can take any value
within the given interval . E.g. Weight.50.0, 50.2, 50.5, 51.0
etc
Discrete variable is the variable which can take isolated values
e.g. No of patients visiting a doctor e.g. 50, 51 etc
Attribute: Honesty, Integrity etc
21
Data Types
Data
Numerical Categorical
(Quantitative) (Qualitative)
Discrete Continuous
22
Primary Data
Data can be classified as Primary Data or Secondary Data
Primary data are those which are collected for a specific

purpose directly from the field and hence are original in nature.
This is collected by or on behalf of the person or persons who
are going to make the use of the data. Once the data have been
collected, processed & published, it becomes the secondary
data for the subsequent usage by different people for other
application in different connection
Methods for Primary Data Collection

Direct Personal Interview
Observations
Indirect Oral Interviews
Information from agents/correspondents
Mailed Questionnaire Method
23
Secondary Data
Secondary data are such numerical information, which have

been already collected by some agency for specific purpose and
are subsequently compiled from that source for the application
in different connections.
There are many advantages of using secondary data

It is inexpensive
Large quantity of data available from wide range of sources
The data may be available for many number of years, and
hence we can understand trend and may forecast the
futuristic information
24
Data Sources
Primary Secondary
Data Collection Data Compilation
Print or Electronic
Observation Survey
Experimentation
25
Descriptive Statistics
26
Data Processing Techniques
Raw Data
Data Array
Discrete Frequency Distribution
Continuous Frequency Distribution
27
Raw Data & Data Array
Raw Data:
Information before it is arranged & analysed is raw data. It is
called raw, as it is unprocessed by any statistical methods
Example
Data Array:
It involves arranging the values in either ascending or descending
order
Example
28
Numerical 1 Data Array
Raw Data
14 26 2 34 8 13 27 37 9 12
39 42 45 30 32 24 24 30 20 23
14 18 30 33 24 34 30 10 22 14
Prepare data array.
2 8 9 10 12 13 14 14 14 18
20 22 23 24 24 24 26 27 30 30
30 30 32 33 34 34 37 39 42 45
29
Discrete Distribution
In the discrete frequency distribution, after arranging the values
in ascending order, we count the frequency i.e. number of times
each value has appeared in the data set by using tally marks
Discrete distribution is also known as ungrouped FD.
Numerical
30
Numerical 2 - Discrete FD
Marks Tally Frequency Marks Tally Fequency
Marks Marks
2 1 24 3
8 1 26 1
9 1 27 1
10 1 30 4
12 1 32 1
13 1 33 1
14 3 34 2
18 1 37 1
20 1 39 1
22 1 42 1
23 1 45 1
31
In this, all the values are classified in groups or classes, hence
this type of distribution is known as grouped or continuous
frequency distribution
Class Limits
Class Interval
Class Frequency
Class Mid Point or Class Mark
32
Class Limits
Class Limits
The two boundaries of the class are known as Class Limits. The
Class Limits are the lowest and the highest value that can be
included in the class.
e.g. 10-20In this class, 10 is the lower limit and 20 is the

upper limit
The lower limit of the class is that value below which no

observation can be included in the class.
The upper limit of the class is that value above which no

observation can be included in the class.
33
Class Interval
Class Interval
The difference between the upper limit and lower limkt of the
class is known as class interval or class width of that class.
e.g. Class 10-20 has the CI of 10.
In case, for the classification, the number of classes are not

given, then the number of classes can be determined using
the Sturges formaula
No of Classes (K) = 1 + 3.322 log N
Where N is the total no of observations
34
Class Interval
Formula for the Class Interval:
Class Interval (i) = (Next unit value after the largest value in
the data Smallest value in the data)/No of Classes
e.g. If the marks of 30 students range between 10 & 40 and if

we want to divide in 3 classes, then
Class Interval (i) = (41-10)/3 = 10.33 i.e. 11
The classes become 10-21, 21-32, 32-43.
35
Cell Nomenclature
Cell interval (i)
CELL
Midpoint
UPPER BOUNDARY
CELL NOMENCLATURE
36
Exclusive / Inclusive Method
There are 2 methods of classifying the data according to class
intervals.
Exclusive Method: In this, the class intervals are so fixed that

the upper limit of the class is the lower limit of the next
class. In other words, in exclusive method, upper limits are
excluded from that class. E.g. 10-20, 20-30, 30-40 etc.
This is more suitable for continuous variable.
Inclusive Method: In this type, the upper limits are included in

the class. E.g. 10-19, 20-29, 30-39 etc. This is more
suitable for discrete variable.
Correction Factor = (Lower Limit of 2nd Class Upper Limit of

1st Class)/2
37
Correction Factor
In case of inclusive type, for getting the correct CI, we need
to add the correction factor to upper limit of the classes
and subtract the same from the lower limit of the classes.
Correction Factor = (Lower Limit of 2nd Class Upper Limit of

1st Class)/2
e.g. 10-19 Class
Correction factor = (19-10)/2 = 0.5 and hence the class

becomes 9.5-19.5 and hence the CI becomes 10
38
Inclusive to exclusive
Inclusive Type Exclusive type
10-14 9.5-14.5
15-19 14.5-19.5
20-24 19.5-24.5
25-29 24.5-19.5
39
Constructing FD
Step 1: Decide on the type (Inclusive / Exclusive) and

number of classes for dividing the data by using Sturges
formula. (If given in the numerical, then go to step 2
directly.
Step 2: Sort the data into different classes and count the
frequency
Step 3: Illustrate the data in the chart
40
Numerical 3 Continuous FD
Step 1: Calculate the No of Classes (Sturges formula)
No of Classes (K) = 1 + 3.322 log N
= 1 + 3.322 log 30
= 1 + 3.322 (1.477)
= 5.9
=6
Step 2: Sort the data points into classes and count the no of
points in each class.
We have K = 6
Now Class Interval width = (Next unit value after Largest

value Smallest value)/K = (46-2)/6 = 44/6 = 7.33 i.e. approx
8.
Hence the classes shall be 2-9, 10-17, 18-25, 26-33, 34-41,

42-49.
41
Numerical 3 Continuous FD
Class Tally Marks Frequency

29 3
10 17 6
18 25 7
26 33 8
34 41 4
42 49 2
42
Numerical 4 Home Assignment
The following set of the data represents the Km per litre of

40 similar motor cycles.
40.5, 39.7, 40.6, 39.9, 40.9, 38.9, 41.4, 40.5, 41.0, 38.8, 39.6,
40.4, 39.9, 40.2, 40.8, 40.7, 40.6, 41.7, 40.8, 39.1, 40.1, 40.7,
40.1, 40.7, 40.7, 39.8, 39.3, 39.6, 40.5, 41.3, 41.0, 39.9, 40.4,
40.9, 40.1, 41.2, 40.2, 40.0, 39.4, 40.6.
Construct the frequency distribution to this data taking

classes as 38.5-39.0, 39.0-39.5 etc
43
Numerical 4 Home Assignment
Classes Tally Marks Frequency
38.5-39.0 2
39.0-39.5 3
39.5-40.0 7
40.0-40.5 8
40.5-41.0 14
41.0-41.5 5
41.5-42.0 1
44

Statistics Introduction Arranging Data

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Statistics Introduction Arranging Data

Загружено:

Авторское право:

Доступные форматы

STATISTICS-INTRODUCTION

Role of Statistics in Managerial Decisions

You use statistics very often

How much you pay for insurance (Mortality Stat)

Whether drugs are approved for use (Drug trials)

Which cars you buy (Reliability ratings, crash tests)

Which products are on you grocery shelf (focus groups),

What politicians claim as their firm beliefs (opinion polls).

Favorites to win in sports.

Whether it will rain

And , on and on.

Definition : Statistics refers to the range of

Definition: By Statistics, we mean aggregate of

Statistics are the aggregate of facts

Statistics are affected to a marked extent by multiplicity of causes

Statistics are numerically expressed

Statistics are expressed according to reasonable standards of accuracy

Statistics should be collected with reasonable standards of accuracy

Statistics should be placed in relation to each other

It helps in formulating & testing hypothesis

It helps in formulation of suitable policies

It helps in understanding the complex happenings

Statistics are widely used in business. Usage continues to increase as the

Statistics deals with quantitative characteristics. It does not deal with

Statistical laws are true only on averages

Statistics does not reveal the entire story

Statistics is only one of the methods of studying the problem

Statistics can be misused

Statistical data should be uniform & homogeneous.

Public accounting firms use statistical

Electronic point-of-sale scanners at

Visual aids may be correct, but emphasize different aspects. Specific

Precise statistics or parameters may incorrectly convey a sense of high

Misleading or unclear or incomplete information may be shared.

These two graphs represent saleswho has

These are actually the same numbers with

How much more is the second?

Violence Stat: Yet another incident doubling the last years

Accident Stat: First day of the year accident making it

It is tabular, graphical, and numerical methods used to summarize data

Visualizing and Summarizing Data: Raw Data, Data Array, Distribution

Exploring the Relationship between Two variables: Scatter Diagrams,

Statistical Inference is the process of using information obtained from

Statistical Inference with Hypothesis Testing: null and alternative

Sample: A sample is a portion of a population selected for further analysis. It is

Parameter: A parameter is a characteristic of the whole population

Statistic: A statistics is a characteristic of the sample, presumably a measurable

Remember: Parameter is to Population as Statistic is to Sample

Less time consuming than a census

Less costly to administer than a

More practical to administer than

Case of Sampling Survey

Data are the facts and figures that are collected,

Data can be further classified as being qualitative

Variables: Weight, height etcTwo types.Continuous &

Attribute: Honesty, Integrity etc

Data can be classified as Primary Data or Secondary Data

Primary data are those which are collected for a specific

Methods for Primary Data Collection

Secondary data are such numerical information, which have

There are many advantages of using secondary data

Continuous Frequency Distribution

Prepare data array.