Вы находитесь на странице: 1из 94

Statistical Analysis

Johaira U. Lidasan, MCS


Statistical Analysis

- Is the process of generating statistics from stored data and


analyzing the results to deduce or infer meaning about the
underlying dataset or the reality that it attempts to describe
May be used to:
• Present key findings revealed by a dataset
• Summarize information
• Calculate measures of cohesiveness, relevance or diversity in data
• Make future predictions based on previously recorded data
• Test experimental predictions
Example

Johnny owns a social media company, FriendKeeper. Since its


inception in 2005, the social media company has grown to
unprecedented heights and is now a global force. However,
recently Johnny has noticed that general membership on the
website has declined in addition to new sign-ups. Johnny
believes that one of the primary causes of this decline in
numbers is due to the prevalence of news coverage by popular
news sources on the ills of social media coverage. He decides to
test this hypothesis. Johnny will thus take the following steps.
Example
• Accumulate the quantitative data of the two variables in
question.
• Put the data in a software program that can run some automatic
statistical analyses.
• Run a linear regression.
• Analyze the r-coefficient and other relevant results.
After the analysis of the Pearson’s coefficient, Johnny will be able to
determine whether the adverse news coverage on social media has
had a negative impact on his membership.
Statistics

- The study of the collection, analysis, interpretation,


presentation and organization of data.
- A value that describes a sample
Statistics

Descriptive and Inferential Statistics


• Organize • Generalize from
• Summarize samples to population
• Simplify data • Hypothesis Testing
• Relationship among
variables

Descriptive Statistics Inferential Statistics


Descriptive Statistics

• Summaries of data in tabular, graphical or numerical


presentation
Descriptive Statistics

Example
Descriptive Statistics

Example
Inferential Statistics

• The methods used to determine something about a


population on the basis of a sample
Inferential Statistics
Example:
Norris manufacturers a high-intensity lightbulb used in a variety of
electrical products. In an attempt to increase the useful life of the
lightbulb, the product design group developed a new lightbulb
filament. In this case, the population is defined as all lightbulbs that
could be produced with the new filament. To evaluate the advantages
of the new filament, 200 bulbs with the new filament were
manufactured and tested. Data collected from this sample showed the
number of hours each lightbulb operated before filament burnout.
Inferential Statistics

Example:
Inferential Statistics

Example:
Norris wants to use the sample data to make an
inference about the average hours of useful life for the
population of all lightbulbs that could be produced with
the new filament. Adding the 200 values in the table
and dividing the total by 200 provides the sample
average lifetime for the lightbulbs: 76 hours.
Variables

• Is a characteristics, number or quantity that can be


measured or counted
• Can also be called data item
• E.g. age, sex, business income, expenses
Measurement Scales
1. Nominal Scale – consists of categories in each of which the number of
respective observations is recorded. The categories are in no logical
order and have no particular relationship
2. Ordinal – consists of distinct categories in which order is implied. Values
in one category are larger or smaller than values in other categories
3. Interval – is a set of numerical measurements in which the distance
between numbers is of a known, constant size
4. Ratio – consists of numerical measurements where the distance
between numbers is of a known and the ratio of two values is
meaningful and true 0 exists
Types of Data

1. Categorical/Qualitative Data – data that can be


grouped by specific categories. It can use either the
nominal or ordinal scale of measurement
2. Quantitative Data – data that use numeric values to
indicate how much or how many. It is obtained
using either the interval or ratio scale of
measurement
Statistical Inference

1. Population – is the set of all elements of interest in


a particular study
2. Sample – is a subset of the population
Descriptive Statistics: Numerical
Measures

Measures of Central Tendency


- Is a single value that attempts to describe a set of data by
identifying the central position within that set of data
a. Mean
b. Median
c. Mode
d. Percentiles
e. Quartiles
Descriptive Statistics: Numerical
Measures

Measures of Central Tendency


- Is a single value that attempts to describe a set of data by
identifying the central position within that set of data
a. Mean A measure of central location
computed by summing the data
b. Median values and dividing by the number of
c. Mode observations

d. Percentiles
e. Quartiles
Descriptive Statistics: Numerical
Measures

Measures of Central Tendency


- Is a single value that attempts to describe a set of data by
identifying the central position within that set of data
a. Mean
b. Median
c. Mode
d. Percentiles
e. Quartiles
Descriptive Statistics: Numerical
Measures

Measures of Central Tendency


- Is a single value that attempts to describe a set of data by
identifying the central position within that set of data
a. Mean
b. Median
c. Mode
d. Percentiles
e. Quartiles
Descriptive Statistics: Numerical
Measures

Measures of Central
- Is a single value that attempts to describe a set of data by
identifying the central position within that set of data
a. Mean A measure of central location
provided by the value in the middle
b. Median when the data are arranged in
c. Mode ascending order

d. Percentiles
e. Quartiles
Descriptive Statistics: Numerical
Measures

Measures of Central
- Is a single value that attempts to describe a set of data by
identifying the central position within that set of data
a. Mean
b. Median
c. Mode
d. Percentiles
e. Quartiles
Descriptive Statistics: Numerical
Measures

Measures of Central Tendency


- Is a single value that attempts to describe a set of data by
identifying the central position within that set of data
a. Mean
b. Median A measure of central location defined
as the value that occurs with greatest
c. Mode frequency
d. Percentiles
e. Quartiles
Descriptive Statistics: Numerical
Measures

Measures of Central Tendency


- Is a single value that attempts to describe a set of data by
identifying the central position within that set of data
a. Mean
b. Median
c. Mode
d. Percentiles
e. Quartiles
Descriptive Statistics: Numerical
Measures

Measures of Central Tendency


- Is a single value that attempts to describe a set of data by
identifying the central position within that set of data
a. Mean
b. Median A value such that at least p percent of the
observations are less than or equal to this value
c. Mode and at least (100 – p) percent of the observations
d. Percentiles are greater than or equal to this value. The 50th
percentile is the median
e. Quartiles
Descriptive Statistics: Numerical
Measures

Measures of Central Tendency


- Is a single value that attempts to describe a set of data by
identifying the central position within that set of data
a. Mean
b. Median
c. Mode
d. Percentiles
e. Quartiles
Descriptive Statistics: Numerical
Measures

Measures of Central Tendency


- Is a single value that attempts to describe a set of data by
identifying the central position within that set of data
a. Mean
b. Median
c. Mode
d. Percentiles
e. Quartiles
Descriptive Statistics: Numerical
Measures

Measures of Central Tendency


- Is a single value that attempts to describe a set of data by
identifying the central position within that set of data
a. Mean
b. Median
c. Mode
d. Percentiles
e. Quartiles
Descriptive Statistics: Numerical
Measures

Measures of Central Tendency


- Is a single value that attempts to describe a set of data by
identifying the central position within that set of data
a. Mean
b. Median The 25th, 50th, and 75th percentiles, referred to as
the first quartile, the second quartile (median), and
c. Mode third quartile, respectively. The quartiles can be
d. Percentiles used to divide a data set into four parts, with each
part containing approximately 25% of the data
e. Quartiles
Descriptive Statistics: Numerical
Measures

Measures of Central Tendency


- Is a single value that attempts to describe a set of data by
identifying the central position within that set of data
a. Mean
b. Median
c. Mode
d. Percentiles
e. Quartiles
Descriptive Statistics: Numerical
Measures

Measures of Central Tendency


- Is a single value that attempts to describe a set of data by
identifying the central position within that set of data
a. Mean
b. Median
c. Mode
d. Percentiles
e. Quartiles
Descriptive Statistics: Numerical
Measures

Measures of Central Tendency


- Is a single value that attempts to describe a set of data by
identifying the central position within that set of data
a. Mean
b. Median
c. Mode
d. Percentiles
e. Quartiles
Descriptive Statistics: Numerical
Measures

Measures of Dispersion
- Is a way of describing how spread out a set of data is.
When a data set has a large value, the values in the
set are widely scattered; when it is small the items in
the set are tightly clustered.
Descriptive Statistics: Numerical
Measures

Measures of Dispersion Example


Suppose that you are a purchasing agent for a large
manufacturing firm and that you regularly place orders with two
different suppliers. After several months of operation, you find
that the mean number of days required to fill orders is 10 days
for both of the suppliers.
Descriptive Statistics: Numerical
Measures

Measures of Dispersion Example

Which supplier would you prefer?


Descriptive Statistics: Numerical
Measures

Measures of Dispersion
a. Range
b. Variance
c. Standard Deviation
Descriptive Statistics: Numerical
Measures

Measures of Dispersion
a. Range Is the difference between maximum and minimum
b. Variance
c. Standard Deviation
Descriptive Statistics: Numerical
Measures

Measures of Dispersion
a. Range Is a measure of variability that utilizes all the data and
is based on the difference between the value of each
b. Variance observation and the mean

c. Standard Deviation
Descriptive Statistics: Numerical
Measures

Measures of Dispersion
a. Range
b. Variance
c. Standard Deviation

Tells how spread out numbers are from the mean


Example

A home theater in a box is the easiest and cheapest


way to provide surround sound for a home
entertainment center. The prices are for models with a
DVD player and for models without a DVD player.
Example
A sample of prices is shown here (Consumer Reports Buying Guide, 2004)

a. Compute the mean price for models with a DVD player and the mean price of models without a
DVD player. What is the additional price paid to have a DVD player included in a home theater
unit?
b. Compute the range, variance and standard deviation for the two samples. What does this
information tell you about the prices for models with and without a DVD player?
Summarizing Quantitative Data

Frequency Distribution
- Is a tabular summary of data showing the number (frequency) of items in
each several nonoverlapping classes.
Summarizing Quantitative Data

Example
Summarizing Quantitative Data

Example
Summarizing Quantitative Data

Example
Summarizing Quantitative Data

Example
Summarizing Quantitative Data

Scatter Diagram
- Is a graphical presentation of the relationship between two quantitative
variables
Trendline
- Is a line that provides and approximation of the relationship.
Summarizing Quantitative Data

Example
Summarizing Quantitative Data
Review of Probability

Probability
- Is a numerical measure of the likelihood that an event will
occur
- Probability values are always assigned on a scale from 0 to 1
Review of Probability

Terms:
Experiment – is a process that generates well-defined outcomes
Sample space – is the set of all experimental outcomes
Sample Point – experimental outcome
Event – is a collection of sample points
Probability of an event – is equal to the sum of the probabilities
of the sample points in the event
Review of Probability
Review of Probability

Example:
Review of Probability

Example:
Review of Probability

Example:
Many students accumulate debt by the time they
graduate from college. Shown in the table is the
percentage of graduates with debt and the average
amount of debt for these graduates at four universities
and four liberal arts colleges.
Review of Probability

Example:
Review of Probability

Example:
a. If you randomly choose a graduate of Morehouse College, what is the
probability that this individual graduated with debt?
b. If you randomly choose one of these eight institutions for a follow-up study on
student loans, what is the probability that you will choose an institution with
more than 60% of its graduates having debt?
c. If you randomly choose one of these eight institutions for a follow-up study on
student loans, what is the probability that you will choose an institution whose
graduates with debts have an average debt of more than $30,000?
d. What is the probability that a graduate of Pace University does not have debt?
Review of Probability

Example:
To investigate how often families eat at home, Harris
Interactive surveyed 496 adults living with children
under the age of 18. The survey results are shown in
the table.
Review of Probability

Example:
Review of Probability

Example:
a. The probability the family eats no meals at home
during the week.
b. The probability the family eats at least four meals at
home during the week.
c. The probability the family eats two or fewer meals
at home during the week.
Random Variables

Random Variables
- Is a numerical description of the outcome of an
experiment
1. Discrete Random Variables – finite number or
infinite sequence
2. Continuous Random Variables – interval or
collection of intervals
Random Variables

Discrete Random Variables


Random Variables

Continuous Random Variables


Probability Distributions

Probability Distribution
- Describes how probabilities are distributed over the
values of the random variable.
- For a discrete random variable x, the probability
distribution is defined by a probability function,
denoted by f(x)
Probability Distributions
Probability Distributions

Example
As an illustration of a discrete random variable and its probability
distribution, consider the sales of automobiles at DiCarlo Motors in
Saratoga, New York. Over the past 300 days of operation, sales data
show 54 days with no automobiles sold, 117 days with 1 automobile
sold, 72 days with 2 automobiles sold, 42 days with 3 automobiles sold,
12 days with 4 automobiles sold, and 3 days with 5 automobiles sold.
Suppose we consider the experiment of selecting a day of operation at
DiCarlo Motors and define the random variable of interest as x the
number of automobiles sold during a day.
Probability Distributions

Example
Probability Distributions

Example
Expected Value

- Or mean of a random variable is a measure of the


central location for the random variable
Expected Value

Example
Binomial Probability
Distribution

- Is a discrete probability distribution that provides many


applications and has four properties:
1. The experiment consists of a sequence of n identical trials.
2. Two outcomes are possible on each trial: success or failure.
3. The probability of a success, denoted by p, does not
change from trial to trial
4. The trials are independent.
Binomial Probability
Distribution
Binomial Probability
Distribution

Example
Suppose that 80% of adults with allergies report
symptomatic relief with a specific medication. If the
medication is given to 10 new patients with allergies,
what is the probability that it is effective in exactly
seven?
Binomial Probability
Distribution

Example

n = 10
p = 0.80
x=7
Poisson Probability
Distribution

- Is a discrete probability distribution which has as one


of its important applications the modeling of events
of a particular time over a unit of time or space – for
example, the number of automobiles arriving at a toll
booth during a given 5-minute period of time
Poisson Probability
Distribution
Poisson Probability
Distribution

Example
Suppose that we are interested in the number of arrivals at the
drive-up teller window of a bank during a 15-minute period on
weekday mornings. If we can assume that the probability of a
car arriving is the same for any two time periods of equal
length and that the arrival or nonarrival of a car in any time
period is independent of the arrival or nonarrival in any time
period, the Poisson probability function is applicable.
Poisson Probability
Distribution

Example
Suppose these assumptions are satisfied and an analysis of
historical data shows that the average number of cars
arriving in a 15-minute period of time is 10; in this case, the
following probability function applies.

The random variable here is x = number of cars arriving in


any 15-minute period.
Poisson Probability
Distribution

Example
If management wanted to know the probability of
exactly five arrivals in 15 minutes, we would set x = 5
and thus obtain
Continuous Probability
Distribution

Probability Density Function – is a mathematical


expression that defines the distribution of the values
for a continuous random variable
1. Normal Probability Distribution
2. Uniform Probability Distribution
Normal Probability
Distribution

- Sometimes referred to as the Gaussian distribution and is the


most common continuous distribution used in statistics
Properties:
1. It is symmetrical and its mean and median are therefore equal
2. It is bell-shaped in appearance
3. Its interquartile range is equal to 1.33 standard deviations. Thus, the
middle 50% of the values are contained within an interval of two-
thirds of a standard deviation below the mean and two-thirds of a
standard deviation above the mean
Normal Probability
Distribution

Example:
The succeeding data represent the amount of soft
drink in 10,000 1-liter bottles filled on a recent day.
Normal Probability
Distribution

Example:
Normal Probability
Distribution

Example:
Normal Probability
Distribution
Standard Normal
Probability Distribution

- Random variable that has a normal distribution


with a mean of zero and a standard deviation of
one
Standard Normal
Probability Distribution
Standard Normal
Probability Distribution

Example:
Suppose the GrearTire Company developed a new steel-belted
radial tire to be sold through a national chain of discount stores.
Because the tire is a new product, Grear’s managers believe that
the mileage guarantee offered with the tire will be an important
factor in the acceptance of the product. Before finalizing the tire
mileage guarantee policy, Grear’s managers want probability
information about x = number of miles the tires will last.
Standard Normal
Probability Distribution

Example:
From actual road tests with the tires, Grear’s engineering group
estimated that the mean tire mileage is μ = 36,500 miles and that
the standard deviation is σ = 5000. In addition, the data collected
indicate that a normal distribution is a reasonable assumption.
What percentage of the tires can be expected to last more than
40,000 miles? In other words, what is the probability that the tire
mileage, x, will exceed 40,000?
Standard Normal
Probability Distribution

Example:
Standard Normal
Probability Distribution

Example:

The area under the standard normal curve to the left of z = 0.70 is
0.7580

Thus, 1.000 – 0.7580 = 0.2420 is the probability that z will exceed.


Uniform Probability
Distribution

- A value has the same probability of occurrence anywhere in the


range between the smallest value and the largest value
- Sometimes called rectangular distribution
- Constant probability
Uniform Probability
Distribution

Вам также может понравиться