Академический Документы
Профессиональный Документы
Культура Документы
Objective of the training: The students became familiar with the basic quantitative techniques
used. The main focus, however, is in their applications in business.
Thus, the statistical methodology in collection, analysis and interpretation of data for
better decision-making is a prime requirement for managerial decision-making and
research in both physical and social sciences, particularly in business and economics.
Areas of application
Statistics in Economics
In Commerce
In planning
In banking, insurance etc
In Business Management
In research
In Social Science
Limited :
Doesnt study qualitative phenomena
Statistical laws true only on averages
Does not study individuals
Figures may be incomplete or manipulated
Need a caution
Statistical methods are delicate tools
Frequency Distribution and Graphic representation
of frequency distribution ,
Decide about the number of classes. Too many classes or too few classes might not
reveal the basic shape of the data set, also it will be difficult to interpret such frequency
distribution. The maximum number of classes may be determined by formula: or where
n is the total number of observations in the data.
Calculate the range of the data (Range = Max Min) by finding minimum and
maximum data value. Range will be used to determine the class interval or class width.
Decide about width of the class denote by h and obtained by .
Generally the class interval or class width is the same for all classes. The classes all
taken together must cover at least the distance from the lowest value (minimum) in the
data set up to the highest (maximum) value. Also note that equal class intervals are
preferred in frequency distribution, while unequal class interval may be necessary in
certain situations to avoid a large number of empty, or almost empty classes.
Decide the individual class limits and select a suitable starting point of the first class
which is arbitrary, it may be less than or equal to the minimum value. Usually it is
started before the minimum value in such a way that the midpoint (the average of lower
and upper class limits of the first class) is properly placed.
Take an observation and mark a vertical bar (|) for a class it belongs. A running tally is
kept till the last observation. The tally counts indicates five.
Find the frequencies, relative frequency, cumulative frequency etc. as required
Graphic representation of frequency
distribution
A histogram consists of tabular frequencies, shown as adjacent
rectangles, erected over discrete intervals (bins), with an area
equal to the frequency of the observations in the interval.
A bar chart is a chart with rectangular bars with lengths
proportional to the values that they represent. The bars can be
plotted vertically or horizontally.
A pie chart shows percentage values as a slice of a pie.
A line chart is a two-dimensional scatterplot of ordered
observations where the observations are connected following
their order.
Measures of Central Tendency
Introduction
The mean, median and mode are all valid measures of central
MERITS OF MEAN:
1-Arithmetic mean is rigidly defined by algebric formula
2- It is easy to calculate and simple to understand
3- IT BASED ON ALL OBSERVATIONS AND IT CAN BE REGARDED AS REPRESENTATIVE OF THE GIVEN DATA
4- It is capable of being treated mathematically and hence it is widely used in statistical analysis.
5-Arithmetic mean can be computed even if the detailed distribution is not known but some of the observation and
number of the observation are known.
6-It is least affected by the fluctuation of sampling
DEMERITS OF ARITHMETIC MEAN:
l-It can neither be determined by inspection or by graphical location
2-Arithmetic mean cannot be computed for qualitative data like data on intelligence honesty and smoking habit etc
3-It is too much affected by extreme observations and hence it is not adequately represent data consisting of some
extreme point
4-Arithmetic mean cannot be computed when class intervals have open ends
Merits and demerits of Median:
Median:
The median is that value of the series which divides the group into two equal parts, one part comprising all values
greater than the median value and the other part comprising all the values smaller than the median value.
Merits of Median:
(1) Simplicity:- It is very simple measure of the central tendency of the series. I the case of simple statistical series, just
a glance at the data is enough to locate the median value.
(2) Free from the effect of extreme values: - Unlike arithmetic mean, median value is not destroyed by the extreme
values of the series.
(3) Certainty: - Certainty is another merits is the median. Median values are always a certain specific value in the
series.
(4) Real value: - Median value is real value and is a better representative value of the series compared to arithmetic
mean average, the value of which may not exist in the series at all.
(5) Graphic presentation: - Besides algebraic approach, the median value can be estimated also through the graphic
presentation of data.
(6) Possible even when data is incomplete: - Median can be estimated even in the case of certain incomplete series. It is
enough if one knows the number of items and the middle item of the series.
Demerits of median:
(1) Simple and popular: - Mode is very simple measure of Following are the various demerits of mode:
central tendency. Sometimes, just at the series is enough to (1) Uncertain and vague: - Mode is an uncertain and vague
locate the model value. Because of its simplicity, it s a very measure of the central tendency.
popular measure of the central tendency.
(2) Not capable of algebraic treatment: - Unlike mean, mode
(2) Less effect of marginal values: - Compared top mean,
mode is less affected by marginal values in the series. Mode is not capable of further algebraic treatment.
is determined only by the value with highest frequencies. (3) Difficult: - With frequencies of all items are identical, it is
(3) Graphic presentation:- Mode can be located graphically, difficult to identify the modal value.
with the help of histogram. (4) Complex procedure of grouping:- Calculation of mode
involves cumbersome procedure of grouping the data. If the
(4) Best representative: - Mode is that value which occurs
most frequently in the series. Accordingly, mode is the best extent of grouping changes there will be a change in the
model value.
representative value of the series.
(5) Ignores extreme marginal frequencies:- It ignores extreme
(5) No need of knowing all the items or frequencies: - The marginal frequencies. To that extent model value is not a
calculation of mode does not require knowledge of all the
items and frequencies of a distribution. In simple series, it is representative value of all the items in a series.
enough if one knows the items with highest frequencies in the
distribution.
Managerial Applications of Mean
Median and Mode
Application of Statistics
Statistics are helpful to get a general idea about the type of earnings that are generated by the family
members.
Statistics are very useful in understanding large sets of data, and are especially helpful in seeing trends
over time. However, since they can also be used to skew results, we must be very careful in properly
understanding how they are calculated. When you are looking at averages to compare them over time, it is
very important that they are calculated the same way every year. Also, when someone tells you something
is an average, make sure you listen to the rest of the description. As we have seen, the average can change
significantly depending on how it is being calculated.
Using Averages: Mean, Median,
and Mode
The median would be used in preference to the arithmetic mean whenever a distribution is significantly skewed or data
is difficult or expensive to measure. For example, salaries of employees, turnover of a large set of companies, time to
destruction in tests of components.
The mode would be used in preference to the median as the most useful measure of location when the 'most common'
or the 'most popular' item is required. For example number of customers in a queue, number of defects in a sample,
sales of shirts by neck sizes.
The arithmetic mean would be used in preference to any other average in symmetric distributions or where further
statistical calculations or analysis might be required. For example, number of items produced per day on a large
assembly line, number of orders received per month for a firm.
Unit II
Q3-Q1
The quartile deviation is a slightly better measure of absolute dispersion than the range. But it ignores the
observation on the tails. If we take difference samples from a population and calculate their quartile
deviations, their values are quite likely to be sufficiently different. This is called sampling fluctuation. It is
not a popular measure of dispersion. The quartile deviation calculated from the sample data does not help
us to draw any conclusion (inference) about the quartile deviation in the population.
Coefficient of Quartile deviation
In statistics, the quartile coefficient of dispersion is a descriptive statistic which measures dispersion and
which is used to make comparisons within and between data sets.
The statistic is easily computed using the first (Q1) and third (Q3) quartiles for each data set. The quartile
coefficient of dispersion is:[1]
Example
Consider the following two data sets:
A = {2, 4, 6, 8, 10, 12, 14}
n = 7, range = 12, mean = 8, median = 8, Q1 = 4, Q3 = 12, coefficient of dispersion = 0.5
B = {1.8, 2, 2.1, 2.4, 2.6, 2.9, 3}
n = 7, range = 1.2, mean = 2.4, median = 2.4, Q1 = 2, Q3 = 2.9, coefficient of dispersion = 0.18
The quartile coefficient of dispersion of data set A is 2.7 times as great (0.5 / 0.18) as that of data set B.
Standard Deviation
The Standard Deviation is a measure of how spread out numbers are. Its
symbol is (the greek letter sigma) The formula is easy: it is the square root of the
Variance.
For a finite set of numbers, the standard deviation is found by taking the square
root of the average of the squared deviations of the values from their average value.
For example, the marks of a class of eight students (that is, a population) are the
following eight values:2 4 4 4 5 5 7 9
These eight data points have the mean (average) of 5:
First, calculate the deviations of each data point from the mean, and square the
result of each:9 , 1, 1, 1, 0, 0, 4, 16
The variance is the mean of these values:=4
and the population standard deviation is equal to the square root of the
variance:square root 4= 2
Coefficient of Standard Deviation:
Variance=The average of the squared differences from the Mean.
Interpretation
A large standard deviation indicates that the data points can spread far from the mean and a small
standard deviation indicates that they are clustered closely around the mean.
For example, each of the three populations {0, 0, 14, 14}, {0, 6, 8, 14} and {6, 6, 8, 8} has a mean of 7.
Their standard deviations are 7, 5, and 1, respectively. The third population has a much smaller
standard deviation than the other two because its values are all close to 7.
Unit III
Theory of Probability and probability distribution Mathematical
probability, Trail and event, simple problem based on sample space,
Binomial, Poisson, Normal Distribution and their application in business
decision making.
Probability
Probability is the measure of the likelihood that an event will occur.
Probability is quantified as a number between 0 and 1 (where 0 indicates
impossibility and 1 indicates certainty).
The higher the probability of an event, the more certain we are that the
event will occur.
A simple example is the toss of a fair (unbiased) coin. Since the two
outcomes are equally probable, the probability of "heads" equals the
probability of "tails", so the probability is 1/2 (or 50%) chance of either
"heads" or "tails".
Independent events
Independent events
for example, if two coins are flipped the chance of both being heads is .[22]
For example, when drawing a single card at random from a regular deck of cards, the chance of getting a
heart or a face card (J,Q,K) (or one that is both) is , because of the 52 cards of a deck 13 are hearts, 12 are
face cards, and 3 are both: here the possibilities included in the "3 that are both" are included in each of
the "13 hearts" and the "12 face cards" but should only be counted once.
Conditional probability
Conditional probability is the probability of some event A, given the occurrence of some other event B.
Conditional probability is written , and is read "the probability of A, given B". It is defined by[23]
Trial and Event
Trial:
A fixed number of repetitions of the same experiment can be thought
of as a composed experiment, in which case the individual repetitions
are called trials. For example, if one were to toss the same coin one
hundred times and record each result, each toss would be considered
a trial within the experiment composed of all hundred tosses
Event:
In probability theory, an event is a set of outcomes of an experiment (a
subset of the sample space) to which a probability is assigned
Binomial
In probability theory and statistics, the binomial distribution with parameters n and p is the
discrete probability distribution of the number of successes in a sequence of n independent
yes/no experiments, each of which yields success with probability p.
nCr prqn-r
However, for N much larger than n, the binomial distribution is a good approximation, and
widely used.
Assumptions
Use of the binomial distribution requires three assumptions:
Each replication of the process results in one of two possible outcomes (success
or failure),
The probability of success is the same for each replication, and
The replications are independent, meaning here that a success in one patient does
not influence the probability of success in another.
Poisson
The Poisson distribution named after French mathematician Simon Denis Poisson, is a discrete
probability distribution that expresses the probability of a given number of events occurring in a fixed
interval of time and/or space if these events occur with a known average rate and independently of the
time since the last event.The Poisson distribution can also be used for the number of events in other
specified intervals such as distance, area or volume.
Poisson Distribution Example
Suppose you knew that the mean number of calls to a fire station on a weekday is 8. What is the
probability that on a given weekday there would be 11 calls? This problem can be solved using the
following formula based on the Poisson distribution:
where
receive an average number of 4 letters per day. If receiving any particular piece of mail doesn't affect the
arrival times of future pieces of mail, i.e., if pieces of mail from a wide range of sources arrive
independently of one another, then a reasonable assumption is that the number of pieces of mail received
per day obeys a Poisson distribution.[2] Other examples that may follow a Poisson: the number of phone
calls received by a call center per hour, the number of decay events per second from a radioactive source,
.
Cont..
The normal distribution is sometimes informally called the bell curve or the Sombrero
distribution. However, many other distributions are bell-shaped (such as Cauchy's, Student's,
and logistic).
Here, is the mean or expectation of the distribution (and also its median and mode). The
parameter is its standard deviation with its variance then . A random variable with a
A positive correlation exists where the high values of one variable are associated with the
high values of the other variable(s).
A 'negative correlation' means association of high values of one with the low values of the
other(s)
Karl Pearson Coefficient of
correlation,
The value of r is such that -1 < r < +1. The + and perfect negative fit. Negative values
signs are used for positive indicate a relationship between x and y such
linear correlations and negative that as values for x increase, values
linear correlations, respectively. for y decrease.
Positive correlation: If x and y have a strong No correlation: If there is no linear correlation
positive linear correlation, r is close or a weak linear correlation, r is
to +1. An r value of exactly +1 indicates a close to 0. A value near zero means that there
perfect positive fit. Positive values is a random, nonlinear relationship
indicate a relationship between x and y between the two variables
variables such that as values for x increases, Note that r is a dimensionless quantity; that is,
values for y also increase. it does not depend on the units
Negative correlation: If x and y have a strong employed.
negative linear correlation, r is close If r>0.8 is generally described as strong,
to -1. An r value of exactly -1 indicates a r<than 0.5 is generally described as weak.
Rank correlation
A rank correlation coefficient measures the degree of
similarity between two rankings, and can be used to assess the
significance of the relation between them. For example, two
common nonparametric methods of significance that use rank
correlation are the MannWhitney U test and the Wilcoxon
signed-rank test.
Spearman's rank correlation
coefficient
In statistics, Spearman's rank correlation coefficient or Spearman's rho, named
after Charles Spearman and often denoted by the Greek letter (rho) or as , is a
nonparametric measure of statistical dependence between two variables. It
assesses how well the relationship between two variables can be described using a
monotonic function. If there are no repeated data values, a perfect Spearman
correlation of +1 or 1 occurs when each of the variables is a perfect monotone
function of the other.
relationships among variables. It includes many techniques for modeling and analyzing
several variables, when the focus is on the relationship between a dependent variable and
one or more independent variables (or 'predictors'). More specifically, regression analysis
helps one understand how the typical value of the dependent variable (or 'criterion variable')
changes when any one of the independent variables is varied, while the other independent
3) The distance in this time interval between any two consecutive data point is the same
4) Each time unit in the time interval has at most one data point
Examples of time series are ocean tides, counts of sunspots, and the daily closing value of
the Dow Jones Industrial Average.
Time Series Analysis
Time series analysis comprises methods for analyzing time series data in
order to extract meaningful statistics and other characteristics of the data.