Вы находитесь на странице: 1из 60

Course: MBA

Sub: Quantitative Methods


Code: CP 102
BASIC DATA OF THE SUBJECT

Name of the subject: Quantitative Methods


Language: English/Hindi(when required )
Subject educator Name: Prof. Neerja Nigam
Title: Associate Professor
Affiliation: Barkatullah University
Period: August 2015-Jan 2016

Objective of the training: The students became familiar with the basic quantitative techniques
used. The main focus, however, is in their applications in business.

Contact education: 48 hours


Internal marks: 20
External marks : 80
Total Marks: 100

Pattern for external evaluation:


Sec A: (Short answers) 4 out of 8 4*8=32 marks
Sec b (Essay type & Case) 3 out of 5 3*16=48 marks
Unit I
Statistical basis of managerial decisions,
Frequency Distribution and Graphic representation of
frequency distribution ,
Measures of Central Tendency Mean Median Mode ,
Requisite of Ideal Measures of Central Techniques,
Merits and Demerits of Mean Median Mode and their
managerial Applications
Statistical basis of managerial
decisions
Relevance:

Since, the complexity of business environment makes the process of decision-making


difficult, the decision-maker cannot rely entirely upon his observation, experience or
evaluation to makes decision. Decisions have to be based upon data which show
relationship, indicate trends, and show rates of change in various relevant variables.
The field of statistics provides methods for collecting, presenting, analysing and
meaningfully interpreting data.

Thus, the statistical methodology in collection, analysis and interpretation of data for
better decision-making is a prime requirement for managerial decision-making and
research in both physical and social sciences, particularly in business and economics.
Areas of application

Statistics in Economics
In Commerce
In planning
In banking, insurance etc
In Business Management
In research
In Social Science

Limited :
Doesnt study qualitative phenomena
Statistical laws true only on averages
Does not study individuals
Figures may be incomplete or manipulated
Need a caution
Statistical methods are delicate tools
Frequency Distribution and Graphic representation
of frequency distribution ,

In statistics, a frequency distribution is a table that displays the


frequency of various outcomes in a sample.[1] Each entry in the table
contains the frequency or count of the occurrences of values within a
particular group or interval, and in this way, the table summarizes
the distribution of values in the sample.
Construction of frequency distributions

Decide about the number of classes. Too many classes or too few classes might not
reveal the basic shape of the data set, also it will be difficult to interpret such frequency
distribution. The maximum number of classes may be determined by formula: or where
n is the total number of observations in the data.
Calculate the range of the data (Range = Max Min) by finding minimum and
maximum data value. Range will be used to determine the class interval or class width.
Decide about width of the class denote by h and obtained by .
Generally the class interval or class width is the same for all classes. The classes all
taken together must cover at least the distance from the lowest value (minimum) in the
data set up to the highest (maximum) value. Also note that equal class intervals are
preferred in frequency distribution, while unequal class interval may be necessary in
certain situations to avoid a large number of empty, or almost empty classes.
Decide the individual class limits and select a suitable starting point of the first class
which is arbitrary, it may be less than or equal to the minimum value. Usually it is
started before the minimum value in such a way that the midpoint (the average of lower
and upper class limits of the first class) is properly placed.
Take an observation and mark a vertical bar (|) for a class it belongs. A running tally is
kept till the last observation. The tally counts indicates five.
Find the frequencies, relative frequency, cumulative frequency etc. as required
Graphic representation of frequency
distribution
A histogram consists of tabular frequencies, shown as adjacent
rectangles, erected over discrete intervals (bins), with an area
equal to the frequency of the observations in the interval.
A bar chart is a chart with rectangular bars with lengths
proportional to the values that they represent. The bars can be
plotted vertically or horizontally.
A pie chart shows percentage values as a slice of a pie.
A line chart is a two-dimensional scatterplot of ordered
observations where the observations are connected following
their order.
Measures of Central Tendency
Introduction

A measure of central tendency is a single value that attempts

to describe a set of data by identifying the central position

within that set of data.

The mean, median and mode are all valid measures of central

tendency but, under different conditions


Measures of Central Tendency Mean Median
Mode ,

The measures of central tendency provide us with


statistical information about a set of data. The four
primary measurements that we use are the mean,
median, mode and range
Requisites of a Measure of Central
Tendency
It should be rigidly defined The definition of an average should be clear and rigid so that
there must be uniformity in its interpretation by different decision- makers or investigators.
It should be based on all observations To ensure that it should represent the entire data
set, its value should be calculated by taking into consideration the entire data set.
It should be easy to understand and calculate The value of an average should be
computed by using a simple method without reducing its accuracy and other advantages.
It should have sampling stability The value of an average calculated from various
independent random samples of the sample size from a given population should not vary
much from another.
It should be capable of further algebraic treatment The nature of the average should be
such that it could be used for statistical analysis of the data set. For example, it should be
possible to determine the average production in a particular year by the use of average
production in each month of that year.
It should not be unduly affected by extreme observations The value of an average should
not be unduly affected by very large observations in the given data. Otherwise the average
value may not truly represent characteristics of the entire set of data.
Merits and Demerits of Mean Median Mode and their
managerial Applications

MERITS OF MEAN:
1-Arithmetic mean is rigidly defined by algebric formula
2- It is easy to calculate and simple to understand
3- IT BASED ON ALL OBSERVATIONS AND IT CAN BE REGARDED AS REPRESENTATIVE OF THE GIVEN DATA
4- It is capable of being treated mathematically and hence it is widely used in statistical analysis.
5-Arithmetic mean can be computed even if the detailed distribution is not known but some of the observation and
number of the observation are known.
6-It is least affected by the fluctuation of sampling
DEMERITS OF ARITHMETIC MEAN:
l-It can neither be determined by inspection or by graphical location
2-Arithmetic mean cannot be computed for qualitative data like data on intelligence honesty and smoking habit etc
3-It is too much affected by extreme observations and hence it is not adequately represent data consisting of some
extreme point
4-Arithmetic mean cannot be computed when class intervals have open ends
Merits and demerits of Median:
Median:

The median is that value of the series which divides the group into two equal parts, one part comprising all values
greater than the median value and the other part comprising all the values smaller than the median value.
Merits of Median:
(1) Simplicity:- It is very simple measure of the central tendency of the series. I the case of simple statistical series, just
a glance at the data is enough to locate the median value.
(2) Free from the effect of extreme values: - Unlike arithmetic mean, median value is not destroyed by the extreme
values of the series.
(3) Certainty: - Certainty is another merits is the median. Median values are always a certain specific value in the
series.
(4) Real value: - Median value is real value and is a better representative value of the series compared to arithmetic
mean average, the value of which may not exist in the series at all.
(5) Graphic presentation: - Besides algebraic approach, the median value can be estimated also through the graphic
presentation of data.
(6) Possible even when data is incomplete: - Median can be estimated even in the case of certain incomplete series. It is
enough if one knows the number of items and the middle item of the series.
Demerits of median:

Following are the various demerits of median:


(1) Lack of representative character: - Median fails to be a representative measure in
case of such series the different values of which are wide apart from each other. Also,
median is of limited representative character as it is not based on all the items in the
series.
(2) Unrealistic:- When the median is located somewhere between the two middle
values, it remains only an approximate measure, not a precise value.
(3) Lack of algebraic treatment: - Arithmetic mean is capable of further algebraic
treatment, but median is not. For example, multiplying the median with the number of
items in the series will not give us the sum total of the values of the series.
However, median is quite a simple method finding an average of a series. It is quite a
commonly used measure in the case of such series which are related to qualitative
observation as and health of the student.
Mode:
The value of the variable which occurs most frequently in a distribution
is called the mode.

Merits of Mode: Demerits of mode:

(1) Simple and popular: - Mode is very simple measure of Following are the various demerits of mode:
central tendency. Sometimes, just at the series is enough to (1) Uncertain and vague: - Mode is an uncertain and vague
locate the model value. Because of its simplicity, it s a very measure of the central tendency.
popular measure of the central tendency.
(2) Not capable of algebraic treatment: - Unlike mean, mode
(2) Less effect of marginal values: - Compared top mean,
mode is less affected by marginal values in the series. Mode is not capable of further algebraic treatment.
is determined only by the value with highest frequencies. (3) Difficult: - With frequencies of all items are identical, it is
(3) Graphic presentation:- Mode can be located graphically, difficult to identify the modal value.
with the help of histogram. (4) Complex procedure of grouping:- Calculation of mode
involves cumbersome procedure of grouping the data. If the
(4) Best representative: - Mode is that value which occurs
most frequently in the series. Accordingly, mode is the best extent of grouping changes there will be a change in the
model value.
representative value of the series.
(5) Ignores extreme marginal frequencies:- It ignores extreme
(5) No need of knowing all the items or frequencies: - The marginal frequencies. To that extent model value is not a
calculation of mode does not require knowledge of all the
items and frequencies of a distribution. In simple series, it is representative value of all the items in a series.
enough if one knows the items with highest frequencies in the
distribution.
Managerial Applications of Mean
Median and Mode
Application of Statistics

Statistics are helpful to get a general idea about the type of earnings that are generated by the family
members.

Statistics are very useful in understanding large sets of data, and are especially helpful in seeing trends
over time. However, since they can also be used to skew results, we must be very careful in properly
understanding how they are calculated. When you are looking at averages to compare them over time, it is
very important that they are calculated the same way every year. Also, when someone tells you something
is an average, make sure you listen to the rest of the description. As we have seen, the average can change
significantly depending on how it is being calculated.
Using Averages: Mean, Median,
and Mode
The median would be used in preference to the arithmetic mean whenever a distribution is significantly skewed or data
is difficult or expensive to measure. For example, salaries of employees, turnover of a large set of companies, time to
destruction in tests of components.

The mode would be used in preference to the median as the most useful measure of location when the 'most common'
or the 'most popular' item is required. For example number of customers in a queue, number of defects in a sample,
sales of shirts by neck sizes.

The arithmetic mean would be used in preference to any other average in symmetric distributions or where further
statistical calculations or analysis might be required. For example, number of items produced per day on a large
assembly line, number of orders received per month for a firm.
Unit II

Dispersion measures of dispersion range, QD, MD, SD, Coefficient of

variation, Skewness , Kurtosis


Dispersion
Dispersion in statistics is a way of describing how spread out a set
of data is. When a data set has a large dispersion, the values in the set
are widely scattered; when dispersion is small the items in the set are
tightly clustered.
Very basically, this set of data has a small dispersion:
1, 2, 2, 3, 3, 4
and this set has a wider dispersion:
0, 1, 20, 30, 40, 100
Dispersion:
Measures of Dispersion
Standard deviation: probably the most common measure of dispersion. It tells you how
spread out numbers are from the mean,
Interquartile range (IQR): describes where the bulk of the data lies (the middle fifty
percent).
Interdecile range: the difference between the first decile (10%) and the last decile
(90%).
Range: the difference between the smallest and largest number in a set of data.
Mean difference or difference in means: measures the absolute difference
between the mean value in two different groups in clinical trials.
Median absolute deviation (MAD): the median of the absolute deviations from a
data sets median.
Quartiles: Numbers that split the data into four quarters (first, second, third, and
fourth quartiles).
Range:
In arithmetic, the range of a set of data is the difference between the largest and
smallest values
Its coefficient:
Formula:
R = (xm- xo)/(xm + xo) Where R = Range of Coefficient xm = Maximum Value
xo = Minimum Value
Relevance of studying coefficient:
The coefficient of variation is useful because the standard deviation of data must
always be understood in the context of the mean of the data. In contrast, the actual
value of the CV is independent of the unit in which the measurement has been
taken, so it is a dimensionless number. For comparison between data sets with
different units or widely different means, one should use the coefficient of
variation instead of the standard deviation.
Mean deviation
Average of absolute differences (differences expressed without plus or
minus sign) between each value in a set, and the average of all values of
that set is called mean deviation
For example, the average (arithmetic mean or mean) of the set of values
1, 2, 3, 4, and 5 is (15 5) or 3. T
he difference between this average (3) and the values in the set is
2, 1, 0, -1, and -2;
the absolute difference being 2, 1, 0, 1, and 2. The average of these
numbers (6 5) is 1.2 which is the mean deviation. Also called mean
absolute deviation, it is used as a measure of dispersion where the number
of values or quantities is small, otherwise standard deviation is used.
Coefficient of MD

Coefficient of M.D (about mean) = Mean Deviation from Mean/ Mean

Coefficient of M.D (about median) = Mean Deviation from Median/Median

Coefficient of M.D (about mode) = Mean Deviation from ModeMode


Quartile Deviation
It is based on the lower quartile and the upper quartile . The difference is called the inter quartile range.
The difference divided by is called semi-inter-quartile range or the quartile deviation. Thus

Q3-Q1
The quartile deviation is a slightly better measure of absolute dispersion than the range. But it ignores the
observation on the tails. If we take difference samples from a population and calculate their quartile
deviations, their values are quite likely to be sufficiently different. This is called sampling fluctuation. It is
not a popular measure of dispersion. The quartile deviation calculated from the sample data does not help
us to draw any conclusion (inference) about the quartile deviation in the population.
Coefficient of Quartile deviation
In statistics, the quartile coefficient of dispersion is a descriptive statistic which measures dispersion and
which is used to make comparisons within and between data sets.

The statistic is easily computed using the first (Q1) and third (Q3) quartiles for each data set. The quartile
coefficient of dispersion is:[1]

{Q_{3}-Q_{1} \over Q_{3}+Q_{1}}.

Example
Consider the following two data sets:
A = {2, 4, 6, 8, 10, 12, 14}
n = 7, range = 12, mean = 8, median = 8, Q1 = 4, Q3 = 12, coefficient of dispersion = 0.5
B = {1.8, 2, 2.1, 2.4, 2.6, 2.9, 3}
n = 7, range = 1.2, mean = 2.4, median = 2.4, Q1 = 2, Q3 = 2.9, coefficient of dispersion = 0.18
The quartile coefficient of dispersion of data set A is 2.7 times as great (0.5 / 0.18) as that of data set B.
Standard Deviation
The Standard Deviation is a measure of how spread out numbers are. Its
symbol is (the greek letter sigma) The formula is easy: it is the square root of the
Variance.
For a finite set of numbers, the standard deviation is found by taking the square
root of the average of the squared deviations of the values from their average value.
For example, the marks of a class of eight students (that is, a population) are the
following eight values:2 4 4 4 5 5 7 9
These eight data points have the mean (average) of 5:
First, calculate the deviations of each data point from the mean, and square the
result of each:9 , 1, 1, 1, 0, 0, 4, 16
The variance is the mean of these values:=4
and the population standard deviation is equal to the square root of the
variance:square root 4= 2
Coefficient of Standard Deviation:
Variance=The average of the squared differences from the Mean.
Interpretation
A large standard deviation indicates that the data points can spread far from the mean and a small
standard deviation indicates that they are clustered closely around the mean.

For example, each of the three populations {0, 0, 14, 14}, {0, 6, 8, 14} and {6, 6, 8, 8} has a mean of 7.
Their standard deviations are 7, 5, and 1, respectively. The third population has a much smaller
standard deviation than the other two because its values are all close to 7.
Unit III
Theory of Probability and probability distribution Mathematical
probability, Trail and event, simple problem based on sample space,
Binomial, Poisson, Normal Distribution and their application in business
decision making.
Probability
Probability is the measure of the likelihood that an event will occur.
Probability is quantified as a number between 0 and 1 (where 0 indicates
impossibility and 1 indicates certainty).
The higher the probability of an event, the more certain we are that the
event will occur.
A simple example is the toss of a fair (unbiased) coin. Since the two
outcomes are equally probable, the probability of "heads" equals the
probability of "tails", so the probability is 1/2 (or 50%) chance of either
"heads" or "tails".
Independent events

Independent events

If two events, A and B are independent then the joint probability is

for example, if two coins are flipped the chance of both being heads is .[22]

Mutually exclusive events

If either event A or event B occurs on a single performance of an experiment this is called


the union of the events A and B denoted as . If two events are mutually exclusive then the
probability of either occurring is

For example, the chance of rolling a 1 or 2 on a six-sided die is


Not mutually exclusive events

If the events are not mutually exclusive then

For example, when drawing a single card at random from a regular deck of cards, the chance of getting a
heart or a face card (J,Q,K) (or one that is both) is , because of the 52 cards of a deck 13 are hearts, 12 are
face cards, and 3 are both: here the possibilities included in the "3 that are both" are included in each of
the "13 hearts" and the "12 face cards" but should only be counted once.

Conditional probability

Conditional probability is the probability of some event A, given the occurrence of some other event B.
Conditional probability is written , and is read "the probability of A, given B". It is defined by[23]
Trial and Event
Trial:
A fixed number of repetitions of the same experiment can be thought
of as a composed experiment, in which case the individual repetitions
are called trials. For example, if one were to toss the same coin one
hundred times and record each result, each toss would be considered
a trial within the experiment composed of all hundred tosses

Event:
In probability theory, an event is a set of outcomes of an experiment (a
subset of the sample space) to which a probability is assigned
Binomial
In probability theory and statistics, the binomial distribution with parameters n and p is the
discrete probability distribution of the number of successes in a sequence of n independent
yes/no experiments, each of which yields success with probability p.

A success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when n


= 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the
basis for the popular binomial test of statistical significance.

nCr prqn-r

However, for N much larger than n, the binomial distribution is a good approximation, and
widely used.
Assumptions
Use of the binomial distribution requires three assumptions:
Each replication of the process results in one of two possible outcomes (success
or failure),
The probability of success is the same for each replication, and
The replications are independent, meaning here that a success in one patient does
not influence the probability of success in another.
Poisson
The Poisson distribution named after French mathematician Simon Denis Poisson, is a discrete
probability distribution that expresses the probability of a given number of events occurring in a fixed
interval of time and/or space if these events occur with a known average rate and independently of the
time since the last event.The Poisson distribution can also be used for the number of events in other
specified intervals such as distance, area or volume.
Poisson Distribution Example
Suppose you knew that the mean number of calls to a fire station on a weekday is 8. What is the
probability that on a given weekday there would be 11 calls? This problem can be solved using the
following formula based on the Poisson distribution:

where

e is the base of natural logarithms (2.7183)


is the mean number of "successes"
x is the number of "successes" in question

For this example,

since the mean is 8 and the question pertains to 11 fires.


Poisson Distribution Example
For instance, an individual keeping track of the amount of mail they receive each day may notice that they

receive an average number of 4 letters per day. If receiving any particular piece of mail doesn't affect the

arrival times of future pieces of mail, i.e., if pieces of mail from a wide range of sources arrive

independently of one another, then a reasonable assumption is that the number of pieces of mail received

per day obeys a Poisson distribution.[2] Other examples that may follow a Poisson: the number of phone

calls received by a call center per hour, the number of decay events per second from a radioactive source,

or the number of pedicabs in queue in a particular street in a given hour of a day.[3]


Normal Cont
In probability theory, the normal (or Gaussian) distribution is a very common
continuous probability distribution. Normal distributions are important in statistics
and are often used in the natural and social sciences to represent real-valued random
variables whose distributions are not known.
Cont..
The normal distribution is useful because of the central limit theorem. In its most
general form, under some conditions (which include finite variance), it states that
averages of random variables independently drawn from independent distributions
converge in distribution to the normal, that is, become normally distributed when the
number of random variables is sufficiently large. Physical quantities that are expected
to be the sum of many independent processes (such as measurement errors) often have
distributions that are nearly normal..

.
Cont..
The normal distribution is sometimes informally called the bell curve or the Sombrero

distribution. However, many other distributions are bell-shaped (such as Cauchy's, Student's,

and logistic).

The probability density of the normal distribution is:

Here, is the mean or expectation of the distribution (and also its median and mode). The

parameter is its standard deviation with its variance then . A random variable with a

Gaussian distribution is said to be normally distributed and is called a normal deviate.


Unit IV
Correlation and Regression Analysis: Karl Pearson Coefficient of correlation,
Rank correlation, repeated ranks, Spearmans rank correlation, Regression
equation, Regression Co-efficient, Time Series analysis and forecasting
Correlation
Degree and type of relationship between any two or more quantities (variables) in which
they vary together over a period; for example, variation in the level of expenditure or
savings with variation in the level of income.

A positive correlation exists where the high values of one variable are associated with the
high values of the other variable(s).

A 'negative correlation' means association of high values of one with the low values of the
other(s)
Karl Pearson Coefficient of
correlation,
The value of r is such that -1 < r < +1. The + and perfect negative fit. Negative values
signs are used for positive indicate a relationship between x and y such
linear correlations and negative that as values for x increase, values
linear correlations, respectively. for y decrease.
Positive correlation: If x and y have a strong No correlation: If there is no linear correlation
positive linear correlation, r is close or a weak linear correlation, r is
to +1. An r value of exactly +1 indicates a close to 0. A value near zero means that there
perfect positive fit. Positive values is a random, nonlinear relationship
indicate a relationship between x and y between the two variables
variables such that as values for x increases, Note that r is a dimensionless quantity; that is,
values for y also increase. it does not depend on the units
Negative correlation: If x and y have a strong employed.
negative linear correlation, r is close If r>0.8 is generally described as strong,
to -1. An r value of exactly -1 indicates a r<than 0.5 is generally described as weak.
Rank correlation
A rank correlation coefficient measures the degree of
similarity between two rankings, and can be used to assess the
significance of the relation between them. For example, two
common nonparametric methods of significance that use rank
correlation are the MannWhitney U test and the Wilcoxon
signed-rank test.
Spearman's rank correlation
coefficient
In statistics, Spearman's rank correlation coefficient or Spearman's rho, named
after Charles Spearman and often denoted by the Greek letter (rho) or as , is a
nonparametric measure of statistical dependence between two variables. It
assesses how well the relationship between two variables can be described using a
monotonic function. If there are no repeated data values, a perfect Spearman
correlation of +1 or 1 occurs when each of the variables is a perfect monotone
function of the other.

Spearman's coefficient, like any correlation calculation, is appropriate for both


continuous and discrete variables, including ordinal variables
Regression equation,
In statistical modeling, regression analysis is a statistical process for estimating the

relationships among variables. It includes many techniques for modeling and analyzing

several variables, when the focus is on the relationship between a dependent variable and

one or more independent variables (or 'predictors'). More specifically, regression analysis

helps one understand how the typical value of the dependent variable (or 'criterion variable')

changes when any one of the independent variables is varied, while the other independent

variables are held fixed.


Regression models

The unknown parameters, denoted as , which may represent a scalar or a vector.


The independent variables, X.
The dependent variable, Y.
In various fields of application, different terminologies are used in place of dependent and
independent variables.
A regression model relates Y to a function of X and .
Y=f(X, )
Method of LEAST SQUARE:
Formula :

Another formula for Slope:


Slope = (NXY - (X)(Y)) / (NX2 - (X)2)
Where
b = The slope of the regression line
a = The intercept point of the regression line and the y axis.
X = Mean of x values
Y = Mean of y values
SDx = Standard Deviation of x
SDy = Standard Deviation of y
Method of deviation from the mean:
Time Series analysis and forecasting

A time series is a sequence of data points that

1) Consists of successive measurements made over a time interval

2) The time interval is continuous

3) The distance in this time interval between any two consecutive data point is the same

4) Each time unit in the time interval has at most one data point

Examples of time series are ocean tides, counts of sunspots, and the daily closing value of
the Dow Jones Industrial Average.
Time Series Analysis
Time series analysis comprises methods for analyzing time series data in
order to extract meaningful statistics and other characteristics of the data.

Time series analysis comprises methods for analyzing


time series data in order to extract meaningful statistics and other
characteristics of the data. Time series forecasting is the use of a model to
predict future values based on previously observed values.
Application of Time Series Analysis
Applications: The usage of time series models is twofold:
Identify the underlying forces and structure that produced the observed data
Fit a model and proceed to forecasting, monitoring or even feedback and feed forward
control.

Time Series Analysis is used for many applications such as:


Economic Forecasting , Sales Forecasting , Budgetary Analysis , Stock Market
Analysis , Yield Projections , Process and Quality Control, Inventory Studies
Workload Projections, Utility Studies , Census Analysis
Unit V
Sampling Concepts and Theory Z-test and T-test for
difference of Means and management F-test
Sampling Concepts
Z-test
T-test difference of Means
F-test

Вам также может понравиться