Statistics Assignment

STATISTICS ASSIGNMENT
Done by:
Albert Benjamin
1473
Sem IV
Q.1(a). Define the term correlation analysis. Explain briefly the types of correlation.
Correlation is a term that describes the strength of a relationship in between 2 variables. A
strong, or high, correlation suggests that 2 or more variables have a strong relationship with
each other, while a weak, or low, correlation indicates that the variables are barely associated.
Correlation analysis is the procedure of studying the strength of that relationship with readily
available analytical information.
Correlation is an analytical method that can prove whether and how highly sets of variables
are associated. Individuals of the exact same height differ in weight, and you can quickly
believe that 2 individuals understand where the much shorter one is much heavier than the
taller one. Correlation can inform you simply how much of the variation in individuals’
weights is related to their heights.
This correlation is relatively apparent your information might consist of unsuspected

connections. You might likewise think there are connections, however have no idea which
are the greatest. A smart correlation analysis can result in a higher understanding of your
information.
When the variation of one variable dependably anticipates a comparable variation in another
variable, there’s typically a propensity to believe that suggests that the modification in one
triggers the modification in the other. Correlation does not indicate causation. There might
be, for example, an unidentified aspect that affects both variables.
For example: A number of research studies report a favorable correlation in between the
quantity of TV kids enjoy and the probability that they will end up being bullies. The
research studies just report a correlation, not causation.
Correlation analysis offers the association in between 2 or more variables. Correlation

analysis tries to identify the degree of relationship between variables.
It is frequently misinterpreted that correlation analysis identifies domino effect; nevertheless,

this is not the case since other variables that are not present in the research study might have
influenced the outcomes.
If correlation is discovered in between 2 variables it suggests that when there is a methodical

modification in one variable, there is likewise an organized modification in the other; the
variables change together over a particular time period. If there is correlation discovered,
relying on the mathematical values determined, this can be either unfavorable or favorable.
– Positive correlation exists if one variable boosts all at once with the other, i.e. the high
mathematical values of one variable associate with the high mathematical values of the other.
– Negative correlation exists if one variable reduces when the other boosts, i.e. the high
mathematical values of one variable associate with the low mathematical values of the other.
You can utilize analytical software applications like SPSS to identify whether a relationship
in between 2 variables exists, and how strong it may be. The analytical procedure will
produce a correlation coefficient that informs you this info.
The most commonly utilized kind of correlation coefficient is the Pearson r. This analysis
presumes that the 2 variables being examined are determined on a minimum of interval
scales, indicating they are determined on a variety of increasing values. The coefficient is
computed by taking the covariance of the 2 variables and dividing it by the item of their
conventional variances.
Statistical analyses like these work due to the fact that they can reveal us how various
patterns or patterns within society may be linked, like joblessness and criminal offense, for
instance; and they can clarify how experiences and social attributes form exactly what occurs
in an individual’s life. Correlation analysis lets us state with self-confidence that a
relationship does or does not exist in between 2 various patterns or variables, which permits
us to forecast the most likely of a result amongst the population studied.
It’s crucial to remember though that correlation is not like causation. Stay tuned for a
conversation of the distinction in between the 2.
Correlation can be discussed as a single number which explains the degree of relationship in
between 2 variables. The relationship in between these 2 variables is explained through a
single value, which is the coefficient.
“In the correlation in between class seat area and course grades, it might be that
– sitting in front (Variable A) triggers trainees to obtain much better grades (Variable B),.
– improving grades (Variable B) triggers trainees to sit in front (Variable A), or.
– being more determined (an unmeasured 3rd variable, C) triggers trainees both to sit in front
(Variable A) and improve grades (Variable B).”
The bottom line is that it is difficult simply from a correlation analysis to identify exactly
what triggers exactly what. Since a correlation exists in between them, you do not understand
the cause and result relationship in between 2 variables merely. You will have to do more
analysis (such as developed experiments) to specify the domino effect relationship.
Correlation analysis is utilized mainly as an information expedition method to expose the

degree of association in a set of matched information. In much information, pairwise
connections might not supply adequate insights, and multivariate exploratory analyses are
advised.
Correlation analysis adds to the understanding of financial habits, helps in finding the
seriously essential variables on which others depend, might expose to the economic expert
the connection by which disruptions spread out and recommend to him the courses through
which supporting forces might end up being reliable.
In company, correlation analysis makes it possible for the executive to approximate expenses,
list prices and other variables on the basis of some other series with which these expenses,
costs or sales might be functionally associated. A few of the uncertainties can be eliminated
from choices when the relationship between a variable to be approximated and the several
other variables on which it depends are fairly invariant and close.
Types of correlation
Correlation analysis is about observing the interaction of different securities and markets.
Here, 2 custom-made indications are utilized to highlight inter-market relationships.
1. Positive and Negative Correlation: Whether the correlation between the variables is
positive or negative depends on its direction of change. The correlation is positive
when both the variables move in the same direction, i.e. when one variable increases
the other on an average also increases and if one variable decreases the other also
decreases.The correlation is said to be negative when both the variables move in the
opposite direction, i.e. when one variable increases the other decreases and vice versa.
2. Simple, Partial and Multiple Correlation: Whether the correlation is simple, partial or
multiple depends on the number of variables studied. The correlation is said to be
simple when only two variables are studied.The correlation is either multiple or
partial when three or more variables are studied. The correlation is said to be Multiple
when three variables are studied simultaneously. Such as, if we want to study the
relationship between the yield of wheat per acre and the amount of fertilizers and
rainfall used, then it is a problem of multiple correlations.
Whereas, in the case of a partial correlation we study more than two variables, but
consider only two among them that would be influencing each other such that the
effect of the other influencing variable is kept constant. Such as, in the above
example, if we study the relationship between the yield and fertilizers used during the
periods when a certain average temperature existed, then it is a problem of partial
correlation.
3. Linear and Nonlinear (Curvilinear) Correlation: Whether the correlation between the
variables is linear or non-linear depends on the constancy of ratio of change between
the variables. The correlation is said to be linear when the amount of change in one
variable to the amount of change in another variable tends to bear a constant ratio. For
example, from the values of two variables given below, it is clear that the ratio of
change between the variables is the same:
X: 10 20 30 40 50
Y: 20 40 60 80 100
The correlation is called as non-linear or curvilinear when the amount of change in

one variable does not bear a constant ratio to the amount of change in the other
variable. For example, if the amount of fertilizers is doubled the yield of wheat would
not necessarily be doubled.
Thus, these are three most important types of correlation classified on the basis of movement,
number and the ratio of change between the variables. The researcher must study these
carefully to determine the correlation methods to be used to identify the extent to which the
variables are correlated.
(b). What is coefficient of correlation? Discuss the properties of coefficient of Correlation

Ans: Correlation coefficients are used in statistics to measure how strong a relationship is
between two variables. There are several types of correlation coefficient: Pearson’s
correlation (also called Pearson’s R) is a correlation coefficient commonly used in linear
regression. If you’re starting out in statistics, you’ll probably learn about Pearson’s R first. In
fact, when anyone refers to the correlation coefficient, they are usually talking about
Pearson’s.
Correlation Coefficient Formula: Definition
Correlation coefficient formulas are used to find how strong a relationship is between data.
The formulas return a value between -1 and 1, where:
● 1 indicates a strong positive relationship.

● -1 indicates a strong negative relationship.
● A result of zero indicates no relationship at all.
Meaning
● A correlation coefficient of 1 means that for every positive increase in one variable,
there is a positive increase of a fixed proportion in the other. For example, shoe sizes
go up in (almost) perfect correlation with foot length.
● A correlation coefficient of -1 means that for every positive increase in one variable,
there is a negative decrease of a fixed proportion in the other. For example, the
amount of gas in a tank decreases in (almost) perfect correlation with speed.
● Zero means that for every increase, there isn’t a positive or negative increase. The two
just aren’t related.
The absolute value of the correlation coefficient gives us the relationship strength. The larger
the number, the stronger the relationship. For example, |-.75| = .75, which has a stronger
relationship than .65.
Correlation coefficient formula
Pearson’s correlation coefficient formula

Two other formulas are commonly used: the sample correlation coefficient and the
population correlation coefficient.
Sample correlation coefficient
Sx and sy are the sample standard deviations, and sxy is the sample covariance.
Population correlation coefficient
The population correlation coefficient uses σx and σy as the population standard deviations,
and σxy as the population covariance.
What is Pearson Correlation?
Correlation between sets of data is a measure of how well they are related. The most common
measure of correlation in stats is the Pearson Correlation. The full name is the Pearson
Product Moment Correlation (PPMC). It shows the linear relationship between two sets of
data.
Potential problems with Pearson correlation.
The PPMC is not able to tell the difference between dependent variables and independent
variables. For example, if you are trying to find the correlation between a high calorie diet
and diabetes, you might find a high correlation of .8. However, you could also get the same
result with the variables switched around. In other words, you could say that diabetes causes
a high calorie diet. That obviously makes no sense. Therefore, as a researcher you have to be
aware of the data you are plugging in. In addition, the PPMC will not give you any
information about the slope of the line; it only tells you whether there is a relationship.
Real Life Example

Pearson correlation is used in thousands of real life situations. For example, scientists in
China wanted to know if there was a relationship between how weedy rice populations are
different genetically. The goal was to find out the evolutionary potential of the rice. Pearson’s
correlation between the two groups was analyzed. It showed a positive Pearson Product
Moment correlation of between 0.783 and 0.895 for weedy rice populations. This figure is
quite high, which suggested a fairly strong relationship.
Meaning of the Linear Correlation Coefficient.
Pearson’s Correlation Coefficient is a linear correlation coefficient that returns a value of

between -1 and +1. A -1 means there is a strong negative correlation and +1 means that there
is a strong positive correlation. A 0 means that there is no correlation (this is also called zero
correlation).
The images show that a strong negative correlation means that the graph has a downward
slope from left to right: as the x-values increase, the y-values get smaller. A strong positive
correlation means that the graph has an upward slope from left to right: as the x-values
increase, the y-values get larger.
Properties of the Coefficient of Correlation
● The correlation coefficient is symmetrical with respect to X and Y, i.e. rXY=rYX

● The correlation coefficient is the geometric mean of the two regression coefficients r
= √ ❑or r = √ ❑
● The correlation coefficient is independent of origin and unit of measurement, i.e.
rXY=rUV.
● The correlation coefficient lies between –1–1 and +1+1. i.e. –1⩽r⩽+1–1⩽r⩽+1.
2. What you meant by Consumer Price Index or Cost of Living Index? Explain
briefly the construction, utility and limitations.

Ans: A comprehensive measure used for estimation of price changes in a basket of goods and
services representative of consumption expenditure in an economy is called consumer price
index.
Description: The calculation involved in the estimation of CPI is quite rigorous. Various
categories and subcategories have been made for classifying consumption items and on the
basis of consumer categories like urban or rural. Based on these indices and sub indices
obtained, the final overall index of price is calculated mostly by national statistical agencies.
It is one of the most important statistics for an economy and is generally based on the
weighted average of the prices of commodities. It gives an idea of the cost of living.
Inflation is measured using CPI. The percentage change in this index over a period of time
gives the amount of inflation over that specific period, i.e. the increase in prices of a
representative basket of goods consumed.
Usage
CPI is widely used as an economic indicator. It is the most widely used measure of inflation
and, by proxy, of the effectiveness of the government’s economic policy. The CPI gives the
government, businesses, and citizens an idea about prices changes in the economy, and can
act as a guide in order to make informed decisions about the economy.
The CPI and the components that make it up can also be used as a deflator for other economic
factors, including retail sales, hourly/weekly earnings and the value of a consumer’s dollar to
find its purchasing power. In this case, the dollar’s purchasing power declines when prices
increase.
The index can also be used to adjust people’s eligibility levels for certain types of
government assistance including Social Security and it automatically provides the cost-of-
living wage adjustments to domestic workers. According to the BLS, the cost-of-living
adjustments of more than 50 million people on Social Security, as well as military and
Federal Civil Services retirees are linked to the CPI.
Calculating the CPI for a single item
cost of market basket∈given year

CPI= ×100
cost of market basket∈base year
Or
CPI 2 price 2
=
CPI 1 price 1
Where 1 is usually the comparison year and CPI1 is usually an index of 100.
Calculating the CPI for multiple items
Many but not all price indices are weighted averages using weights that sum to 1 or 100.
Example: The prices of 85,000 items from 22,000 stores, and 35,000 rental units are added
together and averaged. They are weighted this way: Housing: 41.4%, Food and Beverage:
17.4%, Transport: 17.0%, Medical Care: 6.9%, Other: 6.9%, Apparel: 6.0%, Entertainment:
4.4%. Taxes (43%) are not included in CPI computation
∑∋¿ 1 CPIi=weight i
CPI=
∑∋¿ 1 weight i
where the weighti terms do not necessarily sum to 1 or 100.
Limitations of CPI
Even though the consumer price index is the most common measure of inflation, it is
generally believed that CPI overstated inflation by roughly 1 percentage point. This upward
bias exists because:
● CPI doesn't incorporate the substitution effect into composition of the basket of

goods. For example, when price of a good increases, consumer substitute away from it
but the CPI doesn't include any mechanism to specifically reflect this.
● Even though the BLS attempts to address changes in quality, some quality changes
such as improvement in safety, etc. are not quantifiable. Hence, CPI doesn't take into
consideration such quality changes.
● Although new products generally have better quality, these goods are not included in
the CPI basket of goods immediately but are included only when they are consumed
by people fairly consistently. Another measure of inflation, the GDP deflator includes
the effect of price changes of all person consumption expenditures (excluding
imports).
● Many components of CPI such as food and energy are highly volatile which makes
CPI a not so good measure of long-term inflation. Core inflation addresses this
problem by considering only non-volatile items.
3. Explain briefly time series analysis and its components. Discuss its uses
and limitations.
Ans: A time series is a series of data points indexed (or listed or graphed) in time order. Most
commonly, a time series is a sequence taken at successive equally spaced points in time. Thus
it is a sequence of discrete-time data. Examples of time series are heights of ocean tides,
counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.
Time series are very frequently plotted via line charts. Time series are used in statistics,
signal processing, pattern recognition, econometrics, mathematical finance, weather
forecasting, earthquake prediction, electroencephalography, control engineering, astronomy,
communications engineering, and largely in any domain of applied science and engineering
which involves temporal measurements.
Time series analysis comprises methods for analyzing time series data in order to extract
meaningful statistics and other characteristics of the data. Time series forecasting is the use of
a model to predict future values based on previously observed values. While regression
analysis is often employed in such a way as to test theories that the current values of one or
more independent time series affect the current value of another time series, this type of
analysis of time series is not called "time series analysis", which focuses on comparing values
of a single time series or multiple dependent time series at different points in time.
Interrupted time series analysis is the analysis of interventions on a single time series.
Time series data have a natural temporal ordering. This makes time series analysis distinct
from cross-sectional studies, in which there is no natural ordering of the observations (e.g.
explaining people's wages by reference to their respective education levels, where the
individuals' data could be entered in any order). Time series analysis is also distinct from
spatial data analysis where the observations typically relate to geographical locations (e.g.
accounting for house prices by the location as well as the intrinsic characteristics of the
houses). A stochastic model for a time series will generally reflect the fact that observations
close together in time will be more closely related than observations further apart. In addition,
time series models will often make use of the natural one-way ordering of time so that values
for a given period will be expressed as deriving in some way from past values, rather than
from future values (see time reversibility.)
Time series analysis can be applied to real-valued, continuous data, discrete numeric data, or
discrete symbolic data (i.e. sequences of characters, such as letters and words in the English
language).
Methods for analysis
Methods for time series analysis may be divided into two classes: frequency-domain methods
and time-domain methods. The former include spectral analysis and wavelet analysis; the
latter include auto-correlation and cross-correlation analysis. In the time domain, correlation
and analysis can be made in a filter-like manner using scaled correlation, thereby mitigating
the need to operate in the frequency domain.
Additionally, time series analysis techniques may be divided into parametric and non-
parametric methods. The parametric approaches assume that the underlying stationary
stochastic process has a certain structure which can be described using a small number of
parameters (for example, using an autoregressive or moving average model). In these
approaches, the task is to estimate the parameters of the model that describes the stochastic
process. By contrast, non-parametric approaches explicitly estimate the covariance or the
spectrum of the process without assuming that the process has any particular structure.
Methods of time series analysis may also be divided into linear and non-linear, and univariate
and multivariate.
Panel data
A time series is one type of panel data. Panel data is the general class, a multidimensional
data set, whereas a time series data set is a one-dimensional panel (as is a cross-sectional
dataset). A data set may exhibit characteristics of both panel data and time series data. One
way to tell is to ask what makes one data record unique from the other records. If the answer
is the time data field, then this is a time series data set candidate. If determining a unique
record requires a time data field and an additional identifier which is unrelated to time
(student ID, stock symbol, country code), then it is panel data candidate. If the differentiation
lies on the non-time identifier, then the data set is a cross-sectional data set candidate.
Analysis
There are several types of motivation and data analysis available for time series which are
appropriate for different purposes and etc.
Motivation
In the context of statistics, econometrics, quantitative finance, seismology, meteorology, and

geophysics the primary goal of time series analysis is forecasting. In the context of signal
processing, control engineering and communication engineering it is used for signal detection
and estimation. In the context of data mining, pattern recognition and machine learning time
series analysis can be used for clustering, classification, query by content, anomaly detection
as well as forecasting.
Exploratory analysis
The clearest way to examine a regular time series manually is with a line chart such as the
one shown for tuberculosis in the United States, made with a spreadsheet program. The
number of cases was standardized to a rate per 100,000 and the percent change per year in
this rate was calculated. The nearly steadily dropping line shows that the TB incidence was
decreasing in most years, but the percent change in this rate varied by as much as +/- 10%,
with 'surges' in 1975 and around the early 1990s. The use of both vertical axes allows the
comparison of two time series in one graphic.
Other techniques include:
● Autocorrelation analysis to examine serial dependence
● Spectral analysis to examine cyclic behaviour which need not be related

to seasonality. For example, sun spot activity varies over 11 year cycles. Other
common examples include celestial phenomena, weather patterns, neural activity,
commodity prices, and economic activity.
● Separation into components representing trend, seasonality, slow and fast variation,
and cyclical irregularity: see trend estimation and decomposition of time series
Curve fitting
Curve fitting is the process of constructing a curve, or mathematical function, that has the
best fit to a series of data points, possibly subject to constraints. Curve fitting can involve
either interpolation, where an exact fit to the data is required, or smoothing, in which a
"smooth" function is constructed that approximately fits the data. A related topic is regression
analysis, which focuses more on questions of statistical inference such as how much
uncertainty is present in a curve that is fit to data observed with random errors. Fitted curves
can be used as an aid for data visualization, to infer values of a function where no data are
available, and to summarize the relationships among two or more variables. Extrapolation
refers to the use of a fitted curve beyond the range of the observed data, and is subject to a
degree of uncertainty since it may reflect the method used to construct the curve as much as it
reflects the observed data.
The construction of economic time series involves the estimation of some components for
some dates by interpolation between values ("benchmarks") for earlier and later dates.
Interpolation is estimation of an unknown quantity between two known quantities (historical
data), or drawing conclusions about missing information from the available information
("reading between the lines"). Interpolation is useful where the data surrounding the missing
data is available and its trend, seasonality, and longer-term cycles are known. This is often
done by using a related series known for all relevant dates. Alternatively polynomial
interpolation or spline interpolation is used where piecewise polynomial functions are fit into
time intervals such that they fit smoothly together. A different problem which is closely
related to interpolation is the approximation of a complicated function by a simple function
(also called regression).The main difference between regression and interpolation is that
polynomial regression gives a single polynomial that models the entire data set. Spline
interpolation, however, yield a piecewise continuous function composed of many
polynomials to model the data set.
Extrapolation is the process of estimating, beyond the original observation range, the value of
a variable on the basis of its relationship with another variable. It is similar to interpolation,
which produces estimates between known observations, but extrapolation is subject to greater
uncertainty and a higher risk of producing meaningless results.
Function approximation
In general, a function approximation problem asks us to select a function among a well-
defined class that closely matches ("approximates") a target function in a task-specific way.
One can distinguish two major classes of function approximation problems: First, for known
target functions approximation theory is the branch of numerical analysis that investigates
how certain known functions (for example, special functions) can be approximated by a
specific class of functions (for example, polynomials or rational functions) that often have
desirable properties (inexpensive computation, continuity, integral and limit values, etc.).
Second, the target function, call it g, may be unknown; instead of an explicit formula, only a
set of points (a time series) of the form (x, g(x)) is provided. Depending on the structure of
the domain and codomain of g, several techniques for approximating g may be applicable.
For example, if g is an operation on the real numbers, techniques of interpolation,
extrapolation, regression analysis, and curve fitting can be used. If the codomain (range or
target set) of g is a finite set, one is dealing with a classification problem instead. A related
problem of online time series approximation is to summarize the data in one-pass and
construct an approximate representation that can support a variety of time series queries with
bounds on worst-case error.
To some extent the different problems (regression, classification, fitness approximation) have
received a unified treatment in statistical learning theory, where they are viewed as
supervised learning problems.
Prediction and forecasting
In statistics, prediction is a part of statistical inference. One particular approach to such

inference is known as predictive inference, but the prediction can be undertaken within any of
the several approaches to statistical inference. Indeed, one description of statistics is that it
provides a means of transferring knowledge about a sample of a population to the whole
population, and to other related populations, which is not necessarily the same as prediction
over time. When information is transferred across time, often to specific points in time, the
process is known as forecasting.
Fully formed statistical models for stochastic simulation purposes, so as to generate

alternative versions of the time series, representing what might happen over non-specific
time-periods in the future
Simple or fully formed statistical models to describe the likely outcome of the time series in
the immediate future, given knowledge of the most recent outcomes (forecasting).
Forecasting on time series is usually done using automated statistical software packages and
programming languages, such as Apache Spark, Julia, Python, R, SAS, SPSS and many
others.
Forecasting on large scale data is done using Spark which has spark-ts as a third party
package.
Classification
Assigning time series pattern to a specific category, for example identify a word based on
series of hand movements in sign language.
Signal estimation
This approach is based on harmonic analysis and filtering of signals in the frequency domain
using the Fourier transform, and spectral density estimation, the development of which was
significantly accelerated during World War II by mathematician Norbert Wiener, electrical
engineers Rudolf E. Kálmán, Dennis Gabor and others for filtering signals from noise and
predicting signal values at a certain point in time. See Kalman filter, Estimation theory, and
Digital signal processing
Segmentation
Splitting a time-series into a sequence of segments. It is often the case that a time-series can
be represented as a sequence of individual segments, each with its own characteristic
properties. For example, the audio signal from a conference call can be partitioned into pieces
corresponding to the times during which each person was speaking. In time-series
segmentation, the goal is to identify the segment boundary points in the time-series, and to
characterize the dynamical properties associated with each segment. One can approach this
problem using change-point detection, or by modeling the time-series as a more sophisticated
system, such as a Markov jump linear system.
Uses of Time Series
● The most important use of studying time series is that it helps us to predict the future
behaviour of the variable based on past experience
● It is helpful for business planning as it helps in comparing the actual current
performance with the expected one
● From time series, we get to study the past behaviour of the phenomenon or the
variable under consideration
● We can compare the changes in the values of different variables at different times or
places, etc.
Components for Time Series Analysis
The various reasons or the forces which affect the values of an observation in a time series
are the components of a time series. The four categories of the components of time series are:
● Trend
● Seasonal Variations
● Cyclic Variations
● Random or Irregular movements
Seasonal and Cyclic Variations are the periodic changes or short-term fluctuations.
Trend
The trend shows the general tendency of the data to increase or decrease during a long
period of time. A trend is a smooth, general, long-term, average tendency. It is not always
necessary that the increase or decrease is in the same direction throughout the given
period of time.
It is observable that the tendencies may increase, decrease or are stable in different
sections of time. But the overall trend must be upward, downward or stable. The
population, agricultural production, items manufactured, number of births and deaths,
number of industry or any factory, number of schools or colleges are some of its example
showing some kind of tendencies of movement.
Linear and Non-Linear Trend

If we plot the time series values on a graph in accordance with time t. The pattern of the
data clustering shows the type of trend. If the set of data cluster more or less round a
straight line, then the trend is linear otherwise it is non-linear (Curvilinear).
Periodic Fluctuations
There are some components in a time series which tend to repeat themselves over a
certain period of time. They act in a regular spasmodic manner.
Seasonal Variations
These are the rhythmic forces which operate in a regular and periodic manner over a span
of less than a year. They have the same or almost the same pattern during a period of 12
months. This variation will be present in a time series if the data are recorded hourly,
daily, weekly, quarterly, or monthly.
These variations come into play either because of the natural forces or man-made
conventions. The various seasons or climatic conditions play an important role in
seasonal variations. Such as production of crops depends on seasons, the sale of umbrella
and raincoats in the rainy season, and the sale of electric fans and A.C. shoots up in
summer seasons.
The effect of man-made conventions such as some festivals, customs, habits, fashions,
and some occasions like marriage is easily noticeable. They recur themselves year after
year. An upswing in a season should not be taken as an indicator of better business
conditions.
Cyclic Variations
The variations in a time series which operate themselves over a span of more than one
year are the cyclic variations. This oscillatory movement has a period of oscillation of
more than a year. One complete period is a cycle. This cyclic movement is sometimes
called the ‘Business Cycle’
It is a four-phase cycle comprising of the phases of prosperity, recession, depression, and

recovery. The cyclic variation may be regular are not periodic. The upswings and the
downswings in business depend upon the joint nature of the economic forces and the
interaction between them.
Random or Irregular Movements

There is another factor which causes the variation in the variable under study. They are
not regular variations and are purely random or irregular. These fluctuations are
unforeseen, uncontrollable, unpredictable, and are erratic. These forces are earthquakes,
wars, flood, famines, and any other disasters.
Mathematical Model for Time Series Analysis
yt = f (t)
Here, yt is the value of the variable under study at time t. If the population is the variable
under study at the various time period t1, t2, t3, … , tn. Then the time series is
t: t1, t2, t3, … , tn
yt: yt1, yt2, yt3, …, ytn
or, t: t1, t2, t3, … , tn
yt: y1, y2, y3, … , yn
Additive Model for Time Series Analysis
If yt is the time series value at time t. Tt, St, Ct, and Rt are the trend value, seasonal,
cyclic and random fluctuations at time t respectively. According to the Additive Model, a
time series can be expressed as
yt = Tt + St + Ct + Rt.
This model assumes that all four components of the time series act independently of each
other.
Multiplicative Model for Time Series Analysis
The multiplicative model assumes that the various components in a time series operate
proportionately to each other. According to this model
yt = Tt × St × Ct × Rt
Mixed models
Different assumptions lead to different combinations of additive and multiplicative
models as
yt = Tt + St + Ct Rt.
The time series analysis can also be done using the model yt = Tt + St × Ct × Rt or yt = Tt
× Ct + St × Rt etc.

Statistics Assignment

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Statistics Assignment

Загружено:

Авторское право:

Доступные форматы

STATISTICS ASSIGNMENT

This correlation is relatively apparent your information might consist of unsuspected

Correlation analysis offers the association in between 2 or more variables. Correlation

It is frequently misinterpreted that correlation analysis identifies domino effect; nevertheless,

If correlation is discovered in between 2 variables it suggests that when there is a methodical

Correlation analysis is utilized mainly as an information expedition method to expose the

The correlation is called as non-linear or curvilinear when the amount of change in

(b). What is coefficient of correlation? Discuss the properties of coefficient of Correlation

Correlation Coefficient Formula: Definition

● 1 indicates a strong positive relationship.

Correlation coefficient formula

Pearson’s correlation coefficient formula

Sample correlation coefficient

Population correlation coefficient

What is Pearson Correlation?

Potential problems with Pearson correlation.

Real Life Example

Meaning of the Linear Correlation Coefficient.

Pearson’s Correlation Coefficient is a linear correlation coefficient that returns a value of

Properties of the Coefficient of Correlation

● The correlation coefficient is symmetrical with respect to X and Y, i.e. rXY=rYX

briefly the construction, utility and limitations.

Calculating the CPI for a single item

cost of market basket∈given year

Calculating the CPI for multiple items

where the weighti terms do not necessarily sum to 1 or 100.

● CPI doesn't incorporate the substitution effect into composition of the basket of

Methods for analysis

In the context of statistics, econometrics, quantitative finance, seismology, meteorology, and

Other techniques include:

● Autocorrelation analysis to examine serial dependence

● Spectral analysis to examine cyclic behaviour which need not be related

Prediction and forecasting

In statistics, prediction is a part of statistical inference. One particular approach to such

Fully formed statistical models for stochastic simulation purposes, so as to generate

Uses of Time Series

Components for Time Series Analysis

● Random or Irregular movements

Linear and Non-Linear Trend

It is a four-phase cycle comprising of the phases of prosperity, recession, depression, and

Random or Irregular Movements

Mathematical Model for Time Series Analysis

t: t1, t2, t3, … , tn

yt: yt1, yt2, yt3, …, ytn

or, t: t1, t2, t3, … , tn

yt: y1, y2, y3, … , yn

Additive Model for Time Series Analysis

Multiplicative Model for Time Series Analysis

Вам также может понравиться