Академический Документы
Профессиональный Документы
Культура Документы
Data Analytics
Prof. Rudra Pradhan
IIT Kharagpur
Preamble
Why is it?
Course coverage
What is Data Analytics
Analytics is the discovery and communication of meaningful patterns in
data.
Especially valuable in areas rich with recorded information, analytics
relies on the simultaneous application of statistics, econometrics, computer
programming and operations research to quantify performance.
Analytics often favors data visualization to communicate insight.
What is Data Analysis
Is it reliable?
Principles of odelling
Object/ System
hy? hat are
we lookin! for
"ind? hat do we
want to know
#odel
$ariable% &arameters
#odel &rediction
$alid%
Accepted predictions
'est
Basic $nderstandings
Data
Variables
Scaling
odels% S&
'ools% statistics( mathematics( econometrics( operation research
Statistical odeling
athematical odeling
Soft )omputing
odeling Structure
'heory
Assumptions
*b+ectives
)onstraints
*odelling' it shows the relationships, direct and indirect, interrelationships of
actions and reactions in terms of cause and effect.
(wo types' !escriptive and predictive
+oth dynamic and static
()
E,amples of the !ind of problems that
may be solved by an Econometrician
(. 'estin! whether *nancial markets are weak+form
informationally e,cient.
-. 'estin! whether the .A&# or A&' represent superior
models for the determination of returns on risky assets.
/. #easurin! and forecastin! the 0olatility of bond returns.
). 12plainin! the determinants of bond credit ratin!s used
by the ratin!s a!encies.
3. #odellin! lon!+term relationships between prices and
e2chan!e rates
(3
E,amples of the !ind of problems that
may be solved by an Econometrician -cont.d/
4. Determinin! the optimal hed!e ratio for a spot position in
oil.
5. 'estin! technical tradin! rules to determine which makes
the most money.
6. 'estin! the hypothesis that earnin!s or di0idend
announcements ha0e no e7ect on stock prices.
8. 'estin! whether spot or futures markets react more rapidly
to news.
(9."orecastin! the correlation between the returns to the
stock indices of two countries.
(4
Frequency & quantity of data
,toc maret prices are measured every time there is a trade or
somebody posts a new quote.
Quality
-ecorded asset prices are usually those at which the transaction too
place. .o possibility for measurement error but financial data are /noisy0.
What are the Special )haracteristics
of #inancial Data0
(5
'ypes of Data and 1otation
(here are 1 types of data which econometricians might use for analysis'
2. (ime series data
3. Cross)sectional data
1. &anel data, a combination of 2. 4 3.
(he data may be quantitative $e.g. e#change rates, stoc prices, number of
shares outstanding%, or qualitative $e.g. day of the wee%.
"#amples of time series data
Series Frequency
5.& or unemployment monthly, or quarterly
government budget deficit annually
money supply weely
value of a stoc maret inde# as transactions occur
(6
'ypes of Data and 1otation -cont.d/
Examples of Problems that Could be Tackled Usin a Time Series !eression
) How the value of a country6s stoc inde# has varied with that country6s
macroeconomic fundamentals.
) How the value of a company6s stoc price has varied when it announced the
value of its dividend payment.
) (he effect on a country6s currency of an increase in its interest rate
Cross)sectional data are data on one or more variables collected at a single
point in time, e.g.
) A poll of usage of internet stoc broing services
) Cross)section of stoc returns on the .ew 7or ,toc "#change
) A sample of bond credit ratings for 89 bans
(8
'ypes of Data and 1otation -cont.d/
Examples of Problems that Could be Tackled Usin a Cross"Sectional !eression
) (he relationship between company size and the return to investing in its shares
) (he relationship between a country6s 5!& level and the probability that the
government will default on its sovereign debt.
&anel !ata has the dimensions of both time series and cross)sections, e.g. the
daily prices of a number of blue chip stocs over two years.
:t is common to denote each observation by the letter t and the total number of
observations by T for time series data, and to to denote each observation by the
letter i and the total number of observations by # for cross)sectional data.
-9
:t is preferable not to wor directly with asset prices, so we usually convert the
raw prices into a series of returns. (here are two ways to do this'
,imple returns or log returns
where, !
t
denotes the return at time t
p
t
denotes the asset price at time t
ln denotes the natural logarithm
We also ignore any dividend payments, or alternatively assume that the price
series have been already ad;usted to account for them.
Returns in #inancial odelling
< 2==
2
2
t
t t
t
p
p p
!
< 2== ln
2
=
t
t
t
p
p
!
-(
(he returns are also nown as log price relatives, which will be used throughout this
boo. (here are a number of reasons for this'
2. (hey have the nice property that they can be interpreted as continuously
compounded returns.
3. Can add them up, e.g. if we want a weely return and we have calculated
daily log returns'
r
2
> ln p
2
?p
=
> ln p
2
) ln p
=
r
3
> ln p
3
?p
2
> ln p
3
) ln p
2
r
1
> ln p
1
?p
3
> ln p
1
) ln p
3
r
@
> ln p
@
?p
1
> ln p
@
) ln p
1
r
A
> ln p
A
?p
@
> ln p
A
) ln p
@
ln p
A
) ln p
=
> ln p
A
?p
=
2og Returns
--
(here is a disadvantage of using the log)returns. (he simple return on a
portfolio of assets is a weighted average of the simple returns on the
individual assets'
+ut this does not wor for the continuously compounded returns.
A Disadvantage of using 2og Returns
! $ !
pt ip it
i
#
=
=
2
-/
Steps involved in the formulation of
econometric models
"conomic or Binancial (heory $&revious ,tudies%
Bormulation of an "stimable (heoretical *odel
Collection of !ata
*odel "stimation
:s the *odel ,tatistically Adequate?
.o 7es
-eformulate *odel :nterpret *odel
8se for Analysis
-)
2. !oes the paper involve the development of a theoretical model or is it
merely a technique looing for an application, or an e#ercise in data
mining?
3. :s the data of /good quality0? :s it from a reliable source? :s the size of
the sample sufficiently large for asymptotic theory to be invoed?
1. Have the techniques been validly applied? Have diagnostic tests for
violations of been conducted for any assumptions made in the
estimation
of the model?
Some Points to )onsider 3hen reading papers
in the academic finance literature
-3
@. Have the results been interpreted sensibly? :s the strength of the results
e#aggerated? !o the results actually address the questions posed by the
authors?
A. Are the conclusions drawn appropriate given the results, or has the
importance of the results of the paper been overstated?
Some Points to )onsider 3hen reading papers
in the academic finance literature -cont.d/
*b+ectives of Data Analytics
Data reduction
Structural simplification
Analysis of dependence
Analysis of interdependence
Prediction& #orecasting
4ypotheses construction and testing
Strategy and policy implications
)ourse odules
odule 5% Basic Applied Econometrics
+asics, probability distribution, regression analysis, issues and problems of
regression analysis
odule 6% Advanced Econometrics
C--*, &!*, ,"*
odule 7% 'ime series Econometrics
:ntegration and co)integration, DA- modelling, volatility modelling,
bootstrapping
odule 8% *ptimi"ation 'ools
,imple E&&, :nteger programming, 5oal programming, ,imulation, AH&, WE&
odule 9% Soft computing
A.., BE, 5A, ,D*
odelling Structure
$nivariate structure
Central tendency, dispersion, sewness, urtosis
Bivariate structure
Covariance, correlation, regression
ultivarate structure
Correlation, regression, factor analysis, con;oint analysis, cluster analysis, path
analysis, *!,, AH&, ,"*
Statistical Modelling: A Basic
Fraewor!
Object/ System
:esearch Desi!n/
.hoice/ .reati0ity
;ni0ariate
#odellin!
#ulti0ariate
#odellin!
Data Analysis
Interpretation and
.onclusion
<i0ariate
#odellin!
Research Process
Step ": #e$ne Research Pro%le
Step &: Re'iew of (iterature
)Re'iew concepts and theories*
Re'iew pre'ious research $nding+
Step ,: Forulate -.potheses
Step /: Research #esign
Step 0: #ata 1ollection
Step 2: #ata Anal.sis
Step 3: Interpretation
Soft commuting% Basics
Soft computing is a term applied to a field within computer science which
is characterized by the use of ine#act solutions to computationally hard
tass such as the solution of non)deterministic polynomial $.&%) complete
problems, for which there is no nown algorithm that can compute an
e#act solution in polynomial time.
Soft computing differs from conventional $hard% computing in that, unlie
hard computing, it is tolerant of imprecision, uncertainty, partial truth, and
appro#imation. :n effect, the role model for soft computing is the human
mind.
'ools of Soft )omputing
Artificial neural networs $A..%
,upport Dector *achines $,D*%
Buzzy logic $BE%
"volutionary computation $"C%, including'
"volutionary algorithms
5enetic algorithms
!ifferential evolution
*etaheuristic and ,warm :ntelligence
Ant colony optimization
&article swarm optimization
:deas about probability including'
+ayesian networ
Chaos theory
Wavelet analysis
$ni:variate Statistics
.entral 'endency
Dispersion
Skewness
=urtosis
<i+0ariate Statistics
.o0ariance
.orrelation
Why ultivariate odelling
Applicability% )lient fields use these techni;ues
<uantification% )reate the habit of loo!ing at the strength of a
relationship( not +ust the significance=
)reativity% a!e introductory statistics give techni;ues that let
students e,press their o3n interests=
Siulation:
1on$dence inter'als 'ia %ootstrapping*
h.pothesis testing 'ia randoi5ation of
e6planator. 'aria%les.
7eoetr.:
Regression as pro8ection* A9:;A as
P.thagorean 'ector decoposition* p<
'alues fro su%tended angles.
Data #odellin! and &acka!ed
Software
SPSS
=;I=>S
MI1R:FIT
7A?SS
(IM#=P
MAT(AB
AM:S
MI9ITAB
STATISTI1A
RATS
S@STAT
STATA
(IS=RA(
SAS
TSP
S-AAAM
#=A