Вы находитесь на странице: 1из 7

Difference between trendline and actual line is known as residual or error or

unexplained part
logY=a+blogX
1/Y*dY/dX=b/X
b=(dY/Y)/(dX/X)=%Y/%X
%Y=b*%X
In case we get large residuals, we should try using log(LN) for getting a better
correlation and model more appropriate
MSE=Mean Square Error (Mean of squared residuals)
RMSE(Root MSE) gives a comparable idea like SD
Principle of Parsimony (Dont overcomplicate for the sake of complicating)
Within sample forecast is used for checking whether model is accurate.
Out of sample forecast are for predictive purpose

Class 2:
NON-LINEAR RELATIONSHIPS
Real world data is rarely linear
If a researcher was to ignore this consistently under/over-estimated results
ONE solution is to TRANSFORM the data by taking LOGS
This shrinks the scale and MAY transform the data to be more linear
NOTE: In financial literature taking logs and then the difference Rate of
return
In the regression results coefficients of X refers to the %age change in Y for
every 1 %age change of X
One way to compare to different research methods RMSE(Root Mean Square
Error)
RMSEResiduals options in RegressionSquare averageSq Root
MULTICOLLINEARITY
Multicollinearity may give you meaningless results, but gives you very high p
value for t-test and very low p-value for F-test
Two or more x variables are related to each other.
We only want Y to be related to x
Putting x variables that are related to each other gives us meaningless results
o
o

t-test stated that the variables are individually INSIGNIFICANT


F-test stated that as a whole the model was SIGNIFICANT

Make correlation to confirm


Then choose ONE variable that acts as a proxy for all these variables

Causality is checked by Regression


Comparisons is checked by 2 sample, paired sample, ANOVA and ChiSquare

Cluster Analysis
Binary Factor Analysis

Useful if we notice clustering of our data into two groups


We look at a qualitative factor 1 or 0(Does not matter which way this is
formulated)
This essentially splits the data into two groups where we can examine both
causality as well as the comparison between the two groups
Without which we get an average result that matches neither set of the data
points

Whatever we dont consider, is called base quarter.


FACTORS THAT ARE NOT BINARY
Use (n-1) dummy variables
IF using n dummy variables perfect multicollinearityERROR message

Weeks Errors smaller


Hypothesis: Current salary has a stronger relation with weeks
Hypothesis: starting salary was different for various people, which tends to
normalise with Weeks of exp
HAVE WE INCLUDED ALL FACTORS?
Examine the residual plot
Remember to click Residual Plots in regression
WANT-> this plot to be random
Draw two parallel lines, and the dots should fit randomly
If NOT RANDIM and the residuals either

Class 4:
GRETL:
File->Open Data->Import->Select No as input is not time based.
CTRL-AModel->Ordinary least squares
In the new window for model->Grpahs->Residual Plot->Select anyone1 graph
To check heteroskedasticity:
Click on Test->Heteroskedasticity-oenf ->Koenker
If the value is <.05, it is not random

Class 5:
Multivariable regression analysis is used for causality i.e. y=f(x1,x2,x3,....)
Anova is used for relationship between multivariables without causality

1. Comparing two data: Two sample RM


2. When comparing whether number of students in another university is
higher than SPJain: Univariate goodness of fit RM: Chi square test: To do
with frequency
3. Q2 with male and female students count: Bivariate goodness of fit.
4. For our sample with existing research: Univariate RM with SD known(or
unknown)
5. Same sample in a different period of time(Paired sample)
6. If we want to confirm whether, we have used more than required variable:
Multicollinearity
The unexplained part is depending on time. Thus we can infer that time should
be taken into account
It is called a lag, when current data e.g. sales t depends on past data e.g.
Advertismentt-1
Go to test->Autocorrelation. Keep lag order as 12. Check the output and look for
***. *** indicates the lag order.
Its called autocorrelation because it is following itself.
TIME SERIES RM AUTOCORRELATION
Go to Gretl->Import the excel->Use Time-Series->Model as usual-Test
Autocorrelation
St=(...)=At-1+It-1+St-1

Class 6
In case of 3 variable and time series, check the dependent variables
relationship with the independent variable. Then check the residuals and
look for time series dependency in both of them. Then check for the 3 in a
combined regression test. ve sign need not mean that it is uncorrelated.
TIME SERIES RM STATIONARY
When conducting research using times series data, we will
usually find a relationship between two or more factors.
To avoid the problem of SPURIOUS (happens to show sig results
even though not related) regressions we make sure our data is
STATIONARY (always reverts back to long run mean)
If using non-stationary data we convert it to stationary by
looking at CHANGES
This way we are looking at how CHANGE in one variable
influences CHANGE in another
Go to variable->Unit root tests->Augmented Dickey-Fuller Test
Go to Add->First differences of selected variables
Go to variable->Go to Correlogram.
Auto-regression:
Useful if you have no causal factors in your research (either due to not
finding any or due to a time constraint)
Useful if this is simply your research aim
e.g. Salest=......Salest-1
But then what lag?
Use CORRELOGRAM as a tool look for spikes the cross the blue line in
the PACF

Class 7
Double Click on advertisementClcik on edit valuesChange the last
value and press apply.
Go to modelOrdinary Least SquaresShow the model
In the opened model Go to Analysis->Go to Forecasts
How to know whether t or t is to be taken.
Click Straits VariableUnit root testADF TestChange the criterion to
t-statIf asymptotic t-values are not <0.05, it is not stationary Add first
differences of selected variables and repeat the test again.
How to know the lag
Go to variableCorrelogram->Check the most significant lag
Go to ModelTime SeriesARIMAOne of the following two ways
1. Choose Sales and make differences as 1
2. Choose d_Sales and make the differences as 0

ARIMA:
Auto Regressive (PACF)
Integrated (Another word for difference)
Moving Average (ACF)
Schwartz criteria helps us determine the best ARIMA combination.
Steps:
1. Decide whether Time series or cross section
2. Make it Stationary
3. Forecast
4. Add the change to actual values in case change is taken to make it
stationary.

Вам также может понравиться