Вы находитесь на странице: 1из 5

National University of Modern Languages

Lahore Campus
Topic:
Regression Analysis
Subject:
Multivariate & Data Analysis
Submitted To:
Muhammad Shoaib
Submitted by:
Muhammad Ahmad
Roll Number:
L-21127
Class (shift):
MBA-VI (M)
Topic: Regression Analysis and their types

Definition:
Regression analysis is most widely used statistical technique for investigating or estimating the
relationship between dependent and a set of independent explanatory variables. It is also used as a
blanket term for a variety of data analysis techniques that are utilized in a qualitative research

Types of Regression Analysis:

Linear Regression Analysis

It is one of the most widely known modeling techniques, as it is amongst the first elite regression
analysis methods picked up by people at the time of learning predictive modeling. Here, the dependent
variable is continuous and independent variable is more often continuous or discreet with a linear
regression line. In a multiple linear regression there is more than one independent variable and in a
simple linear regression, there is only one independent variable. Thus, linear regression is best to be
used only when there is a linear relationship between the independent and a dependent variable.

Logistic Regression Analysis

Logistic regression is commonly used to determine the probability of event=Success and event=Failure.
Whenever the dependent variable is binary like 0/1, True/False, Yes/No logistic regression is used. Thus,
it can be said that logistic regression is used to analyze either the close-ended questions in a survey or
the questions demanding numeric response in a survey. Logistic regression does not need a linear
relationship between a dependent and an independent variable just like linear regression. The logistic
regression applies a non-linear log transformation for predicting the odds’ ratio; therefore, it easily
handles various types of relationships between a dependent and an independent variable.

Polynomial Regression Analysis

Polynomial regression is commonly used to analyze the curvilinear data and this happens when the
power of an independent variable is more than 1. In this regression analysis method, the best fit line is
never a ‘straight-line’ but always a ‘curve line’ fitting into the data points. Polynomial regression is better
to be used when few of the variables have exponents and few do not have any. Additionally, it can
model non-linearly separable data offering the liberty to choose the exact exponent for each variable
and that too with full control over the modeling features available.

Stepwise Regression Analysis

This is a semi-automated process with which a statistical model is built either by adding or removing the
variables that are dependent on the t-statistics of their estimated coefficients. If used properly, the
stepwise regression will provide you with more powerful data at your fingertips than any method. It
works well when you are working with a large number of independent variables. It just fine-tunes the
analysis model by poking variables randomly. Stepwise regression analysis is recommended to be used
when there are multiple independent variables, wherein the selection of independent variables is done
automatically without human intervention in stepwise regression modeling, the variable is added or
subtracted from the set of explanatory variables. The set of variables that are added or removed are
chosen depending on the test statistics of the estimated coefficient.

Ridge Regression Analysis

Ridge regression is based on an ordinary least square method which is used to analyze multicollinearity
data (data where independent variables are highly correlated). Co linearity can be explained as a near-
linear relationship between the variables. Whenever there is multicollinearity, the estimates of least
squares will be unbiased; but, if the difference between them is larger, then it may be far away from the
true value. However, ridge regression eliminates the standard errors by appending some degree of bias
to the regression estimates with a motive to provide more reliable estimates. Assumptions derived
through the ridge regression are similar to the least squared regression, the only difference being the
normality. Although the value of the coefficient is constricted in the ridge regression, it never reaches
zero suggesting the inability to select variables.

Lasso Regression Analysis:

Lasso (Least Absolute Shrinkage and Selection Operator) is similar to ridge regression; however, it uses
an absolute value bias instead of square bias used in ridge regression. It was developed way back in
1989 as an alternative to the traditional least-squares estimate with an intention to deduce the majority
of problems related to overfitting when the data has a large number of independent variables. Lasso has
the capability to perform both – selecting variables and regularizing it along with a soft threshold. By
applying lasso regression, it becomes easier to derive a subset of predictors so that prediction errors can
be minimized while analyzing a quantitative response in the lasso model regression coefficient reaching
zero value after shrinkage are excluded from the model. On the contrary, regression coefficients having
more value than zero are strongly associated with the response variables wherein the explanatory
variables can be either quantitative, categorical or both.
Elastic Net Regression Analysis:

It is a mixture of ridge and lasso regression models trained with L1 and L2 norm. The elastic net brings
about a grouping effect wherein strongly correlated predictors tend to be in/out the model together. It
is recommended to use the elastic net regression model when the number of predictors is far greater
than the number of observations. Elastic net regression model came into existence as an option to lasso
regression model as lasso’s variable section was too much dependent on data, making it unstable. By
using elastic net regression, statisticians became capable of over bridging the penalties of ridge and
lasso regression only to get the best out of both the models.

Quantile Regression

Quantile regression is the extension of linear regression and we generally use it when outliers, high
skeweness and heteroscedasticity exist in the data.In linear regression, we predict the mean of the
dependent variable for given independent variables. Since mean does not describe the whole
distribution, so modeling the mean is not a full description of a relationship between dependent and
independent variables. So we can use quantile regression which predicts a quantile (or percentile) for
given independent variable

Principal Components Regression (PCR)

PCR is a regression technique which is widely used when you have many independent variables OR
multicollinearity exists in your data. It is divided into 2 steps:

 Getting the Principal components


 Run regression analysis on principal components

The most common features of PCR are:

 Dimensionality Reduction
 Removal of multicollinearity

Partial Least Squares (PLS) Regression:

It is an alternative technique of principal component regression when you have independent variables
highly correlated. It is also useful when there are a large number of independent variables.
Support Vector Regression:

Support vector regression can solve both linear and non-linear models. SVM uses non-linear kernel
functions (such as polynomial) to find the optimal solution for non-linear models. The main idea of SVR
is to minimize error, individualizing the hyper plane which maximizes the margin.

Ordinal Regression

Ordinal Regression is used to predict ranked values. In simple words, this type of regression is suitable
when dependent variable is ordinal in nature. Example of ordinal variables - Survey responses (1 to 6
scale), patient reaction to drug dose (none, mild, severe).

Poisson Regression

Poisson regression is used when dependent variable has count data.

Application of Poisson Regression -

 Predicting the number of calls in customer care related to a particular product


 Estimating the number of emergency service calls during an event

The dependent variable must meet the following conditions -

 The dependent variable has a Poisson distribution.


 Counts cannot be negative.
 This method is not suitable on non-whole numbers

Negative Binomial Regression

Like Poisson Regression, it also deals with count data. The question arises "how it is different from
poisson regression". The answer is negative binomial regression does not assume distribution of count
having variance equal to its mean. While poison regression assumes the variance equal to its mean.

---------------------------

Вам также может понравиться