National University of Modern Languages Lahore Campus Topic

National University of Modern Languages
Lahore Campus
Topic:
Regression Analysis
Subject:
Multivariate & Data Analysis
Submitted To:
Muhammad Shoaib
Submitted by:
Muhammad Ahmad
Roll Number:
L-21127
Class (shift):
MBA-VI (M)
Topic: Regression Analysis and their types
Definition:
Regression analysis is most widely used statistical technique for investigating or estimating the
relationship between dependent and a set of independent explanatory variables. It is also used as a
blanket term for a variety of data analysis techniques that are utilized in a qualitative research
Types of Regression Analysis:
Linear Regression Analysis
It is one of the most widely known modeling techniques, as it is amongst the first elite regression
analysis methods picked up by people at the time of learning predictive modeling. Here, the dependent
variable is continuous and independent variable is more often continuous or discreet with a linear
regression line. In a multiple linear regression there is more than one independent variable and in a
simple linear regression, there is only one independent variable. Thus, linear regression is best to be
used only when there is a linear relationship between the independent and a dependent variable.
Logistic Regression Analysis
Logistic regression is commonly used to determine the probability of event=Success and event=Failure.
Whenever the dependent variable is binary like 0/1, True/False, Yes/No logistic regression is used. Thus,
it can be said that logistic regression is used to analyze either the close-ended questions in a survey or
the questions demanding numeric response in a survey. Logistic regression does not need a linear
relationship between a dependent and an independent variable just like linear regression. The logistic
regression applies a non-linear log transformation for predicting the odds’ ratio; therefore, it easily
handles various types of relationships between a dependent and an independent variable.
Polynomial Regression Analysis
Polynomial regression is commonly used to analyze the curvilinear data and this happens when the
power of an independent variable is more than 1. In this regression analysis method, the best fit line is
never a ‘straight-line’ but always a ‘curve line’ fitting into the data points. Polynomial regression is better
to be used when few of the variables have exponents and few do not have any. Additionally, it can
model non-linearly separable data offering the liberty to choose the exact exponent for each variable
and that too with full control over the modeling features available.
Stepwise Regression Analysis
This is a semi-automated process with which a statistical model is built either by adding or removing the
variables that are dependent on the t-statistics of their estimated coefficients. If used properly, the
stepwise regression will provide you with more powerful data at your fingertips than any method. It
works well when you are working with a large number of independent variables. It just fine-tunes the
analysis model by poking variables randomly. Stepwise regression analysis is recommended to be used
when there are multiple independent variables, wherein the selection of independent variables is done
automatically without human intervention in stepwise regression modeling, the variable is added or
subtracted from the set of explanatory variables. The set of variables that are added or removed are
chosen depending on the test statistics of the estimated coefficient.
Ridge Regression Analysis
Ridge regression is based on an ordinary least square method which is used to analyze multicollinearity
data (data where independent variables are highly correlated). Co linearity can be explained as a near-
linear relationship between the variables. Whenever there is multicollinearity, the estimates of least
squares will be unbiased; but, if the difference between them is larger, then it may be far away from the
true value. However, ridge regression eliminates the standard errors by appending some degree of bias
to the regression estimates with a motive to provide more reliable estimates. Assumptions derived
through the ridge regression are similar to the least squared regression, the only difference being the
normality. Although the value of the coefficient is constricted in the ridge regression, it never reaches
zero suggesting the inability to select variables.
Lasso Regression Analysis:
Lasso (Least Absolute Shrinkage and Selection Operator) is similar to ridge regression; however, it uses
an absolute value bias instead of square bias used in ridge regression. It was developed way back in
1989 as an alternative to the traditional least-squares estimate with an intention to deduce the majority
of problems related to overfitting when the data has a large number of independent variables. Lasso has
the capability to perform both – selecting variables and regularizing it along with a soft threshold. By
applying lasso regression, it becomes easier to derive a subset of predictors so that prediction errors can
be minimized while analyzing a quantitative response in the lasso model regression coefficient reaching
zero value after shrinkage are excluded from the model. On the contrary, regression coefficients having
more value than zero are strongly associated with the response variables wherein the explanatory
variables can be either quantitative, categorical or both.
Elastic Net Regression Analysis:
It is a mixture of ridge and lasso regression models trained with L1 and L2 norm. The elastic net brings
about a grouping effect wherein strongly correlated predictors tend to be in/out the model together. It
is recommended to use the elastic net regression model when the number of predictors is far greater
than the number of observations. Elastic net regression model came into existence as an option to lasso
regression model as lasso’s variable section was too much dependent on data, making it unstable. By
using elastic net regression, statisticians became capable of over bridging the penalties of ridge and
lasso regression only to get the best out of both the models.
Quantile Regression
Quantile regression is the extension of linear regression and we generally use it when outliers, high
skeweness and heteroscedasticity exist in the data.In linear regression, we predict the mean of the
dependent variable for given independent variables. Since mean does not describe the whole
distribution, so modeling the mean is not a full description of a relationship between dependent and
independent variables. So we can use quantile regression which predicts a quantile (or percentile) for
given independent variable
Principal Components Regression (PCR)
PCR is a regression technique which is widely used when you have many independent variables OR
multicollinearity exists in your data. It is divided into 2 steps:
 Getting the Principal components

 Run regression analysis on principal components
The most common features of PCR are:
 Dimensionality Reduction
 Removal of multicollinearity
Partial Least Squares (PLS) Regression:
It is an alternative technique of principal component regression when you have independent variables
highly correlated. It is also useful when there are a large number of independent variables.
Support Vector Regression:
Support vector regression can solve both linear and non-linear models. SVM uses non-linear kernel
functions (such as polynomial) to find the optimal solution for non-linear models. The main idea of SVR
is to minimize error, individualizing the hyper plane which maximizes the margin.
Ordinal Regression
Ordinal Regression is used to predict ranked values. In simple words, this type of regression is suitable
when dependent variable is ordinal in nature. Example of ordinal variables - Survey responses (1 to 6
scale), patient reaction to drug dose (none, mild, severe).
Poisson Regression
Poisson regression is used when dependent variable has count data.
Application of Poisson Regression -
 Predicting the number of calls in customer care related to a particular product

 Estimating the number of emergency service calls during an event
The dependent variable must meet the following conditions -
 The dependent variable has a Poisson distribution.

 Counts cannot be negative.
 This method is not suitable on non-whole numbers
Negative Binomial Regression
Like Poisson Regression, it also deals with count data. The question arises "how it is different from
poisson regression". The answer is negative binomial regression does not assume distribution of count
having variance equal to its mean. While poison regression assumes the variance equal to its mean.
---------------------------

National University of Modern Languages Lahore Campus Topic

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

National University of Modern Languages Lahore Campus Topic

Загружено:

Авторское право:

Доступные форматы

National University of Modern Languages

Types of Regression Analysis:

Linear Regression Analysis

Logistic Regression Analysis

Polynomial Regression Analysis

Stepwise Regression Analysis

Ridge Regression Analysis

Lasso Regression Analysis:

Principal Components Regression (PCR)

 Getting the Principal components

The most common features of PCR are:

Partial Least Squares (PLS) Regression:

Poisson regression is used when dependent variable has count data.

Application of Poisson Regression -

 Predicting the number of calls in customer care related to a particular product

The dependent variable must meet the following conditions -

 The dependent variable has a Poisson distribution.

Negative Binomial Regression

Вам также может понравиться