Вы находитесь на странице: 1из 28

Machine

Learning
(Part 1)
IYKRA DATA FELLOWSHIP BATCH 3
Outline
• Introduction to Machine
Learning

• Regression
• Linear Regression

• Classification
• Logistic Regression
• Naïve Bayes
• Support Vector Machine
• K-Nearest Neighbours
• Decision Tree
• Random Forest
[Machine Learning is the] field of
study that gives computers the
ability to learn without explicitly
programmed

- ARTHUR SAMUEL, 1959


Why we use Machine Learning?
Machine learning is great for:
Problems for which existing solutions require a lot of
hand-tuning or long lists of rules: one Machine Learning
algorithm can often simplify code and perform better.
• Complex problems for which there is no good solution
at all using a traditional approach: the best Machine
Learning techniques can find a solution.
• Fluctuating environments: a Machine Learning system
can adapt to new data.
• Getting insights about complex problems and large
amounts of data.
Types of Machine Learning
System
SUPERVISED LEARNING UNSUPERVISED LEARNING
Linear
Regression
The key objective of
regression-based tasks is
to predict output labels or
responses which are
continues numeric values,
for the given input data.
Types of Regression Model
SIMPLE REGRESSION MODEL MULTIPLE REGRESSION MODEL

This is the most basic regression model in As name implies, in this regression model
which predictions are formed from a single, the predictions are formed from multiple
univariate feature of the data. features of the data.
Applications
❖Forecasting or Predictive Analysis
❖Optimization
❖Error Correction
❖Economics
❖Finance
Gradient Descent and Cost
Function
Gradient descent is an optimization
algorithm used to minimize some function
by iteratively moving in the direction of
steepest descent as defined by the negative
of the gradient. In machine learning, we use
gradient descent to update
the parameters of our model. Parameters
refer to coefficients in Linear
Regression and weights in neural networks.
Logistic
Regression
Logistic regression is a
supervised learning
classification algorithm used to
predict the probability of a
target variable. The nature of
target or dependent variable is
dichotomous, which means
there would be only two
possible classes.
In simple words, the dependent
variable is binary in nature
having data coded as either 1
(stands for success/yes) or 0
(stands for failure/no).
Types of Logistic Regression
BINARY OR BINOMIAL MULTINOMIAL ORDINAL

In such a kind of classification, a In such a kind of classification, In such a kind of classification,


dependent variable will have only dependent variable can have 3 or dependent variable can have 3 or
two possible types either 1 and 0. more possible unordered types or more possible ordered types or the
For example, these variables may the types having no quantitative types having a quantitative
represent success or failure, yes or significance. For example, these significance.
no, win or loss etc. variables may represent “Type A”
or “Type B” or “Type C”.
Logistic Regression Assumptions
▪ In case of binary logistic regression, the target
variables must be binary always and the desired
outcome is represented by the factor level 1.

▪There should not be any multi-collinearity in the


model, which means the independent variables must
be independent of each other.

▪We must include meaningful variables in our model.

▪We should choose a large sample size for logistic


regression.
Naïve Bayes
Classification
Naive Bayes algorithm can be
defined as a supervised
classification algorithm
which is based on Bayes
theorem with an assumption
of independence among
features.
Types of Naïve Bayes
GAUSSIAN NAÏVE BAYES MULTINOMIAL NAÏVE BAYES BERNAOULLI NAÏVE BAYES

It is the simplest Naïve Bayes The features are assumed to be Another important model is
classifier having the assumption drawn from a simple Multinomial Bernoulli Naïve Bayes in which
that the data from each label is distribution. features are assumed to be binary
drawn from a simple Gaussian (0s and 1s).
distribution.
Pros and Cons of Naïve Bayes
Algorithm
PROS CONS

o It is easy to understand o Naïve Bayes classification is its strong


feature independence because in real life it
o It can also be trained on small dataset.
is almost impossible to have a set of
o It can make probabilistic predictions and features which are completely independent
can handle continuous as well as discrete of each other.
data.
o It has a ‘Zero conditional probability
o It will converge faster than discriminative Problem’, for features having zero frequency
models like logistic regression. the total probability also becomes zero.
Applications
➢ Real-time prediction

➢Multi-class prediction

➢ Text Classificatiion

➢Recommendation system
Support Vector
Machine
A set of supervised learning
methods which learn from
the dataset and can be used
for both regression and
classification
Working of SVM
• Support Vectors, Datapoints that are closest to the
hyperplane is called support vectors. Separating line
will be defined with the help of these data points
• Hyperplane − As we can see in the above diagram, it is
a decision plane or space which is divided between a
set of objects having different classes.
• Margin − It may be defined as the gap between two
lines on the closet data points of different classes. It
can be calculated as the perpendicular distance from
the line to the support vectors. Large margin is
considered as a good margin and small margin is
considered as a bad margin.
Kernels
Kernel method is used by SVM to
perform a non-linear classification.
They take low dimensional input space
and convert them into high dimensional
input space. It converts non-separable
classes into the separable one, it finds
out a way to separate the data on the
basis of the data labels defined by us.
Pros and Cons associated with
SVM
PROS CONS

It works really well with a clear margin of It doesn’t perform well when we have large
separation data set because the required training time is
higher
It is effective in high dimensional spaces.
It also doesn’t perform very well, when the
It is effective in cases where the number of data set has more noise i.e. target classes are
dimensions is greater than the number of overlapping
samples.
SVM doesn’t directly provide probability
It uses a subset of training points in the estimates, these are calculated using an
decision function (called support vectors), so it expensive five-fold cross-validation. It is
is also memory efficient. included in the related SVC method of Python
scikit-learn library.
K-Nearest
Neighbours
Works by finding the
distances between a query
and all the examples in the
data, selecting the specified
number examples (K) closest
to the query, then votes for
the most frequent label (in
the case of classification) or
averages the labels (in the
case of regression).
Working of KNN
1. Load datasets

2. Choose value of K

3. Calculate the distance between test data and


each row of training data with the help of any of
the method namely: Euclidean, Manhattan or
Hamming distance. The most commonly used
method to calculate distance is Euclidean.

4. Now, it will assign a class to the test point based


on most frequent class of these rows.
Pros and Cons of KNN
PROS CONS

It is very simple algorithm to understand and It is computationally a bit expensive algorithm


interpret. because it stores all the training data.

It is very useful for nonlinear data because there is High memory storage required as compared to other
no assumption about data in this algorithm. supervised learning algorithms.

It is a versatile algorithm as we can use it for Prediction is slow in case of big N.


classification as well as regression.
It is very sensitive to the scale of data as well as
It has relatively high accuracy but there are much irrelevant features.
better supervised learning models than KNN.
Applications of KNN
• Banking System
KNN can be used in banking system to predict weather an
individual is fit for loan approval? Does that individual have the
characteristics similar to the defaulters one?
• Calculating Credit Ratings
KNN algorithms can be used to find an individual’s credit rating
by comparing with the persons having similar traits.
• Politics
With the help of KNN algorithms, we can classify a potential
voter into various classes like “Will Vote”, “Will not Vote”, “Will
Vote to Party ‘Congress’, “Will Vote to Party ‘BJP’.
Decision Tree
A decision tree is a structure
that includes a root node,
branches, and leaf nodes.
Types of Decision Tree
BINARY VARIABLE DECISION TREE CONTINUOUS VARIABLE DECISION TREE

Decision Tree which has binary target Decision Tree has continuous target variable
variable then it called as Binary Variable then it is called as Continuous Variable
Decision Tree. Decision Tree.
Advantages and
Disadvantages of Decision
Tree
ADVANTAGES DISADVANTAGES
• Easy to Understand • Overfit
• Useful in Data Exploration • Not fit for continuous variables
• Less data cleaning required

• Data type is not a constraint

• Non parametric method


Random Forest
It uses decision tree
underneath and forms
multiple trees and eventually
takes majority vote out of it.

Оценить