Академический Документы
Профессиональный Документы
Культура Документы
Classification is technique to categorize our data into a desired and distinct number of classes
where we can assign label to each class.
Binary classifiers: Classification with only 2 distinct classes or with 2 possible outcomes
Machine learning is an application of artificial intelligence (AI) that provides systems the
ability to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer programs that
can access data and use it learn for themselves.
The process of learning begins with observations or data, such as examples, direct
experience, or instruction, in order to look for patterns in data and make better decisions in
the future based on the examples that we provide. The primary aim is to allow the
computers learn automatically without human intervention or assistance and adjust actions
accordingly.
CLASSIFICATION
Classification is one of the most widely used techniques in machine learning, with a broad
array of applications, including sentiment analysis, ad targeting, spam detection, risk
assessment, medical diagnosis and image classification. The core goal of classification is to
predict a category or class y from some inputs x.
Linear Classifiers
Linear classifiers are amongst the most practical classification methods. A linear classifier
associates a coefficient with the counts of each word in the sentence. We focus on a
particularly useful type of linear classifier called logistic regression, which, in addition to
allowing you to predict a class, provides a probability associated with the prediction. These
probabilities are extremely useful, since they provide a degree of confidence in the
predictions. We construct features from categorical inputs, and tackle classification problems
with more than two class (multiclass problems).
One-VS-All Classification Multiclass Classification
DECISION TREES
A decision tree is a decision support tool that uses a tree-like graph or model of decisions and
their possible consequences, including chance event outcomes, resource costs, and utility. It is
one way to display an algorithm that only contains conditional control statements.
A decision tree is a flowchart-like structure in which each internal node represents a “test” on
an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the
outcome of the test, and each leaf node represents a class label (decision taken after computing
all attributes). The paths from root to leaf represent classification rules.
Tree based learning algorithms are considered to be one of the best and mostly used
supervised learning methods
WORKING ALGO:-
STEP 1> START WITH AN EMPTY TREE
STEP2>SELECT BEST FEATURE TO SPLIT DATA
STEP3>FOR EACH SPLIT
>IF NOTHING MORE TO PREDICT
>OTHERWISE GO TO STEP 2 & CONTINUE (RECURSE) ON THIS SPLIT
BOOSTING
Boosting is an ensemble modeling technique which attempts to build a strong classifier from
the number of weak classifiers. It is done building a model by using weak models in series.
Firstly, a model is built from the training data. Then the second model is built which tries to
correct the errors present in the first model. This procedure is continued and models are
added until either the complete training data set is predicted correctly or the maximum
number of models are added.
AdaBoost was the first really successful boosting algorithm developed for the purpose of
binary classification. AdaBoost is short for Adaptive Boosting and is a very popular boosting
techniq ue which combines multiple “weak classifiers” into a single “strong classifier
Algorithm:
1. Initialise the dataset and assign equal weight to each of the data point.
2. Provide this as input to the model and identify the wrongly classified data points.
3. Increase the weight of the wrongly classified data points.
.
4. if (got required results)
Goto step 5
else
Goto step 2
5. End
Precision
Precision attempts to answer the following question:
What proportion of positive identifications was actually correct?
Recall
Recall attempts to answer the following question:
What proportion of actual positives was identified correctly?
Mathematically, recall is defined as follows: