Академический Документы
Профессиональный Документы
Культура Документы
Sandeep Khurana
Introduction to Machine Learning
1. Motivation for ML
2. ML algorithms
1. Classification
2. Regression
3. Introduction to Azure ML Studio
4. Environment setup
1. Data Preparation
2. Modeling of Problem
5. Introduction to Kaggle
Hands-on Case study
*Cognitive Neuro-science (Gazzaniga, Ivry, Mangun) ** Artificial Intelligence: A modern approach (Russell and Norvig)
DE-CONSTRUCTING INTELLIGENCE
NATURAL LANGUAGE
to enable it to communicate successfully in English
PROCESSING
Training More “solved examples” the better Larger learning dataset > Better accuracy
Testing Solve unseen problems to assess learning Test algorithm on unseen data
Decompose and Apply past associations between components Apply past associations between components
Aggregate Text/Image of image/text to current image/text of image/text to current image/text
Matching/Classification Recall from memory the closest association to Recall from memory the closest association to
instance at hand instance at hand
Matching/Classification More clues to memory increase recall ability More features increase accuracy
(c) QuantLeap Consulting
• Common Sense
Theory
• Math
Execution
• Programming
Application
• Domain knowledge
Interpretation
• Algorithm
Machine
Supervised Learning
Learning Unsupervised Learning
Reinforcement Learning
Supervised Learning
Anomaly
Prediction
Detection
Classification Algorithms
Neural
SVM K-NN
Networks
Unsupervised Learning
Learning “what normally happens”
No output
Clustering: Grouping similar instances
Other applications: Summarization, Association Analysis
Example applications
Customer segmentation in CRM
Image compression: Color quantization
Reinforcement Learning
Topics:
Policies: what actions should an agent take in a particular situation
Utility estimation: how good is a state (used by policy)
Binary/Categorical/Discrete Classification
Many problems can be structured as classification problems
Binary Recursion
x : car attributes
y : price
y = g (x | θ )
g ( ) model,
θ parameters
y = wx+w0
Approaches
(c) QuantLeap Consulting
Confusion
ROC curve
Matrix
N-Fold
Monte Carlo
Cross
simulation
validation
Sec 2: Azure Machine
Learning Studio
Azure ML modeling
Create account
Steps Build Model
Goal
• Often not clear or clearly articulated
• The “y”
Law of Minimum Force
• Occam’s Razor
• Tools are means to an end- fancy algorithms don’t impress, accurate results do
• Accuracy-simplicity tradeoff
Haste makes Waste
• Don’t rush in on ‘any’ data
• Seek data you need, not what you have
• Study underlying variable-target correlations
• Study summary statistics
Representativeness of Training Data
Formulating ML Problems: Data
Data format
Proxy variables
• Eg Clickstream data
• Eg rental values for socio-economic classification
Calculated variables
• Eg Difference vs Ratio
Features
• Use features that generalize across contexts
• Eg industry-standard fin ratios
05
Formulating DL Problems: Data
Labeled data
• Train using data for all labels
• …and labels must be accurate. GIGO
Creating value from data
Experiment
• The first may not be the best.
• Threshold decisions.
• Multiple small modeling decisions
• Iterations
Model choice
Model integration
• Ensemble
7
Sec 3: Kaggle
Separate presentation
Sec 4: Support Vector
Machines
Classification Tasks
•Learning Task
•Classification Task
Humidity
Introduction: Linear Separators
Binary classification can be viewed as the task of separating classes in
feature space
wTx + b = 0
wTx + b > 0
wTx + b < 0
f(x) = sign(wTx + b)
Linear Separators
Which of the linear separators is optimal?
f(x)=sign(w•x+b),
that correctly classify our data.
Selection of a Good Hyper-Plane
𝐵𝑦 + 𝑐 = 0 is r
𝐴𝑥0 +𝐵 𝑦0 +𝑐
𝐴2 +𝐵2
𝑤 𝑇 𝑥𝑖 + 𝑏
r= 𝑤
Classification Margin
0 x
But what are we going to do ifx2 the dataset is just too hard?
0 x
Φ: x → φ(x)
Nonlinear SVM - Overview
The kernel function plays the role of the dot product in the feature
space.
Properties of SVM
It is sensitive to noise
- Answer:
2)To predict the output for a new input, just predict with each SVM and
find out which one puts the prediction the furthest into the positive region.
Sec 5: Case Study
Hands-on Exercise on Azure Machine Learning studio
(c) QuantLeap Consulting
Overfitting
Thanks!
Contact us:
Sandeep Khurana
Founder
Quant-Leap Consulting
Hyderabad
QLCLLP@gmail.com
www.quantleapconsulting.com