1 views

Uploaded by werwerwer

xccz

xccz

© All Rights Reserved

- 01-intro
- John Shawe-Taylor- Centre for Computational Statistics and Machine Learning
- Machine Learning
- Machine Learning With Go Implement Regression, Classification, Clustering, Time-series Models, Neural Networks, And More
- Civil Engineering Problems for Probability and Numerical Analysis
- Ml Mecse June15
- Clustering. Computational Journalism week 2
- Artificial Neural Network
- learning_theory.pdf
- Lecture-10 Hands on Exercise SDSM 2012
- Lecture 04
- Big data
- Identifying Buying Preferences of Customers...
- Importance of Statistics
- Diagnostics of Stator Faults of the Single-phase Induction Motor Using Thermal Images, MoASoS and Selected Classifiers
- Elec2600 Lecture Part III h
- Our ANN Paper 2.pdf
- seanjordan philipgraff posterboard
- Week 5 -1
- Demand Analysis 2

You are on page 1of 42

Machine Learning

Christfried Webers

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

Outlines

NICTA

and

College of Engineering and Computer Science

The Australian National University

Canberra

February June 2013

Overview

Introduction

Linear Algebra

Probability

Linear Regression 1

Linear Regression 2

Linear Classification 1

Linear Classification 2

Neural Networks 1

Neural Networks 2

Kernel Methods

Sparse Kernel Methods

Graphical Models 1

Graphical Models 2

Graphical Models 3

Mixture Models and EM 1

Mixture Models and EM 2

Approximate Inference

Sampling

Principal Component Analysis

Sequential Data 1

Sequential Data 2

Combining Models

Selected Topics

Discussion and Summary

1of 114

Introduction to Statistical

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

Part II

I

SML

2013

Polynomial Curve Fitting

Probability Theory

Introduction

Probability Densities

Expectations and

Covariances

74of 114

Introduction to Statistical

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

sin(2x) + random noise

x = 0, . . . , 1

I

SML

2013

Polynomial Curve Fitting

Probability Theory

Probability Densities

Expectations and

Covariances

75of 114

Introduction to Statistical

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

Polynomial Curve Fitting

N = 10

x (x1 , . . . , xN )T

t (t1 , . . . , tN )T

Probability Theory

Probability Densities

Expectations and

Covariances

76of 114

Introduction to Statistical

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

N = 10

x (x1 , . . . , xN )

t (t1 , . . . , tN )T

xi R i = 1,. . . , N

ti R i = 1,. . . , N

Probability Theory

Probability Densities

Expectations and

Covariances

77of 114

M : order of polynomial

y(x, w) = w0 + w1 x + w2 x2 + + wM xM

=

M

X

wm xm

m=0

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

Polynomial Curve Fitting

Probability Theory

Probability Densities

Expectations and

Covariances

nonlinear function of x

linear function of the unknown model parameter w

How can we find good parameters w = (w1 , . . . , wM )T ?

78of 114

t

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

tn

y(xn , w)

I

SML

2013

Polynomial Curve Fitting

Probability Theory

Probability Densities

xn

Expectations and

Covariances

79of 114

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

tn

y(xn , w)

I

SML

2013

Polynomial Curve Fitting

Probability Theory

Probability Densities

xn

Expectations and

Covariances

prediction of the model for the training data

N

E(w) =

1X

2

(y(xn , w) tn )

2

n=1

80of 114

y(x, w) =

M

X

m=0

wm x

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

M=0

= w0

Probability Theory

Probability Densities

Expectations and

Covariances

M =0

1

t

0

81of 114

y(x, w) =

M

X

wm x

m=0

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

M=1

= w0 + w1 x

Probability Theory

Probability Densities

Expectations and

Covariances

M =1

1

t

0

82of 114

y(x, w) =

M

X

wm x

m=0

= w0 + w1 x +

M=3

w2 x2

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

+ w3 x3

Probability Theory

Probability Densities

Expectations and

Covariances

M =3

1

t

0

83of 114

y(x, w) =

M

X

m=0

wm x

M=9

= w0 + w1 x + + w8 x8 + w9 x9

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

Polynomial Curve Fitting

Probability Theory

overfitting

Probability Densities

Expectations and

Covariances

M =9

1

t

0

1

84of 114

Introduction to Statistical

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

Get 100 new data points

Root-mean-square (RMS) error

p

ERMS = 2E(w? )/N

I

SML

2013

Polynomial Curve Fitting

Probability Theory

Probability Densities

ERMS

Training

Test

Expectations and

Covariances

0.5

85of 114

Introduction to Statistical

Machine Learning

w?0

w?1

w?2

w?3

w?4

w?5

w?6

w?7

w?8

w?9

M=0

0.19

c

2013

Christfried Webers

NICTA

The Australian National

University

M=1

0.82

-1.27

M=3

0.31

7.99

-25.43

17.37

M=9

0.35

232.37

-5321.83

48568.31

-231639.30

640042.26

-1061800.52

1042400.18

-557682.99

125201.43

I

SML

2013

Polynomial Curve Fitting

Probability Theory

Probability Densities

Expectations and

Covariances

86of 114

Introduction to Statistical

Machine Learning

More Data

c

2013

Christfried Webers

NICTA

The Australian National

University

N = 15

I

SML

2013

Polynomial Curve Fitting

N = 15

Probability Theory

Probability Densities

Expectations and

Covariances

87of 114

Introduction to Statistical

Machine Learning

More Data

N = 100

heuristics : have no less than 5 to 10 times as many data

points than parameters

but number of parameters is not necessarily the most

appropriate measure of model complexity !

later: Bayesian approach

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

Polynomial Curve Fitting

Probability Theory

Probability Densities

Expectations and

Covariances

N = 100

1

t

0

1

88of 114

Introduction to Statistical

Machine Learning

Regularisation

c

2013

Christfried Webers

NICTA

The Australian National

University

Add a regularisation term to the error function

I

SML

2013

Polynomial Curve Fitting

Probability Theory

e

E(w)

=

N

1X

n=1

( y(xn , w) tn ) +

kwk2

2

Probability Densities

Expectations and

Covariances

kwk2 wT w = w20 + w21 + + w2M

89of 114

Introduction to Statistical

Machine Learning

Regularisation

c

2013

Christfried Webers

NICTA

The Australian National

University

M=9

I

SML

2013

Polynomial Curve Fitting

ln = 18

Probability Theory

Probability Densities

Expectations and

Covariances

90of 114

Introduction to Statistical

Machine Learning

Regularisation

c

2013

Christfried Webers

NICTA

The Australian National

University

M=9

I

SML

2013

Polynomial Curve Fitting

ln = 0

Probability Theory

Probability Densities

Expectations and

Covariances

91of 114

Introduction to Statistical

Machine Learning

Regularisation

c

2013

Christfried Webers

NICTA

The Australian National

University

M=9

I

SML

2013

1

Training

Test

Probability Theory

ERMS

Probability Densities

Expectations and

Covariances

0.5

35

30

ln 25

20

92of 114

Introduction to Statistical

Machine Learning

Probability Theory

c

2013

Christfried Webers

NICTA

The Australian National

University

p(X, Y )

I

SML

2013

Polynomial Curve Fitting

Probability Theory

Y =2

Probability Densities

Expectations and

Covariances

Y =1

93of 114

Introduction to Statistical

Machine Learning

Probability Theory

Y vs. X

2

1

sum

a

0

3

3

b

0

6

6

c

0

8

8

d

1

8

9

e

4

5

9

f

5

3

8

g

8

1

9

h

6

0

6

i

2

0

2

sum

26

34

60

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

Polynomial Curve Fitting

Probability Theory

Probability Densities

p(X, Y )

Expectations and

Covariances

Y =2

Y =1

94of 114

Introduction to Statistical

Machine Learning

Sum Rule

Y vs. X

2

1

sum

c

2013

Christfried Webers

NICTA

The Australian National

University

a

0

3

3

b

0

6

6

c

0

8

8

d

1

8

9

e

4

5

9

f

5

3

8

g

8

1

9

h

6

0

6

i

2

0

2

sum

26

34

60

I

SML

2013

Polynomial Curve Fitting

Probability Theory

Probability Densities

p(X = d, Y = 1) = 8/60

p(X = d) = p(X = d, Y = 2) + p(X = d, Y = 1)

= 1/60 + 8/60

p(X = d) =

Expectations and

Covariances

p(X = d, Y)

p(X) =

p(X, Y)

95of 114

Introduction to Statistical

Machine Learning

Sum Rule

c

2013

Christfried Webers

NICTA

The Australian National

University

Y vs. X

2

1

sum

a

0

3

3

b

0

6

6

c

0

8

8

d

1

8

9

e

4

5

9

f

5

3

8

g

8

1

9

h

6

0

6

i

2

0

2

sum

26

34

60

I

SML

2013

Polynomial Curve Fitting

Probability Theory

Probability Densities

p(X) =

p(X, Y)

p(Y) =

p(X, Y)

Expectations and

Covariances

p(X)

p(Y )

96of 114

Introduction to Statistical

Machine Learning

Product Rule

Y vs. X

2

1

sum

a

0

3

3

b

0

6

6

c

0

8

8

d

1

8

9

e

4

5

9

f

5

3

8

g

8

1

9

h

6

0

6

i

2

0

2

sum

26

34

60

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

Polynomial Curve Fitting

Conditional Probability

Probability Theory

Probability Densities

p(X = d | Y = 1) = 8/34

Expectations and

Covariances

p(Y = 1) =

p(X, Y = 1) = 34/60

p(X, Y) = p(X | Y) p(Y)

97of 114

Introduction to Statistical

Machine Learning

Product Rule

Y vs. X

2

1

sum

a

0

3

3

b

0

6

6

c

0

8

8

d

1

8

9

e

4

5

9

f

5

3

8

g

8

1

9

h

6

0

6

i

2

0

2

sum

26

34

60

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

Polynomial Curve Fitting

Probability Theory

Probability Densities

p(X|Y = 1)

Expectations and

Covariances

98of 114

Introduction to Statistical

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

Polynomial Curve Fitting

Sum Rule

p(X) =

Probability Theory

p(X, Y)

Product Rule

Probability Densities

Expectations and

Covariances

99of 114

p(X, Y) = s/t (e.g. s = 8, t = 60 )?

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

SML

2013

Polynomial Curve Fitting

Probability Theory

Probability Densities

Expectations and

Covariances

100of 114

Introduction to Statistical

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

p(X, Y) = s/t (e.g. s = 8, t = 60 )?

Why not using pairs of numbers (a, c) instead of

sin(alpha) = a/c?

SML

2013

Polynomial Curve Fitting

Probability Theory

Probability Densities

Expectations and

Covariances

alpha

101of 114

Introduction to Statistical

Machine Learning

Bayes Theorem

c

2013

Christfried Webers

NICTA

The Australian National

University

p(X, Y) = p(X | Y) p(Y) = p(Y | X) p(X)

Bayes Theorem

I

SML

2013

Polynomial Curve Fitting

Probability Theory

p(X | Y) p(Y)

p(Y | X) =

p(X)

Probability Densities

Expectations and

Covariances

and

p(X) =

p(X, Y)

(sum rule)

X

Y

p(X | Y) p(Y)

(product rule)

102of 114

Introduction to Statistical

Machine Learning

Probability Densities

c

2013

Christfried Webers

NICTA

The Australian National

University

Probability of x to fall in the interval (x, x + x) is given by

p(x)x for infinitesimal small x.

I

SML

2013

Polynomial Curve Fitting

p(x)

Probability Theory

p(x) dx.

Probability Densities

Expectations and

Covariances

P (x)

103of 114

Introduction to Statistical

Machine Learning

Constraints on p(x)

c

2013

Christfried Webers

NICTA

The Australian National

University

Nonnegative

p(x) 0

Normalisation

I

SML

2013

p(x) dx = 1.

Probability Theory

Probability Densities

p(x)

Expectations and

Covariances

P (x)

104of 114

p(z) dz

I

SML

2013

d

P(x) = p(x)

dx

p(x)

c

2013

Christfried Webers

NICTA

The Australian National

University

P(x) =

or

Introduction to Statistical

Machine Learning

Probability Theory

Probability Densities

Expectations and

Covariances

P (x)

105of 114

x1

..

T

Vector x (x1 , . . . , xD ) = .

xD

Nonnegative

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

Polynomial Curve Fitting

Probability Theory

p(x) 0

Normalisation

Probability Densities

Expectations and

Covariances

p(x) dx = 1.

This means

Z

...

106of 114

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

Polynomial Curve Fitting

Sum Rule

p(x) =

Probability Theory

p(x, y) dy

Product Rule

Probability Densities

Expectations and

Covariances

107of 114

Introduction to Statistical

Machine Learning

Expectations

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

distribution p(x)

X

E [f ] =

p(x) f (x)

discrete distribution p(x)

Probability Theory

Probability Densities

Expectations and

Covariances

Z

E [f ] =

p(x) f (x) dx

108of 114

Introduction to Statistical

Machine Learning

How to approximate E [f ]

c

2013

Christfried Webers

NICTA

The Australian National

University

probability distribution p(x).

Approximate the expectation by a finite sum:

I

SML

2013

Polynomial Curve Fitting

Probability Theory

Probability Densities

E [f ] '

1

N

N

X

f (xn )

Expectations and

Covariances

n=1

Lecture coming about Sampling

109of 114

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

X

Ex [f (x, y)] =

p(x) f (x, y)

Probability Theory

Expectations and

Covariances

Z

Ex [f (x, y)] =

Probability Densities

p(x) f (x, y) dx

110of 114

Introduction to Statistical

Machine Learning

Conditional Expectation

arbitrary function f (x)

X

Ex [f | y] =

p(x | y) f (x)

c

2013

Christfried Webers

NICTA

The Australian National

University

Ex [f | y] =

p(x | y) f (x) dx

Other notation used in the literature : Ex|y [f ].

What is E [E [f (x) | y]] ? Can we simplify it?

This must mean Ey [Ex [f (x) | y]]. (Why?)

X

X

X

Ey [Ex [f (x) | y]] =

p(y) Ex [f | y] =

p(y)

p(x|y) f (x)

y

f (x) p(x, y) =

x,y

I

SML

2013

Probability Theory

Probability Densities

Expectations and

Covariances

f (x) p(x)

= Ex [f (x)]

111of 114

Introduction to Statistical

Machine Learning

Variance

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

2

var[f ] = E (f (x) E [f (x)])2 = E f (x)2 E [f (x)]

Special case: f (x) = x

Probability Theory

Probability Densities

Expectations and

Covariances

2

var[x] = E (x E [x])2 = E x2 E [x]

112of 114

Introduction to Statistical

Machine Learning

Covariance

c

2013

Christfried Webers

NICTA

The Australian National

University

cov[x, y] = Ex,y [(x E [x])(y E [y])]

= Ex,y [x y] E [x] E [y]

I

SML

2013

Polynomial Curve Fitting

Probability Theory

Probability Densities

= Ex,y [x y] Ex,y [x b] Ex,y [a y] + Ex,y [a b]

= Ex,y [x y] b Ex,y [x] a Ex,y [y] +a b Ex,y [1]

| {z }

| {z }

| {z }

=Ex [x]

=Ey [y]

Expectations and

Covariances

=1

= Ex,y [x y] a b a b + a b = Ex,y [x y] a b

= Ex,y [x y] E [x] E [y]

Expresses how strongly x and y vary together. If x and y

are independent, their covariance vanishes.

113of 114

Machine Learning

c

2013

Christfried Webers

NICTA

The Australian National

University

I

SML

2013

Polynomial Curve Fitting

cov[x, y] = Ex,y (x E [x])(yT E yT )

= Ex,y x yT E [x] E yT

Probability Theory

Probability Densities

Expectations and

Covariances

114of 114

- 01-introUploaded bygraciela.ovando2725
- John Shawe-Taylor- Centre for Computational Statistics and Machine LearningUploaded byNoScript
- Machine LearningUploaded bydarebusi1
- Machine Learning With Go Implement Regression, Classification, Clustering, Time-series Models, Neural Networks, And MoreUploaded bykapishsharma
- Civil Engineering Problems for Probability and Numerical AnalysisUploaded byAmmar Kh
- Clustering. Computational Journalism week 2Uploaded byJonathan Stray
- Artificial Neural NetworkUploaded byRezaBachtiar
- Lecture 04Uploaded bypraneix
- Ml Mecse June15Uploaded bySujay Hv
- learning_theory.pdfUploaded byShreyansh Kothari
- Big dataUploaded byBhawna Khosla
- Importance of StatisticsUploaded byTritoy
- Lecture-10 Hands on Exercise SDSM 2012Uploaded byMian Waqar Ali Shah
- Identifying Buying Preferences of Customers...Uploaded bycpmr
- Diagnostics of Stator Faults of the Single-phase Induction Motor Using Thermal Images, MoASoS and Selected ClassifiersUploaded byIsrael Zamudio
- Elec2600 Lecture Part III hUploaded byGhoshan Jaganathamani
- Our ANN Paper 2.pdfUploaded bySiddheshwar Chopra
- seanjordan philipgraff posterboardUploaded byapi-306603109
- Week 5 -1Uploaded bymaxentiuss
- Demand Analysis 2Uploaded bySamyak Gajbhiye
- Basic ConceptsUploaded byMohsen Jadidi
- 07 MachineLearning PubUploaded byAkanksha Priyadarshini
- MBMLbookUploaded byAnonymous emkfa2x8
- 14s L01 IntroUploaded byJames Walker
- 27-Article Text-100-1-10-20160902.pdfUploaded byRaisya Nazila
- Submitting FileUploaded byHarshit Gahelot
- stats descriptive projectUploaded byapi-310899958
- Scientific Method_Formal paperUploaded byMary David
- Arkanath CvUploaded byShivam Agarwal
- sylabussUploaded byZakariyaKhan

- 01 OverviewUploaded bywerwerwer
- CaretUploaded bywerwerwer
- Lecture_ Point Pattern AnalysisUploaded bywerwerwer
- Lab1Uploaded bywerwerwer
- Get StartUploaded bywerwerwer
- gs_short_exUploaded bywerwerwer
- Crime Forecasting on a Shoestring Budget - Crime Mapping & Analysis Newscrime Mapping & Analysis NewsUploaded bywerwerwer
- Spatio Temporal KrigingUploaded bywerwerwer
- Stockton 2Uploaded bywerwerwer
- STAT 544.01- Topics - Spatial StatisticsUploaded bywerwerwer
- R FAQ_ How Can I Calculate Moran's I in RUploaded bywerwerwer
- Comp598 Submission 99Uploaded bywerwerwer
- gstat.pdfUploaded bywerwerwer
- Georg Lm IntroUploaded bywerwerwer
- Boosting SchapireUploaded bywerwerwer
- hw2Uploaded bywerwerwer
- f09maUploaded bywerwerwer
- 07Uploaded bywerwerwer
- 23 FilteringUploaded bywerwerwer
- 07 Linear Classification 1Uploaded bywerwerwer
- 03 Linear Algebra CopyUploaded bywerwerwer
- MIT14_384F13_lec1.pdfUploaded bywerwerwer
- YanchangZhao Refcard Data MiningUploaded byRenukha Pannala
- R-Cheat SheetUploaded byPrasad Marathe
- Durbin-watsonUploaded byAhmad Luky Ramdani
- Ricci Refcard RegressionUploaded byDavid Z Shuvalov
- R Cheat DataUploaded bywerwerwer

- 3.10 Xmr Chart InvoicesUploaded byMohammed Adnan Khan
- lat 1Uploaded byRizmana Azzandana
- 13HWsolUploaded byMarco Perez Hernandez
- Statistics 1 AQA Revision NotesUploaded byShiv Kumar Singh
- Business Statistics Assign Men TiUploaded byYograj Rajput
- Output estadísticaUploaded bySandra Martinez
- Stats Chapter 3Uploaded byVivek Narayanan
- Failure RatesUploaded byaakashtrivedi
- 578 Assignment 1 F14 SolUploaded byaman_nsu
- 8) Multilevel AnalysisUploaded byMulusew
- Module 4 - Logistic RegressionUploaded byjuntujuntu
- Detecting OutliersUploaded bymairahimadhani
- ch16Uploaded byRaheel Khan
- Discriminant Analysis ExampleUploaded bySteven Xu
- Week 4ReviewUploaded byArshad Hussain
- Project Report ProbabilityUploaded byAli Raza
- Introduction to Experiment DesignUploaded byYoosef Nadiree
- summary of testsUploaded byapi-399763067
- Python for Multivariate AnalysisUploaded byEmmanuelDasi
- Logit and Probit ModelsUploaded bygalih visnhu
- QUAMET1 FormulasUploaded byNeil Bascos
- CIMENTACIONES MAQUINASUploaded byMauricio de los Santos
- Ch3 SlidesUploaded byRossy Dinda Pratiwi
- ANOVA Techniques for Reliability Analysis. REV 1 CAT DocUploaded byJucimar Antunes Cabral
- Chi-square Test PresentationUploaded bypchimanshu27
- Dsur i Chapter 10 Glm 1Uploaded byDanny
- Wiki. StatisticsUploaded bybrunoschwarz
- Exercise Spearman Rank Correlation CoefficientUploaded byRohaila Rohani
- Operational Management: Management of ControlUploaded byBikram Prajapati
- BUS 511 PresentationUploaded byAfzal Hossain Riaz