Вы находитесь на странице: 1из 83


Computer Vision:
From 3D Reconstruc:on
to Recogni:on

Professor Silvio Savarese

Computa(onal Vision and Geometry Lab

Silvio Savarese! Lecture 1 ! 28-Mar-16

o Silvio Savarese
o ssilvio@stanford.edu
o Oce: Gates Building, room: 154
o Oce hour: Thursday 2-3pm or under appoint.

o Kenji Hata (head CA)
o Sumant Sharma
o Ma:hew Cong
o Bryan Anenberg
o Kratarth Goel
o Kevin Chen
o Lyne Petse

Class Time & Loca:on

o M-W; 34:20PM Skilling Auditorium

Silvio Savarese! Lecture 1 !

This course requires knowledge of linear algebra, probability,
sta:s:cs, machine learning and computer vision, as well as
decent programming skills. Though not an absolute
requirement, it is encouraged and preferred that you have at
least taken either CS221 or CS229 or CS131A or have equivalent

We will leverage concepts from low-level image processing

(CS131A) (e.g., linear lters, edge detectors, corner detectors,
etc) and machine learning (CS229) (e.g., SVM, basic Bayesian
inference, clustering, neural networks, etc) which we wont
cover in this class.

We will provide links to background material related to CS131A

and CS229 (or discuss during TA sessions) so students can
refresh or study those topics if needed.

Silvio Savarese! Lecture 1 !

Text books

- [FP] D. A. Forsyth and J. Ponce. Computer Vision: A Modern Approach (2nd
Edi:on). Pren:ce Hall, 2011.
- [HZ] R. Hartley and A. Zisserman. Mul(ple View Geometry in Computer
Vision. Academic Press, 2002.

- R. Szeliski. Computer Vision: Algorithms and Applica(ons. Springer, 2011.
- D. Hoiem and S. Savarese. Representa(ons and Techniques for 3D Object
Recogni(on and Scene Interpreta(on, Synthesis
lecture on Ar:cial Intelligence and Machine Learning. Morgan Claypool
Publishers, 2011
- Learning OpenCV, by Gary Bradski & Adrian Kaehler, O'Reilly Media, 2008.

Silvio Savarese! Lecture 1 !

Course assignments
1 warm up problem set (HW-0)
4 problem sets (rst problem released next week!)
1 mid-term exam
1 project

Look up class schedule for release and due dates.

Problems will be released through the schedule page and must
be submimed through Gradescope (Use code MB5ZB9).

Silvio Savarese! Lecture 1 !

Midterm Exam
The exam will be held in class and you will have 80
minutes to complete it.
The exam will be open-book and open-notes.
You will be updated with more details, e.g., material
to be covered, review sessions etc., as we approach
the midterm.

Silvio Savarese! Lecture 1 !
Course Projects
Replicate an interes:ng paper
Comparing dierent methods to a test bed
A new approach to an exis:ng problem
Original research

Write a 10-page paper summarizing your results
Release the nal code
Give a nal in-class presenta:on
SCPD students can send videos instead.

We will introduce projects in 1-2 weeks

Important dates: look up class schedule
Course Projects
Form your team:
1-4 people
The larger is the team, the more work we expect
from the team
Be nice to your partner: do you plan to drop the
Quality of the project (including wri:ng)
Final project in-class presenta:on (~ TBA minutes
spotlight presenta:ons)
Grading policy
Homeworks: 42%
2% for HW0
10% for HW1, HW2, HW3, HW4 (each)

Mid term exam: 15%

Course project: 38%
mid term progress report 5%
nal report 25%
presenta:on 8%

Amendance and class par:cipa:on: 5%

Ques:ons, answers, remarks, piazza posts,
Class par:cipa:on are waived for SCPD students. For the project
presenta:on, SCPD students can send videos instead.
Grading policy
Late policy for home works:

If 1 day late, 50% o the grade for that homework

Zero credits if more than one day.
Two "48-hours one-:me late submission bonuses are
available; that is, you can use this bonus to submit your HW
late axer at most 48 hours. This is one :me deal: Axer you
use all your bonuses, you must adhere to the standard late
submission policy.
No excep:ons will be made.
Grading policy

Late policy project:
If 1 day late, 25% o the grade for the project
If 2 days late, 50% o the grade for the project
Zero credits if more than 2 days
No "late submission bonus" is allowed when submizng your
progress report or project report

Collabora:on policy
Read the student code book, understand what is
collabora:on and what is academic infrac:on.
Discussing project assignment with each other is allowed,
but coding must be done individually
Home works or class project coding policy: using on line
code or other students/researchers code is not allowed in
general. Excep:ons can be made and individual cases will be
discussed with the instructor.
Lecture 1

An introduc:on to computer vision

Course overview

Silvio Savarese! Lecture 1 ! 28-Mar-16

There was a table set out under
a tree in front of the house,
and the March Hare and the
Hatter were having tea at it.

The table was a large one, but

the three were all crowded
together at one corner of it

From A Mad Tea-Party

Alice's Adventures in Wonderland
Lewis Carroll
There was a table set out under
a tree in front of the house,
and the March Hare and the
Hatter were having tea at it.

The table was a large one, but

the three were all crowded
together at one corner of it

From A Mad Tea-Party

Alice's Adventures in Wonderland
Lewis Carroll

Illustration by Arthur Rackham

Computer vision


Object 1 Object N

- semantic
Computer vision


Object 1 Object N

- semantic
Computer vision


Object 1 Object N

- semantic

spatial & temporal relations

Computer vision


Object 1 Object N

- semantic

spatial & temporal relations


- geometry
Computer vision

Sensing device ComputaPonal

1. InformaPon extracPon: features, 3D structure, mo:on

ows, etc
2. InterpretaPon: recognize objects, scenes, ac:ons, events
Computer vision and Applica:ons


1990 2000 2010

Fingerprint biometrics
Augmenta:on with 3D
computer graphics

3D object prototyping

EosSystems Photomodeler 23
Computer vision and Applica:ons

New features detector/descriptors

CV leverages machine learning

EosSystems AutosPch

1990 2000 2010

Face detec:on
Face detec:on
Web applica:ons

Panoramic Photography

3D modeling of landmarks

Computer vision and Applica:ons
Large scale image repositories
Deep learning (e.g. ImageNet)

EosSystems AutosPch

1990 2000 2010

Computer vision and Applica:ons
Large scale image repositories
Deep learning (e.g. ImageNet)

Bemer clouds J
More bandwidth
Increase computa:onal power Kinect

EosSystems AutosPch

1990 2000 2010

Image search engines

Movies, news, sports

Visual search and
landmarks recogni:on

Google Goggles
Visual search and
landmarks recogni:on

Augmented reality

Mo:on sensing and
gesture recogni:on

Autonomous naviga:on and safety

Mobileye: Vision systems in high-end BMW, GM, Volvo models

But also, Toyota, Google, Apple, Tesla, Nissan, Ford, etc.

Source: A. Shashua, S. Seitz

Personal robo:cs

Computer vision and Applica:ons

Assis:ve technologies Surveillance

Factory inspec:on

Vision for robo:cs,

space explora:on
Autonomous driving,
robot naviga:on Security

Sources: K. Grauman, L. Fei-Fei, S. Laznebick

Computer vision and Applica:ons


EosSystems AutosPch

1990 2000 2010 40

Computer vision and Applica:ons

3D EosSystems


1990 2000 2010

Computer vision and Applica:ons

3D EosSystems


1990 2000 2010

Current state of computer vision

3D ReconstrucPon 2D RecogniPon

3D shape recovery Object detec:on

3D scene reconstruc:on Texture classica:on
Camera localiza:on Target tracking
Pose es:ma:on Ac:vity recogni:on

Current state of computer vision

Snavely et al., 06-08

3D ReconstrucPon

3D shape recovery
3D scene reconstruc:on
Camera localiza:on
Pose es:ma:on

Levoy et al., 00 Golparvar-Fard, et al. JAEI 10

Lucas & Kanade, 81 Pandey et al. IFAC , 2010
Chen & Medioni, 92 Hartley & Zisserman, 00
Dellaert et al., 00 Pandey et al. ICRA 2011
Debevec et al., 96
Savarese et al. IJCV 05
Levoy & Hanrahan, 96 Rusinkiewic et al., 02
Nistr, 04
Savarese et al. IJCV 06
Fitzgibbon & Zisserman, 98 Microsoxs PhotoSynth
Triggs et al., 99 Brown & Lowe, 04
Schindler et al, 04 Snavely et al., 06-08
Pollefeys et al., 99 Schindler et al., 08
Kutulakos & Seitz, 99 Lourakis & Argyros, 04
Colombo et al. 05 Agarwal et al., 09 44
Frahm et al., 10
Current state of computer vision

Snavely et al., 06-08

3D ReconstrucPon

3D shape recovery
3D scene reconstruc:on
Camera localiza:on
Pose es:ma:on

Levoy et al., 00 Golparvar-Fard, et al. JAEI 10

Lucas & Kanade, 81 Pandey et al. IFAC , 2010
Chen & Medioni, 92 Hartley & Zisserman, 00
Dellaert et al., 00 Pandey et al. ICRA 2011
Debevec et al., 96
Savarese et al. IJCV 05
Levoy & Hanrahan, 96 Rusinkiewic et al., 02
Nistr, 04
Savarese et al. IJCV 06
Fitzgibbon & Zisserman, 98 Microsoxs PhotoSynth
Triggs et al., 99 Brown & Lowe, 04
Schindler et al, 04 Snavely et al., 06-08
Pollefeys et al., 99 Schindler et al., 08
Kutulakos & Seitz, 99 Lourakis & Argyros, 04
Colombo et al. 05 Agarwal et al., 09
Frahm et al., 10
Current state of computer vision

2D RecogniPon

Object detec:on
Texture classica:on
Target tracking
Ac:vity recogni:on

Turk & Pentland, 91 Argawal & Roth, 02 He et al. 06

Poggio et al., 93 Ramanan & Forsyth, 03 Gould et al. 08
Belhumeur et al., 97 Weber et al., 00 Maire et al. 08
LeCun et al. 98 Vidal-Naquet & Ullman 02 Felzenszwalb et al., 08
Amit and Geman, 99 Fergus et al., 03
Shi & Malik, 00 Kohli et al. 09
Torralba et al., 03 L.-J. Li et al. 09
Viola & Jones, 00
Vogel & Schiele, 03 Ladicky et al. 10,11
Felzenszwalb & Humenlocher 00
Barnard et al., 03 Gonfaus et al. 10
Belongie & Malik, 02 Fei-Fei et al., 04
Ullman et al. 02 Farhadi et al., 09
Kumar & Hebert 04 Lampert et al., 09
Current state of computer vision

Building Tree
car person car
Car Bike

Street 2D RecogniPon

Object detec:on
Texture classica:on
Target tracking
Ac:vity recogni:on

Turk & Pentland, 91 Argawal & Roth, 02 He et al. 06

Poggio et al., 93 Ramanan & Forsyth, 03 Gould et al. 08
Belhumeur et al., 97 Weber et al., 00 Maire et al. 08
LeCun et al. 98 Vidal-Naquet & Ullman 02 Felzenszwalb et al., 08
Amit and Geman, 99 Fergus et al., 03
Shi & Malik, 00 Kohli et al. 09
Torralba et al., 03 L.-J. Li et al. 09
Viola & Jones, 00
Vogel & Schiele, 03 Ladicky et al. 10,11
Felzenszwalb & Humenlocher 00
Barnard et al., 03 Gonfaus et al. 10
Belongie & Malik, 02 Fei-Fei et al., 04
Ullman et al. 02 Farhadi et al., 09
Kumar & Hebert 04 Lampert et al., 09
Current state of computer vision

3D ReconstrucPon 2D RecogniPon

3D shape recovery Object detec:on

3D scene reconstruc:on Texture classica:on
Camera localiza:on Target tracking
Pose es:ma:on Ac:vity recogni:on

Perceiving the World in 3D!

Visual processing in the brain
where pathway
(dorsal stream)


what pathway
(ventral stream)
Visual processing in the brain
where pathway
(dorsal stream)

V1 cortex

what pathway
(ventral stream)
CS 231A course overview

1. Geometry
2. Seman:cs

- How to extract 3d informa:on?
- Which cues are useful?
- What are the mathema:cal tools?
Camera systems
Establish a mapping from 3D to 2D
How to calibrate a camera
Es:mate camera parameters such pose or focal length

Single view metrology
Es:mate 3D proper:es of the world from a single image

Single view metrology
Es:mate 3D proper:es of the world from a single image
Mul:ple view geometry
Es:mate 3D proper:es of the world from mul:ple views
Mathema:cal tools

Epipolar geometry

Tomasi & Kanade (1993) Projective Photoconsistency

structure from motion:
Here be dragons!
Structure from mo:on

Courtesy of Oxford Visual Geometry Group

Structure ligh:ng and volumetric stereo

Scanning Michelangelos The David

The Digital Michelangelo Project
- hmp://graphics.stanford.edu/projects/mich/
2 BILLION polygons, accuracy to .29mm
CS 231A course overview

1. Geometry
2. Seman:cs

- How to recognize objects?
- How to classify images or understand a scene?
- How to segment out cri:cal seman:cs
- How to es:mate 3D proper:es (pose, size, shape)
Object recogniPon and categorizaPon
Downtown chicago



Pedestrians crossing street

Is this an forest?

Does this image contain a building? [yes/no]

Does this image contain a car? [where?]

Which objects do this image contain? [where?]



Accurate localizaPon (segmentaPon)

EsPmaPng 3D geometrical properPes

45 degree

Car, side view

Person, back
Challenges: viewpoint variation

slide credit: Fei-Fei, Fergus & Torralba

Challenges: illumination

image credit: J. Koenderink

Challenges: scale

slide credit: Fei-Fei, Fergus & Torralba

Challenges: deformation

Magritte, 1957 slide credit: Fei-Fei, Fergus & Torralba

Challenges: background clutter

Kilmeny Niland. 1995

Challenges: object intra-class variation

slide credit: Fei-Fei, Fergus & Torralba

CS 231A course overview

1. Geometry
2. Seman:cs

Joint recovery of geometry and seman:cs!

Visual processing in the brain
where pathway
(dorsal stream)


what pathway
(ventral stream)
Visual processing in the brain
where pathway
(dorsal stream)

V1 cortex

what pathway
(ventral stream)
Joint reconstruc:on and recogni:on

Input images

Car Person Tree Sky

Street Building 79
Joint reconstruc:on and recogni:on

Input images

Car Person Tree Sky

Street Building 80
There was a table set out under
a tree in front of the house,
and the March Hare and the
Hatter were having tea at it.

The table was a large one, but

the three were all crowded
together at one corner of it

From A Mad Tea-Party

Alice's Adventures in Wonderland
Lewis Carroll
Lecture Topic

1 Introduc:on
2 Camera models

3D geometry
3 Camera calibra:on
4 Single view metrology
5 Epipolar geometry
6 Mul:-view geometry

7 Structure from mo:on/ SLAM

8 Volumetric stereo
9 Fizng and Matching
10 Detector and Descriptors Proposal due

11 Intro to Recogni:on; Object classica:on I
12 Object classica:on II
13 2D Object detec:on

14 3D Object recogni:on
15 Scene understanding & segmenta:on
16 3D Scene understanding

Project presentations
IntroducPon to
Computer Vision

Next lecture: Camera systems

Silvio Savarese! Lecture 1 ! 28-Mar-16

Вам также может понравиться