27 views

Original Title: Deep learning and weak supervisionor image classification

Uploaded by requestebooks

- Hardware Architectures for Deep Neural Networks-MIT'16
- Reinventing the Retail Industry
- Python Machine Learning - Sample Chapter
- Brain Tumor Segmentation Using Convolutional
- Medu13350Learning From Dorothy Vaughan- Artificial Intelligence
- AI based intrusion detection system
- Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition
- Sustainable Natural Resources Management
- IRJET-ANALYSIS OF FACE RECOGNITION SYSTEM FOR DIFFERENT CLASSIFIER
- GAN Report
- slides_11(2)
- Nn 2012 Traffic
- 06_chapter-iii.docx
- Hello
- remotesensing-11-00544.pdf
- Speech Recog
- Understanding Convolutional Neural Networks for NLP – WildML
- 1903.0trya
- Large-scale Prediction of Drug–Target Interactions Using Protein Sequences And
- Hands-On Deep Learning for Images With Tensorflow Build Intelligent Computer Vision Applications Using Tensorflow and Keras - Will Ballard (Packt Publishing;201

You are on page 1of 39

Matthieu Cord

Joint work with Thibaut Durand, Nicolas Thome

Laboratoire dInformatique de Paris 6 (LIP6) - MLIA Team

UMR CNRS

1/35

Outline

1. MANTRA: Latent variable

model to boost classification

performances

2. WELDON: extension to

Deep CNN

2/35

Motivations

background), not centered objects, variable size, ...

ImageNet: centered objects

I Efficient transfert: needs bounding boxes [Oquab, CVPR14]

3/35

Motivations

How to learn without bounding boxes?

Multiple-Instance Learning/Latent variables for missing

information [Felzenszwalb, PAMI10]

Latent SVM and extensions => MANTRA

How to learn deep without bounding boxes?

Learning invariance with input image transformations

I Spatial Transformer Networks [Jaderberg, NIPS15]

I Stacked Attention Networks for Image Question Answering

[Yang, CVPR16]

Parts model

I Automatic discovery and optimization of parts for image

Deep MIL

I Is object localization for free? [Oquab, CVPR15]

I Deep extension of MANTRA: WELDON

4/35

Notations

Input x X observed observed image

Output y Y observed unobserved label

Latent h H unobserved unobserved region

Model missing information with latent variables h

Most popular approach in Computer Vision: Latent SVM

[Felzenszwalb, PAMI10] [Yu, ICML09]

5/35

Latent Structural SVM [Yu, ICML09]

Prediction function:

y, h)

( (1)

(y,h)YH

I Joint inference in the (Y H) space

Training: a set of N labeled trained pairs (xi , yi )

I Objective function: upper bound of (y , y

i i )

N

1 C X

kwk2 + max [(yi , y) + hw, (xi , y, h)i] max hw, (xi , yi , h)i

2 N (y,h)YH hH

i=1 | {z }

(yi ,

yi )

I LAI: max [(yi , y) + hw, (xi , y, h)i]

(y,h)YH

I Challenge exacerbated in the latent case, (Y H) space

6/35

MANTRA: Minimum Maximum Latent Structural SVM

max scoring latent value

not always relevant

MANTRA model:

Pair of latent variables (h+

i,y , hi,y )

I max scoring latent value: h+ = arg max hw, (x , y, h)i

i,y i

hH

I min scoring latent value: h

i,y = arg min hw, (xi , y, h)i

hH

New scoring function:

Dw (xi , y) = hw, (xi , y, h+

i,y )i + hw, (xi , y, hi,y )i (2)

y = arg max Dw (xi , y) (3)

yY

7/35

MANTRA: Minimum Maximum Latent Structural SVM

max scoring latent value

not always relevant

MANTRA model:

Pair of latent variables (h+

i,y , hi,y )

I max scoring latent value: h+ = arg max hw, (x , y, h)i

i,y i

hH

I min scoring latent value: h

i,y = arg min hw, (xi , y, h)i

hH

New scoring function:

Dw (xi , y) = hw, (xi , y, h+

i,y )i + hw, (xi , y, hi,y )i (2)

y = arg max Dw (xi , y) (3)

yY

7/35

MANTRA: Model & Training Rationale

Intuition of the max+min prediction function

x image, h image region, y image class

hw, (x, y, h)i: region h score for class y

Dw (x, y) = hw, (x, y, h+

y )i + hw, (x, y, hy )i

I h+

y : presence of class y large for yi

I h

y : localized evidence of the absence of class y

I Not too low for yi latent space regularization

I Low for y 6= yi tracking negative evidence [Parizi, ICLR15]

street image x Dw (x, street) = 2 Dw (x, highway) = 0.7 Dw (x, coast) = 1.5

8/35

MANTRA: Model Training

Learning formulation

Loss function: `w (xi , yi ) = max [(yi , y) + Dw (xi , y)] Dw (xi , yi )

yY

| {z } | {z } | {z }

score for ground truth output margin score for other output

N

1 C X

min kwk2 + `w (xi , yi ) (4)

w 2 N

i=1

I Fast convergence

I Still needs to solve LAI: max [(y , y) + D (x , y)]

y i w i

9/35

MANTRA: Optimization

Instantiations: binary & multi-class classification, AP ranking

Binary Multi-class AP Ranking

x bag bag set of bags

(set of regions) (set of regions) (of regions)

y 1 {1, . . . , K } ranking matrix

h instance (region) region regions

[1(y=1) (x, h), . . . , joint latent ranking

(x, y, h) y (x, h)

1(y=K ) (x, h)] feature map

(yi , y) 0/1 loss 0/1 loss AP loss

LAI exhaustive exhaustive exact and efficient

Solve Inference maxy Dw (xi , y) & LAI maxy [(yi , y) + Dw (xi , y)]

I Exhaustive for binary/multi-class classification

I Exact and efficient solutions for ranking

10/35

WELDON

Weakly supErvised Learning of Deep cOnvolutional Nets

MANTRA extension for training deep CNNs

Learning (x, y, h): end-to-end learning of deep CNNs with

structured prediction and latent variables

I Incorporating multiple positive & negative evidence

I Training deep CNNs with structured loss

11/35

Standard deep CNN architecture: VGG16

Simonyan et al. Very deep convolutional networks for large-scale image recognition.

ICLR 2015

12/35

MANTRA adaptation for deep CNN

Problem

Fixed-size image as input

13/35

MANTRA adaptation for deep CNN

Problem

Fixed-size image as input

1. Fully connected layers convolution layers

I sliding window approach

13/35

MANTRA adaptation for deep CNN

Problem

Fixed-size image as input

1. Fully connected layers convolution layers

I sliding window approach

13/35

MANTRA adaptation for deep CNN

Problem

Fixed-size image as input

1. Fully connected layers convolution layers

I sliding window approach

2. Spatial aggregation

I Perform object localization prediction

13/35

WELDON: deep architecture

C : number of classes

14/35

Aggregation function

[Oquab, 2015]

Region aggregation = max

Select the highest-scoring window

Oquab, Bottou, Laptev, Sivic. Is object localization for free? weakly-supervised

learning with convolutional neural networks. CVPR 2015 15/35

WELDON: region aggregation

Aggregation strategy:

max+min pooling (MANTRA prediction function)

k-instances

I Single region to multiple high scoring regions:

k k

1X 1X

max i-th max min i-th min

k k

i=1 i=1

I More robust region selection [Vasconcelos CVPR15]

16/35

WELDON: architecture

17/35

WELDON: learning

Objective function for multi-class task and k = 1:

N

1 X

min R(w) + `(fw (xi ), yigt )

w N

i=1

w w 0

fw (xi ) =arg max max Lconv (xi , y , h) + min 0

Lconv (xi , y , h )

y h h

Stochastic gradient descent training.

Back-propagation of the selecting windows error.

18/35

WELDON: learning

Class is present

Increase score of selecting windows.

19/35

WELDON: learning

Class is absent

Decrease score of selecting windows.

20/35

Experiments

Torch7 implementation

Datasets

Object recognition: Pascal VOC 2007, Pascal VOC 2012

Scene recognition: MIT67, 15 Scene

Visual recognition, where context plays an important role:

COCO, Pascal VOC 2012 Action

21/35

Experiments

VOC07 5.000 5.000 20 multi-label

VOC12 5.700 5.800 20 multi-label

15 Scene 1.500 2.985 15 multi-class

MIT67 5.360 1.340 67 multi-class

VOC12 Action 2.000 2.000 10 multi-label

COCO 80.000 40.000 80 multi-label

22/35

Experiments

23/35

Object recognition

VGG16 (online code) [1] 84.5 82.8

SPP net [2] 82.4

Deep WSL MIL [3] 81.8

WELDON 90.2 88.5

Table: mAP results on object recognition datasets.

[2] He et al. Spatial pyramid pooling in deep convolutional networks. ECCV 2014

[3] Oquab et al. Is object localization for free? CVPR 2015

24/35

Scene recognition

15 Scene MIT67

VGG16 (online code) [1] 91.2 69.9

MOP CNN [2] 68.9

Negative parts [3] 77.1

WELDON 94.3 78.0

Table: Multi-class accuracy results on scene categorization datasets.

[2] Gong et al. Multi-scale Orderless Pooling of Deep Convolutional Activation

Features. ECCV 2014

[3] Parizi et al. Automatic discovery and optimization of parts. ICLR 2015

25/35

Context datasets

VGG16 (online code) [1] 67.1 59.7

Deep WSL MIL [2] 62.8

Our WSL deep CNN 75.0 68.8

Table: mAP results on context datasets.

[2] Oquab et al. Is object localization for free? CVPR 2015

26/35

Visual results

27/35

Visual results

28/35

Visual results

29/35

Visual results (failing examples)

30/35

Visual results (failing examples)

Kindergarden Classroom

31/35

Analysis

Impact of the different improvements

a) max b) +k=3 c) +min d) +AP VOC07 VOC12 action

X 83.6 53.5

X X 86.3 62.6

X X 87.5 68.4

X X X 88.4 71.7

X X X 87.8 69.8

X X X X 88.9 72.6

IoU 25.6 30.4

32/35

Analysis

Impact of the number or regions k

k=1 k=3

33/35

Connections to others Latent Variables Models

Hidden CRF (HCRF) [Quattoni, PAMI07]

N

1 C X X X

kwk2 + log exphw, (xi , y, h)i log exphw, (xi , yi , h)i

2 N

i=1 (y,h)YH hH

N

1 2 C X

kwk + max {(yi ,y)+hw,(xi ,y,h)i} maxhw,(xi ,yi ,h)i

2 N (y,h)YH hH

i=1

N

( )

1 2 C

X X X

kwk + max (yi ,y)+log exphw,(xi ,y,h)i log exphw,(xi ,yi,h)i

2 N y

i=1 hH hH

WELDON

N

1 CX X X

kwk2 + max (yi ,y)+ hw,(xi ,y,h)i hw,(xi , yi , h)i

2 N y

i=1 hH hH

34/35

Thibaut Durand Nicolas Thome Matthieu Cord

Sorbonne Universits - UPMC Paris 6 - LIP6

http://webia.lip6.fr/~durandt/project/mantra.html

MANTRA: Minimum Maximum LSSVM for Image Classification and Ranking.

In IEEE International Conference on Computer Vision (ICCV), 2015.

Thibaut Durand, Nicolas Thome, and Matthieu Cord.

WELDON: Weakly Supervised Learning of Deep Convolutional Neural Networks.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

35/35

- Hardware Architectures for Deep Neural Networks-MIT'16Uploaded byModyKing99
- Reinventing the Retail IndustryUploaded byfoliox
- Python Machine Learning - Sample ChapterUploaded byPackt Publishing
- Brain Tumor Segmentation Using ConvolutionalUploaded bylogu_thalir
- Medu13350Learning From Dorothy Vaughan- Artificial IntelligenceUploaded byNixon Patel
- AI based intrusion detection systemUploaded byNavneet_Sinha
- Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object RecognitionUploaded byCortney Reyes
- Sustainable Natural Resources ManagementUploaded byAlexandra Scarlet
- IRJET-ANALYSIS OF FACE RECOGNITION SYSTEM FOR DIFFERENT CLASSIFIERUploaded byIRJET Journal
- GAN ReportUploaded bysenthilnathan
- slides_11(2)Uploaded byDark_Trekker
- Nn 2012 TrafficUploaded byfmannan
- 06_chapter-iii.docxUploaded byRasa Govindasmay
- HelloUploaded byColeen Domingo
- remotesensing-11-00544.pdfUploaded byFelipe Oviedo
- Speech RecogUploaded byzxvcbvb
- Understanding Convolutional Neural Networks for NLP – WildMLUploaded byKinect Asia
- 1903.0tryaUploaded byNand Kumar Yadav
- Large-scale Prediction of Drug–Target Interactions Using Protein Sequences AndUploaded byRaja Sekar A
- Hands-On Deep Learning for Images With Tensorflow Build Intelligent Computer Vision Applications Using Tensorflow and Keras - Will Ballard (Packt Publishing;201Uploaded byAnonymous rFPdu9K
- Ipta TemplateUploaded byAditya Dev
- E14_Feature Extraction and Pattern Recognition for Human MotionUploaded byArman Moin
- Temporal EvolutionUploaded byarchsark
- 24.SPCS10093Uploaded byIJIRAE- International Journal of Innovative Research in Advanced Engineering
- gavril si galceaUploaded byParaschiv Darius
- User GuideUploaded byJorge Marques
- Support Vector MachineUploaded byManu Mathew
- IJCSES 030602Uploaded byijcses
- SVR ConcreteUploaded byMahesh Pal
- A Comparative Study on TermUploaded byLewis Torres

- How to Study the Chess OpeningUploaded byParanalense
- dctUploaded byrequestebooks
- How to Study the Chess OpeningUploaded byrequestebooks
- Power Play 24Uploaded byrequestebooks
- B80_Huebner1Uploaded byrequestebooks
- Encyclopaedia of Chess Openings - Volume C (5th Edition), Chess Informant (2006).pdfUploaded byrequestebooks
- turk1991.pdfUploaded byrequestebooks

- C#Uploaded bylaamchinngon87
- 2012_851 ESMA MiFID Supervisory Briefing Appropriateness and Execution-OnlyUploaded byalefbiondo
- The Application of ISO 9001 to Agile SoftwareUploaded bynacho perez
- news.pdfUploaded bypadmam100
- Activate Local SLDUploaded byPaul Crauwels
- How to Apply for Bir Accreditation as Individual Tax AgentUploaded bypyulovincent
- Glossary_ Heat TreatmentUploaded byMuhammad Umair Rasheed
- Asteroid Mini User Guide UkUploaded byGaurav Asati
- Bullseye FormUploaded byMarcelo Andrade Santiago
- earth leakage protection selection.pdfUploaded byanand
- Hitachi Backup and Recovery Strategies for Sap Systems Implementation GuideUploaded byshyamts23
- Change Management StrategyUploaded byngo_quan_7
- Final English Version Chc Pspan Newsletter 2nd EditionUploaded byLeslie-Ann Boisselle
- State of Utah 2011 Comprehensive Report on HomelessnessUploaded byState of Utah
- The 2014 Canadian Telecom SummitUploaded byMark Goldberg
- SUBURBAN DYNATRAIL FURNACESUploaded bynibaldoaraneda
- Alkon Valves CatalogUploaded byZoran Jankov
- CaseStudies-RoboticArcWeldingUploaded byKoshy John
- file_134_2073Uploaded byGunadevan Chandrasekaran
- Vapour Compression UnitUploaded byifoo82
- Military Missile Bases HistoryUploaded byCAP History Library
- 2004-enriskUploaded byrodri01
- Events Management Dissertation Topics PDFUploaded byAngie
- Competent Leadership Progress TrackerUploaded byArea33Toastmasters
- Nebosh Igc 200 Question and Answers Part 7Uploaded bykhan
- bizhub c450 Network Scan OperationsUploaded byAylee_M
- WpcUploaded byPranab Salian
- 75. Moreno vs. Commission on ElectionsUploaded byDaley Limosinero
- Consumer Collection AgenciesUploaded bykbarn389
- 91044 Piping DrawUploaded byNapoleon Son Polo