Вы находитесь на странице: 1из 42

Multimedia Technology

Lecture 1: Overview and Arrangement

Lecturer: Dr . Wan-Lei Zhao


Autumn Semester 2015

Contact:are
wlzhao@xmu.edu.cn
All rights
reserved by Wan-Lei zhao
1 / 42

About this Course

Outline

About this Course

Syllabus

Course plan

Brief History about IR and Web

Brief History about WWW

All rights are reserved by Wan-Lei zhao


2 / 42

About this Course

Major subjects
Deal with information such as text, image and video
Text retrieval, content-based image retrieval and video retrieval
Focus on how to retrieve above mentioned information
Popular machine learning approaches will be covered
K-means, SVM and decision tree
Popular model tting approaches will be covered
RANSAC and Hough transform
Popular algorithms in computer vision will be covered
SIFT, BoVW and Hamming Embedding

Objectives
Bring you into this interesting topic
Get you familiar with basic & popular algorithms in this eld
Able to build a simple but workable search engine on your own
Able to apply algorithms to solve the problems in your eld

All rights are reserved by Wan-Lei zhao


3 / 42

Syllabus

Text Retrieval (42 hours)


Brief History about IR and Web
Pre-processing on Text Information
Three Retrieval Models
Boolean, vector and probability models
Evaluation Measure
Web Search
Parallel Computing in IR

Machine Learning Approaches (22 hours)


K-means
Spectral clustering
Decision Tree
K-Nearest Neighbour
Support Vector Machine (SVM)
Nearest Neighbour Search (12 hours)
R-Tree
KD-Tree
Locality Sensitive Hashing
Product Quantizer

All rights are reserved by Wan-Lei zhao

4 / 42

Syllabus

Model Fitting
RANSAC
Hough Transform
Image & Video Retrieval (22 hours)
Challenges & Trends
Image Features: SIFT and et al.
BoVW Framework
Fisher Kernel Framework
Challenges in Video Retrieval
Temporal Verication Approach
Image Classication and MISC (12 hours)
Challenges & Trends
One-against-all Framework
Tricks in model training
Convolutional Neural Network

All rights are reserved by Wan-Lei zhao


5 / 42

Syllabus

Course work in the lab (32 hours)


Three experiments
Subjects that you learn in the class
Keep secret until the lab time
Each time, it is also a quiz
10 marks for each experiment
NO team work!!!
Late submission is allowed, but with 30% discount
Presentation of the course project (22 hours)
Two course projects
Implement after class
Team work is encouraged, but size(team)4
15 minutes for each team to present their project
A hardcopy of the project report is also required

All rights are reserved by Wan-Lei zhao


6 / 42

Syllabus

Prerequisites of this course


Data Structure
You have to be familiar with it
Otherwise, you are not suggested to take this course
Good at C/C++
It will be used in the lab
It is recommended for your course project
Basic knowledge about Internet
Internet protocols
Mechanism of WWW
HTML and Javascript
Matlab is a plus
It will be used in the lab
Even you do not know, it does not matter
You will learn its basics during this course

All rights are reserved by Wan-Lei zhao


7 / 42

Syllabus

Teaching assistant for this course

Mr. Zhihui Chen will be in charge of the course project related issues
Miss Haihui Liu helps to do proofreading on the course materials
Experiment lectures are held in Labotrary building, Room 501
Time slot: 2:30pm -4:20pm, in the 6th, 8th and 10th weeks
I will remind you one week ahead

All rights are reserved by Wan-Lei zhao


8 / 42

Syllabus

Course website
Platform of online teaching in XMU
URL: l.xmu.edu.cn, please go to there and register the course
Password: 007

All rights are reserved by Wan-Lei zhao


9 / 42

Syllabus

Language in the Class


English or Chinese?
You might be uncomfortable at

the beginning
Me too:)
Several advantages:
Computer science is dened in
English
Get you guys used to English

All rights are reserved by Wan-Lei zhao


10 / 42

Syllabus

Intersection of four disciplines


Related (top-ranked) Conferences:
ACM SIGIR, WWW
ACM MM, ACM ICMR & ACM ICME
IEEE CVPR & ECCV
IEEE ICCV, IEEE ACCV, IEEE ACCV & BMVC
ICML & AAAI
All rights
are reserved by Wan-Lei zhao
11 / 42

Syllabus

Related (top-ranked) Journals:


IEEE Trans. on Knowledge and Data Engineering
IEEE Trans. on Pattern Analysis and Machine Intelligence
International Journal of Computer Vision
IEEE Trans. on Multimedia
IEEE Trans. on Image Processing
Computer Vision and Image Understanding
Reference Books
R. Baeza-Yates and et al., Modern Information Retrieval: The
Concepts and Technology behind Search (2nd edition)
Richard Szeliski, Computer Vision: Algorithms and Applications
Lecture notes of Machine Learning by Dr. Andrew Ng, from
Stanford University
Related papers will be suggested to read as assignment
Online Resources:
Youku
Wikipedia
Baidu Baike

All rights are reserved by Wan-Lei zhao


12 / 42

Syllabus

Question: can our brain understand how our brain works?


We are going to have a taste that how tough this question is from

two aspects
1

Computer Vision
Machine Learning

2
All rights
are reserved by Wan-Lei zhao
13 / 42

Course plan

Evaluation: 3 lab experiments + 2 course projects

S = 30% + 35% + 35%


About course projects

Implemented in C, C++/Python, Matlab


If you do not know Python or Matlab, learn it!!
Sample codes will be given, you only need to ll blanks
Team work is encouraged for the two course projects
Team leader will be marked 5 credits higher or lower depending on the
performance
Report (only the second one) and presentation (both) are required (in
English if possible)
Failure is acceptable but no cheating or plagiarism
If it happens, you are OUT!!
Any questions?

All rights are reserved by Wan-Lei zhao


14 / 42

Course plan

Be an Active Learner
Level 1
Catch the concept
Level 2
Understand the idea
Know how to use it
Level 3
Able to re-implement the algorithms
Knows where it works
Knows where it fails

All rights are reserved by Wan-Lei zhao


15 / 42

Brief History about IR and Web

Outline

About this Course

Syllabus

Course plan

Brief History about IR and Web

Brief History about WWW

All rights are reserved by Wan-Lei zhao


16 / 42

Brief History about IR and Web

Human Languages (1)


7,000 languages in the world
90% of these languages are used by less than 100,000 people
Based on your knowledge and imagination
Please list out top-5 most popularly used languages
Give the rank also, do it now ...

All rights are reserved by Wan-Lei zhao


17 / 42

Brief History about IR and Web

Human Languages (1)


7,000 languages in the world
90% of these languages are used by less than 100,000 people

Language
Mandarin
English
Hindi
Spanish
Russian

Population
1.2 billion
508 million
497 million
392 million
277 million

Category
isolating language
reecting language
reecting language
reecting language
reecting language

Region
China
UK, North America
India & Pakistan
Span & South America
Russia & East Europe





  
 

Mainly talk about retrieval on English documents


Mention a little about processing on Chinese documents

All rights are reserved by Wan-Lei zhao

18 / 42

Brief History about IR and Web

Human Languages (2)


$  
# 
 
! 
 
 

  
 

&



&



&



&



Figure : Weights of real impact to the world.


In terms of real inuence, the rank changes1
Inuence: economically, politically, size of population and number of

countries
All rights
arebyreserved
by Wan-Lei zhao
1
Conducted
Webb.

19 / 42

Brief History about IR and Web

Distribution of World Languages

Pay attention that not all the languages have their written forms

All rights are reserved by Wan-Lei zhao


20 / 42

Brief History about IR and Web

Evolution of Storage Media

Egyptian papyrus2
Babylonian clay tablet (3000 B.C.)
Chinese Oracle (1400 B.C.)
In 105 A.D., paper was invented in China

2
It is not are
paper
in real sense.
All rights
reserved
by Wan-Lei zhao
21 / 42

Brief History about IR and Web

Story of Rosetta Stone

Written in both acient Egyptian and Greek, discovered in 1799


in 196 BC on behalf of King Ptolemy V.
Key to understanding of acient Egyptian
J.-F. Champollion decoded the language

All rights are reserved by Wan-Lei zhao

22 / 42

Brief History about IR and Web

library comes from Latin word liber, means book


bibliothek comes from Greek word biblion, means book written

on papyrus

All rights are reserved by Wan-Lei zhao


23 / 42

Brief History about IR and Web

Spread of ancient civilizations


Five ancient civilizations: ancient Egypt, ancient Babylion, ancient

India, ancient China, ancient Maya

All rights are reserved by Wan-Lei zhao


24 / 42

Brief History about IR and Web

The rst library (as far as we know) was established in north Syria,

around 3000 BC
Later, Empire Assyria built Library Nineveh (current Mosul) in 612
BC
Best well-known library was built by Alexander the Great about 350
BC in Egypt

In China, library appeared around 800 BC

All rights are reserved by Wan-Lei zhao

25 / 42

Brief History about IR and Web

Evolution of Storage Media


After the advent of computer

All rights are reserved by Wan-Lei zhao


26 / 42

Brief History about IR and Web

IR in two dierent eras

Media
Publishing
Storage
Indexing
Interface

before WWW
text document, TV, lm & CD
months or years
books & papers
title, author, keywords and date
library

WWW era
in electronic forms
hours
disc, DVD and etc & web
and contents
browser

According to IBM, 90% of the knowledge in the world are created in

last two years


Powerful IR system is required to coordinate the distribution of

information/knowledge

All rights are reserved by Wan-Lei zhao


27 / 42

Brief History about WWW

The Birth of WWW


1981-1991: the invention of the Web
In 1980, Tim Berners-Lee worked in CERN (European Organization for
Nuclear Research)
Manage information for physicists such that they can share
In 1984, he returned to CERN
In 1989, he wrote a proposal about large hypertext database
By Christmas 1990, he built all necessary elements for web
HTTP, HTML, web browser and httpd

All rights are reserved by Wan-Lei zhao


28 / 42

Brief History about WWW

The growth of World Wide Web


Early times of growth (1991-1995)
Microsoft has its rst browser: Cello
Mosaic (from UIUC) is the rst successful browser
W3C was founded by Berners-Lee in 1994 at MIT
Commercialize (1996-1998)
More and more dot-coms appeared
Boom and Bust (1999-2001)
More and more dot-coms appeared
Internet becomes popular in China
Many currently well-known companies were established: Baidu,Alibaba
Search Engines were born

All rights are reserved by Wan-Lei zhao


29 / 42

Brief History about WWW

The growth of World Wide Web


Early times of growth (1991-2001)
First version of Java was released in 1995
First version of PHP was released in 1995
JavaScript was invented by Netscape in 1995
Static web to dynamic web
Strong support for multimedia






 




All rights are reserved by Wan-Lei zhao


30 / 42

Brief History about WWW

WWW is everywhere
Ubiquitous web (2002-present)
Introduction of Web 2.0 is the milestone
Wikipedia was born in 2001
Flickr was born in 2004
Facebook was born in 2004
Youtube was born in 2006
Twitter was born in 2006
Smartphone was released in 2007
All technologies and media are intertwined to reshape the world
Impact on our daily life of many aspects
IR becomes the main interface to them all

All rights are reserved by Wan-Lei zhao


31 / 42

Brief History about WWW

Semantic Web
Web 3.0 (20??)
Proposed by Berners-Lee3
Websites are linked by semantic meta data
Machine builds the link automatically
Requires technology of natural language understanding
Still a vague concept
Automatic documenting, e.g. books and recipes

Weaving the Web: The Original Design and Ultimate Destiny of the World Wide
Web,
in American
Scientic, 2000
All
rights
are reserved
by Wan-Lei zhao
32 / 42

Brief History about WWW

Statistics on WWW
Num. of websites and users (2000-1013)
Num. of sites
Num. of users

Number

2B

1B

100M
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Year

The growth rate of user is much higher than that of websites


All rights
are reserved
by Wan-Lei
The growth
rate of clicks
would be zhao
even much higher
33 / 42

Brief History about WWW

Challenges in Modern Information Retrieval

 

  



 









 

  

  

  
  

How to bridge such a semantic gap


A word is worth a thousand pictures
A picture is worth a thousand of words

All rights are reserved by Wan-Lei zhao


34 / 42

Brief History about WWW

Scalability in the age of BIG data (1)


A glance at big data today
1.1 billion websites until Nov. 2014
>3,000 images uploaded to Flickr in every minute4
>200,000 videos uploaded per day to YouTube (>1,000 years)
TV News: thousands hours of programs broadcasted each day
>100 billion photos in Facebook till Jun. 2011
Challenges: facilitate fast browsing and sharing
How to store?
How to organize?
How to retrieve?

4
Statisticsare
wasreserved
collected on by
Apr.Wan-Lei
28th 2010. zhao
All rights
35 / 42

Brief History about WWW

Scalability in the age of BIG data (2)









All rights
are reserved by Wan-Lei zhao
Given the thickness of one photo: 0.2 mm

36 / 42

Brief History about WWW

Top Rank Search Engines



  
 
 
 
 
 
 

Google takes lions share of the market


Baidu is not in the rank (unfortunately)5
5
Cited from:
All rights
arehttp://www.ebizmba.com/articles/search-engines
reserved by Wan-Lei zhao
37 / 42

Brief History about WWW

Sketch the framework of a search engine

Draw a framework about a search engine in 5 minutes


Put all elements you could gure out, do it now ...

All rights are reserved by Wan-Lei zhao

38 / 42

Brief History about WWW

Framework of a search engine

  
  
  





 
 



 

 




 

Observations
Information are highly distributed in Internet
The indexer (search engine) keeps information in a centralized manner

All rights are reserved by Wan-Lei zhao


39 / 42

Brief History about WWW

Structure of a crawler

Observations
Crawler plays very important role
Experiences of using Baidu and Google

All rights are reserved by Wan-Lei zhao


40 / 42

Q&A

All rights are reserved by Wan-Lei zhao


41 / 42

Thanks for your attention!

All rights are reserved by Wan-Lei zhao


42 / 42

Вам также может понравиться