Академический Документы
Профессиональный Документы
Культура Документы
Report Document
Overview of MPEG-7
Dr Zhang Sen
Speech Group, INRIA-LORIA Villers les Nancy, France Chinese Academy of Sciences Beijing, China
3/29/2013
Report Document
Outline of contents
Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information
Report Document
User Context Ozone Context Situation Se ns itivity
smart age nt
Oz on e Se ic rv es
User-interaction module
...
Security
Report Document
90
92
94
98 99 v1 v2
01
mpeg1
mpeg2
mpeg4
mpeg7
mpeg21
MPEG-3, ever defined, but abandoned MPEG-5 and -6, not defined
Speech and Language Processing Techniques 4
Report Document
MPEG Family
MPEG-1 Coding of moving pictures and audio for digital storage media (CD-ROM, MP3), 11/92 MPEG-2 Generic Coding of moving pictures and audio information (DVD, Digital TV), 11/94 MPEG-4 Coding of Audiovisual Objects for MM appls Ver1 09/98, Ver2 11/99 MPEG-7 Multimedia content description for AV material
08/01 MPEG-21 Digital AV framework: Integration of multimedia technologies, 11/01 Speech and Language Processing Techniques 5
Report Document
Report Document
Objective of MPEG-7
Standardize content-based description for various types of audiovisual information
Enable fast and efficient content searching, filtering and identification Describe several aspects of the content (low-level features, structure, semantic, models, collections, creation, etc.) Address a large range of applications
Report Document
Scope of MPEG-7
Description generation
Research and future competition
Description
Scope of MPEG-7
Description consumption
Research and future competition
The description generation (feature extraction, indexing process, annotation & authoring tools,...) and consumption (search engine, filtering tool, retrieval process, browsing device, ...) are non normative parts of MPEG-7. The goal is to define the minimum that enables interoperability.
Report Document
Scope of MPEG-7
standardization
Feature Extraction
Feature Extraction: Content analysis (D, DS) Feature extraction (D, DS) Annotation tools (DS) Authoring (DS)
MPEG-7 Description
MPEG-7 Scope: Description Schemes (DSs) Descriptors (Ds) Language (DDL) Ref: MPEG-7 Concepts
Search Engine
Search Engine: Searching & filtering Classification Manipulation Summarization Indexing
Report Document
Audio in MPEG-7
Audio content description (yes) Sound retrieval and classifier (yes) Speech synthesis (no) Speech recognition (no) Probability Models (yes)
10
Report Document
11
Report Document
Outline of contents
Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information
12
Report Document
13
Report Document
D D
DS D
DS D
DS D
DS
14
Report Document
15
Report Document
16
Report Document
DDL parser
DDL parser is a software to check if a description is valid
Description Parser
Yes or No
Schema
17
Report Document
Outline of contents
Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information
18
Report Document
Type of descriptions
Low level description (features, etc)
Generic and flexible Intelligent / efficient search engine
19
Report Document
Low-level Description
Information in the creation and production processes director, title, short feature movie Information related to the usage of the content copyright pointers, usage history, broadcast schedule Information on the storage features of the content storage format, encoding Information about low-level features in the content colors, textures, sound timbres, melody
20
Report Document
High-level Description
Structural description video segments, frames, still and moving regions, audio segments Segment DS (representing the spatial, temporal or spatio-temporal structure) Conceptual (semantic) description objects, events, and notions links of the two descriptions
Speech and Language Processing Techniques 21
Report Document
Illustration of descriptions
22
Report Document
Basic description
Elements
Information containers containing data and other elements <city> </city>
Attributes
Attribute-value pairs used to characterize elements <city population=10000> </city>
Speech and Language Processing Techniques 23
Report Document
Structured descriptions
Structured descriptions are trees Trees are suitable for retrieval and search
DS
DS DS D
24
Report Document
Description trees
<letter> <header> <name> Mr Sen </name> <address> <street> 16 rue Laplace </street> <city> Nancy </city> </address> </header> <text> Dear Mr White, </text> </letter>
letter text
city
25
Report Document
26
Report Document
Outline of contents
Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information
27
Report Document
Audio description
Low-level Description
spectrum, parametric, and temporal features
High-level Description
Audio signature Description Scheme Instrument timbre Description Schemes The melody Description Tools Sound recognition and indexing Description Tools Spoken Content Description Tools
Speech and Language Processing Techniques 28
Report Document
Report Document
AudioPower Descriptor
describes the temporally-smoothed instantaneous power
30
Report Document
AudioSpectrumCentroid Descriptor
describes the center of gravity of the log-frequency power spectrum
AudioSpectrumSpread Descriptor
describing the second moment of the log-frequency power spectrum
AudioSpectrumFlatness Descriptor
describes the flatness properties of the spectrum
31
Report Document
Report Document
33
Report Document
34
Report Document
SoundModelStateHistogram Descriptor
SoundClassificationModel DS
speech vs music, male vs female, trumpet vs violin genre classification, voice recognition
35
Report Document
36
Report Document
SpokenContentHeader
contains information about the speakers being recognized and the recognizer itself WordLexicon Descriptor PhoneLexicon Descriptor SpeakerInfo Descriptor ConfusionInfo Descriptor
Speech and Language Processing Techniques 37
Report Document
Gaussian DS
<Gaussian> <Mean> 4087.18 7173.73 1.36364 94.2727 1834.36 2359.55 2645.27 2577.09 </Mean> <Variance> 1.6982e+007 5.21621e+007 14.3636 9749.09 3.65743e+006 </Variance> </Gaussian>
38
Report Document
State-transition model DS
<StateTransitionModel> <Transitions size1="20" size2="20"> 0 0 0.210526 0.0526316 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 </Transitions> <Initial size="20"> 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 </Initial> <State label="0 players" confidence="1"> <State label="19 players" confidence="0.223607"> </StateTransitionModel>
39
Report Document
ProbabilityModelClassier DS
<ProbabilityModelClassifier confidence="0.9" length="2"> <ProbabilityModelClass SemanticLabel="fish" Confidence="0.5" DescriptorName="ColorHistogram"> <Gaussian> <Mean> 4087.18 7173.73 1.36364 94.2727 1834.36 2359.55 . </Mean> <Variance> 1.6982e+007 5.21621e+007 14.3636 9749.09 . </Variance> </Gaussian> </ProbabilityModelClass>
40
Report Document
SpokenContentLattice DS
A lattice structure for an hypothetical (combined phone and word) decoding of the expression Taj Mahal drawing .
41
Report Document
AudioSpectrumBasis
AUDIO QUERY SPECTRUM PROJECTION N
Extraction of sound indexes using a sound-recognition classifier. The model reference and state path is stored.
Segmented Audio Description
HMM 1
SELECT
SoundModelStatePath
HMM N-1
HMM N
SoundRecognitionModel
42
Report Document
AudioSpectrumBasis
AUDIO QUERY SPECTRUM PROJECTION N
Indexed Audio
MPEG-7 SOUND DATABASE
ContinuousMarkovModel
HMM 1
HMM 2
SELECT
MATCHING
SoundModelStatePath
HMM N-1
Query-by-example application with a query in media source form. Features must be extracted and projected into the classification space for each model in order to match against the database.
SoundRecognitionModel
43
Report Document
DDL QUERY
MATCHING
RESULT LIST
44
Report Document
AUDIO WAV FILES
Extraction of hidden Markov model and basis functions and storage in a DDL representation
AudioSpectrumBasis
SoundRecognitionModel
HMM AND BASIS
FEATURE EXTRACT
BASIS EXTRACT
HMM
SoundRecognitionFeatures
ContinuousMarkovModel
45
Report Document
46
Report Document
Outline of contents
Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information
47
Report Document
Multimedia DSs
Multimedia Description Schemes are metadata structures for describing and annotating audio-visual (AV) content
Basic Elements Content Management Content Description Content Organization Navigation and Access User Interaction
Speech and Language Processing Techniques 48
Report Document
49
Report Document
Content Management
Creation and production information
Creation information
title, textual annotation, creators, and dates
Classification information
genre, subject, purpose, language
Content usage
usage rights, usage record
Speech and Language Processing Techniques 50
Report Document
Variations
selection of the most suitable of an AV program adapt to the different capabilities of terminal devices, network conditions or user preferences Speech and Language Processing Techniques 51
Report Document
Hierarchical summary
52
Report Document
Illustration of variations
53
Report Document
Content Organization
Collections
group the contents into clusters describes statistics and models of the attribute values describe relationships among collection clusters
Models
model the attributes and features of AV content Probability Model
specify statistical functions and structures
Analytic Model
54
Report Document
Collection Structure
55
Report Document
User Interaction
User Preference
context dependency in terms of time and place relative importance of different preferences privacy characteristics of the preferences preferences update by agent or user
Usage History
history of actions used to determine the user's preferences
Speech and Language Processing Techniques 56
Report Document
Outline of contents
Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information
57
Report Document
eXperimentation Model(XM)
Simulation platform for: Ds, DSs, CSs, DDL XM applications: the server (extraction) applications the client (search, filtering and/or transcoding) applications CS: Coding Schemes
58
Report Document
The XM applications
Extraction from Media
all low-level Ds or DSs should have an application class of this type
59
Report Document
60
Report Document
61
Report Document
62
Report Document
63
Report Document
64
Report Document
65
Report Document
66
Report Document
Illustration of applications
Users
67
Report Document
Information Flow
Feature extraction
Manual/automatic
AV Description
Search/query
Storage
Pull
Browse Filter
Push
Decoding Encoding
Transmission
Users
68
Report Document
Pull applications
Example: Broadcast of video, Interactive TV Advantage: Intelligent agents filter standardized descriptions
Speech and Language Processing Techniques 69
Report Document
MPEG-7 Database
70
Report Document
71
Report Document
Example: queries
Text (keywords):
Find AV material with subject corresponding to some keywords Find AV material corresponding to a specified semantic Find an image with similar characteristics (global or local)
72
Report Document
73
Report Document
Outline of contents
Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information
74
Report Document
Report Document
76
Report Document
MPEG-7 beyond
To mould computers around human requirements and not humans around computer requirements To enable content disclosure based on facts, rather than on human annotations To find information by rich spoken queries, handdrawn images and address what most people expect computers to be able to do
77
Report Document
78
Report Document
Conclusion
Ds Features
AV contents
Structures DSs DDL Ds, DSs
User
79
Report Document
Thanks
80
Report Document
81
Report Document
Still regions
Color Shape Position Texture
Moving regions
Color Motion trajectory Parametric motion
Spatio-temporal shape
Audio segments
Spoken content Spectral feature Timbre
82
Report Document
Projection of a face vector onto a set of basis vect Feature set is extracted from a normalized face im Normalized face image
56 lines with 46 intensity values in each line The centers of the two eyes are located on the 24th row
83
Report Document
Segment Decomposition
84
Report Document
85
Report Document
Search retrieval
MPEG-7 Database
86
Report Document
Segment DS
Segment DS describes the result of a spatial, temporal, or spatio-temporal partitioning of the AV content. It has nine major subclasses:
Multimedia Segment DS AudioVisual Region DS AudioVisual Segment DS Audio Segment DS Still Region DS Still Region 3D DS Moving Region DS Video Segment DS Ink Segment DS Speech and Language Processing Techniques 87
Report Document
88
Report Document
89
Report Document
Concept DS
Semantic state DS
Semantic place DS
AV content Semantic time DS
90
Report Document
Visual description
Basic structures
Grid layout, Time series, Multiple view, Spatial 2D coordinates, Temporal interpolation
Descriptors
Color, Texture, Shape, Motion, Localization
91
Report Document
Report Document
Report Document
Audio Framework
94
Report Document
Descriptor
Definition
A Descriptor (D) is a representation of a Feature. A Descriptor defines the syntax and the semantics of the Feature representation.
Notes
A descriptor allows an evaluation of the corresponding feature via the descriptor value. It is possible to have several descriptors representing a single feature.
Examples
For example for the color feature, possible descriptors are: the color histogram, the average of the frequency components, the motion field, the text of the title, etc.
95
Report Document
Descriptor Value
Definition A Descriptor Value is an instantiation of a Descriptor for a given data set (or subset thereof).
Notes
Descriptor Values are combined via the mechanism of a Description Scheme to form a Description.
96
Report Document
Description Scheme
Definition A Description Scheme (DS) specifies the structure and semantics of the relationships between its components, which may be both Descriptors and Description Schemes. Examples A movie, structured as scenes and shots, including some textual descriptors at the scene level, and color, motion and some audio descriptors at the shot level.
Note
Ds contain only basic data types, and does not refer to others D or DSs.
97
Report Document
98
Report Document
Basic elements of DS
Constructs for linking media files Localizing pieces of content Describing
time, places, persons, individuals, groups, organizations, and textual annotation, etc Who? What object? What action? Where? When? Why? and How?
99
Report Document
100
Report Document
101