Вы находитесь на странице: 1из 101

Chinese Academy of Sciences, Beijing, China

Report Document

Overview of MPEG-7
Dr Zhang Sen

Speech Group, INRIA-LORIA Villers les Nancy, France Chinese Academy of Sciences Beijing, China
3/29/2013

Speech and Language Processing Techniques

Chinese Academy of Sciences, Beijing, China

Report Document

Outline of contents
Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information

Speech and Language Processing Techniques

Chinese Academy of Sciences, Beijing, China

Report Document
User Context Ozone Context Situation Se ns itivity

Ozone WP2 architecture


Ozone application
spee ch re cognition ge sture re cognition v ide o browser animated age nt Authen tication

Multi-modal widgets Dialog management

smart age nt

Pe rce p tion QoS

User Interface mana ge ment

Oz on e Se ic rv es

User-interaction module
...

Security

Software Environment layer

Speech and Language Processing Techniques

Chinese Academy of Sciences, Beijing, China

Report Document

From MPEG-1 to MPEG-7

90

92

94

98 99 v1 v2

01

mpeg1

mpeg2

mpeg4

mpeg7

mpeg21

MPEG-3, ever defined, but abandoned MPEG-5 and -6, not defined
Speech and Language Processing Techniques 4

Chinese Academy of Sciences, Beijing, China

Report Document

MPEG Family
MPEG-1 Coding of moving pictures and audio for digital storage media (CD-ROM, MP3), 11/92 MPEG-2 Generic Coding of moving pictures and audio information (DVD, Digital TV), 11/94 MPEG-4 Coding of Audiovisual Objects for MM appls Ver1 09/98, Ver2 11/99 MPEG-7 Multimedia content description for AV material

08/01 MPEG-21 Digital AV framework: Integration of multimedia technologies, 11/01 Speech and Language Processing Techniques 5

Chinese Academy of Sciences, Beijing, China

Report Document

Why is MPEG-7 needed


Digital audiovisual information increasing
more and more available contents all kinds of sources of information

Use of the digital audiovisual information


description of the contents fast search of the contents

Speech and Language Processing Techniques

Chinese Academy of Sciences, Beijing, China

Report Document

Objective of MPEG-7
Standardize content-based description for various types of audiovisual information
Enable fast and efficient content searching, filtering and identification Describe several aspects of the content (low-level features, structure, semantic, models, collections, creation, etc.) Address a large range of applications

Types of audiovisual information:


Audio, speech Moving video, still pictures, graphics, 3D models Information on how objects are combined in scenes

Speech and Language Processing Techniques

Chinese Academy of Sciences, Beijing, China

Report Document

Scope of MPEG-7
Description generation
Research and future competition

Description
Scope of MPEG-7

Description consumption
Research and future competition

The description generation (feature extraction, indexing process, annotation & authoring tools,...) and consumption (search engine, filtering tool, retrieval process, browsing device, ...) are non normative parts of MPEG-7. The goal is to define the minimum that enables interoperability.

Speech and Language Processing Techniques

Chinese Academy of Sciences, Beijing, China

Report Document

Scope of MPEG-7
standardization

Feature Extraction
Feature Extraction: Content analysis (D, DS) Feature extraction (D, DS) Annotation tools (DS) Authoring (DS)

MPEG-7 Description
MPEG-7 Scope: Description Schemes (DSs) Descriptors (Ds) Language (DDL) Ref: MPEG-7 Concepts

Search Engine
Search Engine: Searching & filtering Classification Manipulation Summarization Indexing

Speech and Language Processing Techniques

Chinese Academy of Sciences, Beijing, China

Report Document

Audio in MPEG-7
Audio content description (yes) Sound retrieval and classifier (yes) Speech synthesis (no) Speech recognition (no) Probability Models (yes)

Speech and Language Processing Techniques

10

Chinese Academy of Sciences, Beijing, China

Report Document

Parts of the MPEG-7 Standard


ISO / IEC 15938 - 1: Systems ISO / IEC 15938 - 2: Description Definition Language ISO / IEC 15938 - 3: Visual ISO / IEC 15938 - 4: Audio ISO / IEC 15938 - 5: Multimedia Description Schemes ISO / IEC 15938 - 6: Reference Software

Speech and Language Processing Techniques

11

Chinese Academy of Sciences, Beijing, China

Report Document

Outline of contents
Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information

Speech and Language Processing Techniques

12

Chinese Academy of Sciences, Beijing, China

Report Document

Main elements of MPEG-7


Descriptors (D): representations of features, that define the
syntax and the semantics of each feature representation (low-level).

Description Schemes (DS): that specify the structure and


semantics of the relationships between their components, which may be both Ds and DSs (high-level).

A Description Definition Language (DDL): based


on XML Schema, to allow the creation of new DSs and Ds, and to allow the extension and modification of existing DSs

System tools: to support multiplexing of descriptions,


synchronization issues, transmission mechanisms, coded representations, management and protection of intellectual property

Speech and Language Processing Techniques

13

Chinese Academy of Sciences, Beijing, China

Report Document

Relations of main elements


DDL
DS DS DS DS

D D

DS D

DS D

DS D

DS

Speech and Language Processing Techniques

14

Chinese Academy of Sciences, Beijing, China

Report Document

Description Definition Language


Description Definition Language (DDL) is a language that define what description is valid, and allows the creation of new Description Schemes and Descriptors. It also allows the extension and modification of existing Description Schemes DDL is used to define a set of formal rules ordering of the elements
occurrences of elements ...

XML + MPEG-7 extensions

Speech and Language Processing Techniques

15

Chinese Academy of Sciences, Beijing, China

Report Document

XML: Base for DDL


Why choose XML as the base for the DDL? The popularity of XML The interoperability with other standards in the future Why XML should be extended for MPEG-7? SGML > XML Structural extensions Datatype extensions

Speech and Language Processing Techniques

16

Chinese Academy of Sciences, Beijing, China

Report Document

DDL parser
DDL parser is a software to check if a description is valid
Description Parser

Yes or No

Schema

Speech and Language Processing Techniques

17

Chinese Academy of Sciences, Beijing, China

Report Document

Outline of contents
Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information

Speech and Language Processing Techniques

18

Chinese Academy of Sciences, Beijing, China

Report Document

Type of descriptions
Low level description (features, etc)
Generic and flexible Intelligent / efficient search engine

High level description (structures, concepts,etc)


Efficient and powerful Lack of flexibility

Speech and Language Processing Techniques

19

Chinese Academy of Sciences, Beijing, China

Report Document

Low-level Description
Information in the creation and production processes director, title, short feature movie Information related to the usage of the content copyright pointers, usage history, broadcast schedule Information on the storage features of the content storage format, encoding Information about low-level features in the content colors, textures, sound timbres, melody

Speech and Language Processing Techniques

20

Chinese Academy of Sciences, Beijing, China

Report Document

High-level Description
Structural description video segments, frames, still and moving regions, audio segments Segment DS (representing the spatial, temporal or spatio-temporal structure) Conceptual (semantic) description objects, events, and notions links of the two descriptions
Speech and Language Processing Techniques 21

Chinese Academy of Sciences, Beijing, China

Report Document

Illustration of descriptions

Speech and Language Processing Techniques

22

Chinese Academy of Sciences, Beijing, China

Report Document

Basic description
Elements
Information containers containing data and other elements <city> </city>

Attributes
Attribute-value pairs used to characterize elements <city population=10000> </city>
Speech and Language Processing Techniques 23

Chinese Academy of Sciences, Beijing, China

Report Document

Structured descriptions
Structured descriptions are trees Trees are suitable for retrieval and search
DS
DS DS D

Speech and Language Processing Techniques

24

Chinese Academy of Sciences, Beijing, China

Report Document

Description trees
<letter> <header> <name> Mr Sen </name> <address> <street> 16 rue Laplace </street> <city> Nancy </city> </address> </header> <text> Dear Mr White, </text> </letter>

letter text

header name street address

city

Speech and Language Processing Techniques

25

Chinese Academy of Sciences, Beijing, China

Report Document

Example: Audio description


<Mpeg7Main>
<DescriptionMetadata> <Version>1.0</Version> </DescriptionMetadata> <ContentDescription> <AudioContent xs1:type=AudioType> <Audio> <CreationInformation> <Creation> <Title> The daily news </Title> </Creation> </CreationInformation> </Audio> </AudioContent> </ContentDescription> </Mpeg7Main>

Speech and Language Processing Techniques

26

Chinese Academy of Sciences, Beijing, China

Report Document

Outline of contents
Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information

Speech and Language Processing Techniques

27

Chinese Academy of Sciences, Beijing, China

Report Document

Audio description
Low-level Description
spectrum, parametric, and temporal features

High-level Description
Audio signature Description Scheme Instrument timbre Description Schemes The melody Description Tools Sound recognition and indexing Description Tools Spoken Content Description Tools
Speech and Language Processing Techniques 28

Chinese Academy of Sciences, Beijing, China

Report Document

Audio low-level descriptors


Waveform Loudness Spectral basis Spectral envelope Spectral centroid Spectral spread Fundamental frequency Harmonicity Attack time
Speech and Language Processing Techniques 29

Chinese Academy of Sciences, Beijing, China

Report Document

Audio descriptor: Basic


Two basic audio Descriptors
AudioWaveform Descriptor
describes the audio waveform envelope (minimum and maximum)

AudioPower Descriptor
describes the temporally-smoothed instantaneous power

Speech and Language Processing Techniques

30

Chinese Academy of Sciences, Beijing, China

Report Document

Audio descriptor: Basic Spectral


AudioSpectrumEnvelope Descriptor
describes the short-term power spectrum

AudioSpectrumCentroid Descriptor
describes the center of gravity of the log-frequency power spectrum

AudioSpectrumSpread Descriptor
describing the second moment of the log-frequency power spectrum

AudioSpectrumFlatness Descriptor
describes the flatness properties of the spectrum

Speech and Language Processing Techniques

31

Chinese Academy of Sciences, Beijing, China

Report Document

Audio Signature Description


AudioSignature Description Scheme provides a unique content identifier for the purpose of robust automatic identification of audio signals Applications include
audio fingerprinting identification of audio locating metadata for legacy audio content
Speech and Language Processing Techniques 32

Chinese Academy of Sciences, Beijing, China

Report Document

Instrument Timbre Description


Timbre is defined as the perceptual features that make two sounds having the same pitch and loudness sound different. Timbre Description describes the perceptual features with a reduced set of Descriptors
HarmonicInstrumentTimbre Descriptor LogAttackTime Descriptor PercussiveIinstrumentTimbre Descriptor Combination with Basic Spectral Descriptors

Speech and Language Processing Techniques

33

Chinese Academy of Sciences, Beijing, China

Report Document

Melody Description Tools


The melody Description Tools is to facilitate efficient, robust, and expressive melodic similarity matching

MelodyContour Description Scheme


5-step contour representation basic rhythmic information representation

MelodySequence Description Scheme


supporting an expanded descriptor set and high precision of interval encoding

Speech and Language Processing Techniques

34

Chinese Academy of Sciences, Beijing, China

Report Document

General Sound Recognition and Indexing Description Tools


SoundModel (SM) DS
statistical model, such as HMM or GMM SoundModelStatePath Descriptor
consists of a state sequence generated by a SM consists of a normalized histogram of the state sequence generated by a SM given an audio segment

SoundModelStateHistogram Descriptor

SoundClassificationModel DS

a trainable multi-way classifier based on SMs

speech vs music, male vs female, trumpet vs violin genre classification, voice recognition

Speech and Language Processing Techniques

35

Chinese Academy of Sciences, Beijing, China

Report Document

Spoken content retrieval


Output of ASR
phone lattice or word lattice spoken content DS stores these lattices instead of plain text lattices are good for retrieval

Speech and Language Processing Techniques

36

Chinese Academy of Sciences, Beijing, China

Report Document

Spoken Content Description Tools


SpokenContentLattice
representing the actual decoding produced by an ASR engine

SpokenContentHeader
contains information about the speakers being recognized and the recognizer itself WordLexicon Descriptor PhoneLexicon Descriptor SpeakerInfo Descriptor ConfusionInfo Descriptor
Speech and Language Processing Techniques 37

Chinese Academy of Sciences, Beijing, China

Report Document

Gaussian DS
<Gaussian> <Mean> 4087.18 7173.73 1.36364 94.2727 1834.36 2359.55 2645.27 2577.09 </Mean> <Variance> 1.6982e+007 5.21621e+007 14.3636 9749.09 3.65743e+006 </Variance> </Gaussian>

Speech and Language Processing Techniques

38

Chinese Academy of Sciences, Beijing, China

Report Document

State-transition model DS
<StateTransitionModel> <Transitions size1="20" size2="20"> 0 0 0.210526 0.0526316 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 </Transitions> <Initial size="20"> 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 </Initial> <State label="0 players" confidence="1"> <State label="19 players" confidence="0.223607"> </StateTransitionModel>

Speech and Language Processing Techniques

39

Chinese Academy of Sciences, Beijing, China

Report Document

ProbabilityModelClassier DS
<ProbabilityModelClassifier confidence="0.9" length="2"> <ProbabilityModelClass SemanticLabel="fish" Confidence="0.5" DescriptorName="ColorHistogram"> <Gaussian> <Mean> 4087.18 7173.73 1.36364 94.2727 1834.36 2359.55 . </Mean> <Variance> 1.6982e+007 5.21621e+007 14.3636 9749.09 . </Variance> </Gaussian> </ProbabilityModelClass>

Speech and Language Processing Techniques

40

Chinese Academy of Sciences, Beijing, China

Report Document

SpokenContentLattice DS

A lattice structure for an hypothetical (combined phone and word) decoding of the expression Taj Mahal drawing .

Speech and Language Processing Techniques

41

Chinese Academy of Sciences, Beijing, China


SoundRecognitionClassifier
HMM AND BASES

Report Document
AudioSpectrumBasis
AUDIO QUERY SPECTRUM PROJECTION N

Extraction of sound indexes using a sound-recognition classifier. The model reference and state path is stored.
Segmented Audio Description

HMM 1

HMM 2 MPEG-7 SOUND DATABASE

SELECT

MODEL REF +STATE PATH

SoundModelStatePath
HMM N-1

HMM N

SoundRecognitionModel

Speech and Language Processing Techniques

42

Chinese Academy of Sciences, Beijing, China


SoundRecognitionClassifier
HMM AND BASIS

Report Document
AudioSpectrumBasis
AUDIO QUERY SPECTRUM PROJECTION N

Indexed Audio
MPEG-7 SOUND DATABASE

ContinuousMarkovModel

HMM 1

HMM 2

SELECT

MODEL REF +STATE PATH

MATCHING

SoundModelStatePath
HMM N-1

Query-by-example application with a query in media source form. Features must be extracted and projected into the classification space for each model in order to match against the database.

RESULT LIST HMM N

SoundRecognitionModel

Speech and Language Processing Techniques

43

Chinese Academy of Sciences, Beijing, China

Report Document

An example search application utilizing a query in DDL format

MPEG-7 SOUND DATABASE

DDL QUERY

MODEL REF + STATE PATH

MATCHING

RESULT LIST

Speech and Language Processing Techniques

44

Chinese Academy of Sciences, Beijing, China

Report Document
AUDIO WAV FILES

Extraction of hidden Markov model and basis functions and storage in a DDL representation

AudioSpectrumBasis

SoundRecognitionModel
HMM AND BASIS

FEATURE EXTRACT

BASIS EXTRACT

HMM

SoundRecognitionFeatures

ContinuousMarkovModel

Speech and Language Processing Techniques

45

Chinese Academy of Sciences, Beijing, China

Report Document

Scenario for for the spoken content Description Tools


Recall of AV data by memorable spoken events
A film or video recording where a character or person spoke a particular word or sequence of words. The source media would be known, and the query would return a position in the media.

Spoken Document Retrieval


There is a database consisting of separate spoken documents. The result of the query is the relevant documents, and optionally the position in those documents of the matched speech

Annotated Media Retrieval


Similar to spoken document retrieval. The result of the query is the media which is annotated with speech, and not the speech itself. An example is a photograph retrieved using a spoken annotation.

Speech and Language Processing Techniques

46

Chinese Academy of Sciences, Beijing, China

Report Document

Outline of contents
Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information

Speech and Language Processing Techniques

47

Chinese Academy of Sciences, Beijing, China

Report Document

Multimedia DSs
Multimedia Description Schemes are metadata structures for describing and annotating audio-visual (AV) content

Basic Elements Content Management Content Description Content Organization Navigation and Access User Interaction
Speech and Language Processing Techniques 48

Chinese Academy of Sciences, Beijing, China

Report Document

Organization of Multimedia DSs

Speech and Language Processing Techniques

49

Chinese Academy of Sciences, Beijing, China

Report Document

Content Management
Creation and production information
Creation information
title, textual annotation, creators, and dates

Classification information
genre, subject, purpose, language

Media coding, storage and file formats


format, compression, and coding

Content usage
usage rights, usage record
Speech and Language Processing Techniques 50

Chinese Academy of Sciences, Beijing, China

Report Document

Navigation and Access


Summaries
hierarchical summaries sequential summaries

Partitions and Decompositions


decompositions in space, time and frequency used in multi-resolution access and progressive retrieval

Variations
selection of the most suitable of an AV program adapt to the different capabilities of terminal devices, network conditions or user preferences Speech and Language Processing Techniques 51

Chinese Academy of Sciences, Beijing, China

Report Document

Hierarchical summary

Speech and Language Processing Techniques

52

Chinese Academy of Sciences, Beijing, China

Report Document

Illustration of variations

Speech and Language Processing Techniques

53

Chinese Academy of Sciences, Beijing, China

Report Document

Content Organization
Collections
group the contents into clusters describes statistics and models of the attribute values describe relationships among collection clusters

Models
model the attributes and features of AV content Probability Model
specify statistical functions and structures

Analytic Model

specify semantic labels specify the confidence build classifiers

Speech and Language Processing Techniques

54

Chinese Academy of Sciences, Beijing, China

Report Document

Collection Structure

Speech and Language Processing Techniques

55

Chinese Academy of Sciences, Beijing, China

Report Document

User Interaction
User Preference
context dependency in terms of time and place relative importance of different preferences privacy characteristics of the preferences preferences update by agent or user

Usage History
history of actions used to determine the user's preferences
Speech and Language Processing Techniques 56

Chinese Academy of Sciences, Beijing, China

Report Document

Outline of contents
Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information

Speech and Language Processing Techniques

57

Chinese Academy of Sciences, Beijing, China

Report Document

eXperimentation Model(XM)
Simulation platform for: Ds, DSs, CSs, DDL XM applications: the server (extraction) applications the client (search, filtering and/or transcoding) applications CS: Coding Schemes

Speech and Language Processing Techniques

58

Chinese Academy of Sciences, Beijing, China

Report Document

The XM applications
Extraction from Media
all low-level Ds or DSs should have an application class of this type

Search & Retrieval Application


either client application

Media Transcoding Application


either client application

Description Filtering Application


either client application

Speech and Language Processing Techniques

59

Chinese Academy of Sciences, Beijing, China

Report Document

Extraction from Media

Speech and Language Processing Techniques

60

Chinese Academy of Sciences, Beijing, China

Report Document

Search and retrieval application

Speech and Language Processing Techniques

61

Chinese Academy of Sciences, Beijing, China

Report Document

Media transcoding application

Speech and Language Processing Techniques

62

Chinese Academy of Sciences, Beijing, China

Report Document

Description Filtering Application

Speech and Language Processing Techniques

63

Chinese Academy of Sciences, Beijing, China

Report Document

Interface model for XM app

Speech and Language Processing Techniques

64

Chinese Academy of Sciences, Beijing, China

Report Document

Real world application

MDB = media database, DDB = description database.


First, from a media database two features are extracted. Then, basing on the first feature, relevant media files are selected from the media database. The relevant media files are transcoded basing on the second extracted feature.

Speech and Language Processing Techniques

65

Chinese Academy of Sciences, Beijing, China

Report Document

MPEG-7 application areas


Storage and retrieval of audiovisual databases (image, film, radio archives) Broadcast media selection (radio, TV programs) Surveillance (traffic control, surface transportation, production chains) E-commerce and Tele-shopping (searching for clothes / patterns) Remote sensing (cartography, ecology, natural resources management) Entertainment (searching for a game, for a karaoke) Cultural services (museums, art galleries) Journalism (searching for events, persons) Personalized news service on Internet (push media filtering) Intelligent multimedia presentations Educational applications nBio-medical applications

Speech and Language Processing Techniques

66

Chinese Academy of Sciences, Beijing, China

Report Document

Illustration of applications

Users

Speech and Language Processing Techniques

67

Chinese Academy of Sciences, Beijing, China

Report Document

Information Flow
Feature extraction
Manual/automatic

AV Description

Search/query
Storage
Pull

Browse Filter
Push

Decoding Encoding

Transmission

Users

Speech and Language Processing Techniques

68

Chinese Academy of Sciences, Beijing, China

Report Document

Push and Pull applications


Push applications
Example: Search engines for internet and DBs Advantage: Many search engines work on standardized descriptions

Pull applications
Example: Broadcast of video, Interactive TV Advantage: Intelligent agents filter standardized descriptions
Speech and Language Processing Techniques 69

Chinese Academy of Sciences, Beijing, China

Report Document

Example: Pull application

MPEG-7 Database

Speech and Language Processing Techniques

70

Chinese Academy of Sciences, Beijing, China

Report Document

Example: Push application

Speech and Language Processing Techniques

71

Chinese Academy of Sciences, Beijing, China

Report Document

Example: queries
Text (keywords):
Find AV material with subject corresponding to some keywords Find AV material corresponding to a specified semantic Find an image with similar characteristics (global or local)

Semantic description: Image as an example:

A few notes of music:

Low level features (example: motion):

Find corresponding musical pieces or movies


Find video with specific object motion trajectories

Speech and Language Processing Techniques

72

Chinese Academy of Sciences, Beijing, China

Report Document

Integration of MPEG-7 into XML


<seq begin=20s dur=10s> <img id="Image1" dur=5s> <MP7: annotation> <Who>Fernado Morientes</Who> < WhatAction >Spain vs. Sweden soccer match </ WhatAction> </MP7: annotation> </img> <img id="Image2" dur=2s /> </seq>

Speech and Language Processing Techniques

73

Chinese Academy of Sciences, Beijing, China

Report Document

Outline of contents
Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information

Speech and Language Processing Techniques

74

Chinese Academy of Sciences, Beijing, China

Report Document

MPEG-7 and other Standards


MPEG-1, -2, and -4 are designed to represent the information itself, while MPEG-7 is meant to represent information about the information. MPEG-1, -2, and -4 make content available, while MPEG-7 allows you to find the content you need.
75

Speech and Language Processing Techniques

Chinese Academy of Sciences, Beijing, China

Report Document

Ultimate ambition of MPEG-7


To make the web as searchable for multimedia content as it is searchable for text today
To improve the use of computer systems as easy as possible

Speech and Language Processing Techniques

76

Chinese Academy of Sciences, Beijing, China

Report Document

MPEG-7 beyond
To mould computers around human requirements and not humans around computer requirements To enable content disclosure based on facts, rather than on human annotations To find information by rich spoken queries, handdrawn images and address what most people expect computers to be able to do

Speech and Language Processing Techniques

77

Chinese Academy of Sciences, Beijing, China

Report Document

More Information on WWW


Major MPEG-7 documents http://www.cselt.it/mpeg/, semi-official website http://www.mpeg-7.com, official website Others http://www.elsevier.com/locate/image

Speech and Language Processing Techniques

78

Chinese Academy of Sciences, Beijing, China

Report Document

Conclusion
Ds Features

AV contents
Structures DSs DDL Ds, DSs

User

Speech and Language Processing Techniques

79

Chinese Academy of Sciences, Beijing, China

Report Document

Thanks

Speech and Language Processing Techniques

80

Chinese Academy of Sciences, Beijing, China

Report Document

Speech and Language Processing Techniques

81

Chinese Academy of Sciences, Beijing, China

Report Document

Low level AV descriptors


Video segments
Color Camera motion Motion activity Mosaic

Still regions
Color Shape Position Texture

Moving regions
Color Motion trajectory Parametric motion
Spatio-temporal shape

Audio segments
Spoken content Spectral feature Timbre

Speech and Language Processing Techniques

82

Chinese Academy of Sciences, Beijing, China

Report Document

Face Recognition Descriptor

Projection of a face vector onto a set of basis vect Feature set is extracted from a normalized face im Normalized face image

56 lines with 46 intensity values in each line The centers of the two eyes are located on the 24th row

Speech and Language Processing Techniques

83

Chinese Academy of Sciences, Beijing, China

Report Document

Segment Decomposition

Speech and Language Processing Techniques

84

Chinese Academy of Sciences, Beijing, China

Report Document

MPEG-7 Normative Interfaces

Speech and Language Processing Techniques

85

Chinese Academy of Sciences, Beijing, China

Report Document

Example: Content description

Indexing Fea extrac

Search retrieval

High level process

MPEG-7 Database

Low level process

Speech and Language Processing Techniques

86

Chinese Academy of Sciences, Beijing, China

Report Document

Segment DS
Segment DS describes the result of a spatial, temporal, or spatio-temporal partitioning of the AV content. It has nine major subclasses:
Multimedia Segment DS AudioVisual Region DS AudioVisual Segment DS Audio Segment DS Still Region DS Still Region 3D DS Moving Region DS Video Segment DS Ink Segment DS Speech and Language Processing Techniques 87

Chinese Academy of Sciences, Beijing, China

Report Document

Examples: T/S segments

Speech and Language Processing Techniques

88

Chinese Academy of Sciences, Beijing, China

Report Document

Example: Segment trees

Speech and Language Processing Techniques

89

Chinese Academy of Sciences, Beijing, China

Report Document

Illus of conceptual description


Semantic base DS Object DS
Event DS Semantic container DS Semantic DS

Concept DS
Semantic state DS

Semantic place DS
AV content Semantic time DS

Speech and Language Processing Techniques

90

Chinese Academy of Sciences, Beijing, China

Report Document

Visual description
Basic structures
Grid layout, Time series, Multiple view, Spatial 2D coordinates, Temporal interpolation

Descriptors
Color, Texture, Shape, Motion, Localization

Speech and Language Processing Techniques

91

Chinese Academy of Sciences, Beijing, China

Report Document

Example: Color Descriptors


Color space Color Quantization Dominant Colors Scalable Color Color Layout Color-Structure GoF/GoP Color
Speech and Language Processing Techniques 92

Chinese Academy of Sciences, Beijing, China

Report Document

Example: Color space


R,G,B Y,Cr,Cb H,S,V HMMD Linear transformation matrix with reference to R, G, B Monochrome
Speech and Language Processing Techniques 93

Chinese Academy of Sciences, Beijing, China

Report Document

Audio Framework

Speech and Language Processing Techniques

94

Chinese Academy of Sciences, Beijing, China

Report Document

Descriptor
Definition
A Descriptor (D) is a representation of a Feature. A Descriptor defines the syntax and the semantics of the Feature representation.

Notes
A descriptor allows an evaluation of the corresponding feature via the descriptor value. It is possible to have several descriptors representing a single feature.

Examples
For example for the color feature, possible descriptors are: the color histogram, the average of the frequency components, the motion field, the text of the title, etc.

Speech and Language Processing Techniques

95

Chinese Academy of Sciences, Beijing, China

Report Document

Descriptor Value
Definition A Descriptor Value is an instantiation of a Descriptor for a given data set (or subset thereof).

Notes
Descriptor Values are combined via the mechanism of a Description Scheme to form a Description.

Speech and Language Processing Techniques

96

Chinese Academy of Sciences, Beijing, China

Report Document

Description Scheme
Definition A Description Scheme (DS) specifies the structure and semantics of the relationships between its components, which may be both Descriptors and Description Schemes. Examples A movie, structured as scenes and shots, including some textual descriptors at the scene level, and color, motion and some audio descriptors at the shot level.

Note
Ds contain only basic data types, and does not refer to others D or DSs.

Speech and Language Processing Techniques

97

Chinese Academy of Sciences, Beijing, China

Report Document

DS: XML Scheme & Extensions


XML Scheme Data types Simple and Complex types Elements Inheritance, Abstract types MPEG-7 extensions Array and Matrix datatype Enumerated datatypes for MimeType, CountryCode, RegionCode, CurrencyCode and CharacterSetCode Typed references

Speech and Language Processing Techniques

98

Chinese Academy of Sciences, Beijing, China

Report Document

Basic elements of DS
Constructs for linking media files Localizing pieces of content Describing
time, places, persons, individuals, groups, organizations, and textual annotation, etc Who? What object? What action? Where? When? Why? and How?

Speech and Language Processing Techniques

99

Chinese Academy of Sciences, Beijing, China

Report Document

Content recognition tools


No speech or face or gesture recognition engines included in MPEG-7 Content recognition tools is a task for industries, not a standard
coding tools in MPEG-1, -2, -4 were for research purposes, not part of the standard no tools were part of the MPEG standard

Speech and Language Processing Techniques

100

Chinese Academy of Sciences, Beijing, China

Report Document

Speech and Language Processing Techniques

101

Вам также может понравиться