Image To Text Ieee Paper

2008 20th IEEE International Conference on Tools with Artificial Intelligence
Automatic Image-to-Text-to-Voice Conversion for

Interactively Locating Objects in Home Environments
Nikolaos Bourbakis, WSU, ATRC, Engr College; AIIS Inc. Ohio
difference between our work here and the work in [31] is
that we describe the content of an image independently
from queries and then we by association we locate the
missing objects. Here we present the basic idea and on still
images and how the LG model is formed for image objects
and NL text sentences.
This paper attempts to provide a new dimension on the
document processing and understanding research field by
efficiently converting, synthesizing and interrelating single
modalities, such as Natural Language (NL) text sentences
and images. In particular, the methodology presented here
deals with the conversion of images into NL text
paragraphs (English natural language text sentences) by
using a common representation model, the Local-Global
(LG) graphs. Thus, the main parts of the methodology
presented here are organized in sections. Section -2
presents the important techniques associated with the image
understanding and describes the LG representation of the
NL sentences in order to associate via LG graphs images
and NL text sentences. It also provides illustrative examples
proving the concept behind the methodology, and section 4
discusses the advantages of the proposed methodology.
Abstract: The efficient processing and association of different multimodal information is a very important research field with a great variety
of applications, such as human computer interaction, knowledge
discovery, document understanding, etc. A good approach to this
important issue is the development of a common platform for converting
different modalities (such as images, text, etc) into the same medium and
associating them for efficient processing and understanding. Thus, this
paper here presents the development of a novel methodology based on
Local-Global (LG) graphs capable for automatically converting image
context into natural language text sentences and then into speech for
serving as an interactive model for locating missing objects in home
environments. Simple illustrative examples are provided for proving the
concept proposed here.
Keywords: Converting Images to NL-Text, Image
Representation, Graphs, Recognizing Objects.
Analysis and
I- INTRODUCTION
Where are my pills? Did you see them? This is one of
several frequently asked questions every day in a home
environment, where we ask help from others who may have
information regarding a missing object of ours. The usual
answer to this question is a short spoken NL sentence, like
over there. This interaction creates the following
challenge. Can we develop an interactive system that
automatically extracts and recognizes objects from images
and describe their locations and associations in NL
sentences by using the appropriate set of sensors,
computing devices, and software techniques? The answer is
yes for some categories of images of low complexity with
no illumination or shadow tricks. Thus, a variety of
techniques from different fields have to be synergistically
employed in order such tasks to be accomplished. In
particular, the main research fields involved in this effort
are image processing and understanding, NL sentences
conversion into speech and computing models, like graphs
[1-32].
II. THE IMAGE-to-TEXT METHODOLOGY

For converting still images into NL text sentences we have
to firstly start from the understanding of the context of an
image and secondly how this context is expressed into NL
sentences by using association and synthesis rules. Thus the
image understanding requires the employment of image
processing/analysis techniques, such as segmentation and
contour generation, features extraction and characterization
of the scene objects as graph descriptions will allow
recognition and characterization of the objects.
A. Image Analysis and Understanding
The image analysis and understanding methodology is
based on the implementation of five main parts: (1) image
segmentation (2) extraction and representation of objects
using region synthesis, (3) recognition, interrelation and
classification of objects, (4) converting objects and their
relationships into LG graphs.
In the scientific literature the most relevant work

[31,32] deals with the design and implementation of the
query language for content based image retrieval. The
retrieval process takes place entirely within the ontological
domain defined by the syntax and semantics of the user
query and it does not pre-annotation of images with
sentences in the language. The language is also extensible
to allow for the definition of higher level terms such as
"cars", "people", "buildings", etc. on the basis of existing
language constructs. The matching process utilizes
automatically extracted image segmentation and
classification information and can incorporate any other
feature extraction mechanisms or contextual knowledge
available at processing time. In this work a mechanism of
interpreting inter-relationships of image properties using
natural language has been developed as well. The main
1082-3409/08 $25.00 2008 IEEE

DOI 10.1109/ICTAI.2008.123
A1. Region based Image Segmentation

Segmentation is one of the first important and difficult steps
of image analysis and computer vision, and is one of the
oldest problems in image processing-analysis and many
algorithms have been developed [6,10-19]. Recently some
image segmentation processes were fused with edge
location methods to produce better results. Here, we use our
Fuzzy-like Reasoning Segmentation (FRS) method that
adds light model as one of the segmentation factors. Its
49
one generated by the contour of a segmented region. Thus,

the skeleton graph of a regions skeleton is
Gsk = K1ac12K2ac23K3 KkackqKq Ki apijKj
KnardnmKm
where, represents the graph relationship operator, and
each Ki maintains the structural features of the
corresponding line segment, thus, Ki = { sp, orientation
(o), length (le), curvature (cu)},
and aij holds the
relationships among these line segments, thus, aij = {
connectivity (c ), parallelism (p), symmetry (s), relative
magnitude (rm), relative distance (rd), etc}. The missing
elements for a global visual perception of an image are: the
color (or texture) of each region, its relative geographic
location (distance and angle) among the other regions, its
relative size in regards with the other regions, etc. One way
to obtain these additional features is the development of the
global image graph GG.
result is more accurate in terms of perception and more

suitable for later reconstructing work [13,14]. The FRS
method has three stages (smoothing, edge detection and
segmentation). The initial smoothing operation is intended
to remove noise. The smoother and edge detector
algorithms are also included in this processing step. The
segmentation algorithm uses edge information and the
smoothed image to find segments present within the image.
A2. The Local-Global (L-G) Graph
The graph is one of the oldest and very powerful
methodologies developed for a great variety of the
computer science problems [20-23]. Relational graphs are
considered good approach to describe pictures or scenes for
pattern recognition and Fu and Sanfeliu used a relational
graph to represent characters. They proposed descriptive
graph grammars as rules to organize and compare graphs.
(a) Line fitted

object
Our Local-Global (L-G) graph method adds local part

information into graph [7]. The graph is a more accurate
representation of an object. Thus, we avoid using a nonlinear graph matching function. By combining the FRS
method and the L-G graph method, we can improve object
recognition accuracy without increasing computation
complexity. It is wise not to say much, since the local and
global graphs used here are not new but incremental
methodologies for efficiently representing images via their
visible features and the features relationships [4,8]. Thus,
here the L-G graph is capable of describing with adjustable
accuracy and robustness the features contained in an image.
The main components of the L-G graph are: (i) the local
graph that represents the information related with shape,
size, and (ii) the skeleton graph that provides in formation
about the internal shape of each segmented region. The
global graph represents the relationships among the
segmented regions for the entire image. The nodes Pi of the
global graph include the L-G, the color, the texture and the
skeleton graph. The local-global image graph components
are briefly described below.
The Region or Local Graph G holds information of a
contour line of an image region after segmentation, see an
example in figure 1.
G = N1ac12N2ac23N3 Nkack1N1 Ni apijNj
NnardnmNm
where, represents the graph relationship operator, and
each Ni maintains the structural features of the
corresponding line segment, thus, Ni= { sp, orientation (o),
length (le), curvature (cu)}, and aij holds the relationships
among these line segments, thus, aij = { connectivity (c ) ,
parallelism (p) , symmetry (s), relative magnitude (rm),
relative distance (rd), etc}.
(b) objects local graph

Figure 1: This is an example of local graph shows an object
or a single region. The number besides every line is the
index. Its local graph representation is shown below.
The Image Global Graph attempts to emulate a human-like

understanding by developing global
topological
relationships among regions and objects. More specifically,
for each image region Mi, a skeletonization task is
performed and the final centroid GCg(i,x,y) is defined [7].
When all the final centroids have been defined for every
image region, the global image graph is developed :
GG(Ak) = (P1R12P2) 23 (P1R13P3) (P1R1n-1Pn-1) n-1n
(P1R1nPn)
The Skeleton Graph is also a part of the L-G graph by

offering additional information about the regions and is
based on the efficient generation of the regions skeletons
after the regions segmentation process. The line segments
of the skeleton of a region are interrelated with each other
through a graph with attributes in a similar way with the
50
where Pi is a node that represents a region graph, its color,

and its GCg(i,x,y) and the skeleton graph (Gsk), Rij
represents the relative distance between two consecutive
Gcg, and the orientation of each dg, ij represents the
relative angle between consecutive distances dg(i) and
dg(j), see figure 2.
Image regions and the GG graph
(a)
(b)
(c)
Figure 3 : An example of a mismatch. (a) is the original
data image. (b) and (c) show two graphs with one PCRP
change. (b) and (c) have a similar graph and also
the same region relationships.
ij
(b)
(a)
Fig. 2. a) It shows the connectivity among three
neighboring regions and their centroids and the
LG graph; b) The L-G graph of a synthetic image
consisted of seven segmented regions
Figure 4: Use local graph to find the common edge between

regions that facilitate the synthesis of the adjacent regions.
An important feature of the L-G graph is its ability to

describe 3-D scenes. The only difference between 2-D from
3-D is that in 3-D the local graph will represent 3-D
surfaces and the global graph will appropriately interrelate
them .
A3. The Model Driven Region Synthesis and Recognition
[4,7,8]
For regions and objects recognition, many methods are
proposed. When the background is highly textured, many
methods can discern objects well. But because of lack of
structural information, different objects with similar texture
characteristics could be classified to the same category; for
example, shadow and water are recognized as part of a bear.
They also assume that shapes have been segmented from
the background, and the mathematic shape representation is
sensitive to some kinds of deformations, for example, if the
shape of a region changes the local graph and the skeleton
graph record the changes attached them to the L-G graph.
In addition, the centroid may change and the global graph
will register that change as well. The process used here for
objects recognition is based on the synthesis of segmented
adjacent regions, using the L-G graph, and association
(comparison) of the integrated region models available in a
database. The method can search object in one given image
based upon the provided object model database. In the
model database, all objects are represented by their multiview structure. Every object has been modeled from several
(6 max) views. Every view represents a view direction that
the user interested. They are not necessary orthogonal
views or 3-dimention views. In fact, the views are not
required to be orthogonal at all. Any different view of an
object, not available in the DB, can be generated as a
synthesis of other view from the DB. Figure 3 shows the
synthesis of the regions based on the L-G graph. Figures 4
and 5 present the synthesis of the regions selected in 5
steps.
Figure 5. It shows the 5 consecutive steps of the

region synthesis for the construction of a balloon.
B. Rules for Converting Objects into NL Sentences
In this section we provide the rules of conversion from
graph representation in an image into NL sentences. More
specifically, a graph with attributes, such as a local graph
that represents an image region, see figure 1, is expressed as
follows:
G = N1ac12N2ac23N3 Nkack1N1 Ni apijNj
NnardnmNm
Where Ni represents a node with attributes {sp (starting
point), direction (o), length (le), curvature (cu)}, and aijx
holds the relationships among these nodes, aijx = {
connectivity (c ) , parallelism (p) , symmetry (s), relative
magnitude (r), relative distance (rd), perpendicular (pe),
etc}, where x={c,p,s,r,rd,}.Thus, the association (or
rules) among nodes with attributes, relationships and NL
sentences are as follows:
51
NLs1= the straight line segment N1 is connected with (140o

degrees) with the straight line segment N2
AND
NLs10= the straight line segment N2 is parallel with the
straight line segment N8
AND
AND
Regions Level Generic Rule:

Ni aijx Nj NL sentence (NLs).
Ni NLs {the straight line segment Ni (has sp, o, le)
}
aijx NLs {verb (is) (participle)with relationship (x)
with}
Nj NLs {the straight line segment Nj (has sp, o, le)
}
*Note that here, for simplicity reasons with exclude the
nodes attributes.
Connectivity Rule:
Ni aijc Nj NLs ={the straight line segment Ni is
connected with (F degrees) with the
straight line segment Nj}
Parallelism Rule:
Ni aijp Nj NLs ={the straight line segment Ni is
parallel with the
Relative Magnitude Rule:
Ni aijr Nj NLs ={the straight line segment Ni is
in ratio (r=n ) with the
Relative Distance Rule:
Ni aijrm Nj NLs ={the straight line segment Ni is
in relative distance (rm=k ) with the
Following a similar approach with region representation

and associations, we can use the global graph
representations to express in NL sentences the attributes or
properties of the regions of an object, or the associations of
objects in an unknown image [8]. Thus, using the LG
graphs representations we can come up with recognition of
certain objects in unknown images, under the condition that
these objects have graph or NLs representations in the
graph DB.
Perpendicular Rule:
Ni aijpe Nj NLs ={the straight line segment Ni is
perpendicular with the
Symmetric Rule:
Ni aijs Nj NLs ={the straight line segment Ni is
symmetric with the straight line
segment Nj}
B1. Image to NL Text Conversion

In the following diagram we present the steps to be
completed for the conversion of an image into NL
sentences, see fig. 6.
Synthesis Rule:
Ni aijx Nj Ni aijx Nj NLs AND NLs
Image Processing
Analysis
An Illustrative Example-1:
Here for simplicity we skip the attributes of the nodes and
all the possible relationships among the nodes, thus from
the figure 1 the attributed graph is:
LG representation of
objects & Associations
G1 = Ln1(c=140o)Ln2(c=174o)Ln3(c=111)Ln4(c=22)
Ln5(c=152)Ln6(c=160)Ln7(c=173)Ln8(c=108o)
Ln9(c=30o)Ln1 Ln2(p)Ln8 Ln3(p)Ln7
Ln9(p)Ln4
Objects Extraction & LG

graph representations
Object Recognition &

Objects Associations
Conversion of LG representations into NL-LG

expressions
Thus, the conversion of the G1 graph into NLs by using the

rules above is:
Fig. 6: The steps of the methodology for

converting images into NL text sentences.
52
Table-1
More specifically, from the input image all the objects are
extracted and represented in their own graph form (G).
Each graph form G is compared to the graph models
existing in the graph DB. The outcome from the DB is the
recognition of each object, its features and the relationships
among the objects. These relationships are expressed into a
NL text description.
Extracted NL Outcome:
Detection: There is a silver wrench-tool
Location of the objects: The wrench-tool is on the upper
center part of the image.
Associations of the objects:
The wrench-tool is above the car; The wrench-tool is at the
left of the helicopter; The wrench-tool is at the right of
airplane;
NL Sentences Conversion into Speech

Here we use an existing commercial tool (Bell Labs) for
converting NL sentences into Speech. Thus, this section
offers no scientific contribution in Text-to-Speech
conversion, but it is a valuable component of the entire
methodology.
Below we present an example to show the capabilities of

our methodology to convert images into NL sentences at
various levels of representations (region, object, image).
B2. Illustrative Examples
In response to the question did you see the wrench-tool? a
possible answer is presented in Figure 7. As it was
mentioned in previous sections, an image segmentation is
applied to create several regions with distinct RGB values.
The image regions are represented by local graphs with
attributes. Adjacent image regions are sequentially
synthesized creating larger regions, as shown in fig. 4-5.
These new larger regions are represented by global graphs
and each time they are compared against the Image Graph
Database for possible recognition as an object. If a
matching was achieved at the database, an object is
recognized and its graph representation is extracted as
shown in figure 7. When all the objects where detected and
recognized then the global graph provides the association
among these objects that may lead to the NL sentences
relevant to the initial question.. Figure 7 shows an outcome
of an image in the form of NL sentences. Note that our
graph DB has 13 known LG graph descriptions.
A Real Example:
The following example presents a real case by using a
surveillance system based on Nortech Security camera and
a HP pavilion portable computer, figure 8. The camera
scans the room by capturing a sequence of images (10
frames per second) and the computer software inspects each
image in order to discover and extract the requested
object(s), in this particular case the objects are the medical
pills, which have been located in the tenth frame of the
sequence. Here we present the LG graph with a few of
visual connections with the surrounding objects (for clear
visual representation). The figure 8 contains only a view of
the camera used in this experiment; a view of one of the
two rooms (upper right frame in the figure); a view of
second room (lower left frame of the figure) with the
objects ; a magnified view of the detected and recognized
objects (pills).
LG = N1ac12N2...N8ac81N1 @ N1ap17N7 @...

@ N1asi17N7 @ ... @ N4ap48N8
Outcome
Graph
DB &
Associations
Fig. 8: A real example in a home environment

Figure 7: It shows processing steps for
understanding the context of an image: objects
extraction, recognition, representation, association.
Extracted NL Outcome:
Detection: There are three boxes with pills;
Location of the objects: The boxes with the pills are at the
upper center of the image.
Associations of the objects: The boxes with the pills are on the
table; The boxes with the pills are below the lamp; The boxes
with the pills are at the left of the chair; The boxes with the pills
are at the right of the couch;
The table-1 below shows the outcome from the simple one
image example.
53
[05] D.Geiger, T-L.Liu and R.Kohn, Representation and

self-similarity of shapes, IEEE T-PAMI, 25,1,2003,pp.8699.
[06] P.Yaun, A.Mogzadeh, D.Goldman and N.Bourbakis A
fuzzy-like approach to edge detection in colored images,
IAPR Pattern Analysis and Applications, vol.4,4,272282,2001
[07] N.Bourbakis, Emulating human visual perception for
measuring differences in images using an SPN graph
approach, IEEE T-SMC, 32,2,191-201, 2002, also SUNY-BTR- 1997.
[08] N. Bourbakis, P.Yuan and P. Kakumanu, Representing
and recognizing 3-D objects in images using LG Graphs
IJAIT to appear.
[09] Pavlidis Theo, Structural Pattern Recognition,
Berlin, New York: Springer-Verlag, 1977.
[10] K. S. Fu and J. K. Mu, A Survey on Image
Segmentation, Pattern Recognition, vol. 13, pp. 3-16,
1981.
[11] R. Schettini, Low-Level Segmentation of Complex
Color Images, Signal Processing VI: Theories and
Applications, pp. 535-538, 1992.
[12] Nikhil R. Pal and Sankar K. Pal, A Review on Image
Segmentation Techniques, Pattern Recognition, vol. 26,
no. 9, pp. 1277-1294, 1993.
[13]A. Moghaddamzadeh and N. Bourbakis, Segmentation
Of Color Images With Highlights And Shadows Using
Fuzzy Reasoning, SPIE Conf. Electronic Imaging, pp.300310, 1995.
[14] A. Moghaddamzadeh and N. G. Bourbakis, A Fuzzy
Region Growing Approach for Segmentation of Color
Images, PR Society Journal of Pattern Recognition, vol.
30, no. 6, pp. 867-881, 1997.
[15] T. Huntsberger and M. Descalzi, Color Edge
Detection, Pattern Recognition Letters, vol. 3, pp. 205209, 1985.
[16] P. Lambert and T. Carron, Symbolic Fusion Of
Luminance-Hue-Chrome
Features
For
Region
Segmentation, Pattern Recognition Journal, vol. 32, pp.
1857-1872, 1999.
[17] Alain Tremeau and Philippe Colantoni, Region
Adjacency Graph Applied to Color Image Segmentation,
IEEE Trans. nn Image Processing, 9, 4, pp. 735-744, 2000
[18] Wenhua Wan and Jose A. Ventura, Segmentation of
Planar Curves Into Straight-Line Segments and Elliptical
Arcs, Graphical Models and Image Processing, vol. 59,
no. 6, pp. 484-494, 1997
[19] G. A. W. West and P. L. Rosin, Techniques For
Segmenting Image Curves Into Meaningful Descriptions,
Pattern Recognition, vol. 24, no. 7, pp. 643-652, 1991.
[20] Narendra Ahuja, Byong An and Bruce Schachter,
Image Representation Using Voronoi Tessellation,
Computer Vision, Graphics and Image Processing, vol. 29,
pp. 286-295, 1985.
[21] Narendra Ahuja, Dot Pattern Processing Using
Voronoi Neighborhoods, IEEE Trans. on PAMI, vol. 4, no.
3, pp. 336-342, 1982.
Discussion:
This is realistic case, where we need to find something
missing in a home environment. The methodology has
inspected a sequence of images (one by one) and it took on
the average a 1 sec per frame to decide if there is the
missing object in it or not. In addition, the methodology
(using a small DB with 20 items) made a decision that there
was a couch in the frame with the pills, however, there were
two chairs together forming a couch. The recognition of the
pills was based on the fact that we have indicated that the
boxes contain the pills and there were not other boxes in the
DB. We are working to improve the methodology to
interactively separate and recognize the correct boxes from
a set of boxes by using additional information such as
color, shape, size, label information (NL text).
III. CONCLUSION
This paper here presented the basic concept and the
synergy of methodology for the development of an efficient
methodology capable for automatically converting images
into equivalent natural language text sentences and
contributing to research efforts by transforming different
modalities into the same model. The model used here for
representing two modalities (images, NL text sentences) is
the LG graphs. Note here that this conversion provides no
interpretation of the context of an image, but a description
of it. LG graphs could also provide efficient interrelations
of images and text.
In addition, the methodology has potential for
commercial and scientific applications such as multimedia
information retrieval, knowledge discovery, etc. It can be
used as a software tool in a variety of applications, such as
document processing and understanding, digital libraries,
knowledge extraction, automatic annotation of images, etc.
It can also serve as a testbed for integration and
synchronization of other single modalities, such as speech,
NL translation, by representing structural knowledge in the
same LG model, and it is an efficient scheme for
multimodal sources.
Acknowledgement
This work is partially supported by an AIIS grant.
BIBLIOGRAPHY AND REFERENCES
[01] F.Wahl, K.Wong and R.Casey, Block separation and
text extraction in mixed text image documents, CVGIP, 20,
1989
[02] N.Bourbakis, A document processing methodology:
separating text from images, IFAC, IJEAAI vol. 14, pp. 3542, 2001, also in IEEE Symp.I&S,Nov.1996,MD
[03] N. Bourbakis, Associating activities in images using
SPN graphs, IEEE Conf. TAI-06, Nov. 13-15, 2006, WDC.
[04] N. Bourbakis and P. Kakumanu, Recognizing Facial
Expressions using LG graphs, IEEE Conf. on TAI-06, Nov.
13-15, 2006, WDC, also, SUNY-B-TR-1997
54
[22] Alberto Sanfeliu and King-Sun Fu, A Distance

Measure Between Attributed Relational Graphs For Pattern
Recognition, IEEE Systems, Man, and Cybernetics, vol.
13, no. 3, pp. 353-362, 1983.
[23] Ewa Kubicka, Grzegorz Kubicki and Ignatios Vakalis,
Using Graph Distance In Object Recognition, 1990 ACM
Eighteenth Annual Computer Science Conference
Proceedings, ACM, New York, NY, pp. 43-48, 1990
[24] N. Bourbakis, A Rule-Based Scheme For Synthesis
Of Texture Images, Int. IEEE Conf. on systems, Man and
Cybernetics, Fairfax, VA, pp. 999-1003, 1987.
[25] Andrew D.J. Cross and Edwin R. Hancock, Graph
Matching With A Dual-Step EM Algorithm, IEEE Trans.
on Pattern Recognition and Machine Intelligence, vol. 20,
no. 11, pp. 1236-1253, 1998.
[26] Augusto Celentano and Eugenio Di Sciascio, Feature
Integration and Relevance Feedback Analysis In Image
Similarity Evaluation, J. Electronic Imaging, vol.7, no. 2,
pp. 308-317, 1999.
[27] Alex Pentland and Stan Sclaroff, Closed-Form
Solutions for Physically Based Shape Modeling And
Recognition, IEEE Trans. PAMI, 13, 7, pp. 715-729, 1991.
[28] Hu MK, Visual Pattern Recognition by Moments
Invariants, IRE Transaction of Information Theory, IT. 8,
pp. 179-187, 1962.
[29] L. Rabiner, B-H Juang, Fundamentals of Speech
Recognition, Prentice Hall, 2006
[30] Butterworth B. L., Hadar U. Gesture, speech, and
computational stages: a reply to McNeill. Psychological
Review, 96, 168-174, 1989.
[31] C. Town, D. Sinclair, Ontological Query Language for
Content Based Image Retrieval, Proc. IEEE Workshop on
Content-based Access of Image and Video Libraries,
2001,pp.75-80
[32]C. Town, Ontology based Visual Information
Processing, PhD thesis, Dec. 2004, UK.
55

Image To Text Ieee Paper

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Image To Text Ieee Paper

Загружено:

Авторское право:

Доступные форматы

2008 20th IEEE International Conference on Tools with Artificial Intelligence

Automatic Image-to-Text-to-Voice Conversion for

II. THE IMAGE-to-TEXT METHODOLOGY

In the scientific literature the most relevant work

1082-3409/08 $25.00 2008 IEEE

A1. Region based Image Segmentation

one generated by the contour of a segmented region. Thus,

result is more accurate in terms of perception and more

(a) Line fitted

Our Local-Global (L-G) graph method adds local part

(b) objects local graph

The Image Global Graph attempts to emulate a human-like

The Skeleton Graph is also a part of the L-G graph by

where Pi is a node that represents a region graph, its color,

Figure 4: Use local graph to find the common edge between

An important feature of the L-G graph is its ability to

Figure 5. It shows the 5 consecutive steps of the

NLs1= the straight line segment N1 is connected with (140o

Regions Level Generic Rule:

Following a similar approach with region representation

B1. Image to NL Text Conversion

Objects Extraction & LG

Object Recognition &

Conversion of LG representations into NL-LG

Thus, the conversion of the G1 graph into NLs by using the

Fig. 6: The steps of the methodology for

NL Sentences Conversion into Speech

Below we present an example to show the capabilities of

LG = N1ac12N2...N8ac81N1 @ N1ap17N7 @...

Fig. 8: A real example in a home environment

[05] D.Geiger, T-L.Liu and R.Kohn, Representation and

[22] Alberto Sanfeliu and King-Sun Fu, A Distance

Вам также может понравиться