Intelligent Robots - Natural Language Processing

IEEE TRANSACTIONS ON ROBOTICS 1
1 Human–Robot Communications of Probabilistic

2 Beliefs via a Dirichlet Process Mixture of Statements
3 Rina Tse , Member, IEEE, and Mark Campbell, Fellow, IEEE
4 Abstract—This paper presents a natural framework for are often found valuable to humans for their planning and deci- 41
5 information sharing in cooperative tasks involving humans and sion making. As a result, the communication between humans 42
6 robots. In this framework, all information gathered over time by and robots regarding their probabilistic beliefs has become es- 43
7 a human–robot team is exchanged and summarized in the form
8 of a fused probability density function (pdf). An approach for an sential, not only for their successful interactions (by making it 44
9 intelligent system to describe its belief pdfs in English expressions more transparent what one another is thinking), but also for their 45
10 is presented. This belief expression generation is achieved through successful task collaborations (by sharing the same information 46
11 two goodness measures: semantic correctness and information for planning and decision making). In essence, the probabilis- 47
12 preservation. In order to describe complex, multimodal belief tic information exchange is important when humans and robots 48
13 pdfs, a Mixture of Statements (MoS) model is proposed such
14 that optimal expressions can be generated through compositions are working collaboratively on the same task, when robots are 49
15 of multiple statements. The model is further extended to a working for humans, when humans and robots need to operate 50
16 nonparametric Dirichlet process MoS generation, such that the on the same objects, or simply when they coexist in the same 51
17 optimal number of statements required for describing a given environment. 52
18 pdf is automatically determined. Results based on information For example, in disaster management and emergency re- 53
19 loss, human collaborative task performances, and correctness
20 rating scores suggest that the proposed method for generating sponse, human and mobile robot teams are commonly deployed 54
21 belief expressions is an effective approach for communicating in search missions to rescue earthquake victims trapped inside 55
22 probabilistic information between robots and humans. a collapsed building, or to rescue lost hikers in a forest [7], [8]. 56
23 Index Terms—Belief expression generation, belief clustering,

The goal of the human and robot rescue teams is to plan their 57
24 Dirichlet process (DP) mixtures, human robot communications, individual paths such that the team has the highest probability 58
25 Mixture of Statements (MoS), natural language processing (NLP). of reaching the victim [9]. In addition, human rescue team man- 59
ager needs to plan the team’s resource allocations, such as food 60
or medicine. In order for both humans and robots to perform 61
26 I. INTRODUCTION planning and decision making in these missions, probabilistic 62
ESEARCH work has shown that a robot’s belief in an information about the searched victim locations is required [8]. 63
27
28
29
R environment with uncertainties can be effectively repre-
sented as a probability density function (pdf) [1]. Through a
This information must be gathered by the heterogeneous team
members, and then shared among all of them. A similar appli-
64
65
30 Bayesian approach, a belief represented as a pdf can be used to cation of probabilistic information sharing between humans and 66
31 summarize all information gathered by a robot. Previous work robots is in the context of security, where a team of human and 67
32 in many areas of robotics, such as localization [2], target track- robot patrol units are deployed to search for a criminal suspect or 68
33 ing [3], sensor fusion [4], [5], and probabilistic reasoning [6] a hidden bomb [10], [11]. To accomplish the mission, the prob- 69
34 has been done to allow a robot to estimate or predict the state of abilistic information regarding the suspect or terrorist activity 70
35 a subject of interest based on information from various sources locations must be collaboratively gathered and shared among 71
36 in a probabilistic manner. all human and robot agents so that they can plan their search 72
37 Probabilistic information gathered and represented in a form trajectories and make high-level strategic decisions accordingly 73
38 of a pdf plays an important role for robots in performing funda- [12], [13]. 74
39 mental tasks, such as trajectory planning, prediction, and deci- Furthermore, in the contexts of service robot or personal robot 75
40 sion making. Furthermore, the information gathered by robots applications, a robot’s job is to assist humans in their daily tasks. 76
A lot of research work has been done in order for humans and 77
robots to naturally operate in the same environment, such as in 78
people’s households or offices, or in public areas such as malls, 79
Manuscript received September 9, 2017; revised February 7, 2018; accepted airports, or museums. In these scenarios, humans and robots 80
April 6, 2018. This paper was recommended for publication by Associate Editor inevitably need to communicate with one another probabilis- 81
J. Piater and Editor A. Billard upon evaluation of the reviewers’ comments. This
work was supported by the National Science Foundation (NSF IIS-1427030). tic information regarding their shared objects and environment 82
(Corresponding author: Rina Tse.) [14]. For example, a human might ask a home assistant robot 83
The authors are with the Autonomous Systems Laboratory, 550/542 Upson to fetch a car key and inform the robot that the key is probably 84
Hall, Cornell University, Ithaca, NY 14853 USA (e-mail:, rt297@cornell.edu;
mc288@cornell.edu). on the dining table or on the bedside table. The robot, check- 85
Digital Object Identifier 10.1109/TRO.2018.2830360 ing both locations and not detecting any keys, then performs an 86
1552-3098 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
2 IEEE TRANSACTIONS ON ROBOTICS
processing (NLP): physical semantic representation and ground- 130

ing for language understanding in steps (1) and (2), and content 131
generation for language generation in step (4). It, however, as- 132
sumes that parsing, part-of-speech (POS) tagging, and surface 133
realization are performed separately in order to process general- 134
form input–output sentences. 135
The paper is outlined as follows. Section II discusses previ- 136
ous work related to two-way belief communications. Section III 137
provides a summary of technical approaches. Section IV sum- 138
marizes human observation expression grounding and belief 139
fusion formulations. Section V discuss the modeling of human 140
semantic likelihood. Section VI introduces a formal definition of 141
Fig. 1. Two-way information exchange between a robot and humans.
a belief expression generation problem and proposes two belief 142
expression goodness measures. Section VII extends Section VI 143
87 inference based on its prior information and suggest to the hu- by proposing nonparametric Mixture of Statements (MoS) com- 144
88 man that the key is probably in the human’s coat pocket or left position methods for belief expression generations. Section VIII 145
89 inside the car. In these scenarios, the communication of prob- provides an illustrative example of the belief communica- 146
90 abilistic beliefs is required for human and robot to understand tion methods presented. Section IX presents the evaluation 147
91 each other’s perception of the world, and hence, the rationales results and discussions. Finally, conclusions are presented in 148
92 behind each other’s decisions and behaviors. Section XI. 149
93 In existing human–robot collaboration frameworks, the prob-
94 abilistic information shared between human and robot is usually
II. RELATED WORK 150
95 communicated and displayed in their raw pdf format, which is
96 directly available from the robot’s fusion and estimation out- In previous research work [15], [17], human-to-robot infor- 151
97 puts. Even though the pdfs are naturally consumable by the al- mation acquisition and sharing techniques were developed to 152
98 gorithms running on a robot, they are found to be less convenient address the language grounding subproblem in step (1) of Fig. 1, 153
99 in practical communications with human users. For example, in and the information fusion subproblem in steps (2) and (3) of 154
100 the studies of human–robot cooperative search framework [15], the information exchange loop. However, the fused beliefs gen- 155
101 [16], the process of communicating probabilistic information erated in step (3) were left to be interpreted by the human 156
102 is found to be burdensome to human user for several reasons: recipients in their raw pdf format. Similar work has been devel- 157
103 First, while robots find it easy to process pdf data, humans are oped to address human-to-robot information acquisition tasks 158
104 more fluent in digesting natural language, such as English ex- in steps (1) and (2). Frost et al. [18], [19] proposed probabilistic 159
105 pressions. Moreover, natural language statements are a more models for occupancy grid map updates with spatial linguistic 160
106 compact form of communications compared with pdfs. These descriptions. Bishop et al. [20]–[22] defined likelihood func- 161
107 natural language expressions can be disseminated to humans in tions for spatial language expressions using random set models 162
108 a form of texts or speeches, via text messaging, tweets, or radio that allows for recursive Bayesian belief update. None of these 163
109 messages. This is especially important for human users such as considered the communication of fused beliefs back to humans. 164
110 police or search and rescue officers patrolling in the fields. Nat- Another area of research work related to physical grounding of 165
111 ural language communications of a pdf will be able to help free natural language expressions in step (1) regarding a subject of 166
112 the human users from carrying and looking at a large screened interest is, e.g., object identification based on human referring 167
113 device while working on a cooperative mission. expressions [23], [24]. In these tasks, the goal of the robot is to 168
114 This research work endeavors to provide seamless, two-way identify the referred objects based on human’s expressions and 169
115 information exchange between robots and humans, the former the robot’s direct observation of objects’ physical attributes. As 170
116 operating on pdfs, while the latter on English sentences. Fig. 1 il- such, the robot’s objective is not to infer hidden state attributes 171
117 lustrates a flow of two-way information sharing between humans through information gathering. Therefore, these referring ex- 172
118 and a robot. First, human users communicate their observations pression understanding frameworks do not include cooperative 173
119 about the state of the subject of interest using structured En- information fusion in steps (2) and (3), or belief expression 174
120 glish sentences, as indicated by (1). English sentences from all in step (4) as a part of their objectives. Additionally, research 175
121 the users are subsequently grounded as probabilistic functions work in the area of human command understanding maps nat- 176
122 over physical quantities, which are then fused with the robot’s ural language commands to physical entities. For example, in 177
123 current belief as indicated by (2). The robot produces a final up- [25]–[27], a generalized grounding graph was developed to in- 178
124 dated belief by also fusing other pieces of information obtained corporate a command’s hierarchical and compositional semantic 179
125 from its own perception, such as camera or LIDAR sensor data, structure in the grounding. Similarly, the work in [28] and [29] 180
126 as represented by (3). The summarized belief is then trans- grounds natural language instruction to physical world repre- 181
127 lated and communicated to the human agents in English sen- sentations. The grounding was performed onto an abstracted 182
128 tences, as indicated by (4). The proposed framework, therefore, layer with classified segmented physical space represented by 183
129 focuses on solving two key subproblems in natural language a Voronoi random field map. None of the command grounding 184
TSE AND CAMPBELL: HUMAN–ROBOT COMMUNICATIONS OF PROBABILISTIC BELIEFS VIA A DP MIXTURE OF STATEMENTS 3
Fig. 2. Learned MMS likelihood models p(D i = d|X i = x, X l i = xl ) for human observations “The <subject> is <preposition> <reference>,” where the
prepositions d are “next to,” “nearby,” “far from,” and the reference xl is at [0, 0]T . (a) Next to. (b) Nearby. (c) Far from.
185 frameworks addresses cooperative perception problem, where statistical data communications. Such previous work, however, 229
186 information gathering and belief updates are the primary com- is typically limited to describing probabilities of scalar, binary 230
187 munication objective; neither do they consider belief expression variables rather than more complex probability densities of 231
188 problems. multidimensional, continuous state variables as presented in 232
189 Related work on robot-to-human communications for the this paper, which are commonly considered in general sensor 233
190 task of describing the state of the subject of interest in step fusion or estimation problems. Related research work on 234
191 (4) includes referring expression generation (REG) [30], [31]. language generation framework that involves physical state 235
192 REG problems can be found in the contexts of, for instance, grounding also includes plan recognition [46], [47], which 236
193 referring to an object in situated dialogues [32], [33], referring produces statements summarizing the robot’s activities given 237
194 to a person or an object in a given image [34], [35], or referring the estimated robot states, and question or request generations 238
195 to a specific image within a set of images [36]. In typical [48], [49], where questions or requests for human assistance are 239
196 REG work [37], the communication goal is to identify an produced from a predefined sequence of deterministic states 240
197 object, i.e., the referent, in a given scene by describing the and actions. None of these language generation frameworks is 241
198 referent’s properties, such as its size, shape, color, or its spatial suitable for generating belief expressions for mixed multiagent 242
199 relation to another object, etc., [38]–[40], and at the same time, systems working on cooperative information gathering tasks. 243
200 distinguishing the referent from the other objects (i.e., distrac- This paper presents an information sharing framework be- 244
201 tors) in the environments. These expression generations are tween humans and robots, providing each party with information 245
202 typically conditioned on deterministic variables, representing presented in a form they can immediately process and utilize. 246
203 deterministic knowledge of the referent and the scene based on From human users’ perspective, the system permits inputs and 247
204 a complete observation of an image, e.g., each object’s location outputs in structured English, on which further interpretation 248
205 or RGB/intensity, etc. Therefore, the expressions generated are and reasoning can be completed naturally. From the robot’s 249
206 not based on a distribution over hidden referent states produced perspective, the interface allows information communication in 250
207 from information fusion or estimation algorithms. While some terms of pdfs, on which conventional information fusion and 251
208 previous REG work incorporate uncertainties into several parts estimation algorithms can be seamlessly integrated. This pa- 252
209 of their models, the stochastic elements are typically contained per extends the work by Tse and Campbell [50] to provide 253
210 within attribute description grounding models, or target detailed derivations and explanations of the proposed belief 254
211 selection probability [35], [41], [42]. By the standard definition expression generation models, including the variants of non- 255
212 of REG problems, the referent is still assumed to be a fully parametric Dirichlet process (DP) MoS generation, and also to 256
213 observed object, with deterministic state attributes. In contrast, include results and discussions from new sets of evaluation data, 257
214 in this paper, the goal is to describe a probability density rather including empirical human trial results. 258
215 than a deterministic quantity. As a result, the belief expression
216 generation problem defined in this paper entails a different
217 set of generation requirements than in REG problems. For III. SUMMARY OF TECHNICAL APPROACHES 259
218 example, the sentence generated must faithfully describe the This section provides a brief overview of the technical ap- 260
219 dispersedness/peakedness of the input density and avoid sound- proaches for the belief communication system. The goal of the 261
220 ing more/less certain regarding the hidden subject of interest system is to exchange probabilistic information between humans 262
221 than the input belief actually is. Additionally, the correctness and robots regarding the state of a subject of interest X. An ex- 263
222 of the generated expression must be evaluated by taking into ample scenario of belief communication is shown in Figs. 3–6 264
223 account the fact that the actual state attribute is hidden, and can for the communication of crime suspect location between police 265
224 take any value under the input probability distribution. Previous officers and security robots. 266
225 research work has also been done in order to automatically The information flow starts with mixed sources of informa- 267
226 interpret or generate natural language expressions describing tion regarding the subject’s state, including sensor measurement 268
probabilistic quantities [43]–[45]. This research area has been r
227 readings ζ1:i obtained by robots, as well as observation sentences 269
228 found useful in applications, such as risk communications or ζ1:i expressed by humans. These inputs I = {ζ1:i
h r h
, ζ1:i } are used 270
TABLE I
NOMENCLATURE
Fig. 3. Belief over subject’s location given robot measurements.
multimodal softmax (MMS). Examples of human expression 277

Fig. 4. Grounded semantic likelihood of human’s observation statements. MMS likelihood models are illustrated in Fig. 2. Given a human 278
observation sentence, “The suspect is 60% likely nearby Ives 279
Hall, or 40% likely at Barnes Hall,” the likelihood is shown in 280
Fig. 4, resulting in an updated posterior belief shown in Fig. 5. 281
In general, a belief pdf generated via Bayesian updates sum- 282
marizes all information obtained so far. The fused belief regard- 283
ing the subject of interest can be utilized by a mixed, multiagent 284
team by directly sharing the raw pdf among all robot team mem- 285
bers, as well as expressing the same pdf in natural language sen- 286
tences and sending them to all human team members. The belief 287
expression problem is mathematically defined by considering 288
two goodness criteria to be optimized: semantic correctness, and 289
information preservation. Furthermore, in order to allow expres- 290
Fig. 5. Fused belief given both robot and human inputs. sions of complex, multimodal pdfs, a statement composition is 291
formulated via a MoS model, which automatically breaks down 292
a pdf into multiple underlying hypotheses. The objective func- 293
tion of MoS generation via information preservation is shown 294
to be equivalent to that of a classical clustering problem on the 295
data {Xn ; n = 1, ..., N } sampled from the belief b(x), as illus- 296
trated in Fig. 6. Therefore, this belief expression problem can 297
be solved using various clustering techniques, such as expecta- 298
tion maximization (EM) or Bayesian nonparametric DP mixture 299
modeling. For the example in Fig. 6, the sentence generated by 300
DP MoS is “The subject is 61.79% likely nearby Ives Hall, or 301
27.08% likely at Barnes Hall, or 11.13% likely at Sage Chapel.” 302
Fig. 6. States X 1 :N generated from the belief used by DP MoS generation. The list of variables used in belief communication processes 303
discussed in the subsequent sections are summarized in Table I. 304
271 to update the belief distribution b(x; I) via Bayesian methods. A

272 regular measurement update equation is first used to fuse robot’s
273 measurement readings and obtain an updated belief of the sub- IV. LANGUAGE UNDERSTANDING AND BELIEF FUSION 305
274 ject’s state as shown in Fig. 3. Belief updates given human input This section describes a mathematical formulation for a re- 306
275 expressions are performed using a Bayesian update equation, cursive update of robot’s belief pdf over the state of interest 307
276 with likelihood models of human expressions represented by based on the work in [17]. The inputs used for robot’s belief 308
309 updates are in the forms of both traditional sensor measurement expression likelihood model can be viewed a probabilistic clas- 348
310 readings as well as human expressions in structured English. sification of the subject’s state space. This input space partition- 349
ing can be described by probabilistic functions, such as logistic, 350
311 A. Recursive Belief Update softmax, or MMS [51], [52]. In this paper, human expression 351
likelihood is modeled by an MMS, with the MMS parameters 352
312 Let Xi ∈ X be the state of the subject of interest (e.g., a
learned from maximum-likelihood estimation based on labeled 353
313 tracked object’s, or the robot’s own location) at time i. Let ζir
expression training dataset. A key advantage of MMS compared 354
314 be the robot’s sensor measurement reading (e.g., from a cam-
with log-linear models such as logistic or softmax is that it can 355
315 era or LIDAR). A human’s observation expression ζih at time i
represent nonconvex, multimodal, nonlinear decision bound- 356
316 is defined according to the structured language sentence “The
aries, as shown below. 357
317 <subject> is <prepositioni > <referencei >,” (e.g., “The sus-
A softmax function (multinomial logistic function) is a gener- 358
318 pect is nearby gas station A,” or “The injured victim is behind the
alization of logistic function from binary to multiclass classifica- 359
319 robot”). More specifically, the expression ζih at time i is speci-
tion. The softmax likelihood of Xi = x ∈ X , where X ⊆ RM , 360
320 fied by a preposition Di and a landmark or a reference state Xl i :
being labeled as class Di = d is 361
321 ζih (Di , Xl i ). Defining the set of robot and human measurement
322 inputs over time as ζ1:i r
≡ {ζ1r , ..., ζir }, and ζ1:i
h
≡ {ζ1h , ..., ζih }, exp(wdT x̃)
323 the belief over the subject’s state at time i, p(Xi |ζ1:i r h
, ζ1:i ), is cal- p(Di = d|Xi = x) = N D ; x̃ = [1 x1 x2 ... xM ]T
r h h=1 exp(w T x̃)
324 culated given the inputs ζ1:i and ζ1:i through recursive Bayesian h
(4)
325 update steps as follows.
where Di = d ∈ {1, ..., ND }; ND ≥ 2. The partition bound- 362
326 First, the belief is propagated to the new time step i according
aries are the equiprobability lines p(Di = h|Xi = x) = 363
327 to the stochastic dynamics p(Xi |Xi−1 ) of the state of interest
p(Di = k|Xi = x) between each pair of classes (h, k), i.e., 364
p(Xi |ζ1:i−1
r h
, ζ1:i−1 ) where log odds ratio is zero 365

= p(Xi |Xi−1 )p(Xi−1 |ζ1:i−1
r h
)dXi−1 . p(Di = h|Xi = x)
, ζ1:i−1 (1) ln = (wh − wk )T x̃ = 0. (5)
X p(Di = k|Xi = x)
328 The belief is then updated given traditional robot’s sensor
As a result, while logistic functions are restricted to linear 366
329 measurement reading at time i as
partitioning of the input space, softmax functions extend the 367
p(ζir |Xi )p(Xi |ζ1:i−1
r h
, ζ1:i−1 ) classification boundaries to piecewise linear ones. 368
p(Xi |ζ1:i
r h
, ζ1:i−1 )= r |X )p(X |ζ r
(2)
X p(ζi i , ζ
i 1:i−1 1:i−1
h )dX i
Nonetheless, softmax partitions are still restricted to convex 369
cases: Let x̃1 and x̃2 belong to class h 370
330 where p(ζir |Xi ) is the traditional sensor measurement’s like-
331 lihood model obtained from calibration or from the manufac- (wh − wk )T x̃1 ≥ 0
332 turer’s specifications.
333 Belief update given human’s observation expression and (wh − wk )T x̃2 ≥ 0 ∀k = h. (6)
334 ζih (Di , Xl i ) at time i is
All points on the line segment (1 − t)x̃1 + tx̃2 also belong 371
p(Xi |ζ1:i
r h
, ζ1:i ) = p(Xi |ζ1:i
r h
, ζ1:i−1 , ζih (Di , Xl i )) to class h 372
p(Di |Xi , ζ1:i

r h
, ζ1:i−1 , Xl i )p(Xi |ζ1:i
r h
, ζ1:i−1 , Xl i ) (wh − wk )T ((1 − t)x̃1 + tx̃2 ) ≥ 0 ∀t ∈ [0, 1]. (7)
=
X p(Di |Xi , ζ1:i , ζ1:i−1 , Xl i )p(Xi |ζ1:i , ζ1:i−1 , Xl i )dXi
r h r h
An MMS generalizes softmax functions to the case of non- 373
p(Di |Xi , Xl i )p(Xi |ζ1:i r h
, ζ1:i−1 ) convex, multimodal partitions, by dividing each class d ∈
=
374
(3)
X p(Di |X i , X l i
)p(X i |ζ r , ζ h
1:i 1:i−1 )dX i {1, ..., ND } into sd mutually exclusive and exhaustive sub- 375
classes Z = k ∈ σ(d); |σ(d)| = sd . The set of all subclasses 376
335 where similar to the case of traditional sensor measurement,
is defined as Ψ = {1, ..., S}, where the set of all relevant sub- 377
336 p(Di |Xi , Xl i ) is the likelihood model of human’s English ex- D
337 pression. The formulation of this human expression likelihood classes in d is σ(d) ⊆ Ψ and N d=1 sd = S. The likelihood of 378
338 model is given in Section V. the state Xi = x ∈ X ⊆ RM being described with a preposition 379
Di = d is 380
V. SEMANTIC-LIKELIHOOD MODEL
k ∈σ (d) exp(wk x̃)
339 T
p(Di = d|Xi = x) = S . (8)
This section discusses the semantic-likelihood model-
h=1 exp(wh x̃)
340 T
341 ing necessary for both belief fusion and belief expres-
342 sion generation. Given an expression “The <subject> is For an M -dimensional input space, the S · (M + 1) param- 381
343 <prepositioni > <referencei >,” the likelihood p(Di = d|Xi = eters of MMS weight vectors w1,...,S can then be learned by 382
344 x, Xl i = xl ) maps from the hidden subject’s state space maximizing the probability p(Di = d|Xi = x) in (8), given a 383
345 X to the interval [0, 1]. At each state value x, the likeli- training set labeling {(Di , Xi )i=1:N }. This maximization can 384
346 hood
N D sums to one over the discrete choices of prepositions: be done through standard nonlinear optimization algorithms, 385
347 d=1 p(Di = d|Xi = x, Xl i = xl ) = 1. As such, the human e.g., quasi-Newton methods. 386
387 VI. LANGUAGE GENERATION FROM FUSED BELIEF: space” always yields the maximum correctness probability of 433
388 PROBLEM FORMULATION one according to (10), and therefore would always be chosen, 434
389 In this section, two goodness measures for language genera- should we allow it among the expression choices. As a result, 435
this criterion is suitable when the correctness probability of the 436
390 tion from probabilistic beliefs are presented. The first optimizes
391 the semantic correctness probability of the sentence generated. generated expression is more critical to the application than the 437
specificity of the expression. 438
392 This measure is useful in the applications where the expres-
In a common situation where a semantic likelihood 439
393 sion’s correctness probability is more critical than the speci-
394 ficity. The second measure is based on information preservation p(D|X, Xl , Θ1,...,N D ) is represented by an MMS model, and a 440
belief p(X|I, Θ1,...,N D ) is represented by a Gaussian mixture, 441
395 in communicating the belief pdf. This measure is useful in the
the integral in (10) can be efficiently calculated by performing 442
396 applications where the expression’s correctness probability as
397 well as its specificity are important. The developments in sub- variational Bayesian importance sampling (VBIS) [17], [53]. 443
The optimal statement can then be generated with a preposi- 444
398 sequent sections will be based on the information preservation
tion determined based on the following objective 445
399 formulation.
D∗ = arg max p(D|I, Xl , Θ1,...,N D ). (11)
400 A. Semantic Correctness D
401 First, define a belief distribution over the subject’s state of Maximizing the objective in (11) is equivalent to maximizing 446
402 interest X as the semantic correctness probability of the generated sentence 447
when describing the hidden state, given the information gathered 448
b(x) ≡ p(X = x|I, Θ1,...,N D ) (9) so far. 449
403 where I denotes all information acquired previously by the

404 robot through its own perception or through interactions with B. Information Preservation 450
405 the other agents. This belief distribution can be obtained as an An alternative approach to belief expression formulation is 451
406 output from any sensor fusion algorithms, or from human–robot to consider the communication problem from an information 452
407 communication processes, such as those presented in Section IV- theoretic perspective. As discussed in the previous section, a 453
408 A, i.e., I = {ζ1:i
r h
, ζ1:i }. Θ1,...,N D denote parameters describing belief maintained by a robot can be described as b(x) ≡ p(X = 454
409 all human semantic models in an a priori learned dictionary. x|I, Θ1,...,N D ), where I = {ζ1:i r h
, ζ1:i } is the supporting infor- 455
410 Given the learned semantic models and all gathered informa- mation gathered so far by a robot. Since I contains all in- 456
411 tion I, the correctness probability of using a preposition D to formation generating the belief, one way to fully preserve all 457
412 describe a hidden subject’s state X with respect to a reference information content in a belief is by communicating all orig- 458
state Xl can be defined and computed as follows r h r h
413 inal messages ζ1:i , ζ1:i , or by compressing ζ1:i , ζ1:i into one 459
summary expression. However, general belief expression gen- 460
p(D|I, Xl , Θ1 , . . . , N D )
eration problems entail another complication: the full message 461

history {ζ1:i
r h
, ζ1:i } is usually discarded after belief update steps. 462
= p(X = x|I, Xl , Θ1 , . . . , N D )p(D|X = x, I, Xl , Θ1 , . . . , N D )dx
X As a result, the actual information content underlying a belief 463
pdf is typically hidden to an expression generation algorithm. 464
= p(X = x|I, Θ1 , . . . , N D )p(D|X = x, Xl , Θ1 , . . . , N D )dx. (10) An expression generation problem can therefore be regarded as 465
X
a problem of recovering a hidden I from a fused distribution 466
414 The first term in (10) is basically the belief distribution b(x). 467
415 b(x) being expressed, while the second term is the semantic- To find an output expression that best summarizes the infor- 468
416 likelihood model described in Section V. That is, the correctness mation content I underlying a belief, an information loss is eval- 469
417 probability of a generated expression is determined by the prod- uated between the original belief b(x) ≡ p(X = x|I, Θ1,...,N ) 470
D
418 uct between 1) the probability of the subject being at each state and a representative distribution constructed from the recovered 471
419 x in the state space, and 2) the probability for the expression to information Î 472
420 be semantically correct in describing all these particular states.
421 We have noticed that the second term, i.e., the semantic like- r(x) ≡ p(X = x|Î, Θ1,...,N D ). (12)
422 lihood, is not normalized over the state space X . This signifies
423 the fact that the prepositions with broad meanings can have Given a reference state of interest Xl , a minimum information 473
424 high semantic applicability likelihood across large areas in the loss expression ζ(D∗ , Xl ) can be defined by minimizing the 474
425 state space, while prepositions with narrow or specific mean- Kullback–Leibler divergence (KLD) between the representative 475
426 ings would have high likelihood in only small regions. For this distribution r(x) and the original distribution b(x) as follows: 476
427 reason, in order to maximize the chance of being correct, the D∗ = arg min KL(b(x)||r(x)) (13)
428 semantic correctness criterion naturally prefers a more general D
429 statement by choosing a preposition with a larger support when-
where 477
430 ever such statement is semantically applicable to (i.e., covers)
431 the support of the given belief distribution. As an extreme illus- b(x)
KL(b(x)||r(x)) ≡ ln b(x)dx. (14)
432 trative example, the statement “The subject’s state is in the state X r(x)
478 Since KLD can be considered as a dissimilarity measure be- more than one statement. Belief distributions with multiple- 520
479 tween two pdfs, the information preservation objective intends hypotheses content can be formally represented by a mixture 521
480 to preserve the original shape (both the peaks and the valleys) of model [54] as discussed next. 522
481 the distribution b(x). That is, the reconstructed distribution must First, assuming that a simple belief b(x) consists of only 523
482 preserve both the positive information of where the hidden state one underlying piece of information, which can be summarized 524
483 X would likely be, as well as the negative information of where by a single statement I ≈ {ζ(D, Xl )}, then b(x) ≡ p(X = 525
484 it likely would not be, in order to minimize the KLD. As a result, x|I, Θ1,...,N D ) can be represented by 526
485 the information preservation criterion is more advantageous in
486 applications where the semantic correctness probability of the r(x) = p(X = x|ζ(D, Xl ), Θ1,...,N D ). (19)
487 generated expression should be balanced with the expression’s
488 specificity. In the case where the reference state parameter Xl is fixed, 527
489 It can be shown that minimizing the KLD in (14) is equivalent the language generation problem reduces to the problem defined 528
490 to maximizing the expectation of the log of r(x) over x ∼ b(x). by (17); note that ζ(·) is a deterministic function. 529
491 This expectation can be approximated by performing sampling The language generation problem can be extended to allow 530
492 on b(x) for {xn ; n = 1, ..., N } a description of complex belief distributions, by breaking down 531
the expression into a composition of K single-statement hy- 532
D∗ = arg max Eb(x) [ln r(x)] (15) potheses via a conjunction “or” 533
D
493 where “The <subject> is probably <probability 1> % likely 534

<hypothesis 1>, or ... , or <probability K> % likely 535

N
Eb(x) [ln r(x)] ≈ ln r(xn ). (16) <hypothesis K>,” 536
n =1
where each hypothesis k is described by a single-statement 537
494 That is, the optimal statement can be determined based on the ζ(Dk , Xl k ) as defined in previous sections. 538
495 following maximum likelihood objective: In this case, the representative belief r(x) reconstructed from 539

N the generated expression becomes 540
D∗ = arg max ln p(X = xn |ζ(D, Xl ), Θ1,...,N D ) (17)

D r(x) = p(X = x|ζ1,...,K , Θ1,...,N D ); ζk = ζ(Dk , Xl k ). (20)
n =1
496 where The probability of each hypothesis Z = k ∈ {1, ..., K} being 541
responsible for the belief of x can also be explicitly represented 542
xn ∼ p(X = xn |I, Θ1,...,N D ). (18) and then marginalized out as follows: 543
497 This maximum-likelihood problem can be solved by many r(x) = p(X = x|ζ1,...,K , Θ1,...,N D )
498 existing algorithms, some of which are applied in Section VII,

K
499 coupled with the formulation of related expression generation = p(X = x, Z = k|ζ1,...,K , Θ1,...,N D )
500 subproblems. k =1
501 VII. LANGUAGE GENERATION FROM FUSED BELIEF:

K
= p(X = x|ζk , Θ1,...,N D )p(Z = k|ζ1,...,K , Θ1,...,N D).
502 STATEMENT COMPOSITION
k =1
503 In this section, composition methods are developed such that (21)
504 an expression describing an input belief can be optimally gen-
505 erated through a composition of more than one statement. Ad- Notice that the reconstructed belief r(x) yields a mixture 544
506 ditionally, in previous section, the reference state of interest Xl model of hypotheses k = 1, ..., K. Therefore, an expression 545
507 is treated as a fixed parameter in the communication process. In generated via statement compositions using “or” conjunctions 546
508 this section, the optimal reference state parameters of all state- is referred to as a Mixture of Statements. 547
509 ments in the composition are simultaneously estimated. Finally, Each hypothesis k’s weight πk and likelihood Pk are defined 548
510 a technique for automatically determining the optimal number as follows: 549
511 of statements required to describe a belief is presented. πk ≡ P (Z = k|ζ1,...,K , Θ1,...,N D ) (22)
512 A. Mixture of Statements 550
513 Oftentimes, a complex belief cannot be described by a sin- and

514 gle statement; this is appropriate given that the belief itself is
515 formulated via multiple hypotheses produced from multiple in- Pk (x) ≡ p(X = x|ζk , Θ1,...,N D )
516 formation sources during an information acquisition process. = p(X = x|Dk , Θk ); Θk ≡ {Xl k , Θ1,...,N D }
517 Therefore, this section extends the language generation process
p(Dk |X = x, Θk )p(X = x|Θk )
518 from single-statement generation described in Section VI to = . (23)
519 the case where an optimal expression can be composed using p(Dk |X = x, Θk )p(X = x|Θk )dx
such as EM or Bayesian nonparametric DP mixture modeling. 582

The classical parametric maximum likelihood solution via EM 583
is provided in Appendix to show the parallelism between belief 584
summarization and clustering problem, and to serve as a step- 585
ping stone toward nonparametric DP mixture discussed next. 586
C. Generating Variable-Length MoS Expressions 587
The MoS generation problem originally assumes the pre- 588

ferred number of statement K is known, as specified by the 589
user. A more general form considers the number of statements 590
Fig. 7. Probabilistic graphical model representing MoS model. Observed sam-
ples X 1 , . . . , N are shaded in gray. Note that each hypothesis statement ζk is a to be unknown. Solutions to these problems require both the 591
function of hidden statement parameters D k and Θ k . number of statements and the statements themselves to be esti- 592
mated jointly in a single framework. The problem of determining 593
551 Assuming a uniform prior on X when updating r(x) with the number of statements needed to represent a particular input 594
552 hypothesis k’s information, the term p(X = x|Θk ) then does belief can be considered a part of model selection problem [55]. 595
553 not depend on x Existing model selection methods, such as cross validation [56] 596
or information-based criterion (e.g., Akaike information crite- 597
p(Dk |X = x, Θk )p(X = x|Θk )
Pk (x) = rion [57]), can be used to determine the number of statements 598
p(X = x|Θk ) p(Dk |X = x, Θk )dx such that model fit and complexity are balanced. Cross valida- 599
p(Dk |X = x, Θk ) tion and information-based criterion approaches perform model 600

= selection as a separate process outside estimation algorithms. 601
p(Dk |X = x, Θk )dx
As such, estimation algorithms must be repeated for all values 602
∝ p(Dk |X = x, Θk ). (24) of K’s within a preselected range [Km in , Km ax ] [57]. 603
Alternatively, a Bayesian method can be used to solve model 604
554 For the problem of modeling human language grounding,
selection problem simultaneously within the estimation process, 605
555 as discussed in Section V, the distribution p(Dk |X = x, Θk )
by placing a regularization prior over the model’s parameters. 606
556 can be represented by an MMS. As a result, each hypothesis
Therefore, all parameters, including the number of statements, 607
557 likelihood Pk (x) is also represented by an MMS.
can be simultaneously inferred from the posterior distribution. 608
558 Note that the normalizing term (the area under an MMS func-
The prior over model parameters provides a regularization ef- 609
559 tion) may or may not be dependent on the parameter Θk , de-
fect such that the model complexity is balanced with the data- 610
560 pending on our choice of Θk . For example, if all MMS shape
likelihood fitness. Another benefit of a Bayesian approach is 611
561 parameters are fixed (as defined by an a priori learned word
that additional information about the model, such as human’s 612
562 semantic) and only the landmark reference location Xl k is esti-
preferences over the reference or preposition usage, can be sys- 613
563 mated, then the normalizing term is also constant with respect
tematically integrated into the model through the prior specifi- 614
564 to the parameter. The fact that the normalizing term does not de-
cation. 615
565 pend on Θk simplifies the estimation problem in that the integral
In an MoS generation problem, a nonparametric DP prior [58] 616
566 can be ignored when maximizing the likelihood with respect to
is used in this paper to determine the number of statements K. As 617
567 the parameter Θk .
an extension of a Dirichlet prior for parametric Bayesian mixture 618
568 The proposed MoS model is represented by a probabilistic
models, a nonparametric DP prior generalizes the parametric 619
569 graphical model shown in Fig. 7. Observed variables X1,...,N
Dirichlet distribution to infinite dimensions [59]. Parametric and 620
570 are shaded in gray. Note that each hypothesis statement ζk is a
nonparametric Bayesian MoS models are illustrated in Fig. 8. 621
571 function of hidden statement parameters Dk and Θk .
The parametric Bayesian MoS model in Fig. 8(a) extends the 622
MoS model in Fig. 7 by placing a Dirichlet distribution with 623
572 B. MoS Expression Generation
a concentration parameter α over the MoS weights π, and a 624
573 This section revisits the belief expression generation objec- statement prior distribution G0 over each statement ζk 625
574 tive proposed in (15) and (16) and presents belief expression
575 generation solutions for MoS compositions. First, given a set π ∼ Dir(K, α)
576 of samples S = {xn |xn ∼ b(x), n = 1, .., N } generated from ζk ∼ G0 . (26)
577 state’s belief b(x), the log likelihood of the samples based on
578 the MoS representative belief distribution in (20) is As before, a hypothesis weight and likelihood are as defined 626
K in (22) and (23) 627
N N
ln r(xn ) = ln Pk (xn )πk . (25) Zn |π ∼ Categorical(π)
n =1 n =1 k =1
Xn |ζk , Zn ∼ Pk (Xn ). (27)
579 This MoS generation objective is equivalent to that of a clas-
580 sical clustering problem. As a result, this expression generation In a nonparametric approach, the number of generated state- 628
581 problem can be solved by using various clustering techniques, ments K is not treated as a fixed parameter but is directly driven 629
formed using Gibbs sampling. The detailed implementation of 654

Gibbs sampling for DP mixture model generation is described in 655
[60], where for our case of DP MoS, each sample n’s likelihood 656
Pn (xn ) is a normalized MMS defined in (23) 657
Xn |ζn ∼ Pn (Xn ). (29)
The MoS sample that yields the highest log likelihood ob- 658
jective (25) is finally chosen as the output belief expression. 659
Variants of base distributions G0 used for DP MoS are pre- 660
sented next. 661
D. DP MoS Prior Modeling 662
One benefit of a Bayesian approach in MoS generation is that 663

Fig. 8. Probabilistic graphical models representing (a) parametric MoS with additional information regarding the properties, preferences, or 664
a Dirichlet prior and (b) nonparametric DP MoS model. Observed samples
X 1 , . . . , N are shaded in gray. Note that each hypothesis statement ζn is a deter- constraints of the generated sentence’s parameters can be easily 665
ministic function of hidden statement parameters D n and Θ n . integrated into the language generation model through prior 666
specifications. Two examples for MoS prior modeling are given 667
in this section: MoS with heterogeneous statement parameter 668
spaces, and preposition preference specifications. 669
A MoS generation problem where statements can be de- 670
scribed by nonidentical prepositions is a mixture model estima- 671
tion with heterogeneous component parameter spaces. In order 672
to solve this mixture estimation problem, the base distribution 673
G0 over statement parameters is augmented to explicitly repre- 674
Fig. 9. Prior models for DP MoS generation where mixture component pa-
sent a prior distribution over mixture parameters, conditioned 675
rameters could be sampled from multiple separate manifolds. on a mixture type D. A random distribution G resulting from 676
DP(G0 , α) with an augmented base G0 is illustrated in Fig. 9. 677
In this figure, each distribution Gd0 denotes a base distribution 678
630 by the observed samples X1,...,N , as well as the DP prior. The
over each statement’s parameter Xl , given a statement type 679
631 nonparametric Bayesian DP MoS model is shown in Fig. 8(b),
D = d. In general, each conditional base Gd0 can be defined in 680
632 which can be described as follows.
a completely different parameter space for each statement type. 681
633 First, the prior distribution of each statement ζn is represented
From the example in Fig. 9, marginalizing the base distribu- 682
634 by a random distribution G, where G is DP distributed with a
tion along the dimension of reference Xl , it can be seen that the 683
635 base G0 and a concentration α
base distribution over each type of preposition D ∈ {1, ..., ND } 684
ζn |G ∼ G is uniform. That is, there is an equal chance of generating a state- 685
ment with any of the prepositions D in the dictionary. Human’s 686
G ∼ DP(G0 , α). (28)
natural preferences over preposition usages can be further incor- 687
636 The base distribution G0 is the prior distribution over each porated into the prior model. For example, humans may prefer 688
637 statement ζn ’s parameters, without enforcing any clustering ef- a more informative prepositions (e.g., “nearby”) over those that 689
638 fect. For example, consider the G0 highlighted in red on the are less informative (e.g., “far from”). In order to simulate this 690
639 left hand side of Fig. 9. This G0 provides a Gaussian prior over behavior, the base G0 can be defined such that the marginal 691
640 the reference state Xl n parameter of each ζn statement. The weight for each preposition choice is inversely proportional to 692
641 distribution G generated according to (28) with this base G0 is the entropy of each preposition’s semantic 693
642 shown in the same figure. The distribution G provides a prior

643 over Xl n , with a DP clustering effect enforced. As a result, G hD = − f (x) log f (x)dx (30)
644 is no longer a simple Gaussian, but is a combination of G0 and X
645 the six “clusters” of discrete probability masses shown in black. where 694
646 The number of these clustered Xl n prior probability masses in
647 G are guided by the DP’s concentration parameter α, while the f (x) = p(X = x|ζ(D = d, Xl = 0− ), Θ1,...,N D ). (31)
648 locations of each Xl n prior probability mass are guided by the
649 base distribution G0 . Further information on DP mixture model The same strategy can be applied to simulate other types 695
650 can be found in [58] and [59]. of language generation behaviors, for example, by setting the 696
651 Finally, according to Fig. 8(b), given the DP prior in (28) and G0 distribution over Xl according to human’s preferences over 697
652 the hypothesis statement likelihood, the MoS parameters can be reference choices, such as map region preferences, landmark 698
653 generated from the DP MoS posterior distribution. This is per- saliencies, etc. 699
700 VIII. ILLUSTRATIVE EXAMPLE two human trials, are presented and discussed. First, automatic 755
701 This section revisits the example of the belief communication evaluations were performed to compare the information loss 756
702 system introduced in Section III. In this example, a patrol robot incurred by different belief expression strategies. Second, task- 757
based evaluations were conducted to compare human users’ 758
703 gathers and shares information regarding the location of a crime
704 suspect cooperatively with a human partner. In the first step, task performances given different types of belief expression 759
sentences. Task performances were measured in terms of search 760
705 the robot uses its own vision and localization systems to check
region F1 score. Finally, evaluations based on human ratings 761
706 if a person matching the suspect description can be detected
707 anywhere in the map. The detection and nondetection measure- and judgments were used to compare each belief expression’s 762
correctness in describing a belief pdf. Evaluation data were 763
708 ments ζir are used to update the robot’s belief via the Bayesian
collected via Amazon Mechanical Turk online crowdsourcing 764
709 measurement update equation in (2), with the detection likeli-
710 hood p(ζir |Xi ) obtained from the vision system’s calibration as platform. 765
711 well as the robot’s pose estimate uncertainty.

712 When the robot receives observation inputs ζih from human,
713 it also performs similar belief updates according to (3). First, the A. Evaluation Setups 766
714 English preposition dictionary represented by MMS models in
715 (8) are learned from the labeled dataset from [17]. The learned The evaluations were performed in the context of human– 767
716 MMS likelihood models p(Di = d|Xi = x, Xl i = xl ) for a robot cooperative search scenarios. During each search mission, 768
717 statement “The <subject> is <preposition> <reference>,” the information system gathers multiple pieces of information 769
718 where the prepositions d are “next to,” “nearby,” “far from,” describing possible search subject locations, including robot 770
719 and the reference xl is at [0, 0]T , are shown in Fig. 2. The like- measurement readings and human observation sentences. Based 771
720 lihood of the statements “The subject is 60% likely nearby Ives on all gathered pieces of information, the system generates 772
721 Hall, or 40% likely at Barnes Hall” is then obtained according a fused belief pdf summarizing the potential search subject 773
722 to the learned MMS models as shown in Fig. 4. This likelihood locations over the search space map. The fused pdf is then com- 774
723 is used for the belief update equation in (3) to obtain the final municated back to human and robot team members to inform 775
724 belief in Fig. 5. This updated belief b(x; I) contains the infor- them of the latest estimate of the search subject’s location. In the 776
725 mation I gathered by both the robot and human team members case of human team members, structured English sentences are 777
726 in the search for the suspect. used to express the fused belief. The human searchers then rely 778
727 In order to communicate this belief back to human, the robot on the fused belief information to make a decision on how they 779
728 determines the expression statements ζ1:K that best summarize would conduct the search. In these scenarios, the search subject 780
729 all the information I underlying the belief pdf. The belief ex- was a laptop thief suspect reported to the Cornell University Po- 781
730 pression process is carried out by first generating sample states lice, and the search region is central Cornell University campus. 782
731 X1:N from the belief b(x; I) as illustrated in Fig. 6. These The suspect descriptions are first provided by witnesses and dis- 783
732 states X1:N are passed to DP MoS Gibbs sampling, to sample seminated to the patrol officers by the police information center. 784
733 output English expressions according to the probabilistic graph- Subsequently, all sightings of a person matching the description 785
734 ical model shown in Fig. 8(b). The distributions defining this are to be reported back to the information center to update the 786
735 model are described in full in (28) and (29). For the DP mixture system on potential suspect locations. All reports ζ1h , ..., ζLh h 787
736 prior in (28), the base distribution G0 for this example is defined describing suspect sightings are input to the information fusion 788
737 as follows. First, the base distribution for each output reference system as described in (3), Section IV-A, with the structure “The 789
738 state parameter Xl n is uniform over the locations of all the land- suspect is <preposition> <reference>.” The preposition dictio- 790
739 marks on the map. Second, the base distribution for each output nary was learned from the human preposition dataset from [17]. 791
740 preposition Dn is uniform over all prepositions in the MMS In addition, all search agents share the common landmark dic- 792
741 dictionary. Finally, the likelihood Pn (Xn ) in (29) is given from tionary, defined by the campus map retrieved from Google Map 793
742 the same MMS likelihood dictionary used in belief update pro- shown in Fig. 10. Each report includes a probability indicating 794
743 cesses. The sentence generated by DP MoS in this case is “The how confident the human reporter is regarding the relevance of 795
744 subject is 61.79% likely nearby Ives Hall, or 27.08% likely at the information he or she provided, i.e., the probability that the 796
745 Barnes Hall, or 11.13% likely at Sage Chapel.” That is, the sys- suspect sighting information is a true detection and is not a false 797
746 tem extracts from the belief pdf a mixture model with three clus- alarm. The numbers of observation reports were sampled from 798
747 ters. Each of the clusters is described by the statement ζ1 (D1 = a Poisson distribution (λ = 5), with the maximum of nine and 799
748 “nearby”, Xl 1 = XIves Hall ), ζ2 (D2 = “at”, Xl 2 = XBarnes Hall ), minimum of two, while each input report’s preposition was sam- 800
749 and ζ3 (D3 = “at”, Xl 3 = XSage Chapel ), with the cluster proba- pled from a uniform distribution over the preposition dictionary. 801
750 bilities of 61.79%, 27.08%, and 11.13%, respectively. Each report’s reference was sampled from a uniform distribution 802
over the campus map in Fig. 10. Finally, each report’s relevance 803
probability was sampled from a uniform distribution over [0, 1]. 804
751 IX. EVALUATION RESULTS In addition to the sighting reports, the suspect location pdf is 805
752 In this section, the proposed belief communication frame- also updated by sensor measurement inputs ζ1r , ..., ζLr r generated 806
753 work is evaluated in the context of human–robot cooperative based on a 360◦ camera on a robot patrol in the search region. 807
754 search scenarios. Data from three sets of evaluations, including Each measurement consists of 808
Fig. 10. Central Cornell campus map defining reference landmarks, such as “Day Hall,” “Uris Garden,” “Statler Hotel,” etc. This landmark dictionary was shared
among all agents during search missions. This image is retrieved from Google Map.
809 1) the detection or nondetection of a person matching the

810 suspect descriptions,
811 2) the current pose estimate of the robot, and
812 3) the probability of the measurement being a correct asso-
813 ciation of the suspect.
814 The 360◦ camera’s detection likelihood is an unnormalized
815 Gaussian with a covariance of (10 m)2 · I, where I is an iden-
816 tity matrix. That is, the detection likelihood is approaching one
817 when the suspect is close to the robot, and approaching zero
818 at the suspect locations outside the robot’s field of view. The
819 nondetection information is also fused in the recursive pdf up-
820 dates. The likelihood of nondetection is one minus the detection
821 likelihood, i.e., approaching zero if the suspect is close to the
822 robot, and approaching one at locations outside the robot’s field
823 of view. The robot pose estimates are Gaussians, with the means
824 randomized uniformly over the search map, and the covariances
825 randomized from a Wishart distribution with a (0.5 m)2 · I
Fig. 11. Setups used in the three sets of evaluations. 1) Information loss: Au-
826 base covariance and 2 DOF. The probability of correct associa- tomatic evaluations based on the information loss. 2) Task F1 score: Human
827 tion of the suspect is 0.5. The suspect detection measurements subject task-based evaluations with respect to search mission’s precision and
828 are generated at every 30 s for an hour-long mission and are recall performances, as measured by the F1 scores. 3) Likert score: Evalua-
tions based on human ratings and judgments with respect to the correctness in
829 fused using the traditional sensor belief update equation in (2), describing a belief pdf.
830 Section IV-A. An example of the fused belief pdf is shown in
831 Fig. 3.
832 The summary of the three evaluation setups is illustrated in TABLE II
833 Fig. 11. In each evaluation run, the set of all inputs I(Θ) = SUMMARY OF FIVE BELIEF EXPRESSION GENERATION STRATEGIES
834 {ζ1h , ..., ζLh } ∪ {ζ1r , ..., ζLr } is generated from an independent
835 and identically distributed (i.i.d.) sample of parameters Θ’s. A
836 fused belief b(x; I(Θ)) is then produced given all inputs I(Θ).
837 Output expressions I(Θ̂) = {ζˆ1 , ..., ζˆK } describing each belief
838 pdf are subsequently generated using the five expression gener-
839 ation strategies summarized in Table II. The generated expres-
840 sions I(Θ̂) are evaluated in three sets of evaluations presented
841 in Sections IX-B–D, which are summarized as follows.
842 Automatic evaluation: In the first set of evaluations in Sec-
843 tion IX-B, information losses between the reconstructed belief
TABLE III
SUMMARY OF NORMALIZED JSD INFORMATION LOSS FOR EACH TYPE OF
SENTENCES IN DESCRIBING BELIEF DISTRIBUTIONS
844 pdfs b(x; I(Θ̂)) given the generated expressions I(Θ̂), and the
845 original belief pdfs b(x; I(Θ)) were compared among the five Fig. 12. Information loss box plot for the five belief expressions. The boxes
show the 25th percentile, median, and 75th percentile, while the whiskers
846 expression generation strategies. indicate the minimum, maximum, and the 1.5 interquartile range (IQR) outliers.
847 Online human data collections were conducted based on two
848 types of NLG system evaluations, as classified by Reiter and
849 Belz [61]: task-based evaluations and evaluations based on hu-
In terms of consistency, Chance sentences were found to be 886
850 man ratings and judgments.
the least consistent, with the largest information loss IQR of 887
851 Human Task-Based Evaluation: In Section IX-C, human task-
0.337. MoS and Truth sentences were found to be the most 888
852 based evaluations were conducted by asking humans to perform
consistent, with the smallest information loss IQR among all 889
853 suspect search tasks, given each type of generated expressions
(0.069). Finally, the SS sentences were found to produce a 890
854 I(Θ̂). Human subject task executions T (I(Θ̂)) were then com-
smaller information loss IQR (0.145) than Uninf sentences but 891
855 pared in terms of task precision and recall performances, as
larger than MoS sentences. 892
856 measured by the F1 scores.
Nonparametric Wilcoxon test was used to test the significance 893
857 Human Ratings and Judgments: In Section IX-D, human sub-
of the differences between the information loss caused by each 894
858 ject Likert score ratings were collected to assess the correctness
belief expression approach against the others. It was found that 895
859 of each generated expression I(Θ̂) in describing a given distri-
the qualities of Truth and MoS sentences were significantly bet- 896
860 bution b(x; I(Θ)). The human subject ratings were then com-
ter (p < 0.001) than all the other types of sentences but were not 897
861 pared among the five expression generation strategies.
significantly different from each other (p > 0.05). The differ- 898
ences between the qualities of Uninf and Chance sentences with 899
862 B. Automatic Evaluation: Information Loss respect to SS’s were also significant (p < 0.01 in both cases). 900
863 In the first set of evaluations, belief expression system perfor- These results indicate that the MoS sentences were better at 901
864 mances were evaluated automatically without human subjects. estimating the underlying belief parameters, and preserving the 902
865 The goal of this evaluation is to compare information losses original belief information content than the other automated 903
866 incurred from different belief expression strategies. Twenty sets methods. Furthermore, the results illustrate the benefit of allow- 904
867 of i.i.d. belief pdfs were generated by fusing suspect search in- ing a composition of multiple statements in describing a belief 905
868 put sentences and robot sensor measurements. The five sentence in contrast to restricting an expression to only one statement. 906
869 generation methods listed in Table II were used to generate five An interesting observation is that while human expert did not 907
870 output expressions for each pdf. In order to measure the in- explicitly aim to optimize any particular objective function as 908
871 formation losses between the estimated pdfs b(x; I(Θ̂)), and automatic generation algorithms do, the expert still managed to 909
872 the input belief pdfs b(x; I(Θ)), a normalized Jensen–Shannon achieve low information loss results that were not significantly 910
873 divergence (JSD) [62] criterion was used. The benefit of a nor- different from those from MoS. 911
874 malized JSD as an information loss measure is that it is bounded The sentences generated by the SS approach, while having a 912
875 within [0, 1]. As a benchmark, Truth sentence result denotes the significantly higher information loss than MoS sentences, still 913
876 performances of the belief expression sentences written by hu- performed significantly better than Chance and Uninf sentences. 914
877 man expert. The information loss results are summarized in The relatively large SS and Uninf information loss IQRs indi- 915
878 Table III and a box plot of the results is shown in Fig. 12. cate a high variation in the modality and entropy of the input 916
879 When the pdf was reconstructed using an estimated sentence pdfs used in the test cases. The fact that MoS sentences were 917
880 I(Θ̂), it was found that the median information loss was small- able to achieve both smallest information loss, as well as infor- 918
881 est when using MoS sentences (0.225) and largest using Chance mation loss IQR, indicates the reliability of the MoS approach 919
882 sentences (0.651). The median information loss from SS sen- in handling the variation in the input belief pdfs being described. 920
883 tences (0.357) was found to be lower than that of the uninforma- Fig. 13 shows the number of statements generated by MoS 921
884 tive (Uninf) sentences (0.605) but was higher than that of MoS versus those written by human expert. The radius of each point 922
885 sentences. is proportional to MoS expression information loss. It was 923
TABLE V
SUMMARY OF TASK F1 SCORE STATISTICS FOR EACH
BELIEF EXPRESSION ALGORITHM
Fig. 13. K MoS versus K Truth . The radius of each point is proportional to MoS
expression information loss.
finding the suspect. The search space was divided into small grid 948
TABLE IV
INSTRUCTION TO HUMAN SUBJECTS DURING TASK-BASED EVALUATIONS cells and the human subjects were asked to identify high prob- 949
ability regions in the search space by painting over the gridded 950
canvas. An example of painting made by the Turker is shown in 951
Fig. 14. The regions identified by the Turkers were then com- 952
pared with the actual highest probability density regions on the 953
underlying belief pdf. This was done by automatically choosing 954
the top 10% of the cells on the canvas which had the highest 955
probabilities in the pdf. Suspect search precision and recall were 956
calculated between the Turker’s painted regions and the actual 957
pdf’s 10% peak regions. The F1 score task performance mea- 958
sure was finally calculated by taking the harmonic mean of the 959
precision and recall results. 960
In this evaluation set, two hundred F1 score data were col- 961
lected for each of the five algorithms, giving a total of one thou- 962
sand F1 score data. For each of the five evaluated algorithms, 963
twenty trials formed by twenty sets of i.i.d. suspect search re- 964
ports were sampled as described in Section IX-B. Each set of 965
924 found that regardless of the number of suspect sighting reports,
reports was fused as described in Section IV-A to generate one 966
925 the number of statements generated is bounded to around six
belief pdf. Five associated belief expression sentences were fi- 967
926 statements maximum. This is because the sum of all statement
nally generated for each pdf. Twenty participants were asked to 968
927 weights cannot exceed to 100% . When the number of hypothe-
perform the tasks, where each person participated in ten trials. 969
928 ses grows, the weight of the least likely hypothesis becomes
In each trial, each person was shown with one belief expres- 970
929 very small that such hypothesis was eliminated by MoS and
sion sentence picked at random. Each participant produced one 971
930 the human expert. The number of statements generated by MoS
search region painting per sentence given, i.e., ten paintings in 972
931 and human expert were positively correlated and were all within
total. Suspect search based on uninformative expression “Sus- 973
932 two statements from each other. The greatest information loss
pect is equally likely anywhere on the map” was performed 974
933 occurred when only two hypotheses were generated. This was
automatically by randomly sampling selected search regions 975
934 because when there were only two statements generated, it was
according to a uniform distribution over the search space. The 976
935 harder to describe accurately the shape of the belief distribution.
human search F1 score statistics are summarized in Table V, 977
while the box plot is shown in Fig. 15. 978
936 C. Human Task-Based Evaluation: Search Precision and
It was found that the task precision and recall performance as 979
937 Recall measured by median F1 score was highest in the case of MoS 980
938 In task-based evaluation, human subjects were asked to per- sentences (64.09%), followed by Truth sentences (63.15%). Ad- 981
939 form suspect search tasks based on the generated sentences ditionally, it was found that using only a single statement to de- 982
940 given to them. The proposed language generation system was scribe a pdf, SS strategy was able to achieve a median F1 score 983
941 evaluated according to how well humans performed the search of 49.92%. Nonparametric Wilcoxon test was applied in order 984
942 tasks given the sentences. to test the significance of the differences in human task perfor- 985
943 The instruction given to the human subjects in task-based mances between each type of sentences against the Truth, MoS, 986
944 evaluation is shown in Table IV. To perform the tasks, the Turk- and SS. Based on the test, the Truth sentence performances were 987
945 ers were asked to search for the suspect on the map shown found to be significantly better than those of the SS, Uninf, and 988
946 in Fig. 10 by inferring, based on the given sentence, the top Chance sentences (p < 0.001 in all three cases). Likewise, F1 989
947 10% regions in the search space with the highest probability of scores based on MoS sentences were found to be significantly 990
Fig. 14. Painting interface used by Amazon Mechanical Turkers in task-based evaluations. An example painting made by the Turker is shown over the given
search space map.
posed MoS model can significantly improve task performances 1003

in comparison with the SS model. 1004
The SS belief expression approach, though giving a less re- 1005
liable human task performance than MoS approach, still per- 1006
formed significantly better than Uninf and Chance, with a me- 1007
dian F1 score of almost 50%. This achievement can be explained 1008
by a few reasons. First, the SS method, though utilizing only one 1009
statement, was able to correctly capture and express the most 1010
likely hypothesis of the given belief, thus, minimizing the num- 1011
ber of missed detections and false alarms within its limitation. 1012
Another reason was because the task objective focused on iden- 1013
tifying the top 10% regions of the pdf. The SS model was found 1014
to perform well at preserving the most primary information of 1015
where the suspect would likely be, and thus, was able to achieve 1016
Fig. 15. F1 score box plot for the five belief expressions. The boxes show the acceptable F1 score performances. 1017
25th percentile, median, and 75th percentile, while the whiskers indicate the Finally, Chance sentences were found to yield a much lower 1018
minimum, maximum, and the 1.5 IQR outliers.
median F1 score performance than that of the Uninf sentence. 1019
The reason was because Chance sentences typically led to region 1020
991 better (p < 0.001) than that of SS, Uninf, and Chance, while the selections that either completely hit or missed the actual pdf 1021
992 performances based on SS sentences were significantly better peaks. When bearing incorrect suspect information, which was 1022
993 (p < 0.001) than that of Uninf and Chance. However, human often found to be the case, Chance region selection yielded 1023
994 task F1 score performances given the Truth sentences were not nearly zero F1 score. The Uninf region selection, on the other 1024
995 found to be significantly different from those given the MoS hand, was able to uniformly cover all regions of grid cells in the 1025
996 sentences (p > 0.05). search space, giving the most consistent hit and miss rates with 1026
997 These statistical results demonstrate that the proposed MoS median F1 score of about 10%. 1027
998 approach can be successfully used to express a given belief

999 pdf to humans in cooperative search tasks as a substitute for the
D. Human Ratings and Judgments: Likert Scores 1028
1000 Truth sentence that might be expensive or unavailable. These re-
1001 sults also demonstrate that, by allowing a composition of more In this section, the quality of the generated belief expression 1029
1002 than one statement in belief expression generation, the pro- was assessed by asking human subjects to rate the correctness 1030
Fig. 16. Human subject rating interface, with an example belief pdf and the five corresponding belief expression sentences.
1031 of each output expression with respect to the belief pdf shown TABLE VI
SUMMARY OF HUMAN RATING STATISTICS FOR EACH TYPE OF SENTENCES IN
1032 to them. The rating was performed on a five-point Likert scale, DESCRIBING BELIEF DISTRIBUTIONS
1033 including a mid-point neutral option. The rating interface used
1034 by human subjects is shown in Fig. 16. Two hundred Likert
1035 score data were collected for each of the five algorithms, giving
1036 an evaluation set of one thousand human subject rating scores.
1037 For each evaluated algorithm, 20 trials were conducted based on
1038 20 sets of i.i.d. suspect search reports used in Section IX-C. Each
1039 set of reports was fused as described in Section IV-A to generate
1040 one belief pdf. Five associated belief expression sentences were
1041 finally generated for each pdf. Twenty human participants were
1042 asked to perform the task in ten trials each. In each trial, each
1043 person was shown with one belief distribution picture, together
1044 with all five corresponding description sentences for them to Wilcoxon test was used to test the significance of the difference 1054
1045 rate. As such, each human provided ratings for 50 sentences. between each type of output sentence ratings against the Truth’s. 1055
1046 The Likert score statistics from the five belief expression sets Similarly, the significance of the difference between MoS and 1056
1047 are summarized in Table VI. The rating box plot for the five SS ratings against all the others were also tested. Based on 1057
1048 types of expressions is also shown in Fig. 17. It was found Wilcoxon test, the Truth and MoS ratings were both found to 1058
1049 that the Truth and MoS sentences received the highest median be significantly better than the ratings of SS, Chance, and Uninf 1059
1050 correctness scores among the five types of sentences tested, with sentences (p < 0.001). Likewise, SS rating was also found to be 1060
1051 the median ratings of four, followed by the SS sentences with significantly better than Uninf, and Chance ratings (p < 0.001). 1061
1052 the median rating of three. Chance and Uninf sentences were The Truth and MoS ratings were not found to be significantly 1062
1053 found to have the lowest median ratings of one. Nonparametric different (p > 0.05). 1063
the preposition and landmark dictionary sizes, and exponential 1099

with respect to the length of the generated sentence. In practice, 1100
however, the length of a sentence is bounded due to the follow- 1101
ing mechanisms. First, the number of principal hypotheses is 1102
restrained by the fact that the total probability of all hypotheses 1103
together is summed to one. As a result, the competing mix- 1104
ing component probabilities restrict the number of hypothesis 1105
statements that need to be generated. Second, DP mixture prior 1106
enforces a clustering effect on the produced statements. That is, 1107
the DP mixture prior collapses and combines similar statements 1108
when possible, resulting in a natural compression on the num- 1109
ber of statements during each iteration [59]. If these inherent 1110
characteristics do not result in a computationally efficient form, 1111
then further savings can be achieved by dividing the state space 1112
Fig. 17. Box plot for the five belief expression ratings. The boxes show the
into subregions such that the space size is constant. For exam- 1113
25th percentile, median, and 75th percentile, while the whiskers indicate the ple, the suspect location state space and can be divided such 1114
minimum, maximum, and the 1.5 IQR outliers. that the hypotheses are calculated independently for the belief 1115
descriptions over the central campus, collegetown, downtown, 1116
east hill, west hill, etc. This division of state space helps reduce 1117
1064 These statistical results indicate that the proposed DP MoS the computation complexities for both belief updates as well as 1118
1065 algorithm was successful in expressing the belief pdfs by re- expressions. 1119
1066 ceiving correctness ratings of four or above in at least 75% of

1067 the times, with no significant difference in performances com-
XI. CONCLUSION 1120
1068 pared with the Truth. In addition, the flexibility of the DP MoS
1069 approach in generating belief descriptions with variable amount This paper presents a natural framework for information shar- 1121
1070 of information content allows the correctness of the MoS gen- ing in cooperative tasks between humans and robots. In this con- 1122
1071 erated sentences to significantly improve compared with those text, all information regarding a state of interest is recursively 1123
1072 of the SS approach. fused and maintained in the form of belief. For a robot agent, 1124
1073 Similar to the findings in task-based evaluation, SS expres- its belief is commonly and practically represented as a pdf, 1125
1074 sions significantly outperformed Uninf and Chance expressions. formed by information fusion and state estimation algorithms. 1126
1075 The Uninf sentence, receives the most consistent, though poor, In cooperative tasks involving nonexpert humans, a robot must 1127
1076 ratings with the 25th and 75th percentiles both equal one. The effectively communicate its belief so that the gathered infor- 1128
1077 Uninf sentence, despite having better performances than Chance mation can be easily processed and interpreted by humans. A 1129
1078 in both of the information loss and task-based evaluations, was novel approach for generating natural language sentences from 1130
1079 rated the worst in terms of correctness. a probabilistic belief is developed by considering two goodness 1131
measures: semantic correctness and information preservation. 1132
A MoS model is proposed to describe complex belief pdfs such 1133
1080 X. DISCUSSION that the optimal expression can be generated through a composi- 1134
1081 In the context of general NLP applications, the belief com- tion of more than one statements. The model is then extended by 1135
1082 munication system currently focuses on two key subproblems: defining a nonparametric DP MoS, such that the language gen- 1136
1083 physical semantic representation and grounding in language eration system can automatically determine the optimal number 1137
1084 understanding, and content generation in language generation. of statements, as well as the corresponding statement parame- 1138
1085 These frameworks can be further integrated with other NLP ters to describe a given belief pdf. Three sets of experiments 1139
1086 frameworks. For example, the semantic grounding in belief up- were conducted and evaluated based on 1) information loss, 1140
1087 dates can be coupled with parsing and POS tagging engines 2) task-based precision and recall performance as measured by 1141
1088 [63], [64] in order to extract the corresponding preposition and the F1 score, and 3) human correctness rating. 1142
1089 reference tokens from general, complex sentences. Similarly, in Results show that, by permitting statement composition, the 1143
1090 belief expressions, given the content generated from template- proposed MoS model can significantly reduce information loss 1144
1091 based MoS expressions, surface realization optimization can be incurred in belief communication processes. Evaluations based 1145
1092 further applied by feeding the expression content to existing on human subject data indicate the additional advantages of 1146
1093 realization engines such as OpenCCG, in order to maximize the using MoS in providing significant improvements in task F1 1147
1094 fluency of the sentence [65]. score performance, as well as correctness assessed by humans, 1148
1095 The time complexity of belief updates is linear with number of over expressions without a statement composition. Task perfor- 1149
1096 input measurements as well as the state space size, and constant mances and correctness ratings obtained from the MoS method 1150
1097 with the preposition and landmark dictionary sizes. The time were also found to have no significant differences from the re- 1151
1098 complexity of belief expressions is polynomial with respect to sults obtained from the truth sentences. These evaluation results 1152
suggest that the proposed method for generating belief expres- (t)
1153 Furthermore, it is noticed from (36) that the parameter Θk 1179
1154 sions is an effective approach for communicating probabilistic describing each hypothesis k can be solved independently from 1180
1155 information between robots and humans. all the other hypotheses k = k. For the case of MMS hypothesis 1181
(t) (t)
with Θk = xl k , the M-Step can be performed with a grid 1182
1156 APPENDIX (t)
search for the optimal reference location xl k
of each hypothesis 1183
1157 EM solution to maximizing MoS generation objective in (25) k, with all the other MMS shape parameters fixed, as learned 1184
1158 is performed as follows. from human preposition dataset 1185
1159 E-step: Evaluate the conditional probabilities of all hid-
1160 den variables Z1,...,N , given the current parameter estimates (t)

(t)
(t−1) (t−1) xl k = arg max γn k ln Pk (xn ) . (38)
1161 Θ(t−1) ≡ {Θk =1,...,K , πk =1,...,K }. That is, for each pair of sam- xl k
n
1162 ple point Xn and hypothesis k, evaluate the (soft) membership
1163 assignment
In each of the EM iteration step t, the likelihood of the samples 1186
(t)

K
(t)
is guaranteed to increase unless the local optimum has been 1187
γn k ≡ p(Zn = k|Xn = xn , Θ(t−1) ); γn k = 1. (32) reached: p(S|Θ(t) ) ≥ p(S|Θ(t−1) ) [54]. 1188
k =1
1164 The E-step for MoS estimation problem is as follows:

REFERENCES 1189
(t) (t−1)
γn k ≡ p(Zn = k|Xn = xn , Θ ) [1] K. B. Korb and A. E. Nicholson, Bayesian Artificial Intelligence, 2nd ed. 1190
Boca Raton, FL, USA: CRC Press, 2010. 1191
p(Xn = xn |Zn = k, Θ(t−1) )p(Zn = k|Θ(t−1) ) [2] B. Charrow, N. Michael, and V. Kumar, “Cooperative multi-robot estima- 1192
= K tion and control for radio source localization,” Int. J. Robot. Res., vol. 33, 1193
(t−1) )p(Z = k|Θ(t−1) )
k =1 p(Xn = xn |Zn = k, Θ n pp. 569–580, 2013. 1194
[3] K. Wyffels and M. Campbell, “Negative information for occlusion rea- 1195
(t−1) (t−1)
P (xn )πk soning in dynamic extended multiobject tracking,” IEEE Trans. Robot., 1196
= K k (t−1) (t−1)
(33) vol. 31, no. 2, pp. 425–442, 2015. 1197
k =1 Pk (xn )πk [4] R. Tse, N. R. Ahmed, and M. Campbell, “Unified terrain mapping 1198
model with Markov random fields,” IEEE Trans. Robot., vol. 31, no. 2, 1199
(t−1)
1165 where Pk (xn ) is the normalized MMS representing the like- pp. 290–306, Apr. 2015. 1200
[5] R. Tse, N. Ahmed, and M. Campbell, “Unified mixture-model based 1201
1166 lihood of xn given hypothesis statement ζk , with the MMS terrain estimation with Markov Random Fields,” in Proc. IEEE Int. 1202
(t−1)
1167 parameters Θk as defined in Section VII-A. Conf. Multisensor Fusion Integr. Intell. Syst., Hamburg, Germany, 2012, 1203
1168 M-step: Solve for all parameters Θ(t) , maximizing the ex- pp. 238–243.
[6] M. Post, “An embedded implementation of Bayesian network robot
1204
1205
1169 pected complete log likelihood programming methods,” in Proc. IMA Conf. Math. Robot., 2015, 1206
pp. 1–9. 1207
Θ(t) = arg max Eγ ( t ) [ln p(X, Z|Θ)] (34) [7] R. R. Murphy, “Human-robot interaction in rescue robotics,” IEEE 1208
Θ Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 34, no. 2, pp. 138–153, 1209
May 2004. 1210
1170 with the soft membership assignments γ (t) calculated from the [8] I. R. Nourbakhsh, K. Sycara, M. Koes, M. Yong, M. Lewis, and S. Burion, 1211
“Human-robot teaming for search and rescue,” IEEE Pervasive Comput., 1212
1171 E-step. The M-step for MoS estimation problem is therefore vol. 4, no. 1, pp. 72–79, Jan.–Mar. 2005. 1213

(t) [9] I. Abi-Zeid and J. R. Frost, “SARPlan: A decision support system for 1214
Θ(t) = arg max γn k ln p(Xn = xn , Zn = k|Θk ) canadian search and rescue operations,” Eur. J. Oper. Res., vol. 162, 1215
Θ n no. 3, pp. 630–653, 2005. 1216
k

(t) (t)
[10] A. Carlin, J. Ayers, J. Rousseau, and N. Schurr, “Agent-based coordi-
nation of human-multirobot teams in complex environments,” in Proc.
1217
1218
= arg max γn k ln Pk (xn ) + γn k ln πk (35) 9th Int. Conf. Auton. Agents Multiagent Syst., Industry Track, 2010, 1219
Θ n
k pp. 1747–1754. 1220
[11] T. Theodoridis and H. Hu, “Toward intelligent security robots: A sur- 1221
1172 subject to the constraint K k =1 πk = 1. To solve the constrained vey,” IEEE Trans. Syst., Man, Cybern. C Appl. Rev., vol. 42, no. 6, 1222
1173 optimization problem for Θ(t) , first consider the Lagrangian pp. 1219–1230, Nov. 2012. 1223
[12] A. Farinelli, L. Iocchi, and D. Nardi, “Distributed on-line dynamic task 1224

(t) assignment for multi-robot patrolling,” Auton. Robots, vol. 41, no. 6, 1225
(t)
K
pp. 1321–1345, 2017. 1226
γn k ln Pk (xn ) + γn k ln πk + λ 1 − πk . [13] M. S. Gerber, “Predicting crime using twitter and kernel density estima- 1227
k n k =1 tion,” Decis. Support Syst., vol. 61, pp. 115–125, 2014. 1228
(36) [14] Z. Huo and M. Skubic, “Natural spatial language generation for indoor 1229
1174 It is noticed from (36) that the hypothesis weight and description robot,” in Proc. Workshop Model Learn. Hum.-Robot Commun. Robot., 1230
(t) (t) Sci. Syst., 2016, pp. 1–7. 1231
1175 parameters πk and Θk can be solved independently of each [15] E. Sample, N. Ahmed, and M. Campbell, “An experimental evaluation of 1232
1176 other. From the partial derivative of the Lagrangian with respect Bayesian soft human sensor fusion in robotic systems,” Proc. AIAA Guid., 1233
(t) Navig., Control, 2012, p. 4542. 1234
1177 to πk =1,...,K and λ, the updated weight parameters are obtained [16] N. Ahmed et al., “Towards cooperative Bayesian human-robot perception: 1235
1178 as follows: Theory, experiments, opportunities,” in Proc. Workshops 27th AAAI Conf. 1236
N Artif. Intell., 2013, pp. 104–109. 1237
1 (t)
(t) N [17] N. Ahmed, E. Sample, and M. Campbell, “Bayesian multi-categorical soft 1238
(t) =1 γn k
πk = K n (t)
= γ . (37) data fusion for human-robot collaboration,” IEEE Trans. Robot., vol. 29, 1239
N N n =1 n k
n =1 γn k
k =1
no. 1, pp. 189–206, Feb. 2013. 1240
1241 [18] J. Frost, “Mapping spatial language to sensor models,” in Proc. Comput. [41] V. Mast, D. C. Vale, and Z. Falomir, “Enabling grounding dialogues 1316
1242 Lab. Student Conf., 2009, p. 8. through probabilistic reference handling,” in Proc. RefNet Workshop Psy- 1317
1243 [19] J. Frost, A. Harrison, S. Pulman, and P. Newman, “A probabilistic ap- chol. Comput. Models Reference Comprehension Prod., Edinburgh, U.K., 1318
1244 proach to modelling spatial language with its application to sensor mod- Aug. 2014, pp. 1–3. 1319
1245 els,” in Proc. Workshop Comput. Models Spatial Lang. Interpretation [42] V. Mast and D. Wolter, “A probabilistic framework for object descrip- 1320
1246 Spatial Cogn., 2010, pp. 1–8. tions in indoor route instructions,” in Proc. Spatial Inf. Theory, 2013, 1321
1247 [20] A. N. Bishop and B. Ristic, “Fusion of spatially referring natural language pp. 185–204. 1322
1248 statements with random set theoretic likelihoods,” IEEE Trans. Aerosp. [43] N. L. Green, “Analysis of communication of uncertainty in genetic coun- 1323
1249 Electron. Syst., vol. 49, no. 2, pp. 932–944, Apr. 2013. seling patient letters for design of a natural language generation system,” 1324
1250 [21] A. N. Bishop and B. Ristic, “Fusion of natural language propositions: Social Semiotics, vol. 20, no. 1, pp. 77–86, 2010. 1325
1251 Bayesian random set framework,” in Proc. 2011 Proc. 14th Int. Conf. Inf. [44] C. Huang, “Risk analysis with information described in natural language,” 1326
1252 Fusion, 2011, pp. 1–8. in Proc. 7th Int. Conf. Comput. Sci. III, 2007, pp. 1016–1023. 1327
1253 [22] A. N. Bishop and B. Ristic, “Spatially referring natural lan- [45] C. R. Fox and J. R. Irwin, “The role of context in the communication of 1328
1254 guage propositions: Information fusion and estimation theory,” uncertain beliefs,” Basic Appl. Social Psychol., vol. 20, no. 1, pp. 57–70, 1329
1255 in Proc. Workshop Defense Appl. Signal Proces. (DASP), 2011, 1998. 1330
1256 pp. 1–11. [46] R. Tse, G. Seet, and S. K. Sim, “Recognition of human intentions 1331
1257 [23] C. Matuszek, L. Bo, L. Zettlemoyer, and D. Fox, “Learning from un- using Bayesian artificial intelligence,” in Proc. ASME IMECE, 2007, 1332
1258 scripted deictic gesture and language for human-robot interactions,” in pp. 699–707. 1333
1259 Proc. 28th AAAI Conf. Artif. Intell., 2014, pp. 2556–2563. [47] K. A. Tahboub, “Intelligent human-machine interaction based on dynamic 1334
1260 [24] C. Matuszek, N. Fitzgerald, L. Zettlemoyer, L. Bo, and D. Fox, “A joint Bayesian networks probabilistic intention recognition,” J. Intell. Robot. 1335
1261 model of language and perception for grounded attribute learning,” in Syst., vol. 45, no. 1, pp. 31–52, 2006. 1336
1262 Proc. 29th Int. Conf. Mach. Learn., 2012, pp. 1671–1678. [48] R. A. Knepper, S. Tellex, A. Li, N. Roy, and D. Rus, “Recovering from 1337
1263 [25] S. Tellex et al., “Understanding natural language commands for robotic failure by asking for help,” Auton. Robots, vol. 39, no. 3, pp. 347–362, 1338
1264 navigation and mobile manipulation,” in Proc. 25th AAAI Conf. Artif. 2015. 1339
1265 Intell., 2011, pp. 1507–1514. [49] S. Tellex, R. A. Knepper, A. Li, D. Rus, and N. Roy, “Asking for help 1340
1266 [26] S. Tellex, P. Thaker, J. Joseph, M. R. Walter, and N. Roy, “Toward learning using inverse semantics,” in Proc. Robot. Sci. Syst. Conf., vol. 7, Berkeley, 1341
1267 perceptually grounded word meanings from unaligned parallel data,” in CA, USA, 2014, p. 24. 1342
1268 Proc. 2nd Workshop Semantic Interpretation Actionable Context, 2012, [50] R. Tse and M. Campbell, “Human-robot information sharing 1343
1269 pp. 7–14. with structured language generation from probabilistic beliefs,” 1344
1270 [27] S. Tellex, P. Thaker, J. Joseph, and N. Roy, “Learning perceptually in Proc. 2015 IEEE/RSJ Int. Conf. Intell. Robots Syst., 2015, 1345
1271 grounded word meanings from unaligned parallel data,” Mach. Learn., pp. 1242–1248. 1346
1272 vol. 94, no. 2, pp. 151–167, 2014. [51] N. Ahmed and M. Campbell, “On estimating simple probabilistic dis- 1347
1273 [28] C. Matuszek, E. Herbst, L. Zettlemoyer, and D. Fox, “Learning criminative subclass models,” Expert Syst. Appl., vol. 39, pp. 6659–6664, 1348
1274 to parse natural language commands to a robot control system,” 2012. 1349
1275 in Experimental Robotics. New York, NY, USA: Springer, 2013, [52] N. Ahmed and M. Campbell, “Multimodal operator decision models,” in 1350
1276 pp. 403–415. Proc. Amer. Control Conf., 2008, pp. 4504–4509. 1351
1277 [29] C. Matuszek, D. Fox, and K. Koscher, “Following directions using statis- [53] N. R. Ahmed, R. Tse, and M. Campbell, “Enabling robust human-robot 1352
1278 tical machine translation,” in Proc. 5th ACM/IEEE Int. Conf. Hum.-Robot cooperation through flexible fully Bayesian shared sensing,” in Proc. AAAI 1353
1279 Interact., 2010, pp. 251–258. Spring Symp. Series, Int. Robust Intell. Trust Auton. Syst., Stanford, CA, 1354
1280 [30] E. Krahmer and K. Van Deemter, “Computational generation of referring USA, 2014, pp. 2–10. 1355
1281 expressions: A survey,” Comput. Linguistics, vol. 38, no. 1, pp. 173–218, [54] C. M. Bishop, Pattern Recognition and Machine Learning. New York, 1356
1282 2012. NY, USA: Springer, 2006. 1357
1283 [31] J. Kelleher and G. M. Kruijff, “A context-dependent model of proximity [55] A. Corduneanu and C. M. Bishop, “Variational Bayesian model selection 1358
1284 in physically situated environments,” in Proc. ACL-SIGSEM Workshop for mixture distributions,” in Proc. Artif. Intell. Statist., vol. 2001, 2001, 1359
1285 Linguistic Dimensions Prepositions Their Use Comput. Linguistics For- pp. 27–34. 1360
1286 malisms Appl., 2005, pp. 1–8. [56] P. Smyth, “Model selection for probabilistic clustering using cross- 1361
1287 [32] R. Fang, M. Doering, and J. Y. Chai, “Collaborative models for referring validated likelihood,” Statist. Comput., vol. 10, no. 1, pp. 63–72, 1362
1288 expression generation in situated dialogue,” in Proc. 28th AAAI Conf. 2000. 1363
1289 Artif. Intell., 2014, pp. 1544–1550. [57] X. Hu and L. Xu, “Investigation on several model selection criteria for 1364
1290 [33] C. Liu, R. Fang, L. She, and J. Y. Chai, “Modeling collaborative referring determining the number of cluster,” Neural Inf. Process. Lett. Rev., vol. 4, 1365
1291 for situated referential grounding,” in Proc. ACL SIGDIAL, 2013, pp. no. 1, pp. 1–10, 2004. 1366
1292 78–86. [58] D. M. Blei and M. I. Jordan, “Variational inference for Dirichlet 1367
1293 [34] A. Sadovnik, A. Gallagher, and T. Chen, “Not everybody’s special: process mixtures,” Bayesian Analysis, vol. 1, no. 1, pp. 121–143, 1368
1294 Using neighbors in referring expressions with uncertain attributes,” in 2006. 1369
1295 Proc. 2013 IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2013, [59] Y. W. Teh, “Dirichlet process,” in Encyclopedia of Machine Learning. 1370
1296 pp. 269–276. New York, NY, USA: Springer, 2011, pp. 280–287. 1371
1297 [35] S. Kazemzadeh, V. Ordonez, M. Matten, and T. L. Berg, “Refer- [60] R. M. Neal, “Markov chain sampling methods for Dirichlet process mix- 1372
1298 itgame: Referring to objects in photographs of natural scenes,” in ture models,” J. Comput. Graphical Statist., vol. 9, no. 2, pp. 249–265, 1373
1299 Proc. ACL Conf. Empirical Methods Natural Language Process., 2014, 2000. 1374
1300 pp. 787–798. [61] E. Reiter and A. Belz, “An investigation into the validity of some met- 1375
1301 [36] A. Sadovnik, Y.-I. Chiu, N. Snavely, S. Edelman, and T. Chen, “Image rics for automatically evaluating natural language generation systems,” 1376
1302 description with a goal: Building efficient discriminating expressions for Comput. Linguistics, vol. 35, no. 4, pp. 529–558, 2009. 1377
1303 images,” in Proc. 2012 IEEE Conf. Comput. Vis. Pattern Recognit., 2012, [62] J. Lin, “Divergence measures based on the Shannon entropy,” IEEE Trans. 1378
1304 pp. 2791–2798. Inf. Theory, vol. 37, no. 1, pp. 145–151, Jan. 1991. 1379
1305 [37] E. Reiter and R. Dale, “A fast algorithm for the generation of referring [63] D. Klein and C. D. Manning, “Fast exact inference with a factored model 1380
1306 expressions,” in Proc. 14th Conf. Comput. Linguistics-Volume 1, 1992, for natural language parsing,” in Adv. Neural Inf. Process. Syst., 2003, 1381
1307 pp. 232–238. pp. 3–10. 1382
1308 [38] A. Stent and S. Bangalore, Natural Language Generation in Interactive [64] K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, “Feature-rich part- 1383
1309 Systems. Cambridge, U.K.: Cambridge Univ. Press, 2014. of-speech tagging with a cyclic dependency network,” in Proc. 2003 Conf. 1384
1310 [39] J. Viethen and R. Dale, “The use of spatial relations in referring ex- North Amer. Chapter Assoc. Comput. Linguistics Hum. Lang. Technol., 1385
1311 pression generation,” in Proc. 5th Int. Nat. Lang. Gener. Conf., 2008, 2003, pp. 173–180. 1386
1312 pp. 59–67. [65] V. Demberg, J. Hoffmann, D. M. Howcroft, D. Klakow, and A. Torralba, 1387
1313 [40] R. Dale and J. Viethen, “Referring expression generation through attribute- “Search challenges in natural language generation with complex opti- 1388
1314 based heuristics,” in Proc. 12th Eur. Workshop Natural Lang. Gener., 2009, mization objectives,” KI-Kunstliche Intelligenz, vol. 30, no. 1, pp. 63–69, 1389
1315 pp. 58–65. 2016. 1390
1391 Rina Tse (M’16) received the B.Eng. degree in me- Mark Campbell (F’18) received the B.S. degree in 1405
1392 chanical engineering with mechatronics specializa- mechanical engineering from Carnegie Mellon Uni- 1406
1393 tion and the M.Eng. degree in computer engineering versity, Pittsburgh, PA, USA, and the M.S. and Ph.D. 1407
1394 from Nanyang Technological University, Singapore, degrees in control and estimation from Massachusetts 1408
1395 and the M.S. and Ph.D. degrees in mechanical engi- Institute of Technology, Cambridge, MA, ISA, in 1409
1396 neering from Cornell University, Ithaca, NY, USA. 1993 and 1996, respectively. 1410
1397 She is currently a Postdoctoral Research Associate He is currently the John A. Mellowes’ 60 Profes- 1411
1398 with the Autonomous Systems Lab, Cornell Univer- sor and the S. C. Thomas Sze Director of the Sibley 1412
1399 sity. Her research interests include Bayesian machine School of Mechanical and Aerospace Engineering, 1413
1400 learning, clustering, data fusion, and human-robot in- Cornell University, Ithaca, NY, USA. In 2005.2006, 1414
1401 teraction. he was a Visiting Scientist with the Insitu Group and 1415
1402 Dr. Tse was the recipient of the Singapore Government Scholarship, and an ARC International Fellow with the Australian Centre of Field Robotics. His 1416
1403 Cornell University’s Olin and Walter Schonlenk Ph.D. Fellowships. research interests are in the areas of autonomous systems. 1417
1404 Dr. Campbell was the recipient of the best paper awards from AIAA Propul- 1418
sion and GNC Conferences, and Frontiers in Education Conference, and teach- 1419
ing awards from Cornell, University of Washington, and ASEE. He is an Asso- 1420
ciate Fellow of the AIAA. 1421
1422

Intelligent Robots - Natural Language Processing

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Intelligent Robots - Natural Language Processing

Загружено:

Авторское право:

Доступные форматы

IEEE TRANSACTIONS ON ROBOTICS 1

1 Human–Robot Communications of Probabilistic

23 Index Terms—Belief expression generation, belief clustering,

processing (NLP): physical semantic representation and ground- 130

Fig. 3. Belief over subject’s location given robot measurements.

multimodal softmax (MMS). Examples of human expression 277

271 to update the belief distribution b(x; I) via Bayesian methods. A

p(Di |Xi , ζ1:i

403 where I denotes all information acquired previously by the

493 where “The <subject> is probably <probability 1> % likely 534

D∗ = arg max ln p(X = xn |ζ(D, Xl ), Θ1,...,N D ) (17)

501 VII. LANGUAGE GENERATION FROM FUSED BELIEF: 

511 of statements required to describe a belief is presented. πk ≡ P (Z = k|ζ1,...,K , Θ1,...,N D ) (22)

512 A. Mixture of Statements 550

513 Oftentimes, a complex belief cannot be described by a sin- and

such as EM or Bayesian nonparametric DP mixture modeling. 582

C. Generating Variable-Length MoS Expressions 587

The MoS generation problem originally assumes the pre- 588

p(Dk |X = x, Θk ) tion and information-based criterion approaches perform model 600

formed using Gibbs sampling. The detailed implementation of 654

Xn |ζn ∼ Pn (Xn ). (29)

D. DP MoS Prior Modeling 662

One benefit of a Bayesian approach in MoS generation is that 663

642 shown in the same figure. The distribution G provides a prior

711 well as the robot’s pose estimate uncertainty.

809 1) the detection or nondetection of a person matching the

posed MoS model can significantly improve task performances 1003

998 approach can be successfully used to express a given belief

the preposition and landmark dictionary sizes, and exponential 1099

1066 ceiving correctness ratings of four or above in at least 75% of

1164 The E-step for MoS estimation problem is as follows:

Вам также может понравиться

501 VII. LANGUAGE GENERATION FROM FUSED BELIEF: