Академический Документы
Профессиональный Документы
Культура Документы
4 Abstract—This paper presents a natural framework for are often found valuable to humans for their planning and deci- 41
5 information sharing in cooperative tasks involving humans and sion making. As a result, the communication between humans 42
6 robots. In this framework, all information gathered over time by and robots regarding their probabilistic beliefs has become es- 43
7 a human–robot team is exchanged and summarized in the form
8 of a fused probability density function (pdf). An approach for an sential, not only for their successful interactions (by making it 44
9 intelligent system to describe its belief pdfs in English expressions more transparent what one another is thinking), but also for their 45
10 is presented. This belief expression generation is achieved through successful task collaborations (by sharing the same information 46
11 two goodness measures: semantic correctness and information for planning and decision making). In essence, the probabilis- 47
12 preservation. In order to describe complex, multimodal belief tic information exchange is important when humans and robots 48
13 pdfs, a Mixture of Statements (MoS) model is proposed such
14 that optimal expressions can be generated through compositions are working collaboratively on the same task, when robots are 49
15 of multiple statements. The model is further extended to a working for humans, when humans and robots need to operate 50
16 nonparametric Dirichlet process MoS generation, such that the on the same objects, or simply when they coexist in the same 51
17 optimal number of statements required for describing a given environment. 52
18 pdf is automatically determined. Results based on information For example, in disaster management and emergency re- 53
19 loss, human collaborative task performances, and correctness
20 rating scores suggest that the proposed method for generating sponse, human and mobile robot teams are commonly deployed 54
21 belief expressions is an effective approach for communicating in search missions to rescue earthquake victims trapped inside 55
22 probabilistic information between robots and humans. a collapsed building, or to rescue lost hikers in a forest [7], [8]. 56
24 Dirichlet process (DP) mixtures, human robot communications, individual paths such that the team has the highest probability 58
25 Mixture of Statements (MoS), natural language processing (NLP). of reaching the victim [9]. In addition, human rescue team man- 59
ager needs to plan the team’s resource allocations, such as food 60
or medicine. In order for both humans and robots to perform 61
26 I. INTRODUCTION planning and decision making in these missions, probabilistic 62
ESEARCH work has shown that a robot’s belief in an information about the searched victim locations is required [8]. 63
27
28
29
R environment with uncertainties can be effectively repre-
sented as a probability density function (pdf) [1]. Through a
This information must be gathered by the heterogeneous team
members, and then shared among all of them. A similar appli-
64
65
30 Bayesian approach, a belief represented as a pdf can be used to cation of probabilistic information sharing between humans and 66
31 summarize all information gathered by a robot. Previous work robots is in the context of security, where a team of human and 67
32 in many areas of robotics, such as localization [2], target track- robot patrol units are deployed to search for a criminal suspect or 68
33 ing [3], sensor fusion [4], [5], and probabilistic reasoning [6] a hidden bomb [10], [11]. To accomplish the mission, the prob- 69
34 has been done to allow a robot to estimate or predict the state of abilistic information regarding the suspect or terrorist activity 70
35 a subject of interest based on information from various sources locations must be collaboratively gathered and shared among 71
36 in a probabilistic manner. all human and robot agents so that they can plan their search 72
37 Probabilistic information gathered and represented in a form trajectories and make high-level strategic decisions accordingly 73
38 of a pdf plays an important role for robots in performing funda- [12], [13]. 74
39 mental tasks, such as trajectory planning, prediction, and deci- Furthermore, in the contexts of service robot or personal robot 75
40 sion making. Furthermore, the information gathered by robots applications, a robot’s job is to assist humans in their daily tasks. 76
A lot of research work has been done in order for humans and 77
robots to naturally operate in the same environment, such as in 78
people’s households or offices, or in public areas such as malls, 79
Manuscript received September 9, 2017; revised February 7, 2018; accepted airports, or museums. In these scenarios, humans and robots 80
April 6, 2018. This paper was recommended for publication by Associate Editor inevitably need to communicate with one another probabilis- 81
J. Piater and Editor A. Billard upon evaluation of the reviewers’ comments. This
work was supported by the National Science Foundation (NSF IIS-1427030). tic information regarding their shared objects and environment 82
(Corresponding author: Rina Tse.) [14]. For example, a human might ask a home assistant robot 83
The authors are with the Autonomous Systems Laboratory, 550/542 Upson to fetch a car key and inform the robot that the key is probably 84
Hall, Cornell University, Ithaca, NY 14853 USA (e-mail:, rt297@cornell.edu;
mc288@cornell.edu). on the dining table or on the bedside table. The robot, check- 85
Digital Object Identifier 10.1109/TRO.2018.2830360 ing both locations and not detecting any keys, then performs an 86
1552-3098 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
2 IEEE TRANSACTIONS ON ROBOTICS
Fig. 2. Learned MMS likelihood models p(D i = d|X i = x, X l i = xl ) for human observations “The <subject> is <preposition> <reference>,” where the
prepositions d are “next to,” “nearby,” “far from,” and the reference xl is at [0, 0]T . (a) Next to. (b) Nearby. (c) Far from.
185 frameworks addresses cooperative perception problem, where statistical data communications. Such previous work, however, 229
186 information gathering and belief updates are the primary com- is typically limited to describing probabilities of scalar, binary 230
187 munication objective; neither do they consider belief expression variables rather than more complex probability densities of 231
188 problems. multidimensional, continuous state variables as presented in 232
189 Related work on robot-to-human communications for the this paper, which are commonly considered in general sensor 233
190 task of describing the state of the subject of interest in step fusion or estimation problems. Related research work on 234
191 (4) includes referring expression generation (REG) [30], [31]. language generation framework that involves physical state 235
192 REG problems can be found in the contexts of, for instance, grounding also includes plan recognition [46], [47], which 236
193 referring to an object in situated dialogues [32], [33], referring produces statements summarizing the robot’s activities given 237
194 to a person or an object in a given image [34], [35], or referring the estimated robot states, and question or request generations 238
195 to a specific image within a set of images [36]. In typical [48], [49], where questions or requests for human assistance are 239
196 REG work [37], the communication goal is to identify an produced from a predefined sequence of deterministic states 240
197 object, i.e., the referent, in a given scene by describing the and actions. None of these language generation frameworks is 241
198 referent’s properties, such as its size, shape, color, or its spatial suitable for generating belief expressions for mixed multiagent 242
199 relation to another object, etc., [38]–[40], and at the same time, systems working on cooperative information gathering tasks. 243
200 distinguishing the referent from the other objects (i.e., distrac- This paper presents an information sharing framework be- 244
201 tors) in the environments. These expression generations are tween humans and robots, providing each party with information 245
202 typically conditioned on deterministic variables, representing presented in a form they can immediately process and utilize. 246
203 deterministic knowledge of the referent and the scene based on From human users’ perspective, the system permits inputs and 247
204 a complete observation of an image, e.g., each object’s location outputs in structured English, on which further interpretation 248
205 or RGB/intensity, etc. Therefore, the expressions generated are and reasoning can be completed naturally. From the robot’s 249
206 not based on a distribution over hidden referent states produced perspective, the interface allows information communication in 250
207 from information fusion or estimation algorithms. While some terms of pdfs, on which conventional information fusion and 251
208 previous REG work incorporate uncertainties into several parts estimation algorithms can be seamlessly integrated. This pa- 252
209 of their models, the stochastic elements are typically contained per extends the work by Tse and Campbell [50] to provide 253
210 within attribute description grounding models, or target detailed derivations and explanations of the proposed belief 254
211 selection probability [35], [41], [42]. By the standard definition expression generation models, including the variants of non- 255
212 of REG problems, the referent is still assumed to be a fully parametric Dirichlet process (DP) MoS generation, and also to 256
213 observed object, with deterministic state attributes. In contrast, include results and discussions from new sets of evaluation data, 257
214 in this paper, the goal is to describe a probability density rather including empirical human trial results. 258
215 than a deterministic quantity. As a result, the belief expression
216 generation problem defined in this paper entails a different
217 set of generation requirements than in REG problems. For III. SUMMARY OF TECHNICAL APPROACHES 259
218 example, the sentence generated must faithfully describe the This section provides a brief overview of the technical ap- 260
219 dispersedness/peakedness of the input density and avoid sound- proaches for the belief communication system. The goal of the 261
220 ing more/less certain regarding the hidden subject of interest system is to exchange probabilistic information between humans 262
221 than the input belief actually is. Additionally, the correctness and robots regarding the state of a subject of interest X. An ex- 263
222 of the generated expression must be evaluated by taking into ample scenario of belief communication is shown in Figs. 3–6 264
223 account the fact that the actual state attribute is hidden, and can for the communication of crime suspect location between police 265
224 take any value under the input probability distribution. Previous officers and security robots. 266
225 research work has also been done in order to automatically The information flow starts with mixed sources of informa- 267
226 interpret or generate natural language expressions describing tion regarding the subject’s state, including sensor measurement 268
probabilistic quantities [43]–[45]. This research area has been r
227 readings ζ1:i obtained by robots, as well as observation sentences 269
228 found useful in applications, such as risk communications or ζ1:i expressed by humans. These inputs I = {ζ1:i
h r h
, ζ1:i } are used 270
4 IEEE TRANSACTIONS ON ROBOTICS
TABLE I
NOMENCLATURE
274 ject’s state as shown in Fig. 3. Belief updates given human input This section describes a mathematical formulation for a re- 306
275 expressions are performed using a Bayesian update equation, cursive update of robot’s belief pdf over the state of interest 307
276 with likelihood models of human expressions represented by based on the work in [17]. The inputs used for robot’s belief 308
TSE AND CAMPBELL: HUMAN–ROBOT COMMUNICATIONS OF PROBABILISTIC BELIEFS VIA A DP MIXTURE OF STATEMENTS 5
309 updates are in the forms of both traditional sensor measurement expression likelihood model can be viewed a probabilistic clas- 348
310 readings as well as human expressions in structured English. sification of the subject’s state space. This input space partition- 349
ing can be described by probabilistic functions, such as logistic, 350
311 A. Recursive Belief Update softmax, or MMS [51], [52]. In this paper, human expression 351
likelihood is modeled by an MMS, with the MMS parameters 352
312 Let Xi ∈ X be the state of the subject of interest (e.g., a
learned from maximum-likelihood estimation based on labeled 353
313 tracked object’s, or the robot’s own location) at time i. Let ζir
expression training dataset. A key advantage of MMS compared 354
314 be the robot’s sensor measurement reading (e.g., from a cam-
with log-linear models such as logistic or softmax is that it can 355
315 era or LIDAR). A human’s observation expression ζih at time i
represent nonconvex, multimodal, nonlinear decision bound- 356
316 is defined according to the structured language sentence “The
aries, as shown below. 357
317 <subject> is <prepositioni > <referencei >,” (e.g., “The sus-
A softmax function (multinomial logistic function) is a gener- 358
318 pect is nearby gas station A,” or “The injured victim is behind the
alization of logistic function from binary to multiclass classifica- 359
319 robot”). More specifically, the expression ζih at time i is speci-
tion. The softmax likelihood of Xi = x ∈ X , where X ⊆ RM , 360
320 fied by a preposition Di and a landmark or a reference state Xl i :
being labeled as class Di = d is 361
321 ζih (Di , Xl i ). Defining the set of robot and human measurement
322 inputs over time as ζ1:i r
≡ {ζ1r , ..., ζir }, and ζ1:i
h
≡ {ζ1h , ..., ζih }, exp(wdT x̃)
323 the belief over the subject’s state at time i, p(Xi |ζ1:i r h
, ζ1:i ), is cal- p(Di = d|Xi = x) = N D ; x̃ = [1 x1 x2 ... xM ]T
r h h=1 exp(w T x̃)
324 culated given the inputs ζ1:i and ζ1:i through recursive Bayesian h
(4)
325 update steps as follows.
where Di = d ∈ {1, ..., ND }; ND ≥ 2. The partition bound- 362
326 First, the belief is propagated to the new time step i according
aries are the equiprobability lines p(Di = h|Xi = x) = 363
327 to the stochastic dynamics p(Xi |Xi−1 ) of the state of interest
p(Di = k|Xi = x) between each pair of classes (h, k), i.e., 364
p(Xi |ζ1:i−1
r h
, ζ1:i−1 ) where log odds ratio is zero 365
= p(Xi |Xi−1 )p(Xi−1 |ζ1:i−1
r h
)dXi−1 . p(Di = h|Xi = x)
, ζ1:i−1 (1) ln = (wh − wk )T x̃ = 0. (5)
X p(Di = k|Xi = x)
328 The belief is then updated given traditional robot’s sensor
As a result, while logistic functions are restricted to linear 366
329 measurement reading at time i as
partitioning of the input space, softmax functions extend the 367
p(ζir |Xi )p(Xi |ζ1:i−1
r h
, ζ1:i−1 ) classification boundaries to piecewise linear ones. 368
p(Xi |ζ1:i
r h
, ζ1:i−1 )= r |X )p(X |ζ r
(2)
X p(ζi i , ζ
i 1:i−1 1:i−1
h )dX i
Nonetheless, softmax partitions are still restricted to convex 369
cases: Let x̃1 and x̃2 belong to class h 370
330 where p(ζir |Xi ) is the traditional sensor measurement’s like-
331 lihood model obtained from calibration or from the manufac- (wh − wk )T x̃1 ≥ 0
332 turer’s specifications.
333 Belief update given human’s observation expression and (wh − wk )T x̃2 ≥ 0 ∀k = h. (6)
334 ζih (Di , Xl i ) at time i is
All points on the line segment (1 − t)x̃1 + tx̃2 also belong 371
p(Xi |ζ1:i
r h
, ζ1:i ) = p(Xi |ζ1:i
r h
, ζ1:i−1 , ζih (Di , Xl i )) to class h 372
338 model is given in Section V. the state Xi = x ∈ X ⊆ RM being described with a preposition 379
Di = d is 380
V. SEMANTIC-LIKELIHOOD MODEL
k ∈σ (d) exp(wk x̃)
339 T
p(Di = d|Xi = x) = S . (8)
This section discusses the semantic-likelihood model-
h=1 exp(wh x̃)
340 T
341 ing necessary for both belief fusion and belief expres-
342 sion generation. Given an expression “The <subject> is For an M -dimensional input space, the S · (M + 1) param- 381
343 <prepositioni > <referencei >,” the likelihood p(Di = d|Xi = eters of MMS weight vectors w1,...,S can then be learned by 382
344 x, Xl i = xl ) maps from the hidden subject’s state space maximizing the probability p(Di = d|Xi = x) in (8), given a 383
345 X to the interval [0, 1]. At each state value x, the likeli- training set labeling {(Di , Xi )i=1:N }. This maximization can 384
346 hood
N D sums to one over the discrete choices of prepositions: be done through standard nonlinear optimization algorithms, 385
347 d=1 p(Di = d|Xi = x, Xl i = xl ) = 1. As such, the human e.g., quasi-Newton methods. 386
6 IEEE TRANSACTIONS ON ROBOTICS
387 VI. LANGUAGE GENERATION FROM FUSED BELIEF: space” always yields the maximum correctness probability of 433
388 PROBLEM FORMULATION one according to (10), and therefore would always be chosen, 434
389 In this section, two goodness measures for language genera- should we allow it among the expression choices. As a result, 435
this criterion is suitable when the correctness probability of the 436
390 tion from probabilistic beliefs are presented. The first optimizes
391 the semantic correctness probability of the sentence generated. generated expression is more critical to the application than the 437
specificity of the expression. 438
392 This measure is useful in the applications where the expres-
In a common situation where a semantic likelihood 439
393 sion’s correctness probability is more critical than the speci-
394 ficity. The second measure is based on information preservation p(D|X, Xl , Θ1,...,N D ) is represented by an MMS model, and a 440
belief p(X|I, Θ1,...,N D ) is represented by a Gaussian mixture, 441
395 in communicating the belief pdf. This measure is useful in the
the integral in (10) can be efficiently calculated by performing 442
396 applications where the expression’s correctness probability as
397 well as its specificity are important. The developments in sub- variational Bayesian importance sampling (VBIS) [17], [53]. 443
The optimal statement can then be generated with a preposi- 444
398 sequent sections will be based on the information preservation
tion determined based on the following objective 445
399 formulation.
D∗ = arg max p(D|I, Xl , Θ1,...,N D ). (11)
400 A. Semantic Correctness D
401 First, define a belief distribution over the subject’s state of Maximizing the objective in (11) is equivalent to maximizing 446
402 interest X as the semantic correctness probability of the generated sentence 447
when describing the hidden state, given the information gathered 448
b(x) ≡ p(X = x|I, Θ1,...,N D ) (9) so far. 449
405 the other agents. This belief distribution can be obtained as an An alternative approach to belief expression formulation is 451
406 output from any sensor fusion algorithms, or from human–robot to consider the communication problem from an information 452
407 communication processes, such as those presented in Section IV- theoretic perspective. As discussed in the previous section, a 453
408 A, i.e., I = {ζ1:i
r h
, ζ1:i }. Θ1,...,N D denote parameters describing belief maintained by a robot can be described as b(x) ≡ p(X = 454
409 all human semantic models in an a priori learned dictionary. x|I, Θ1,...,N D ), where I = {ζ1:i r h
, ζ1:i } is the supporting infor- 455
410 Given the learned semantic models and all gathered informa- mation gathered so far by a robot. Since I contains all in- 456
411 tion I, the correctness probability of using a preposition D to formation generating the belief, one way to fully preserve all 457
412 describe a hidden subject’s state X with respect to a reference information content in a belief is by communicating all orig- 458
state Xl can be defined and computed as follows r h r h
413 inal messages ζ1:i , ζ1:i , or by compressing ζ1:i , ζ1:i into one 459
summary expression. However, general belief expression gen- 460
p(D|I, Xl , Θ1 , . . . , N D )
eration problems entail another complication: the full message 461
history {ζ1:i
r h
, ζ1:i } is usually discarded after belief update steps. 462
= p(X = x|I, Xl , Θ1 , . . . , N D )p(D|X = x, I, Xl , Θ1 , . . . , N D )dx
X As a result, the actual information content underlying a belief 463
pdf is typically hidden to an expression generation algorithm. 464
= p(X = x|I, Θ1 , . . . , N D )p(D|X = x, Xl , Θ1 , . . . , N D )dx. (10) An expression generation problem can therefore be regarded as 465
X
a problem of recovering a hidden I from a fused distribution 466
414 The first term in (10) is basically the belief distribution b(x). 467
415 b(x) being expressed, while the second term is the semantic- To find an output expression that best summarizes the infor- 468
416 likelihood model described in Section V. That is, the correctness mation content I underlying a belief, an information loss is eval- 469
417 probability of a generated expression is determined by the prod- uated between the original belief b(x) ≡ p(X = x|I, Θ1,...,N ) 470
D
418 uct between 1) the probability of the subject being at each state and a representative distribution constructed from the recovered 471
419 x in the state space, and 2) the probability for the expression to information Î 472
420 be semantically correct in describing all these particular states.
421 We have noticed that the second term, i.e., the semantic like- r(x) ≡ p(X = x|Î, Θ1,...,N D ). (12)
422 lihood, is not normalized over the state space X . This signifies
423 the fact that the prepositions with broad meanings can have Given a reference state of interest Xl , a minimum information 473
424 high semantic applicability likelihood across large areas in the loss expression ζ(D∗ , Xl ) can be defined by minimizing the 474
425 state space, while prepositions with narrow or specific mean- Kullback–Leibler divergence (KLD) between the representative 475
426 ings would have high likelihood in only small regions. For this distribution r(x) and the original distribution b(x) as follows: 476
427 reason, in order to maximize the chance of being correct, the D∗ = arg min KL(b(x)||r(x)) (13)
428 semantic correctness criterion naturally prefers a more general D
429 statement by choosing a preposition with a larger support when-
where 477
430 ever such statement is semantically applicable to (i.e., covers)
431 the support of the given belief distribution. As an extreme illus- b(x)
KL(b(x)||r(x)) ≡ ln b(x)dx. (14)
432 trative example, the statement “The subject’s state is in the state X r(x)
TSE AND CAMPBELL: HUMAN–ROBOT COMMUNICATIONS OF PROBABILISTIC BELIEFS VIA A DP MIXTURE OF STATEMENTS 7
478 Since KLD can be considered as a dissimilarity measure be- more than one statement. Belief distributions with multiple- 520
479 tween two pdfs, the information preservation objective intends hypotheses content can be formally represented by a mixture 521
480 to preserve the original shape (both the peaks and the valleys) of model [54] as discussed next. 522
481 the distribution b(x). That is, the reconstructed distribution must First, assuming that a simple belief b(x) consists of only 523
482 preserve both the positive information of where the hidden state one underlying piece of information, which can be summarized 524
483 X would likely be, as well as the negative information of where by a single statement I ≈ {ζ(D, Xl )}, then b(x) ≡ p(X = 525
484 it likely would not be, in order to minimize the KLD. As a result, x|I, Θ1,...,N D ) can be represented by 526
485 the information preservation criterion is more advantageous in
486 applications where the semantic correctness probability of the r(x) = p(X = x|ζ(D, Xl ), Θ1,...,N D ). (19)
487 generated expression should be balanced with the expression’s
488 specificity. In the case where the reference state parameter Xl is fixed, 527
489 It can be shown that minimizing the KLD in (14) is equivalent the language generation problem reduces to the problem defined 528
490 to maximizing the expectation of the log of r(x) over x ∼ b(x). by (17); note that ζ(·) is a deterministic function. 529
491 This expectation can be approximated by performing sampling The language generation problem can be extended to allow 530
492 on b(x) for {xn ; n = 1, ..., N } a description of complex belief distributions, by breaking down 531
the expression into a composition of K single-statement hy- 532
D∗ = arg max Eb(x) [ln r(x)] (15) potheses via a conjunction “or” 533
D
n =1
where each hypothesis k is described by a single-statement 537
494 That is, the optimal statement can be determined based on the ζ(Dk , Xl k ) as defined in previous sections. 538
495 following maximum likelihood objective: In this case, the representative belief r(x) reconstructed from 539
N the generated expression becomes 540
496 where The probability of each hypothesis Z = k ∈ {1, ..., K} being 541
responsible for the belief of x can also be explicitly represented 542
xn ∼ p(X = xn |I, Θ1,...,N D ). (18) and then marginalized out as follows: 543
497 This maximum-likelihood problem can be solved by many r(x) = p(X = x|ζ1,...,K , Θ1,...,N D )
498 existing algorithms, some of which are applied in Section VII,
K
499 coupled with the formulation of related expression generation = p(X = x, Z = k|ζ1,...,K , Θ1,...,N D )
500 subproblems. k =1
506 ditionally, in previous section, the reference state of interest Xl model of hypotheses k = 1, ..., K. Therefore, an expression 545
507 is treated as a fixed parameter in the communication process. In generated via statement compositions using “or” conjunctions 546
508 this section, the optimal reference state parameters of all state- is referred to as a Mixture of Statements. 547
509 ments in the composition are simultaneously estimated. Finally, Each hypothesis k’s weight πk and likelihood Pk are defined 548
510 a technique for automatically determining the optimal number as follows: 549
551 Assuming a uniform prior on X when updating r(x) with the number of statements needed to represent a particular input 594
552 hypothesis k’s information, the term p(X = x|Θk ) then does belief can be considered a part of model selection problem [55]. 595
553 not depend on x Existing model selection methods, such as cross validation [56] 596
or information-based criterion (e.g., Akaike information crite- 597
p(Dk |X = x, Θk )p(X = x|Θk )
Pk (x) = rion [57]), can be used to determine the number of statements 598
p(X = x|Θk ) p(Dk |X = x, Θk )dx such that model fit and complexity are balanced. Cross valida- 599
The MoS sample that yields the highest log likelihood ob- 658
jective (25) is finally chosen as the output belief expression. 659
Variants of base distributions G0 used for DP MoS are pre- 660
sented next. 661
Fig. 9. Prior models for DP MoS generation where mixture component pa-
sent a prior distribution over mixture parameters, conditioned 675
rameters could be sampled from multiple separate manifolds. on a mixture type D. A random distribution G resulting from 676
DP(G0 , α) with an augmented base G0 is illustrated in Fig. 9. 677
In this figure, each distribution Gd0 denotes a base distribution 678
630 by the observed samples X1,...,N , as well as the DP prior. The
over each statement’s parameter Xl , given a statement type 679
631 nonparametric Bayesian DP MoS model is shown in Fig. 8(b),
D = d. In general, each conditional base Gd0 can be defined in 680
632 which can be described as follows.
a completely different parameter space for each statement type. 681
633 First, the prior distribution of each statement ζn is represented
From the example in Fig. 9, marginalizing the base distribu- 682
634 by a random distribution G, where G is DP distributed with a
tion along the dimension of reference Xl , it can be seen that the 683
635 base G0 and a concentration α
base distribution over each type of preposition D ∈ {1, ..., ND } 684
ζn |G ∼ G is uniform. That is, there is an equal chance of generating a state- 685
ment with any of the prepositions D in the dictionary. Human’s 686
G ∼ DP(G0 , α). (28)
natural preferences over preposition usages can be further incor- 687
636 The base distribution G0 is the prior distribution over each porated into the prior model. For example, humans may prefer 688
637 statement ζn ’s parameters, without enforcing any clustering ef- a more informative prepositions (e.g., “nearby”) over those that 689
638 fect. For example, consider the G0 highlighted in red on the are less informative (e.g., “far from”). In order to simulate this 690
639 left hand side of Fig. 9. This G0 provides a Gaussian prior over behavior, the base G0 can be defined such that the marginal 691
640 the reference state Xl n parameter of each ζn statement. The weight for each preposition choice is inversely proportional to 692
641 distribution G generated according to (28) with this base G0 is the entropy of each preposition’s semantic 693
645 the six “clusters” of discrete probability masses shown in black. where 694
646 The number of these clustered Xl n prior probability masses in
647 G are guided by the DP’s concentration parameter α, while the f (x) = p(X = x|ζ(D = d, Xl = 0− ), Θ1,...,N D ). (31)
648 locations of each Xl n prior probability mass are guided by the
649 base distribution G0 . Further information on DP mixture model The same strategy can be applied to simulate other types 695
650 can be found in [58] and [59]. of language generation behaviors, for example, by setting the 696
651 Finally, according to Fig. 8(b), given the DP prior in (28) and G0 distribution over Xl according to human’s preferences over 697
652 the hypothesis statement likelihood, the MoS parameters can be reference choices, such as map region preferences, landmark 698
653 generated from the DP MoS posterior distribution. This is per- saliencies, etc. 699
10 IEEE TRANSACTIONS ON ROBOTICS
700 VIII. ILLUSTRATIVE EXAMPLE two human trials, are presented and discussed. First, automatic 755
701 This section revisits the example of the belief communication evaluations were performed to compare the information loss 756
702 system introduced in Section III. In this example, a patrol robot incurred by different belief expression strategies. Second, task- 757
based evaluations were conducted to compare human users’ 758
703 gathers and shares information regarding the location of a crime
704 suspect cooperatively with a human partner. In the first step, task performances given different types of belief expression 759
sentences. Task performances were measured in terms of search 760
705 the robot uses its own vision and localization systems to check
region F1 score. Finally, evaluations based on human ratings 761
706 if a person matching the suspect description can be detected
707 anywhere in the map. The detection and nondetection measure- and judgments were used to compare each belief expression’s 762
correctness in describing a belief pdf. Evaluation data were 763
708 ments ζir are used to update the robot’s belief via the Bayesian
collected via Amazon Mechanical Turk online crowdsourcing 764
709 measurement update equation in (2), with the detection likeli-
710 hood p(ζir |Xi ) obtained from the vision system’s calibration as platform. 765
716 MMS likelihood models p(Di = d|Xi = x, Xl i = xl ) for a robot cooperative search scenarios. During each search mission, 768
717 statement “The <subject> is <preposition> <reference>,” the information system gathers multiple pieces of information 769
718 where the prepositions d are “next to,” “nearby,” “far from,” describing possible search subject locations, including robot 770
719 and the reference xl is at [0, 0]T , are shown in Fig. 2. The like- measurement readings and human observation sentences. Based 771
720 lihood of the statements “The subject is 60% likely nearby Ives on all gathered pieces of information, the system generates 772
721 Hall, or 40% likely at Barnes Hall” is then obtained according a fused belief pdf summarizing the potential search subject 773
722 to the learned MMS models as shown in Fig. 4. This likelihood locations over the search space map. The fused pdf is then com- 774
723 is used for the belief update equation in (3) to obtain the final municated back to human and robot team members to inform 775
724 belief in Fig. 5. This updated belief b(x; I) contains the infor- them of the latest estimate of the search subject’s location. In the 776
725 mation I gathered by both the robot and human team members case of human team members, structured English sentences are 777
726 in the search for the suspect. used to express the fused belief. The human searchers then rely 778
727 In order to communicate this belief back to human, the robot on the fused belief information to make a decision on how they 779
728 determines the expression statements ζ1:K that best summarize would conduct the search. In these scenarios, the search subject 780
729 all the information I underlying the belief pdf. The belief ex- was a laptop thief suspect reported to the Cornell University Po- 781
730 pression process is carried out by first generating sample states lice, and the search region is central Cornell University campus. 782
731 X1:N from the belief b(x; I) as illustrated in Fig. 6. These The suspect descriptions are first provided by witnesses and dis- 783
732 states X1:N are passed to DP MoS Gibbs sampling, to sample seminated to the patrol officers by the police information center. 784
733 output English expressions according to the probabilistic graph- Subsequently, all sightings of a person matching the description 785
734 ical model shown in Fig. 8(b). The distributions defining this are to be reported back to the information center to update the 786
735 model are described in full in (28) and (29). For the DP mixture system on potential suspect locations. All reports ζ1h , ..., ζLh h 787
736 prior in (28), the base distribution G0 for this example is defined describing suspect sightings are input to the information fusion 788
737 as follows. First, the base distribution for each output reference system as described in (3), Section IV-A, with the structure “The 789
738 state parameter Xl n is uniform over the locations of all the land- suspect is <preposition> <reference>.” The preposition dictio- 790
739 marks on the map. Second, the base distribution for each output nary was learned from the human preposition dataset from [17]. 791
740 preposition Dn is uniform over all prepositions in the MMS In addition, all search agents share the common landmark dic- 792
741 dictionary. Finally, the likelihood Pn (Xn ) in (29) is given from tionary, defined by the campus map retrieved from Google Map 793
742 the same MMS likelihood dictionary used in belief update pro- shown in Fig. 10. Each report includes a probability indicating 794
743 cesses. The sentence generated by DP MoS in this case is “The how confident the human reporter is regarding the relevance of 795
744 subject is 61.79% likely nearby Ives Hall, or 27.08% likely at the information he or she provided, i.e., the probability that the 796
745 Barnes Hall, or 11.13% likely at Sage Chapel.” That is, the sys- suspect sighting information is a true detection and is not a false 797
746 tem extracts from the belief pdf a mixture model with three clus- alarm. The numbers of observation reports were sampled from 798
747 ters. Each of the clusters is described by the statement ζ1 (D1 = a Poisson distribution (λ = 5), with the maximum of nine and 799
748 “nearby”, Xl 1 = XIves Hall ), ζ2 (D2 = “at”, Xl 2 = XBarnes Hall ), minimum of two, while each input report’s preposition was sam- 800
749 and ζ3 (D3 = “at”, Xl 3 = XSage Chapel ), with the cluster proba- pled from a uniform distribution over the preposition dictionary. 801
750 bilities of 61.79%, 27.08%, and 11.13%, respectively. Each report’s reference was sampled from a uniform distribution 802
over the campus map in Fig. 10. Finally, each report’s relevance 803
probability was sampled from a uniform distribution over [0, 1]. 804
751 IX. EVALUATION RESULTS In addition to the sighting reports, the suspect location pdf is 805
752 In this section, the proposed belief communication frame- also updated by sensor measurement inputs ζ1r , ..., ζLr r generated 806
753 work is evaluated in the context of human–robot cooperative based on a 360◦ camera on a robot patrol in the search region. 807
754 search scenarios. Data from three sets of evaluations, including Each measurement consists of 808
TSE AND CAMPBELL: HUMAN–ROBOT COMMUNICATIONS OF PROBABILISTIC BELIEFS VIA A DP MIXTURE OF STATEMENTS 11
Fig. 10. Central Cornell campus map defining reference landmarks, such as “Day Hall,” “Uris Garden,” “Statler Hotel,” etc. This landmark dictionary was shared
among all agents during search missions. This image is retrieved from Google Map.
TABLE III
SUMMARY OF NORMALIZED JSD INFORMATION LOSS FOR EACH TYPE OF
SENTENCES IN DESCRIBING BELIEF DISTRIBUTIONS
844 pdfs b(x; I(Θ̂)) given the generated expressions I(Θ̂), and the
845 original belief pdfs b(x; I(Θ)) were compared among the five Fig. 12. Information loss box plot for the five belief expressions. The boxes
show the 25th percentile, median, and 75th percentile, while the whiskers
846 expression generation strategies. indicate the minimum, maximum, and the 1.5 interquartile range (IQR) outliers.
847 Online human data collections were conducted based on two
848 types of NLG system evaluations, as classified by Reiter and
849 Belz [61]: task-based evaluations and evaluations based on hu-
In terms of consistency, Chance sentences were found to be 886
850 man ratings and judgments.
the least consistent, with the largest information loss IQR of 887
851 Human Task-Based Evaluation: In Section IX-C, human task-
0.337. MoS and Truth sentences were found to be the most 888
852 based evaluations were conducted by asking humans to perform
consistent, with the smallest information loss IQR among all 889
853 suspect search tasks, given each type of generated expressions
(0.069). Finally, the SS sentences were found to produce a 890
854 I(Θ̂). Human subject task executions T (I(Θ̂)) were then com-
smaller information loss IQR (0.145) than Uninf sentences but 891
855 pared in terms of task precision and recall performances, as
larger than MoS sentences. 892
856 measured by the F1 scores.
Nonparametric Wilcoxon test was used to test the significance 893
857 Human Ratings and Judgments: In Section IX-D, human sub-
of the differences between the information loss caused by each 894
858 ject Likert score ratings were collected to assess the correctness
belief expression approach against the others. It was found that 895
859 of each generated expression I(Θ̂) in describing a given distri-
the qualities of Truth and MoS sentences were significantly bet- 896
860 bution b(x; I(Θ)). The human subject ratings were then com-
ter (p < 0.001) than all the other types of sentences but were not 897
861 pared among the five expression generation strategies.
significantly different from each other (p > 0.05). The differ- 898
ences between the qualities of Uninf and Chance sentences with 899
862 B. Automatic Evaluation: Information Loss respect to SS’s were also significant (p < 0.01 in both cases). 900
863 In the first set of evaluations, belief expression system perfor- These results indicate that the MoS sentences were better at 901
864 mances were evaluated automatically without human subjects. estimating the underlying belief parameters, and preserving the 902
865 The goal of this evaluation is to compare information losses original belief information content than the other automated 903
866 incurred from different belief expression strategies. Twenty sets methods. Furthermore, the results illustrate the benefit of allow- 904
867 of i.i.d. belief pdfs were generated by fusing suspect search in- ing a composition of multiple statements in describing a belief 905
868 put sentences and robot sensor measurements. The five sentence in contrast to restricting an expression to only one statement. 906
869 generation methods listed in Table II were used to generate five An interesting observation is that while human expert did not 907
870 output expressions for each pdf. In order to measure the in- explicitly aim to optimize any particular objective function as 908
871 formation losses between the estimated pdfs b(x; I(Θ̂)), and automatic generation algorithms do, the expert still managed to 909
872 the input belief pdfs b(x; I(Θ)), a normalized Jensen–Shannon achieve low information loss results that were not significantly 910
873 divergence (JSD) [62] criterion was used. The benefit of a nor- different from those from MoS. 911
874 malized JSD as an information loss measure is that it is bounded The sentences generated by the SS approach, while having a 912
875 within [0, 1]. As a benchmark, Truth sentence result denotes the significantly higher information loss than MoS sentences, still 913
876 performances of the belief expression sentences written by hu- performed significantly better than Chance and Uninf sentences. 914
877 man expert. The information loss results are summarized in The relatively large SS and Uninf information loss IQRs indi- 915
878 Table III and a box plot of the results is shown in Fig. 12. cate a high variation in the modality and entropy of the input 916
879 When the pdf was reconstructed using an estimated sentence pdfs used in the test cases. The fact that MoS sentences were 917
880 I(Θ̂), it was found that the median information loss was small- able to achieve both smallest information loss, as well as infor- 918
881 est when using MoS sentences (0.225) and largest using Chance mation loss IQR, indicates the reliability of the MoS approach 919
882 sentences (0.651). The median information loss from SS sen- in handling the variation in the input belief pdfs being described. 920
883 tences (0.357) was found to be lower than that of the uninforma- Fig. 13 shows the number of statements generated by MoS 921
884 tive (Uninf) sentences (0.605) but was higher than that of MoS versus those written by human expert. The radius of each point 922
885 sentences. is proportional to MoS expression information loss. It was 923
TSE AND CAMPBELL: HUMAN–ROBOT COMMUNICATIONS OF PROBABILISTIC BELIEFS VIA A DP MIXTURE OF STATEMENTS 13
TABLE V
SUMMARY OF TASK F1 SCORE STATISTICS FOR EACH
BELIEF EXPRESSION ALGORITHM
Fig. 13. K MoS versus K Truth . The radius of each point is proportional to MoS
expression information loss.
finding the suspect. The search space was divided into small grid 948
TABLE IV
INSTRUCTION TO HUMAN SUBJECTS DURING TASK-BASED EVALUATIONS cells and the human subjects were asked to identify high prob- 949
ability regions in the search space by painting over the gridded 950
canvas. An example of painting made by the Turker is shown in 951
Fig. 14. The regions identified by the Turkers were then com- 952
pared with the actual highest probability density regions on the 953
underlying belief pdf. This was done by automatically choosing 954
the top 10% of the cells on the canvas which had the highest 955
probabilities in the pdf. Suspect search precision and recall were 956
calculated between the Turker’s painted regions and the actual 957
pdf’s 10% peak regions. The F1 score task performance mea- 958
sure was finally calculated by taking the harmonic mean of the 959
precision and recall results. 960
In this evaluation set, two hundred F1 score data were col- 961
lected for each of the five algorithms, giving a total of one thou- 962
sand F1 score data. For each of the five evaluated algorithms, 963
twenty trials formed by twenty sets of i.i.d. suspect search re- 964
ports were sampled as described in Section IX-B. Each set of 965
924 found that regardless of the number of suspect sighting reports,
reports was fused as described in Section IV-A to generate one 966
925 the number of statements generated is bounded to around six
belief pdf. Five associated belief expression sentences were fi- 967
926 statements maximum. This is because the sum of all statement
nally generated for each pdf. Twenty participants were asked to 968
927 weights cannot exceed to 100% . When the number of hypothe-
perform the tasks, where each person participated in ten trials. 969
928 ses grows, the weight of the least likely hypothesis becomes
In each trial, each person was shown with one belief expres- 970
929 very small that such hypothesis was eliminated by MoS and
sion sentence picked at random. Each participant produced one 971
930 the human expert. The number of statements generated by MoS
search region painting per sentence given, i.e., ten paintings in 972
931 and human expert were positively correlated and were all within
total. Suspect search based on uninformative expression “Sus- 973
932 two statements from each other. The greatest information loss
pect is equally likely anywhere on the map” was performed 974
933 occurred when only two hypotheses were generated. This was
automatically by randomly sampling selected search regions 975
934 because when there were only two statements generated, it was
according to a uniform distribution over the search space. The 976
935 harder to describe accurately the shape of the belief distribution.
human search F1 score statistics are summarized in Table V, 977
while the box plot is shown in Fig. 15. 978
936 C. Human Task-Based Evaluation: Search Precision and
It was found that the task precision and recall performance as 979
937 Recall measured by median F1 score was highest in the case of MoS 980
938 In task-based evaluation, human subjects were asked to per- sentences (64.09%), followed by Truth sentences (63.15%). Ad- 981
939 form suspect search tasks based on the generated sentences ditionally, it was found that using only a single statement to de- 982
940 given to them. The proposed language generation system was scribe a pdf, SS strategy was able to achieve a median F1 score 983
941 evaluated according to how well humans performed the search of 49.92%. Nonparametric Wilcoxon test was applied in order 984
942 tasks given the sentences. to test the significance of the differences in human task perfor- 985
943 The instruction given to the human subjects in task-based mances between each type of sentences against the Truth, MoS, 986
944 evaluation is shown in Table IV. To perform the tasks, the Turk- and SS. Based on the test, the Truth sentence performances were 987
945 ers were asked to search for the suspect on the map shown found to be significantly better than those of the SS, Uninf, and 988
946 in Fig. 10 by inferring, based on the given sentence, the top Chance sentences (p < 0.001 in all three cases). Likewise, F1 989
947 10% regions in the search space with the highest probability of scores based on MoS sentences were found to be significantly 990
14 IEEE TRANSACTIONS ON ROBOTICS
Fig. 14. Painting interface used by Amazon Mechanical Turkers in task-based evaluations. An example painting made by the Turker is shown over the given
search space map.
991 better (p < 0.001) than that of SS, Uninf, and Chance, while the selections that either completely hit or missed the actual pdf 1021
992 performances based on SS sentences were significantly better peaks. When bearing incorrect suspect information, which was 1022
993 (p < 0.001) than that of Uninf and Chance. However, human often found to be the case, Chance region selection yielded 1023
994 task F1 score performances given the Truth sentences were not nearly zero F1 score. The Uninf region selection, on the other 1024
995 found to be significantly different from those given the MoS hand, was able to uniformly cover all regions of grid cells in the 1025
996 sentences (p > 0.05). search space, giving the most consistent hit and miss rates with 1026
997 These statistical results demonstrate that the proposed MoS median F1 score of about 10%. 1027
Fig. 16. Human subject rating interface, with an example belief pdf and the five corresponding belief expression sentences.
1031 of each output expression with respect to the belief pdf shown TABLE VI
SUMMARY OF HUMAN RATING STATISTICS FOR EACH TYPE OF SENTENCES IN
1032 to them. The rating was performed on a five-point Likert scale, DESCRIBING BELIEF DISTRIBUTIONS
1033 including a mid-point neutral option. The rating interface used
1034 by human subjects is shown in Fig. 16. Two hundred Likert
1035 score data were collected for each of the five algorithms, giving
1036 an evaluation set of one thousand human subject rating scores.
1037 For each evaluated algorithm, 20 trials were conducted based on
1038 20 sets of i.i.d. suspect search reports used in Section IX-C. Each
1039 set of reports was fused as described in Section IV-A to generate
1040 one belief pdf. Five associated belief expression sentences were
1041 finally generated for each pdf. Twenty human participants were
1042 asked to perform the task in ten trials each. In each trial, each
1043 person was shown with one belief distribution picture, together
1044 with all five corresponding description sentences for them to Wilcoxon test was used to test the significance of the difference 1054
1045 rate. As such, each human provided ratings for 50 sentences. between each type of output sentence ratings against the Truth’s. 1055
1046 The Likert score statistics from the five belief expression sets Similarly, the significance of the difference between MoS and 1056
1047 are summarized in Table VI. The rating box plot for the five SS ratings against all the others were also tested. Based on 1057
1048 types of expressions is also shown in Fig. 17. It was found Wilcoxon test, the Truth and MoS ratings were both found to 1058
1049 that the Truth and MoS sentences received the highest median be significantly better than the ratings of SS, Chance, and Uninf 1059
1050 correctness scores among the five types of sentences tested, with sentences (p < 0.001). Likewise, SS rating was also found to be 1060
1051 the median ratings of four, followed by the SS sentences with significantly better than Uninf, and Chance ratings (p < 0.001). 1061
1052 the median rating of three. Chance and Uninf sentences were The Truth and MoS ratings were not found to be significantly 1062
1053 found to have the lowest median ratings of one. Nonparametric different (p > 0.05). 1063
16 IEEE TRANSACTIONS ON ROBOTICS
Fig. 17. Box plot for the five belief expression ratings. The boxes show the
into subregions such that the space size is constant. For exam- 1113
25th percentile, median, and 75th percentile, while the whiskers indicate the ple, the suspect location state space and can be divided such 1114
minimum, maximum, and the 1.5 IQR outliers. that the hypotheses are calculated independently for the belief 1115
descriptions over the central campus, collegetown, downtown, 1116
east hill, west hill, etc. This division of state space helps reduce 1117
1064 These statistical results indicate that the proposed DP MoS the computation complexities for both belief updates as well as 1118
1065 algorithm was successful in expressing the belief pdfs by re- expressions. 1119
suggest that the proposed method for generating belief expres- (t)
1153 Furthermore, it is noticed from (36) that the parameter Θk 1179
1154 sions is an effective approach for communicating probabilistic describing each hypothesis k can be solved independently from 1180
1155 information between robots and humans. all the other hypotheses k = k. For the case of MMS hypothesis 1181
(t) (t)
with Θk = xl k , the M-Step can be performed with a grid 1182
1156 APPENDIX (t)
search for the optimal reference location xl k
of each hypothesis 1183
1157 EM solution to maximizing MoS generation objective in (25) k, with all the other MMS shape parameters fixed, as learned 1184
1158 is performed as follows. from human preposition dataset 1185
1159 E-step: Evaluate the conditional probabilities of all hid-
1160 den variables Z1,...,N , given the current parameter estimates (t)
(t)
(t−1) (t−1) xl k = arg max γn k ln Pk (xn ) . (38)
1161 Θ(t−1) ≡ {Θk =1,...,K , πk =1,...,K }. That is, for each pair of sam- xl k
n
1162 ple point Xn and hypothesis k, evaluate the (soft) membership
1163 assignment
In each of the EM iteration step t, the likelihood of the samples 1186
(t)
K
(t)
is guaranteed to increase unless the local optimum has been 1187
γn k ≡ p(Zn = k|Xn = xn , Θ(t−1) ); γn k = 1. (32) reached: p(S|Θ(t) ) ≥ p(S|Θ(t−1) ) [54]. 1188
k =1
1241 [18] J. Frost, “Mapping spatial language to sensor models,” in Proc. Comput. [41] V. Mast, D. C. Vale, and Z. Falomir, “Enabling grounding dialogues 1316
1242 Lab. Student Conf., 2009, p. 8. through probabilistic reference handling,” in Proc. RefNet Workshop Psy- 1317
1243 [19] J. Frost, A. Harrison, S. Pulman, and P. Newman, “A probabilistic ap- chol. Comput. Models Reference Comprehension Prod., Edinburgh, U.K., 1318
1244 proach to modelling spatial language with its application to sensor mod- Aug. 2014, pp. 1–3. 1319
1245 els,” in Proc. Workshop Comput. Models Spatial Lang. Interpretation [42] V. Mast and D. Wolter, “A probabilistic framework for object descrip- 1320
1246 Spatial Cogn., 2010, pp. 1–8. tions in indoor route instructions,” in Proc. Spatial Inf. Theory, 2013, 1321
1247 [20] A. N. Bishop and B. Ristic, “Fusion of spatially referring natural language pp. 185–204. 1322
1248 statements with random set theoretic likelihoods,” IEEE Trans. Aerosp. [43] N. L. Green, “Analysis of communication of uncertainty in genetic coun- 1323
1249 Electron. Syst., vol. 49, no. 2, pp. 932–944, Apr. 2013. seling patient letters for design of a natural language generation system,” 1324
1250 [21] A. N. Bishop and B. Ristic, “Fusion of natural language propositions: Social Semiotics, vol. 20, no. 1, pp. 77–86, 2010. 1325
1251 Bayesian random set framework,” in Proc. 2011 Proc. 14th Int. Conf. Inf. [44] C. Huang, “Risk analysis with information described in natural language,” 1326
1252 Fusion, 2011, pp. 1–8. in Proc. 7th Int. Conf. Comput. Sci. III, 2007, pp. 1016–1023. 1327
1253 [22] A. N. Bishop and B. Ristic, “Spatially referring natural lan- [45] C. R. Fox and J. R. Irwin, “The role of context in the communication of 1328
1254 guage propositions: Information fusion and estimation theory,” uncertain beliefs,” Basic Appl. Social Psychol., vol. 20, no. 1, pp. 57–70, 1329
1255 in Proc. Workshop Defense Appl. Signal Proces. (DASP), 2011, 1998. 1330
1256 pp. 1–11. [46] R. Tse, G. Seet, and S. K. Sim, “Recognition of human intentions 1331
1257 [23] C. Matuszek, L. Bo, L. Zettlemoyer, and D. Fox, “Learning from un- using Bayesian artificial intelligence,” in Proc. ASME IMECE, 2007, 1332
1258 scripted deictic gesture and language for human-robot interactions,” in pp. 699–707. 1333
1259 Proc. 28th AAAI Conf. Artif. Intell., 2014, pp. 2556–2563. [47] K. A. Tahboub, “Intelligent human-machine interaction based on dynamic 1334
1260 [24] C. Matuszek, N. Fitzgerald, L. Zettlemoyer, L. Bo, and D. Fox, “A joint Bayesian networks probabilistic intention recognition,” J. Intell. Robot. 1335
1261 model of language and perception for grounded attribute learning,” in Syst., vol. 45, no. 1, pp. 31–52, 2006. 1336
1262 Proc. 29th Int. Conf. Mach. Learn., 2012, pp. 1671–1678. [48] R. A. Knepper, S. Tellex, A. Li, N. Roy, and D. Rus, “Recovering from 1337
1263 [25] S. Tellex et al., “Understanding natural language commands for robotic failure by asking for help,” Auton. Robots, vol. 39, no. 3, pp. 347–362, 1338
1264 navigation and mobile manipulation,” in Proc. 25th AAAI Conf. Artif. 2015. 1339
1265 Intell., 2011, pp. 1507–1514. [49] S. Tellex, R. A. Knepper, A. Li, D. Rus, and N. Roy, “Asking for help 1340
1266 [26] S. Tellex, P. Thaker, J. Joseph, M. R. Walter, and N. Roy, “Toward learning using inverse semantics,” in Proc. Robot. Sci. Syst. Conf., vol. 7, Berkeley, 1341
1267 perceptually grounded word meanings from unaligned parallel data,” in CA, USA, 2014, p. 24. 1342
1268 Proc. 2nd Workshop Semantic Interpretation Actionable Context, 2012, [50] R. Tse and M. Campbell, “Human-robot information sharing 1343
1269 pp. 7–14. with structured language generation from probabilistic beliefs,” 1344
1270 [27] S. Tellex, P. Thaker, J. Joseph, and N. Roy, “Learning perceptually in Proc. 2015 IEEE/RSJ Int. Conf. Intell. Robots Syst., 2015, 1345
1271 grounded word meanings from unaligned parallel data,” Mach. Learn., pp. 1242–1248. 1346
1272 vol. 94, no. 2, pp. 151–167, 2014. [51] N. Ahmed and M. Campbell, “On estimating simple probabilistic dis- 1347
1273 [28] C. Matuszek, E. Herbst, L. Zettlemoyer, and D. Fox, “Learning criminative subclass models,” Expert Syst. Appl., vol. 39, pp. 6659–6664, 1348
1274 to parse natural language commands to a robot control system,” 2012. 1349
1275 in Experimental Robotics. New York, NY, USA: Springer, 2013, [52] N. Ahmed and M. Campbell, “Multimodal operator decision models,” in 1350
1276 pp. 403–415. Proc. Amer. Control Conf., 2008, pp. 4504–4509. 1351
1277 [29] C. Matuszek, D. Fox, and K. Koscher, “Following directions using statis- [53] N. R. Ahmed, R. Tse, and M. Campbell, “Enabling robust human-robot 1352
1278 tical machine translation,” in Proc. 5th ACM/IEEE Int. Conf. Hum.-Robot cooperation through flexible fully Bayesian shared sensing,” in Proc. AAAI 1353
1279 Interact., 2010, pp. 251–258. Spring Symp. Series, Int. Robust Intell. Trust Auton. Syst., Stanford, CA, 1354
1280 [30] E. Krahmer and K. Van Deemter, “Computational generation of referring USA, 2014, pp. 2–10. 1355
1281 expressions: A survey,” Comput. Linguistics, vol. 38, no. 1, pp. 173–218, [54] C. M. Bishop, Pattern Recognition and Machine Learning. New York, 1356
1282 2012. NY, USA: Springer, 2006. 1357
1283 [31] J. Kelleher and G. M. Kruijff, “A context-dependent model of proximity [55] A. Corduneanu and C. M. Bishop, “Variational Bayesian model selection 1358
1284 in physically situated environments,” in Proc. ACL-SIGSEM Workshop for mixture distributions,” in Proc. Artif. Intell. Statist., vol. 2001, 2001, 1359
1285 Linguistic Dimensions Prepositions Their Use Comput. Linguistics For- pp. 27–34. 1360
1286 malisms Appl., 2005, pp. 1–8. [56] P. Smyth, “Model selection for probabilistic clustering using cross- 1361
1287 [32] R. Fang, M. Doering, and J. Y. Chai, “Collaborative models for referring validated likelihood,” Statist. Comput., vol. 10, no. 1, pp. 63–72, 1362
1288 expression generation in situated dialogue,” in Proc. 28th AAAI Conf. 2000. 1363
1289 Artif. Intell., 2014, pp. 1544–1550. [57] X. Hu and L. Xu, “Investigation on several model selection criteria for 1364
1290 [33] C. Liu, R. Fang, L. She, and J. Y. Chai, “Modeling collaborative referring determining the number of cluster,” Neural Inf. Process. Lett. Rev., vol. 4, 1365
1291 for situated referential grounding,” in Proc. ACL SIGDIAL, 2013, pp. no. 1, pp. 1–10, 2004. 1366
1292 78–86. [58] D. M. Blei and M. I. Jordan, “Variational inference for Dirichlet 1367
1293 [34] A. Sadovnik, A. Gallagher, and T. Chen, “Not everybody’s special: process mixtures,” Bayesian Analysis, vol. 1, no. 1, pp. 121–143, 1368
1294 Using neighbors in referring expressions with uncertain attributes,” in 2006. 1369
1295 Proc. 2013 IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2013, [59] Y. W. Teh, “Dirichlet process,” in Encyclopedia of Machine Learning. 1370
1296 pp. 269–276. New York, NY, USA: Springer, 2011, pp. 280–287. 1371
1297 [35] S. Kazemzadeh, V. Ordonez, M. Matten, and T. L. Berg, “Refer- [60] R. M. Neal, “Markov chain sampling methods for Dirichlet process mix- 1372
1298 itgame: Referring to objects in photographs of natural scenes,” in ture models,” J. Comput. Graphical Statist., vol. 9, no. 2, pp. 249–265, 1373
1299 Proc. ACL Conf. Empirical Methods Natural Language Process., 2014, 2000. 1374
1300 pp. 787–798. [61] E. Reiter and A. Belz, “An investigation into the validity of some met- 1375
1301 [36] A. Sadovnik, Y.-I. Chiu, N. Snavely, S. Edelman, and T. Chen, “Image rics for automatically evaluating natural language generation systems,” 1376
1302 description with a goal: Building efficient discriminating expressions for Comput. Linguistics, vol. 35, no. 4, pp. 529–558, 2009. 1377
1303 images,” in Proc. 2012 IEEE Conf. Comput. Vis. Pattern Recognit., 2012, [62] J. Lin, “Divergence measures based on the Shannon entropy,” IEEE Trans. 1378
1304 pp. 2791–2798. Inf. Theory, vol. 37, no. 1, pp. 145–151, Jan. 1991. 1379
1305 [37] E. Reiter and R. Dale, “A fast algorithm for the generation of referring [63] D. Klein and C. D. Manning, “Fast exact inference with a factored model 1380
1306 expressions,” in Proc. 14th Conf. Comput. Linguistics-Volume 1, 1992, for natural language parsing,” in Adv. Neural Inf. Process. Syst., 2003, 1381
1307 pp. 232–238. pp. 3–10. 1382
1308 [38] A. Stent and S. Bangalore, Natural Language Generation in Interactive [64] K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, “Feature-rich part- 1383
1309 Systems. Cambridge, U.K.: Cambridge Univ. Press, 2014. of-speech tagging with a cyclic dependency network,” in Proc. 2003 Conf. 1384
1310 [39] J. Viethen and R. Dale, “The use of spatial relations in referring ex- North Amer. Chapter Assoc. Comput. Linguistics Hum. Lang. Technol., 1385
1311 pression generation,” in Proc. 5th Int. Nat. Lang. Gener. Conf., 2008, 2003, pp. 173–180. 1386
1312 pp. 59–67. [65] V. Demberg, J. Hoffmann, D. M. Howcroft, D. Klakow, and A. Torralba, 1387
1313 [40] R. Dale and J. Viethen, “Referring expression generation through attribute- “Search challenges in natural language generation with complex opti- 1388
1314 based heuristics,” in Proc. 12th Eur. Workshop Natural Lang. Gener., 2009, mization objectives,” KI-Kunstliche Intelligenz, vol. 30, no. 1, pp. 63–69, 1389
1315 pp. 58–65. 2016. 1390
TSE AND CAMPBELL: HUMAN–ROBOT COMMUNICATIONS OF PROBABILISTIC BELIEFS VIA A DP MIXTURE OF STATEMENTS 19
1391 Rina Tse (M’16) received the B.Eng. degree in me- Mark Campbell (F’18) received the B.S. degree in 1405
1392 chanical engineering with mechatronics specializa- mechanical engineering from Carnegie Mellon Uni- 1406
1393 tion and the M.Eng. degree in computer engineering versity, Pittsburgh, PA, USA, and the M.S. and Ph.D. 1407
1394 from Nanyang Technological University, Singapore, degrees in control and estimation from Massachusetts 1408
1395 and the M.S. and Ph.D. degrees in mechanical engi- Institute of Technology, Cambridge, MA, ISA, in 1409
1396 neering from Cornell University, Ithaca, NY, USA. 1993 and 1996, respectively. 1410
1397 She is currently a Postdoctoral Research Associate He is currently the John A. Mellowes’ 60 Profes- 1411
1398 with the Autonomous Systems Lab, Cornell Univer- sor and the S. C. Thomas Sze Director of the Sibley 1412
1399 sity. Her research interests include Bayesian machine School of Mechanical and Aerospace Engineering, 1413
1400 learning, clustering, data fusion, and human-robot in- Cornell University, Ithaca, NY, USA. In 2005.2006, 1414
1401 teraction. he was a Visiting Scientist with the Insitu Group and 1415
1402 Dr. Tse was the recipient of the Singapore Government Scholarship, and an ARC International Fellow with the Australian Centre of Field Robotics. His 1416
1403 Cornell University’s Olin and Walter Schonlenk Ph.D. Fellowships. research interests are in the areas of autonomous systems. 1417
1404 Dr. Campbell was the recipient of the best paper awards from AIAA Propul- 1418
sion and GNC Conferences, and Frontiers in Education Conference, and teach- 1419
ing awards from Cornell, University of Washington, and ASEE. He is an Asso- 1420
ciate Fellow of the AIAA. 1421
1422