Вы находитесь на странице: 1из 1

A Bayesian Based Machine Learning Application to Task Analysis

that have arguably little face validity. Such as ap- The posterior probability of subtask Ai is calculated
proach also helps balance out the effect of many as follows:
biasing factors such as individual differences
among the knowledge agents, unusual work ac- P(Ai|WAi,WCi,WCi-1)=MAX[P(WAi|Ai)*P(Ai)/
tivities followed by a particular knowledge agent, P(WAi), P(WCi|Ai)*P(Ai)/P(WCi),
or sampling problems related to collecting data P(WCi-1|Ai)*P(Ai)/P(WCi-1)]
from only a few or particular groups of customers =MAX [MAXj [P(WAij|Ai)*P(Ai)/P(WAij)], MAXj
with specific questions. [P(WCij|Ai)*P(Ai)/P(WCij)],
2. The next step is to document the agents’ trouble- MAXj [P(WC(i-1)j|Ai)*P(Ai)/P(WC(i-1)j)]] for
shooting process and define the subtask categories. j=1,2,….q
The assigned subtasks can then be manually as-
signed into either the training set or testing set. To develop the model, keywords were first parsed
Development of subtask definitions will normally from the training set to form a knowledge base.
require inputs from human experts. In addition to The Bayesian based machine learning tool then
the use of transcribed verbal protocols, the analyst learned from the knowledge base. This involved
might consider using synchronized forms of time determines combinations of words appearing in
stamped video/audio/data streams. the narratives that could be candidates for subtask
3. In our particular study, the fuzzy Bayes model category predictors. These words were then used
was developed (Lin, 2006) during the text min- to predict subtask categories, which was the output
ing process to describe the relation between the of the fuzzy Bayes model.
verbal protocol data and assigned subtasks. For 4. The fuzzy Bayes model was tested on the test
fuzzy Bayes method, the expression below is used set and the model performance was evaluated in
to classify subtasks into categories: terms of hit rate, false alarm rate, and sensitivity
value. The model training and testing processes
P( E j | S i ) P( S i )
P ( S i | E ) = MAX j were repeated ten times to allow cross-validation
P( E j )
of the accuracy of the predicted results. The testing
results showed that the average hit rate (56.55%)
where P(Si|E) is the posterior probability of sub- was significantly greater than the average false
task Si is true given the evidence E (words used alarm rate (0.64%), and a sensitivity value of 2.65
by the agent and the customer) is present, greater than zero.
P(Ej|Si) is the conditional probability of obtaining
the evidence Ej given that the subtask Si is true,
P(Si) is the prior probability of the subtask being FUTURE TRENDS
true prior to obtaining the evidence Ej, and
“MAX” is used to assign the maximum value of The testing results reported above suggest that the fuzzy
calculated P(Ej|Si)*P(Si)/P(Ej). Bayesian based model is able to learn and accurately
When agent performs subtask Ai, words used by predict subtask categories from the telephone conversa-
the agent and the customer are expressed by word tion between the customers and the knowledge agents.
vectors These results are encouraging given the complexity
of the tasks addressed. That is, the problem domain
WAi = (WAi1, WAi2, …, WAiq) and WCi = (WCi1, included 24 different agents, 55 printer models, 75
WCi2, …, WCiq) respectively, companies, 110 customers, and over 70 technical issues.
Future studies are needed to further evaluate model
where WAi1, WAi2, …, WAiq are the q words in performance that includes topics such as alternative
the ith agent’s dialog/narrative; groupings of subtasks and words, as well as use of
WCi1, WCi2, …, WCiq are the q words in the ith word sequences. Other research opportunities include
customer’s dialog/narrative. further development and exploration of a variety of
Ai is considered potentially relevant to WAi, WCi-1, Bayesian models, as well as comparison of model
and WCi for i greater than 1.



Вам также может понравиться