Вы находитесь на странице: 1из 23

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/318404775

Can we Improve the User Experience of Chatbots with Personalisation?

Thesis · July 2017


DOI: 10.13140/RG.2.2.36112.92165

CITATIONS READS

4 3,638

1 author:

Daniëlle Duijst
University of Amsterdam
2 PUBLICATIONS   4 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

User Experience of Chatbots View project

RealME: A Digital Tool for Prisoners to Develop Employability Skills View project

All content following this page was uploaded by Daniëlle Duijst on 13 July 2017.

The user has requested enhancement of the downloaded file.


Can we Improve the User Experience of
Chatbots with Personalisation?

University of Amsterdam

A thesis submitted for the degree of


Master of Science

Daniëlle Duijst
Studentnumber: 11266988

First reader:
Dr. Jacobijn Sandberg

Second reader:
Dr. Dan Buzzo

Supervisor Capgemini:
Drs. Wouter Slager

July 13, 2017

1
Can we improve the User Experience of Chatbots with
Personalisation?

Daniëlle Duijst
11266988
University of Amsterdam
Science Park 904
danielle.duijst@student.uva.nl

ABSTRACT this study focuses on researching the kind of factors which


A conversational interface is an interface that the user can influence the user experience of chatbots. Previous research
interact with by means of a conversation. The conversation has shown that personalisation factors influences the user ex-
can occur by speech but also by text input. When a conver- perience (Fan & Poole, 2006). Personalisation is “a process
sational interface uses text, it is also described as a chatbot that changes the functionality, interface, information access
or a conversational agent. In this study, the user experience and content, or distinctiveness of a system to increase its
factors of these so called chatbots were investigated. The personal relevance to an individual or a category of individ-
main research question is “What is the added value of per- uals” (Fan & Poole, 2006).
sonalisation for the user experience of chatbots?”. A variety Because of the gap in research about user experience of
of user experience frameworks and evaluation methods are machine learning applications, and the potential of person-
described in the literature review. Two kinds of chatbots alisation, we chose the following research question “What is
have been designed, one with and one without personali- the added value of Personalisation for the User Experience
sation factors. The design of this research is a two-by-two of Chatbots?”. The research is conducted at the Financial
factorial design. The independent variables are the two chat- Services business unit of Capgemini (Capgemini, 2017).
bots (unpersonalised versus personalised) and the specific The structure of this document is as follows: Firstly, we
task or goal the user can achieve with the chatbot in the discuss the relevant literature. Secondly, we present the re-
financial field (a simple versus a complex task). The results search question and hypotheses. Thirdly, we explain our
are that there was no significant interaction effect between methodology and our approach. Fourthly, the results are
personalisation and task on the user experience of chatbots. presented. Thereafter, the results are presented and dis-
A significant difference was found between the two tasks cussed.
with respect to the user experience of chatbots, however
this difference was not due to personalisation. Also in our 2. RELATED WORK
study, we created the satisfaction construct which appeared
to be reliable. 2.1 Chatbots
A conversational interface is an interface that the user
Keywords can interact with by means of a conversation. This can
User Experience, Chatbots, Conversational Interfaces, Per- be via speech, but also via typed natural language. The
sonalisation, Finance, Fintech terms Natural User Interface (NUI) and Conversational In-
terfaces are sometimes used interchangeably. A NUI is an
interface where you interact by using natural inputs like:
1. INTRODUCTION speech, touch and hand gestures (Wigdor & Wixon, 2011).
Chatbots are computer programs that are written to sim- Sometimes, NUIs can recognise faces, their environment or
ulate a conversation with the user by means of auditory or emotions (Kaushik, Jain, et al., 2014). A chatbot is an ex-
textual inputs. Recently, the popularity of using chatbots ample of a conversational interface. When we speak about
within companies increased (Knight, 2016). Companies are chatbots, or conversational agents, we mean a computer pro-
advancing technically and companies research and explore gram that is written to simulate a conversation with the user
more possibilities in the area of artificial intelligence and ma- by means of auditory or textual inputs. In this study, we
chine learning (Press, 2016). When using a new technology, focus solely on chatbots that use textual inputs. Chatbots
such as chatbots, it is important to know how to implement are used online and are sometimes driven by artificial intel-
this technology in a good way. One way to measure the ligence.
quality of technology, is by looking at the user experience.
Also with conversational interfaces, in particular chatbots, 2.1.1 Implementations of chatbots
user experiences plays a big part (O’Brien, 2017). The user Chatbots can be used for a wide range of fields, such
experience of machine learning applications becomes more as education (Letzter, 2016), information retrieval (Shawar,
popular to research because these technologies did not expe- Atwell, & Roberts, 2005), business and e-commerce (Chai et
rience as much design innovations as other technologies have al., 2001), and for customer service (MarutiTechlabs, 2017).
(Dove, Halskov, Forlizzi, & Zimmerman, 2017). Therefore, In their study, Tatai et al. (2003) compared implementa-

2
tions of chatbots and identified three main roles of chatbots, recent history of transactions and can make simple money
namely: transfers (Lipiec, 2017).
• Role of a Digital Assistant; 2.1.4 Conversational aspects
• Role of an Information Provider; Human-system dialogue consists of the inquirer (the user),
looking for information, and the expert (system), providing
• Role of a General Chatbot. information. There are two ways that chatbots can converse
with users. System-initiated chatbots, where the system
(Tatai, Csordás, Kiss, Szaló, & Laufer, 2003) leads the conversation, and user-initiated chatbots, where
Examples of digital assistant chatbots are from IKEA, which the user leads the conversation. Systems that contain both
was launched in 2005 1 and Niki.ai which helps you shop methods of initiation, are called mixed initiative systems
through chat (Niki, 2017). (Hung, Elvir, Gonzalez, & DeMara, 2009).
Chatbots can be used to improve the communication be- To understand how a conversational interface should be
tween doctor-patient (DP) and clinic-patient (CP). The re- represented, it is important to investigate how human dia-
searchers claim that using chatbots for this purpose (DP and logues work. Quarteroni, et al. (2006) researched the as-
CP communication) could reduce costs and time on routine pects and issues related to human dialogues and they pro-
operations (Abashev, Grigoryev, Grigorian, & Boyko, 2016). posed a list of essential item for an interactive question and
More implementations of chatbots are: credit score coach, answering system:
lawyer, personal stylist, food orderer, personal concierge,
doctor, pension or finance advisor, teacher, newsreader, toy, • Context Maintenance: utilising the context of the con-
accountant, and lastly, as a partner (Jee, 2017). versation to correctly interpret the user’s input. This is
important for follow-up questions, or for clarification.
2.1.2 Chatbots in The Netherlands
In The Netherlands, companies are also starting to use • Utterance Understanding: the detection of follow-up
chatbots for example on their Facebook pages or company and clarification within the context of the previous
websites (Adformatie, 2017; Emerce, 2016; Goodfellow, 2017; conversation.
ASR, 2017). Another example of an implementation of chat- • Mixed Initiative: the user should be able to take ini-
bots is Dawnbot. Dawnbot is a Dutch chatbot that helps tiative within the conversation (by quitting, or asking
you create a video advertisement (Dawnbot, 2017). The user questions).
answers the questions dawnbot asks by selecting the corre-
sponding emoji. A possible advantage of communicating • Follow-up Proposal: meaning that the system moti-
with emoji’s could be that the interaction can be quicker, vates the user to give feedback on the answers that
and more efficient. On the other hand, miscommunications the system gives (if the user is satisfied or not). Until
can occur between the user and the maker of the system, the user has achieved his or her goal.
since emoji’s can be interpreted differently (Miller et al.,
2016). • Natural Interaction: covering and generating a variety
of utterances to create a smooth conversation and to
2.1.3 Customer Service: Finance & Insurance Chat- keep the dialogue active.
bots
(Quarteroni & Manandhar, 2009)
Chatbots can be found in different domains such as: cus-
tomer service, e-commerce, insurance, healthcare, retail and
McTear (2016) reviewed key features of conversation in
more. 95% of the respondents of a chatbot survey believed
chapter 3 of his book on conversational interfaces. He de-
that the customer service domain is going to be ’the major
scribes that the following aspects are important when de-
beneficiary of chatbots’ (Mindbowser, 2017). We choose this
signing conversational interfaces:
field to illustrate the financial and insurance field specifically.
In the field of insurance, we also see a rise of chatbots. • Conversation as action: meaning that utterances of
Lemonade is an insurance company from the US. Lemonade users could be seen as actions that speakers carry out
claims that they transform the business model of insurance to achieve a goal.
by: ”injecting technology and transparency into and indus-
try that often lacks both”. They say that they provide a fast, • The structure of conversation: regarding how utter-
affordable and hassle free insurance experience (Lemonade, ances from a conversation relate to each other. Exam-
2017). ples of ways to recognise structure in dialog acts can
The expert virtual insurance agent Evia was released last include adjacency pairs, exchanges, discourse segments
year (2016). Evia is created by Insurify, a spinoff company and conversational games.
from MIT (Buhr, 2016). Evia tries to find you a car insur-
ance by using a photo of your license plate. The makers of • Conversation as a joint activity: describes how two
Evia claim that they will match you with the right coverage parties take turns and reduce the risk of a miscommu-
package for you situation and needs, will monitor rates for nication by using the grounding process in their con-
you, gather multiple quotes from companies in one place and versation.
will skip the process of filling in long, confusing forms.
• Conversational repair: a repairing process that can be
An example of a chatbot implementation in the financial
initiated by one of the two parties in the conversation.
field is K2 Bank. This chatbot checks your balance, your
Sometimes the speaker repairs his own utterances be-
1
https://www.youtube.com/watch?v=rRY5XtLpNuU fore the receiver has time to repair it.

3
• The language of conversation: the tone of voice in a much will an optional refurbishment add to the costs?”. A
spoken text can be a way to convey additional informa- limitation of implicit confirmation strategy is that the user is
tion, such as emotions and affect. For example, when responsible for correcting the system when it did not recog-
persons raise their voice when they are angry. For writ- nise the user’s utterances correctly (McTear et al., 2016).
ten text, emotions and affect can also be conveyed, for
example by using emoticons or capital letters. 2.1.6 Maturity Model Chatbots
(McTear, Callejas, & Griol, 2016) Smiers (2017) describes some levels of maturity of chat-
Incremental processing is an important process in human bots based on three categories, namely: interaction, intel-
to human conversation (McTear et al., 2016). Incremental ligence and integration (see Fig 1). The interaction area
processing means that overlap occurs within a conversation. describes that the user experience of chatbots is different
In human-machine interaction, a latency between turns is than the user experience of websites, since the interaction
present. This is one reason why human-computer interac- with chatbots is done through textual input. The intelli-
tion can sometimes feel less natural than human to human gence area describes the capability of the chatbot to under-
interaction. Another benefit of incremental processing is stand and provide a relevant utterance which is in line with
that the dialog becomes more fluent and efficient (McTear the intent of the user. The integration area is about the
et al., 2016). Google search applies incremental processing back-end of the chatbot and how well it is integrated with
by completing the user’s query during typing and in Voice other websites, servers or services from other websites and
Search by showing the recognised words while the user is applications (Smiers, 2017).
still speaking.
2.2 User Experience
2.1.5 Dialogue Management Strategy User experience is a modern concept, which includes “all
McTear (2016) describes that one of the core aspects of the users’ emotions, beliefs, preferences, perceptions, physi-
conversational interfaces is the design of the dialogue man- cal and psychological responses, behaviours and accomplish-
agement strategy, in which the system’s conversational be- ments that occur before, during and after use.” (ISO 9241-
haviour is defined. The design of dialogue management 210, 2010). This broadness implies that many aspects are
strategy was done manually in the past, but the research involved in the user experience. To scope our study, we
community has found ways to automate this process by divide user experience in three basic needs: usefulness, us-
training the model with real conversations (McTear et al., ability and user satisfaction.
2016).
Two arising design strategies of dialogues in chatbots are 2.2.1 Usefulness
the interaction strategy and the choice of a confirmation The Technology Acceptance Model (TAM), is developed
strategy (McTear et al., 2016). Automatic speech recogni- to test the user acceptance of information systems. Ques-
tion is not always accurate, but by asking the user for con- tions of the standardized TAM questionnaire measure the
firmation or reprompt them, some errors could be avoided. perceived usefulness and the perceived ease of use of infor-
Too many confirmation asking can also be annoying. mation systems (Davis, 1985). The 10 items that measure
There are three types of interaction strategies in chatbots, usefulness are:
namely: user-initiated, system-initiated or mixed initiative.
Limitations of user-initiated dialogues are errors in speech 1. Using this product improves the quality of the work I
recognition and understanding, since users can say anything do.
they want. The limitation of system-initiated dialogues, is
that the user’s input is limited, but the advantage is that 2. Using this product gives me greater control over my
the interaction is more efficient. The advantage of a mixed- work.
initiated dialogue is that the system can guide the user, but
the user is also free to say anything he wants and take ini- 3. This product enables me to accomplish tasks more
tiative, ask questions and introduce new topics. The limi- quickly.
tations are that the system has to be technically advanced
to keep track of his own structure/agenda, understand and 4. This product supports critical aspects.
answer the user’s utterances correctly and remember the rel-
evant information said. 5. This product increases my productivity.
Confirmation strategies are strategies to prevent errors in
recognition and understanding of the user’s utterances. The 6. This product improves my job performance.
disadvantage of using these confirmations is that it can make
the interaction inefficient, repetitive and lengthy which can 7. This product allows me to accomplish more work than
eventually lead to a frustrating user experience. A solution would otherwise be possible.
for these problems, is to create implicit confirmations. In
this way, the user’s input is used in the next system’s out- 8. This product enhances my effectiveness on the job.
put and extra information is added. To clarify the difference
between implicit and explicit confirmation, we give the fol- 9. This product makes it easier to do my job.
lowing example. When the user says the price of the house
he wants to buy is e180.000, the system could say: ”So the 10. Overall, I find this product useful in my job.
house you want to buy is e180.000?”, which is an explicit
confirmation. An implicit confirmation of the system would The other half of the TAM questionnaire, measures the ease
be: ”So the house you want to buy costs e180.000. How of use.

4
Figure 1: Maturity model for chatbots by Smiers (2017)

2.2.2 Usability SUPR-Q (Standardized Universal Percentile Rank Ques-


Usability is focused on the performance of the task and is tionnaire) is a rating scale which measures the perception
an integrated aspect within user experience (Kaye, 2007). of the user with respect to usability, credibility, trust, ap-
To measure Usability, a popular post-test that could be pearance, and loyalty for websites (Sauro & Lewis, 2016).
used, is the System Usability Scale (Brooke, 1985). This The reliability of the SUPR-Q scores is all above 0.7, which
SUS questionnaire is described as a ’quick and dirty’ method, means that this scale is reliable. The limitation of this ques-
because the execution is relatively fast since the question- tionnaire is that it mainly focuses on websites and the goals
naire only consists of 10 questions which the users have to which can be achieved on websites. Therefore we think it is
answer without thinking too much about it. Another advan- not an easy questionnaire to apply to chatbots. For exam-
tage of using the SUS method, is that no license is needed ple the question ”It is easy to navigate within the website” is
and a score can be calculated which can be easily compared. hard to apply to chatbots, since navigation plays a different
The SUS provides an easy scoring system. These scores can role in chatbots than on websites.
also be compared to the scores that the researchers identi-
fied. The mean of the SUS scores that the researchers found 2.2.3 Satisfaction
is 68. With this scoring system, we can compare the score
McTear describes that the most popular methodology for
of chatbots to other information systems.
performing overall system evaluation, is the PARAdigm for
Finstad (2010) investigated Usability Metrics for User Ex-
DIalogue Evaluation System (PARADISE) (McTear et al.,
perience (UMUX) which consists of four questions which are
2016). This method evaluates the user satisfaction on a
highly correlated (0.8) with the SUS scale. Therefore Fin-
scale, after the users have interacted with the system. Hung
stad claims that the UMUX can be used as a standalone
et al. (2009) propose evaluation metrics derived from the
usability metric (Finstad, 2010). The ISO definition of us-
PARADigm for DIalogue System Evaluation (PARADISE)
ability is about effectivity, efficiency and satisfaction of a
in their paper. These evaluation metrics reflect the dialog
system. The four likert scales presented in UMUX question-
performance and task success as described above, see Table
naire represent the ISO definition, they are:
6 in the Appendix.
1. The system’s capabilities meet my requirements; Quarteroni, et al., evaluated their Interactive Question
& Answering system, YourQA, by collecting user feedback
2. Using this system is a frustrating experience; (Quarteroni & Manandhar, 2009). One way they collected
user feedback, was by gathering objective information from
3. This system is easy to use; chat logs. Also, they collected information about the user
4. I have too spend too much time correcting things with experience, by using a questionnaire about a Wizard of Oz
this system (Finstad, 2010). experiment. The Questions were inspired by a WOz exper-
iment from (Munteanu & Boldea, 2000). See Table 1 for
Lewis (2013) confirms in a critical review on the UMUX their questionnaire. In this questionnaire, Q1 and Q2 assess
questionnaire that researchers who need a shorter question- the performance. Q3 and Q4 focus on interaction difficul-
naire than SUS should consider using the UMUX question- ties and Q5 and Q6 focus on the overall satisfaction of the
naire (Lewis, 2013). user. Q7 and Q8 were taken from the PARADISE evalua-

5
tion metrics (PARAdigm for DIalogue System Evaluation) use.
(Walker, Litman, Kamm, & Abella, 1997). Q9 was chosen
by the researchers to measure the perceived difference be- • User engagement relates to behavioural proxies such
tween their two prototypes. The result of their evaluation as the frequency, intensity, or depth of interaction over
was that users tend to be more satisfied with the interactive some time period. For example, if a user uses your app
version of YourQA than with the baseline version. The ques- one hour everyday.
tionnaire they used appeared to be sufficient for achieving • Adoption metrics track how many new users start us-
their evaluation goals. ing a product during a given time period.
The Questionnaire for User Interaction Satisfaction (QUIS),
assesses user’s subjective satisfaction with respect to specific • Retention metrics track how many of the users from a
aspects of the human-computer interface (Chin, Diehl, & given time period are still present in some later time
Norman, 1988). QUIS measures attitude towards eleven in- period.
terface factors: screen factors, terminology and system feed-
back, learning factors, system capabilities, technical man- • Task success encompasses the traditional behavioural
uals, on-line tutorials, multimedia, voice recognition, vir- metrics of user experience, such as efficiency (e.g. time
tual environments, internet access, and software installation. to complete a task), effectiveness (e.g. percent of tasks
Each area measures the users’ overall satisfaction with that completed), and error rate.
facet of the interface, as well as the factors that make up
that facet, on a 10-point scale. The advantage of the QUIS 2.2.5 User Experience Questionnaire
is that the researchers successfully created a measure that
After analysing the methods and questionnaires, we choose
is highly reliable across many types of interfaces. Therefore
to use the following for our research.
we can say that this questionnaire could also be reliable if
For usability, we use the UMUX questionnaire. The ad-
we apply it on a conversational interface, such as a chatbot.
vantage of the UMUX questionnaire is, that it consists of
User satisfaction with search dialogues has been defined
only four questions which are reliable & valid (Finstad, 2010;
by analysing real dialogues of a commercial intelligent assis-
Lewis, 2013). Since we want to measure three things, we
tant (Kiseleva et al., 2016). The researchers also found that
want to use short and reliable questionnaires.
dialogues consist of a single task search dialogue or a multi-
To measure usefulness, we use the usefulness questions
task search dialogue. The user satisfaction was measured by
from the TAM questionnaire. The advantages of the TAM
the aggregation of all dialogue’s tasks, rather than dialogue’s
questionnaire is that it is a standardised and commonly used
queries separately, because of the dependency between the
questionnaire to measure usefulness and ease of use (Davis,
tasks and that the context of the dialogue is also important
1985). We only use the questions that measure usefulness.
for the user satisfaction (Kiseleva et al., 2016). This means
We want to measure satisfaction by using one question
that when researching the user experience of chatbots, the
from the chatbot evaluation questionnaire from Quarteroni
whole task should be used as a measurement for the user
et al. (2007). Based on previous experience with measuring
experience instead of seperate tasks (due to the dependency
the happiness aspect of the HEART framework, we created
between tasks).
three other questions (e.g. if the user enjoyed using the chat-
2.2.4 Industry defined metrics for Conversational In- bot, if they would recommend it to a friend and if there were
terfaces any functionalities missing). In total we combined four ques-
tions to measure satisfaction. We did not choose a specific
Besides the previously discussed three needs of user expe-
satisfaction questionnaire, because these had overlap with
rience, there is also user experience metrics from the indus-
the usability questionnaires.
try. An objective in the industry is, among others, to reduce
costs and ensure user satisfaction. Objective metrics from 2.3 Personalisation
the industry could be: time-to-task (time to start engag-
ing), correct transfer rate (if customers are redirected ap- 2.3.1 Personality
propriately), containment rate (percentage of calls handled
McTear defines personality as a set of characteristics of
by the system) and abandonment rate (percentage of calls
people that uniquely influences their cognitions, motivations,
hang up). Subjective measures can be done by e.g. ques-
and behaviours in different situations (McTear et al., 2016).
tionnaires, such as Subjective Assessment of Speech System
Five traits of personality of humans are defined by Mc-
Interfaces (SASSI) which covers six factors, namely: system
Crae and John (1992), namely: openness, conscientious-
response accuracy, likeability, cognitive demand, annoyance,
ness, extroversion, agreeableness and neuroticism (OCEAN)
habitability and speed. These factors could also be applied
(McCrae & John, 1992).
to conversational interfaces without speech.
With respect to personality in (information) systems, Leite
Another way to measure the User Experience, is by using
et al. (2013) did an evaluation of the friendliness of robots
user-centered metrics such as described in Google’s HEART
my measuring the: stimulation of companionship, intimacy,
framework (Rodden, Hutchinson, & Fu, 2010). The re-
reliable alliance, self-validation, and emotional security (Leite
searchers (at Google) describe that the HEART framework
et al., 2013). For chatbots, we need to find a middle way in
can measure user experience quality. HEART is an abbre-
characteristics, since robots and humans both posses a vi-
viation for: happiness, engagement, adoption, retention and
sual appearance, whereas with chatbots this would manifest
task success.
through a user interface.
• Happiness consists of factors which relate to subjective Personality traits of users could be assessed before they
aspects of user experience, like satisfaction, visual ap- use the chatbot to fine-tune the user model and create a
peal, likelihood to recommend, and perceived ease of better fitting user experience. Due to technical and time

6
Table 1: Chatbot Evaluation Questionnaire (Quarteroni et al., 2007)

N0 Questions Focus on
Q1 Did you get all the information you wanted using the system? Performance
Do you think the system
Q2 Performance
understood what you asked?
Q3 How easy was it,to obtain the information you wanted? Interaction Difficulties
Q4 Was it easy to reformulate your questions when you were invited to? Interaction Difficulties
Q5 Overall, are you satisfied with the system? Overall Satisfaction
Q6 Do you think you would use this system again? Overall Satisfaction
Q7 Was the pace of interaction with the system appropriate? * PARADISE evaluation metrics
Q8 How often was the system slow in replying? * PARADISE evaluation metrics
Q9 Which version of the system did you prefer and why? * Measure Perceived Difference

* Q7, Q8 and Q9 are additional questions when the user used both prototypes

Table 2: Four types of personalisation and their focus (Fan & Poole, 2006)

Type of personalisation: Focuses on: Motives to use:


Creating a pleasant user space and unique experience. Aesthetic value and
Architectural
Relates to the interface aspect of the system. expressing himself/herself
Utilising the information systems to enhance efficiency
Instrumental Productivity/efficiency
and personal productivity.
Mediating the interpersonal relationships and utilising Welfare/psychological
Relational
the relational resources. well-being
Differentiating the product service and information to Material and psychic
Commercial
increase sales and customer satisfaction. well-being

limitations, we did not apply this in our chatbot. Also, interface, information access and content, or distinctiveness
filling in a questionnaire before using the chatbot could be a of a system to increase its personal relevance to an individual
tedious experience for the user and there is a risk of wrongly or a category of individuals” (Fan & Poole, 2006).
labelling the user. The personalisation effect describes that students learn
more deeply from a multimedia explanations when this is
2.3.2 Personalisation presented to them in a conversational style (Mayer, 2003).
In a survey about personalisation from Fan & Poole (2006), Also, when presenting a conversational style (instead of ”the
the researchers found that the definitions of personalisa- financial situation” the system says ”your financial situa-
tion differ per field that the concept has been researched tion”), the students are more likely to engage in cognitive
in. Personalisation has been researched in a variety of do- processes.
mains, such as: knowledge retrieval and recommender sys- Three reasons to personalise are to access information,
tems (Shaikh, Phalke, Patil, Bhosale, & Raghatwan, 2016). to accomplish work goals and to accommodate individual
In a marketing study (Postma & Brokke, 2002), the re- differences (Fan & Poole, 2006). McTear (2016) states that
searchers found that personalisation, and self-reported inter- when you want agents to become recognisable individuals,
ests, appeared to have a positive effect on the click-through they should have life-like interaction capabilities, such as
rate of their newsletters send by e-mail. They also found convincing and intuitive behaviour. When an agent has a
that the personalisation effect grows over time. Most re- similar personality as the user, the perceived intelligence and
search on personalisation is done in the field of computer competence of this agent increases (McTear et al., 2016).
science, marketing and e-commerce (Fan & Poole, 2006). In The implementation of personalisation is constructed in
marketing and e-commerce the focus of personalisation lays three dimensions (McTear et al., 2016):
on delivering unique value and benefits to each individual
customer. Whereas in the human interaction field, person- 1. What to personalise: four aspects that can be person-
alisation is seen as a way to close the gap between user and alised are: content, user interface, channel/information
computer. Fan & Poole (2006) analysed the definitions and access and functionality.
found that most of the definitions of personalisation consist 2. To whom to personalise: the target of personalisation
of these elements: can either be individuals or a group of individuals.
1. The purpose or goal of personalisation When the user identifies with a certain category, the
user is likely to perceive the personalisation as if it is
2. What is personalised (e.g. interface or content) personalised for him or her.
3. The target of personalisation (e.g. user or consumer)
3. Who executes the personalisation: personalisation can
The definition of personalisation we use in this study is that happen based on the input and information given by
personalisation is “a process that changes the functionality, the user (explicit personalisation) or personalisation

7
can be done automatically by the system (implicit per- Mirroring is the alignment of an emotional response be-
sonalisation). tween two parties, for example an agent and an user. Mir-
roring can make the agent perceive as socially competent
Different technologies to enable personalisation in infor- (Iacoboni, 2009). McTear (2016) describes that mirroring
mation systems can be divided in the following categories can occur in embodied conversational agents by: choice of
(Amoroso & Reinig, 2004): vocabulary, head posture, gestures and facial expressions.
For chatbots the above mentioned aspects applicable are
• User-behavior tracking: for example by clickstream
choice of vocabulary and facial expressions in the sense of
tracking, hover technologies, pattern recognition;
emoji’s (also knowns as ideograms, smileys or emoticons
• Personalisation database: for example by collaborative which are used in electronic messages to convey an emo-
filtering, webhousing, intelligent agents, data mining, tion or facial expression). McTear claims that agents that
profiling, statistical analysis; use effective mirroring can be perceived as being more em-
pathic.
• Personalised User Interface Technologies: for example Conversational agents which are made for coaching, health
by content management, streaming audio/video, user, care or as social companions should be empathic and trust-
information filtering, user-preference interface design, worthy to be perceived as social (McTear et al., 2016).
personalised searching; McTear (2016) states that when an agent always behaves
in the same way, the user satisfaction could decrease over
• Customer Support Technologies: just-in-time customer time. He also describes that boredom is more persistent
support, wireless customer service. and associated with poorer learning and less user satisfaction
than frustration. Therefore we can say that user engagement
When designing personalised systems, a distinction can
plays a big part in maintaining a long-term relationship with
be made between affective design, which is process-oriented,
your users. To improve user engagement, long-term user
and utilitarian design, which is result-oriented (Fan & Poole,
models can be build to predict the future user’s affect and
2006). Also, research supports that the majority of per-
actions (McTear et al., 2016).
sonalisation systems are designed to enhance productivity,
Applying humour can provide a more fun and believable
because people us these systems to achieve a goal. Person-
robot (Tatai et al., 2003). Aspects that users may not ap-
alised systems may not only fulfil the functional aspects of
preciate from robots are social control and criticism (Baron,
human needs, but also their entertainment needs (Fan &
2015). This behaviour could also be not appreciated in chat-
Poole, 2006).
bots.
Fan & Poole (2006) distilled four ideal types of personali-
sation in their survey, namely: the architectural, relational,
instrumental and commercial perspective (see Table 2). Ar-
2.4 Applying personalisation in chatbots to im-
chitectural personalisation focuses on the construction of the
prove User Experience
digital environment to create a pleasant user space and a Kim et al. (2014) investigated chatbots that provides re-
unique experience for the user through arrangement and de- sponses based to the user input, the user itself and its own
sign of digital artefacts in a way that meet the user’s needs personality and common sense. The systems stores user-
and reflect his or her style and taste. This perspective re- related facts automatically from user input. The system also
lates to the interface aspect of the system. Instrumental keeps track of changes in user interest (Kim et al., 2014).
personalisation refers to utilisation of information systems Jeesoo Bang et al. (2015) introduced a chatbot based
to enhance efficiency and personal productivity by provid- on EBDM system with a personalisation framework using
ing, enabling, delivering useful and user-friendly tools in a long-term memory (Bang, Noh, Kim, & Lee, 2015). The
way that meet the user’s needs. Three aspects of instrument researchers found that implementations of chatting systems
personalisation are: were: using EBDM dialogue system, using dialogue acts and
part of speech (POS) tagged tokens and using long-term
1. Providing tools; memory and knowledge extractor.
The study of Shaikh et al. (2016) paper gives a survey
2. Designing tools; on previous work about chatbots and describes an example
based chat-oriented dialogue system using personalised long
3. Utilising tools.
term memory. In their study, the researchers describe a chat-
Also, instrumental personalisation highlights the importance bot framework based on Example-based Dialogue Manage-
of the user’s situated needs. Relational personalisation re- ment (EBDM). An EDBM is a system that uses dialogue ex-
lates mainly to the differentiation of product service and ample to index information, rather than probabilistic mod-
information to increase sales and customers satisfaction by els or rules. The advantage of using an EBDM framework
addressing their needs and goals in a given context. Com- is that it frees the designer from manual labelling and anno-
mercial personalisation fulfils the users material needs and tation. The results from the researched experiments show
therefore contribute to their psychic welfare (Fan & Poole, that using this framework could improve the performance of
2006). Fan & Poole (2006) discuss that a combination of the system (Shaikh et al., 2016).
paradigms could best meet the different needs of the users.
2.5 Tasks
2.3.3 Social Behaviour A task is an action that can be done on a website or
Examples of social behaviour which could be implemented information system. There are different tasks possible in the
within conversational interfaces are mirroring and showing financial domain. To investigate which tasks were relevant to
empathy. use for our study, we investigated three banking websites in

8
The Netherlands (ABN AMRO, Rabobank and ING). The The same counts for the second task (financial advice).
analysis can be found in Figure 5 in the Appendix. We could assume that users prefer personal advice, or the
The overlapping categories of the tasks found on the web- users could prefer no personality, because they just want to
sites are: paying, saving, mortgage, loan, investing, insur- obtain the advice from the chatbot.
ance and retirement. We divided the tasks into three cate- We do not know exactly what the effect of personalisation
gories, namely: could be, therefore we investigate the interaction effect of
personalisation and tasks with respect to the user experi-
• Simple tasks, for example: opening a bank account, ence. We wanted to find out if personalisation matters to
blocking a bank card, finding basic information and users.
activate mobile banking. We used the following hypotheses:
• Complex tasks, for example calculations of mortgage H1 0: Personalisation has no effect on the User
or loan which is rule-based information. Also, advisory Experience of chatbots.
tasks based on rules. H1 1: Personalisation has a positive effect on the
User Experience of chatbots.
• Very complex tasks, for example giving advice for a H2 0: The User Experience is the same for simple
specific package of a specific situation where emotions and complex tasks.
and human interaction is essential. H2 1: The User Experience is different for simple
Other dimensions we found after analysing the main user and complex tasks.
tasks on the website, were problem-based versus opportunity- Hinteraction: Personalisation and Tasks have
based user tasks. an interaction effect.

Table 3: Types of Tasks


3. METHODS
Problem-based Opportunity-based In this section, we describe the methods of this study.
Simple Task 1 - The structure consists of the sections participants, design,
Complex - Task 2 materials and procedure.

For this research, we want to compare two tasks that dif- 3.1 Participants
fer fundamentally from each other. Therefore we chose two We have 121 respondents for our online survey. In group
tasks with a different goal, however both tasks are placed 1 we had 31 respondents, group 2 had 30 respondents, group
in a financial context. Task 1 is mainly focused on a simple 3 had 28 respondents and group 4 had 32 respondents. 9%
and problem-based scenario. Task 2 is more focused on a was below 20 years old, 61% of our data set was between
complex and opportunity-based scenario, see Table 3. 20 - 25 year old, 18 % was between 25 - 30 years old. The
The tasks we use for our chatbots are: other 12 % was older than 30 years old. 57% of the people
was female, 41% was male and 2% did not want to say. The
• Task 1: Blocking bankcard majority of our group was highly educated, namely 41% aca-
The chatbot solves a problem for the user. In this demics (WO), 17% applied university (HBO), 21% academic
case, the user lost his bankcard and has to block his preparation schooling (VWO) and 12% applied university
bankcard. In this task, urgency also plays a role. preparation schooling.
The qualitative data was obtained by convenient sampling
• Task 2: Financial advice of our sample group. This means we selected five people at
The chatbot fulfils the role of an financial advisor. The Capgemini to test one chatbot individually. Which makes a
user can do a financial ’health’ check. In the second total of 20 participants for the qualitative user testing. The
part, if the user has spare money, the chatbot advices participants differed in gender and age, but were mainly
on what kind of option the user has (e.g. investing, around 25 years old and technically literate. They took part
fixed savings account, etc.). Basically, the chatbot in the research, because of personal reasons and they got a
functions as a (simple) recommender system. small reward (a Dutch cookie).
2.6 Hypotheses 3.2 Design
As mentioned in the introduction, our research question For this research, we used a mixed methodology. First we
is: sent out a survey. Subsequently, we conducted qualitative
user tests and observations.
“What is the added value of Personalisation for
The research design is a 2x2 between-subject design. We
the User Experience of Chatbots?”
compared the two independent variables with a dependent
It is not clear from the literature it is not clear if users variable. The two major independent variables (factors) are:
want a personalised chatbot, that shows empathy, when The Chatbots (unpersonalised & personalised) and the type
there is a sense of urgency present. On one hand we could of tasks of the Chatbot (blocking the bankcard & asking for
assume that users prefer a chatbot that shows empathy to financial advice). The dependent variable is the UX score
calm them down after their bank card is stolen. On the which is calculated with the results of the survey.
other hand, we could assume that users do not care about
personalisation and empathy in a chatbot when they are in
a hurry.

9
3.3 Materials
3.3.1 Chatbot
The chatbot can be placed in level 1 of the maturity model
for chatbots (Smiers, 2017). This means that the interaction
will be done through one language, one channel and human-
to-bot interaction. The intelligence will consist of a simple
Q&A, menu-based and word-based rules. Integration with
other websites is done through simple links.
We used chatfuel due to time and skill restrictions. Chat-
fuel is a platform that enables the creation of chatbots with-
out needing programming skills (Chatfuel, 2017). This sig-
nificantly shortens the time and effort needed to create the
four chatbots needed for this research. Also, the platform is
free to use and unlimited chatbots can be created. Another
advantage is that Chatfuel supports multiple languages, such
as Dutch which is used in our case.
The structure of the chatbot’s conversation needed to be
designed. To do this, we created conversational user flow
diagrams which we used as our blueprint for the chatbots.
These conversational flow diagrams can be found in the ap-
pendix. In Figure 6 we illustrate the conversational user
flow of the first task of the chatbot, namely: blocking a bank
card. Different icons are placed to see which actor is doing
what, for example the user icon shows that the user has to
either give input by choosing from quick replies or by a free
text field. In Figure 7 we illustrate the conversational user
flow of the second task, namely: receiving financial advice. Figure 2: Startscreen of the personalised chatbot for block-
We exported and tested the chatbot in Facebook Messen- ing a bank card. The options are: “My card is stolen”, “I
ger. Facebook messenger is a instant messaging service de- lost my card” and “Nothing”.
veloped by Facebook and has 1.2 billion users (Kerr, 2017).
Facebook Messenger was chosen as the environment in which
the chatbot was exported and tested, due to practical rea-
sons and limitations of Chatfuel. The chatbot is shown in
Figure 2 and 3. The rest of the screenshots, and the links of
the chatbots, can be found in the Appendix.

3.3.2 Quantitative Test


The questions of our questionnaire were made on Google
Form and were subdivided in categories. The first four ques-
tions are based on the UMUX questionnaire and measure
usability. The second group of four questions measures the
usefulness of the chatbot and is based on the usefulness
questions of the TAM questionnaire. The third group of
four questions measures the user satisfaction of the chatbot.
These four questions are based on the chatbot evaluation
questionnaire from Quarteroni et al (2007). Together, these
questions form the base of our user experience questionnaire.

3.3.3 Qualitative Test


For the qualitative research, we used the think aloud method
(Someren, Barnard, Sandberg, et al., 1994). This means
that the user was asked to think aloud while executing the
task so that the user experience during the task could be
measured.
Besides making notes of the user’s utterances, we also
recorded the screen and the audio during the observations.
The test was executed on a Macbook Pro. The researcher
coded certain emotions and utterances into a coding scheme
simultaneously (see Figure 12 in the Appendix). The base Figure 3: The second screen of the personalised chatbot for
emotions chosen were based on Plutchick’s wheel of emo- blocking a bank card. The chatbot says: “What a shame
tions (Plutchik, 2003). Once the task was completed, the your pass is stolen! Don’t forget to report it to the police!”
tester was asked about his or her experience and interaction

10
with the chatbot. The tester was asked to elaborate on any 4.1 Quantitative results
difficulties experienced. Then the structured questions were In this section we will discuss the results of our data anal-
the following: ysis. The descriptive statistics of our groups can be found
in Table 4.
• What did you think of this experience? Why (not)?
Table 4: Means and Standard Deviations of the User Expe-
• Do you think you would use this chatbot in real life? rience Score
Why (not)?
Chatbot Group n M SD
• Do you think this chatbot is useful? Why (not)? Task 1, Unpersonal 31 5.44 0.932
Task 1, Personal 30 5.35 1.087
• Did you think there were functionalities missing in the Task 2, Unpersonal 28 4.53 1.548
chatbot? Which ones? Task 2, Personal 32 4.51 1.448

3.4 Procedure Even though all our groups were around n = 30, we still
After implementing and testing the chatbots ourselves, investigated if our groups were normally distributed. We
the chatbots were ready for being tested by users. We used executed the Shapiro-Wilk test. It appeared that the data
a combination of quantitative and qualitative research meth- in our four groups were normally distributed. The scores
ods to obtain the results from user testing. are as follows: Chatbot 1 (.333), Chatbot 2 (.087), Chatbot
The quantitative data were collected through surveys. Peo- 3 (.008), Chatbot 4 (.007). The three constructs appeared
ple were only allowed to take part in one survey. In this to be reliable. The construct Usability consisted of 4 items
questionnaire, the user was first asked to answer questions (α = .822), The construct Usefulness consisted of 4 items
on demographics. Then the user was asked to open the link (α = .883) and The construct Satisfaction consisted of 3
of the chatbot in Facebook Messenger to execute the ex- items, after removing 1 item (α = .818). The three con-
plained task. After completing the task the user was asked structs together formed the total user experience score. The
to return to the questionnaire and finish to answer the last total user experience score consisted of 11 items and was
12 questions about the chatbot. In total, the whole survey highly reliable (α = .934).
took around 10 minutes.
The participants were semi-randomised, because the sur- 6
vey link was sent out to random groups of people and some Blocking bankcard
were distributed with a snowball effect. Also, the researchers Financial advice
distributed the link to friends and family through a personal
User Experience Score

message on Facebook Messenger. This was the most practi- 5.5


cal manner, because to use the chatbot, the users needed to
have Facebook Messenger. The participants took part be-
cause they knew the researchers and wanted to help them.
Another reason is because the researchers filled in a survey
for the participants so that the participants took part the 5
survey from the researchers.
The qualitative data were collected through observations
and interviews. The user first listened tot the instructions
of the researcher. The tester was asked to think aloud while 4.5
doing the task. Then the user had to independently com-
plete the task (same chatbot task as in the survey). The Unpersonalised and Personalised
participant was observed by the researcher. Also, the re-
searcher made notes of the facial expressions, experiences Figure 4: Plot of effect of Task & Personalisation with re-
and utterances of the user. After executing the task, the spect to the user experience of chatbots
tester did a semi-structured interview with the researcher.
A two-way ANOVA was conducted that examined the ef-
3.5 Data Analysis fect of tasks and personalisation on the user experience of a
We used SPSS to analyse the data. We used a two-way chatbot.
ANOVA test to compare the four independent groups to There was no significant main effect of personalisation
find an interaction effect of the two independent variables with respect to the user experience, F (1, 117) = .052, p
on the dependent variable (UX score). Subsequently, we did = .820, ηp2 = .000445, observed power = .056. Therefore, we
a two-way MANOVA test, to investigate the efffect of task have to accept our H1 0 and reject our H1 1.
and personality on the three constructs (usability, usefulness However, there was a significant main effect of tasks with
and satisfaction). respect to the user experience, F (1, 117 ) = 14.341, p <
.005, ηp2 = .109, observed power = .964. Therefore, we ac-
cept the H2 1 and we reject the H2 0.
4. RESULTS A two-way MANOVA was conducted to examine the effect
In this section, we describe the results of our research. We of the tasks with respect to the three constructs of user ex-
start with presenting the quantitative results and thereafter perience (Usability, Usefulness and Satisfaction). See Table
we present the qualitative results. 5 for the descriptive statistics.

11
Table 5: Means and Standard Deviations of the Usability, Usefulness and Satisfaction of Chatbots

Dependent Variable Personality Task n M SD


Usability Total No personalisation Blocking Bankcard 31 5.61 1.008
Financial Advice 28 5.05 1.573
Personalisation Blocking Bankcard 30 5.50 1.303
Financial Advice 32 4.70 1.559
Usefulness Total No personalisation Blocking Bankcard 31 5.63 0.970
Financial Advice 28 4.41 1.685
Personalisation Blocking Bankcard 30 5.72 1.183
Financial Advice 32 4.66 1.512
Satisfaction Total No personalisation Blocking Bankcard 31 4.97 1.367
Financial Advice 28 3.98 1.640
Personalisation Blocking Bankcard 30 4.67 1.141
Financial Advice 32 4.06 1.482

The dependent variable Task has a significant effect with it was ’fake empathy’ and that he had no need for that, es-
respect to usability, usefulness and satisfaction F(3, 115) = pecially not in a situation of urgency. Some users found that
8.532, p < .005, W ilks0 λ = .818, ηp2 = .182, observed power the personalisation aspects did not appear to play any role,
= .993. In the test results of the between-subjects test, we as may be derived from the fact that they did not note or
see that Task has the following effect scores on the three mentioned it. Most of the users found it strange that emoji’s
constructs. We found that Usefulness has the biggest effect were combined with a certain formal way of addressing the
F (1, 117) = 21.187, p <.0001, ηp2 = .153, observed power users (in Dutch we say ’U’ for formal events and ’Je’ for in-
= .995. The second biggest effect was by Satisfaction F (1, formal events). This can be solved by changing ’U’ to ’Je’,
117) = 9.580, p = .002, ηp2 = .076, observed power = .866. so that the tone of voice is not a mixture of informal and
The third biggest effect was by Usability F (1, 117) = 7.318, formal contact. However, some users prefer that the chatbot
p = .008, ηp2 = .059, observed power = .765. is very formal and addressed them with ’U’ and no emoji’s.
There was no significant interaction effect (Task * Per- Mainly younger users preferred that the chatbot addressed
sonalisation) with respect to the user experience, F (1, 117) them with ’Je’ and emoji’s, however most users also men-
= .027, p = .869, ηp2 = .000233, observed power = .053. tioned not to overdo the emoji’s (maximally 2 emojis).
Therefore, we can not accept our Hi nteraction. A beneficial factor for the satisfaction of this chatbot was
that this chatbot can be used remotely. Users that don’t
4.2 Qualitative results like to call were satisfied that they were able to perform this
We obtained our qualitative data through observations task through a chatbot.
and semi-structured interviews. We divided the obtained Most users found these chatbots useful. Contributing fac-
feedback from the users into the following categories: Us- tors are that the goal of the bots were clear and they ’did
ability, Usefulness and Satisfaction. These categories are what they had to do’. For the chatbot to be relevant, the
being described below. In the discussion, the qualitative chatbot has to tell or teach new things, be efficient and ef-
results will be further discussed. fective according to some users. Some user’s also mentioned
that the whole customer journey should be taken into ac-
4.2.1 Usability, Usefulness & Satisfaction count in the chatbot, for example when the chatbot is fin-
ished with blocking the bank card, it could ask if the user
In our prototype we let the users choose if they wanted to immediately wants to apply for a new bank card.
use the quick reply or the text field. Half of the users started
immediately with typing. We saw that when personalisation
effects were added, users expected even more that they were 5. DISCUSSION
able to type everything they wanted. Some users mentioned In this study we found that personalisation does not have
that they expected this possibility, since the chatbot spoke a significant effect on the user experience of chatbots for sim-
to them in a personal manner. ple and for complex tasks. There was a significant difference
The emotions that were observed by the researchers, were between the tasks in the chatbot, where the simple task was
mainly neutral. Some users showed frustration when the perceived more useful, with a better usability and a higher
chatbot did not understand their utterances. Some users user satisfaction. The following aspects were identified that
mentioned that the chatbot’s credibility plays an important could have influenced the user’s opinions, such as: artificial
part in how the user perceives the chatbot and what kind intelligence, the goal of the chatbot, visual and UI elements,
of emotions are associated with the chatbot. According to the way the text is presented, security issues and usefulness.
these users, trust plays a big part. Some users said that it is These aspects are discussed in the exploratory findings.
better to not say in the introductonairy text that the user is
interacting with a chatbot, while other users prefer to know 5.1 Exploratory Findings
that they are talking to a chatbot beforehand. These results are not directly linked to our research ques-
A couple of users noticed the use of emojis in the chatbot tion, however they are still important for chatbots in general
and reacted on it. Most people who noticed, were positive and its user experience.
about the emojis, tone of voice and showing empathy. How-
ever some users were more negative. One user even felt like 5.1.1 Artificial Intelligence

12
Artificial intelligence (AI) is an important aspect of chat- balloons having a certain overlap, that should be prevented.
bots. One of the problems when having an unintelligent In general, the text balloons should not be too long. Most
chatbot is that the chatbot does not react on user’s input. users did not read all the text and a more visual way of com-
Artificial Intelligence could also improve the satisfaction of municating is being preferred. For example when blocking
the users. In general, most users were unsatisfied about the the bank card, a green check could appear to show visually
inability to use the free text input and they preferred to use that the bank card is blocked.
both quick replies and free text input. Even though some
users also admitted that quick replies prevent errors and are 5.1.5 Information overload
quicker to use. Maybe by improving the quick replies, the Other observations were that sometimes the text blocks
users do not feel the need to use the free text input, but by are shown really fast after each other. Some users said they
showing the possibility doesn’t make them feel restricted. would not read all the text at once. A possible solution is
to send a larger text in smaller bites, and preferably with a
5.1.2 Goal chatbot bit of time in between, so that user can easily digest all the
Explaining beforehand to the user how he or she can inter- information. Also after each text block, it is important to
act with the chatbot, and the goal of the chatbot, is impor- show the user what the next logical step is, or why something
tant. Some users had the expectation that the chatbot would is happening. A couple of users got stuck in the prototype
validate their user details. The chatbot should mention in because they felt like they arrived at the end.
what kind of form or syntax certain details (e.g. birth date)
have to be filled in or natural language processing could be 5.1.6 Accessibility
implemented which extracts the relevant information from Almost all users were able to complete the chatbot sce-
the user’s input. Another way the chatbot should commu- narios successfully. However, a couple of users said that
nicate with his users is for example in the bank card block older people might have more difficulties with completing
chatbot, the chatbot could say here how long it will take (or these scenarios. Another user also had difficulty with read-
visually show) to block the bank card. ing the small letters. Accessibility has to be taken into ac-
count when implementing chatbots.
5.1.3 Quick replies
Quick replies are text bubbles that the user can choose 5.1.7 Security
to interact with the chatbot. In this way, the user does not Security is important for most users especially when the
have to think what the possibilities of the chatbot are and chatbot handles financial data. It is very simple for users to
therefore the interaction and the user flow can be more fast block their bankcard, but it is also easy for others to block
and error proof than having a text input field. Most users their bankcard. User detail validation would also be highly
understood and used the quick replies and thought it was appreciated by users. Also, some users indicated they would
convenient for when they don’t want to read through the like to receive a clear confirmation of that the bankcard is
whole website where they would get an information over- blocked e.g. notification or letter.
load, or when they are looking for a certain person/expert.
However, some users mentioned that they didn’t like the re- 5.1.8 Usefulness
strictions and limitations of the quick replies and would love The usefulness of the chatbot also depends on the chat ap-
to see that both options are available. It appeared that the plication used. Some users did not have a Facebook account
’reactance effect’ can also occur when limiting the choices of and were therefore not able to use the chatbot. For those
interaction in chatbots (Brehm, 1966). Someone said that users, this Facebook chatbot is not so useful. Some users
it felt like he just had to go through a form or a fixed menu, said that they would not use this kind of chatbot if it was
not like a conversation. from Facebook, but they would use it if it was integrated
in the website or application from the bank, or in another
5.1.4 User Interface chat service (e.g. Whatsapp, Slack). Another possibility is
A more visual problem that occured by using quick replies, to login with the corresponding bank account when using
was that the buttons were sometimes overseen by the users. the chatbot.
This could be improved by making them more clear, or Some users also mentioned that it would be useful for
putting them on a more prominent place in the screen (e.g. the company that has such a chatbot, so the chatbots could
below the introductionairy text bubble of the chatbot). Cur- support for example the customer service department. Also,
rently typing feels the most natural way of chat interaction when users are not able to call, it is convenient that there is
for users. Interacting with quick replies is a relatively new a chatbot as an alternative. One user mentioned that chat-
way of interacting through chat. By time, or when the users bots are more convenient than calling, because sometimes
tried it once, using quick replies could feel more natural for people speak with with an indistinguishable accent and the
users. Also, when the user starts typing, the quick reply but- customer service is not able to help them. This problem
tons disappear. Some users said they want to still be able would not occur with a chatbot.
to use and see the quick reply buttons. Other users said The chatbot would be even more useful if it would remem-
that a back button should be implemented when an error is ber the user’s input and would use it in the future or when
made. Either way, correcting errors should be made more referring the user to another expert, that the information
easy in this chatbot, otherwise the user has to start over the would be send too.
whole scenario and that could lead to frustration. Speed and
clarity are also factors which influenced the satisfaction of 5.2 Difference of tasks
the users. Also, a return or redo button would be beneficial The difference of tasks could be explained by the following
for the user satisfaction. Some users were annoyed by text aspects. From the interviews, it appeared that the first task

13
(blocking bank card) had a much clearer goal for the user chatbots were possible to research. In this section we discuss
than the second task (financial advice). Also for the first some of the aspects that are interesting to be researched
task, more users said that they would use it in real life. further.
This could be a reason why the first task was rated more Artificial intelligence would enhance the user experience
useful. For the second task, the users mentioned that the of this chatbot. It would be interesting to replicate this
goal and the flow were not always clear to them. research when artificial intelligence is implemented.
The first task was also shorter than the second task. We Embodiment in chatbots is also an interesting field to be
see that when the task is shorter, less mistakes are being explored. Especially how personalisation relates to embodi-
made and when a mistake is made, it is easier to correct it. ment when applying that in chatbots. Besides embodiment,
This could have had impact on the usability score. other personalisation factors can be researched (e.g. hu-
The first task also meets the expectations of the user more mour). Another interesting aspect is personalisation of con-
and this resolves into a higher satisfaction rate. The users in versational interfaces which use speech as input. Another
the second task were sometimes confused, made more errors interesting aspect to research is to improve the personalisa-
and were therefore more frustrated than users from the first tion within chatbots by creating a user model of the user,
task. This could have had influence on the satisfaction. before the user uses the chatbot (e.g. the user could fill in a
personality questionnaire so that the chatbot adapts to the
5.3 Contribution user).
In this study, theoretical and methodological contribu- In our research we chose the three constructs of usability,
tions were made. The theoretical contributions of this study usefulness and satisfaction to measure user experience. How-
are two-fold. Firstly, an overview of the state-of-the-art of ever, there could be other constructs relevant to measure
chatbots is given. Secondly, the study gives insight into the user experience. Also, it would be interesting to research
effect of personalisation. chatbots with a different kind of UX framework or evalua-
The methodological contribution consists of a way to mea- tion method. Also, looking into measuring user engagement
sure satisfaction by means of 3 questions with a Chronbach’s of chatbots could be interesting.
α = .818. Also, a UX questionnaire for chatbots has been Besides the financial domain, there are multiple other do-
made which consists of the chatbot’s usability, usefulness mains that would be interesting to further investigate, such
and satisfaction scores combined (see Table 7 in the Ap- as: the field of insurance, education, e-commerce, etc. Also,
pendix). within the financial domain, there are multiple other tasks
which are interesting to investigate more, such as: a mort-
5.4 Limitations gage chatbot or a chatbot that helps you to finance your
Throughout this research, multiple limitations emerged. start-up.
We will discuss them below. In this study, we mainly user-tested technical literate peo-
The platform that was used to create the chatbots, was ple. For future work, testing on people outside of the con-
limiting in the functionalities (e.g. artificial intelligence) of venient sample size would be interesting, e.g. the elderly or
the chatbot and the ways to export this chatbot. Now, we non-technical people.
were only able to export it to Facebook Messenger, but a lot
of people do not use this application and we don’t want to 6. CONCLUSION
force people to use it.
The popularity of using chatbot’s is increasing, however
Since user experience has not been strongly defined in
when applying a new technology like chatbots, it is impor-
the literature, we focused on the most mentioned aspects of
tant to do this effectively. A way to measure effectivity, is
user experience (usability, usefulness and satisfaction). It
to investigate the user experience. Therefore we researched
is debatable if these three constructs are valid to measure
the effect of personalisation on the user experience of chat-
the total user experience. This is a decision that was made
bots. The research question of this study is “What is the
based on time restrictions. To give a broader image of user
added value of personalisation for the User Experience of
experience, other measurements should have been used in
chatbots?”. The corresponding hypotheses are:
addition (e.g. EEG). We tried to tackle this by using a
combination of quantitative and qualitative user testing. H1 0: Personalisation has no effect on the User
The surveys were distributed through Facebook, for ex- Experience of chatbots.
ample on facebook groups where it was possible to exchange H1 1: Personalisation has a positive effect on the
surveys. It is possible that some participants from these User Experience of chatbots.
groups not motivated to fill in the questionnaire correctly. H2 0: The User Experience is the same for simple
We had to take this into account when analysing the data. and complex tasks.
However, the general reliability of our constructs was high H2 1: The User Experience is different for simple
enough. and complex tasks.
The limitation of the qualitative user testing was that the Hinteraction: Personalisation and Tasks have
users were all people working at Capgemini (an Informa- an interaction effect.
tion Technology company). Everyone was technically liter-
ate and some even worked with chatbots before. Nonethe-
less, they gave important feedback on our user experience After conducting a two-way ANOVA test, no significant
aspects. main effect was found between the two chatbots with re-
spect to personalisation. Therefore, we accepted our H1 0
5.5 Future Work and rejected our H1 1. This means that personalisation has
Due to time and technical limitations, not all aspects of no significant effect on the user experience of chatbots.

14
There was a significant main effect between the differ- guage sales assistant-a web-based dialog system for on-
ent tasks with respect to the user experience. This means line sales. In Iaai (pp. 19–26).
that the user experience score for the first task (blocking Chatfuel. (2017). Retrieved March 14, 2017, from
bankcard) was rated significantly higher than the second https://dashboard.chatfuel.com (Platform to cre-
task (financial advice). Therefore, we accepted the H2 1 and ate Chatbots)
we rejected the H2 0. When we investigated the effect of Chin, J. P., Diehl, V. A., & Norman, K. L. (1988). Devel-
task on the constructs of user experience (usefulness, usabil- opment of an instrument measuring user satisfaction
ity, satisfaction), we found that the biggest effect of task was of the human-computer interface. In Proceedings of
by Usefulness, then Satisfaction and then Usability. the sigchi conference on human factors in computing
No significant interaction effect was found (Task * Person- systems (pp. 213–218).
alisation) with respect to the user experience of chatbots. Davis, F. D. (1985). A technology acceptance model for
Therefore, we have to reject our Hi nteraction. empirically testing new end-user information systems:
Additionally, we can conclude that our satisfaction con- Theory and results (Unpublished doctoral disserta-
struct was highly reliable. tion). Massachusetts Institute of Technology.
We can conclude that a simple task with a clear goal Dawnbot. (2017). Retrieved May 15, 2017, from
(problem-based) receives a significantly higher user experi- http://www.dawnbot.nl/
ence score than a more complex task with a more uncertain Dove, G., Halskov, K., Forlizzi, J., & Zimmerman, J. (2017).
outcome (opportunity-based) in a chatbot. Lastly, we can Ux design innovation: Challenges for working with
conclude that we created a brief, but reliable, questionnaire machine learning as a design material. In Proceed-
to measure the user experience of chatbots. ings of the 2017 chi conference on human factors in
computing systems (pp. 278–288).
Acknowledgements Emerce. (2016). Eneco lanceert chatbot in face-
book messenger. Retrieved June 24, 2017, from
First of all, I want to thank Jacobijn Sandberg from the
https://www.emerce.nl/nieuws/eneco-lanceert-
University of Amsterdam and Wouter Slager from Capgem-
chatbot-facebook-messenger
ini for this opportunity and the patience, advice and help
Ergonomics of human-system interaction – Part 210:
they have given me.
Human-centred design for interactive systems (Stan-
Secondly, I want to thank my family and friends for sup-
dard). (2010, March). International Organization for
porting me throughout the process of conducting this study.
Standardization.
Fan, H., & Poole, M. S. (2006). What is personaliza-
References tion? perspectives on the design and implementation
Abashev, A., Grigoryev, R., Grigorian, K., & Boyko, V. of personalization in information systems. Journal of
(2016). Programming tools for messenger-based chat- Organizational Computing and Electronic Commerce,
bot system organization: Implication for outpatient 16 (3-4), 179–202.
and translational medicines. BioNanoScience, 1–5. Finstad, K. (2010). The usability metric for user experience.
Adformatie. (2017). Slimme chatbot en web- Interacting with Computers, 22 (5), 323–327.
site zetten eva jinek als journalistiek merk Goodfellow, J. (2017). Klm introduces emoji-led directions to
op de kaart. Retrieved June 24, 2017, from its messenger chatbot. Retrieved June 25, 2017, from
http://www.adformatie.nl/nieuws/slimme- http://www.thedrum.com/news/2017/03/02/klm-
chatbot-en-website-zetten-eva-jinek-als- introduces-emoji-led-directions-its-
journalistiek-merk-op-de-kaart messenger-chatbot
Amoroso, D. L., & Reinig, B. A. (2004). Personalization Hung, V., Elvir, M., Gonzalez, A., & DeMara, R. (2009).
management systems. In System sciences, 2004. pro- Towards a method for evaluating naturalness in con-
ceedings of the 37th annual hawaii international con- versational dialog systems. In Systems, man and cyber-
ference on (pp. 1–pp). netics, 2009. smc 2009. ieee international conference
ASR. (2017). Retrieved May 5, 2017, from www.asr.nl on (pp. 1236–1241).
Bang, J., Noh, H., Kim, Y., & Lee, G. G. (2015). Example- Iacoboni, M. (2009). Mirroring people: The new science
based chat-oriented dialogue system with personalized of how we connect with others. Farrar, Straus and
long-term memory. In Big data and smart computing Giroux.
(bigcomp), 2015 international conference on (pp. 238– Jee, C. (2017). Retrieved June 25, 2017,
243). from http://www.techworld.com/picture-
Baron, N. S. (2015). Shall we talk? conversing with humans gallery/personal-tech/9-best-uses-of-
and robots. The Information Society, 31 (3), 257–264. Chatbots-in-business-in-uk-3641500/
Brehm, J. W. (1966). A theory of psychological reactance. Kaushik, D., Jain, R., et al. (2014). Natural user inter-
Buhr, S. (2016). Retrieved April 4, 2017, from faces: trend in virtual interaction. arXiv preprint
https://techcrunch.com/2016/01/28/mit- arXiv:1405.0101 .
spinout-insurify-raises-2-million-to-replace- Kaye, J. (2007). Evaluating experience-focused hci. In
human-insurance-agents-with-a-robot/ Chi’07 extended abstracts on human factors in com-
Capgemini. (2017). Retrieved June 26, puting systems (pp. 1661–1664).
2017, from https://www.capgemini- Kerr, D. (2017). Retrieved April 12, 2017, from
consulting.com/industries/financial-services9 https://www.cnet.com/news/facebook-messenger-
Chai, J. Y., Budzikowska, M., Horvath, V., Nicolov, N., hits-1-2b-users-thats-a-lot-of-people/
Kambhatla, N., & Zadrozny, W. (2001). Natural lan- Kim, Y., Bang, J., Choi, J., Ryu, S., Koo, S., & Lee, G. G.

15
(2014). Acquisition and use of long-term memory for psychology, biology, and evolution. American Psycho-
personalized dialog systems. In International work- logical Association.
shop on multimodal analyses enabling artificial agents Postma, O. J., & Brokke, M. (2002). Personalisation in
in human-machine interaction (pp. 78–87). practice: The proven effects of personalisation. Jour-
Kiseleva, J., Williams, K., Hassan Awadallah, A., Crook, nal of Database Marketing & Customer Strategy Man-
A. C., Zitouni, I., & Anastasakos, T. (2016). Predict- agement, 9 (2), 137–142.
ing user satisfaction with intelligent assistants. In Pro- Press, G. (2016). Retrieved June 27, 2017, from
ceedings of the 39th international acm sigir conference https://www.forbes.com/sites/gilpress/2016/11/
on research and development in information retrieval 01/forrester-predicts-investment-in-
(pp. 45–54). artificial-intelligence-will-grow-300-in-
Knight, W. (2016). 10 breakthrough technologies: Conver- 2017/691466b35509
sational interfaces. Retrieved June 24, 2017, from Quarteroni, S., & Manandhar, S. (2009). Designing an inter-
https://www.technologyreview.com/s/600766/10- active open-domain question answering system. Natu-
breakthrough-technologies-2016- ral Language Engineering, 15 (01), 73–95.
conversational-interfaces/ Rodden, K., Hutchinson, H., & Fu, X. (2010). Measuring the
Leite, I., Pereira, A., Mascarenhas, S., Martinho, C., Prada, user experience on a large scale: user-centered metrics
R., & Paiva, A. (2013). The influence of empathy for web applications. In Proceedings of the sigchi con-
in human–robot relations. International journal of ference on human factors in computing systems (pp.
human-computer studies, 71 (3), 250–260. 2395–2398).
Lemonade. (2017). Retrieved April 4, 2017, from Sauro, J., & Lewis, J. R. (2016). Quantifying the user expe-
https://www.lemonade.com/ rience: Practical statistics for user research. Morgan
Kaufmann.
Letzter, R. (2016). Ibm’s brilliant ai just helped teach a
Shaikh, A., Phalke, G., Patil, P., Bhosale, S., & Raghatwan,
grad-level college course. Retrieved June 24, 2017,
J. (2016). A survey on chatbot conversational systems.
from http://uk.businessinsider.com/watson-
International Journal of Engineering Science, 3117 .
ai-became-a-teaching-assistant-2016-
Shawar, A., Atwell, E., & Roberts, A. (2005). Faqchat as
5?international=truer=UK&IR=T
in information retrieval system. In Human language
Lewis, J. R. (2013). Critical review of ‘the usability metric
technologies as a challenge for computer science and
for user experience’. Interacting with computers.
linguistics: Proceedings of the 2nd language and tech-
Lipiec, M. (2017). Retrieved April 14, 2017, from nology conference (pp. 274–278).
https://chatbotsmagazine.com/what-we-learned- Smiers, L. (2017). How can chatbots meet expectations?
designing-a-chatbot-for-banking-2dd2c51d7c2c introducing the bot maturity model. Retrieved May 5,
MarutiTechlabs. (2017). Retrieved June 24, 2017, from 2017, from http://www.content-loop.com/how-can-
https://www.facebook.com/Customer-Support- chatbots-meet-expectations-introducing-the-
Bot-1857341381220252/ bot-maturity-model/
Mayer, R. E. (2003). The promise of multimedia learning: Someren, M. v., Barnard, Y. F., Sandberg, J. A., et al.
using the same instructional design methods across dif- (1994). The think aloud method: a practical approach
ferent media. Learning and instruction, 13 (2), 125– to modelling cognitive processes. Academic Press.
139. Tatai, G., Csordás, A., Kiss, Á., Szaló, A., & Laufer, L.
McCrae, R. R., & John, O. P. (1992). An introduction to (2003). Happy chatbot, happy user. In International
the five-factor model and its applications. Journal of workshop on intelligent virtual agents (pp. 5–12).
personality, 60 (2), 175–215. Walker, M. A., Litman, D. J., Kamm, C. A., & Abella, A.
McTear, M., Callejas, Z., & Griol, D. (2016). The conver- (1997). Paradise: A framework for evaluating spoken
sational interface. Springer. dialogue agents. In Proceedings of the eighth confer-
Miller, H., Thebault-Spieker, J., Chang, S., Johnson, I., ence on european chapter of the association for com-
Terveen, L., & Hecht, B. (2016). “blissfully happy” putational linguistics (pp. 271–280).
or “ready to fight”: Varying interpretations of emoji. Wigdor, D., & Wixon, D. (2011). Brave nui world: designing
Proceedings of ICWSM , 2016 . natural user interfaces for touch and gesture. Elsevier.
Mindbowser. (2017). Retrieved April 12, 2017, from
http://mindbowser.com/chatbot-market-survey-
2017/ (300+ individuals from a wide range of
industries participated in this survey. This study fo-
cuses on understanding the current state of Chatbots
and their future outlook.)
Munteanu, C., & Boldea, M. (2000). Mdwoz: A wizard of oz
environment for dialog systems development. In Lrec.
Niki. (2017). Retrieved June 5, 2017, from
https://niki.ai/ (digital chat-based artificial in-
telligence powered shopping assistant)
O’Brien, G. (2017). Retrieved June 26, 2017,
from https://tutorials.botsfloor.com/the-user-
experience-of-creating-a-chatbot-1f9055496349
Plutchik, R. (2003). Emotions and life: Perspectives from

16
APPENDIX
A. TASK ANALYSIS

Figure 5: Task analysis of three Dutch banks

B. CONVERSATIONAL USER FLOWS

17
Figure 6: Conversational user flow: blocking bankcard

18
Figure 7: Conversational user flow: financial advice

19
Link chatbot 1: m.me/1522158034481823
Link chatbot 2: m.me/1905225986423545
Link chatbot 3: m.me/272303553179105
Link chatbot 4: m.me/127510844467523 .

C. SCREENSHOTS CHATBOT 2

Figure 10: Screenshot of Blocking Bank Card Chatbot

Figure 8: Screenshot of Blocking Bank Card Chatbot

Figure 9: Screenshot of Blocking Bank Card Chatbot

Figure 11: Screenshot of Blocking Bank Card Chatbot

20
D. CHATBOT METRICS
Table 6: Chatbot Metrics (Hung et al., 2009)

Metric Type Data Collection Method


Total elapsed time Efficiency Quantitative Analysis
Total number of user/system turns Efficiency Quantitative Analysis
Total number of system turns Efficiency Quantitative Analysis
Total number of turns per task Efficiency Quantitative Analysis
Total elapsed time per turn Efficiency Quantitative Analysis
Number of re-prompts Qualitative Quantitative Analysis
Number of user barge-ins Qualitative Quantitative Analysis
Number of inappropriate system responses Qualitative Quantitative Analysis
Concept accuracy Qualitative Quantitative Analysis
Turn correction ratio Qualitative Quantitative Analysis
Ease of usage Qualitative Questionnaire
Clarity Qualitative Questionnaire
Naturalness Qualitative Questionnaire
Friendliness Qualitative Questionnaire
Robustness regarding misunderstandings Qualitative Questionnaire
Willingness to use system again Qualitative Questionnaire

E. QUESTIONNAIRE
Table 7: Questions from the self-made questionnaire where + means positively asked questions and - means negatively asked
questions.

Question Type -/+ questions


What is your name? Demographics
What is your age? Demographics
What is your education? Demographics
How much do you use a computer? Demographics
How confident do you feel about your computer usage? Demographics
Have you ever used a chatbot before? Demographics
The possibilities of this chatbot meet my requirements. Usability +
Using this chatbot is a frustrating experience. Usability -
This chatbot is easy to use. Usability +
I waste to much time on correcting things in this chatbot. Usability -
Because of this chatbot, I can quickly execute this task. Usefulness +
This chatbot makes it hard to execute the task. Usefulness -
Because of this chatbot, I can effectively execute this task. Usefulness +
This chatbot is useless. Usefulness -
This chatbot is fun to use. Satisfaction +
I miss functionalities in this chatbot. Satisfaction -
I would recommend this chatbot to a friend. Satisfaction +
I am unsatisfied about this chatbot. Satisfaction -

21
F. CODING SCHEME

Figure 12: Coding scheme used to make notes of the user experience during the qualitative user tests

22

View publication stats

Вам также может понравиться