You are on page 1of 60

Lecture 2: Introduction to agents and

problem solving

Characteristics of agents and environments

Problem-solving agents where search consists of

state space
start state
goal states

Abstraction and problem formulation

Search trees: an effective way to represent the search process

Reading: chapter 2 and chapter 3.1, 3.2, 3.3

Homework 1 will be announced shortly

An agent is anything that can be viewed as perceiving its
environment through sensors and acting upon that environment
through actuators.

This definition includes:

Robots, humans, programs

Human agent:
eyes, ears, and other organs for sensors;
hands, legs, mouth, and other body parts for

Robotic agent:
cameras and infrared range finders for sensors; various motors
for actuators

We use the term percept to refer to the agents perceptual inputs

at any given instant
Examples of Agents

Humans Programs Robots___

senses keyboard, mouse, dataset cameras, pads
body parts monitor, speakers, files motors, limbs
Agents and environments:
The structure of Agents

The agent function is an abstract mathematical

description; maps from percept histories to
[f: P* A]
The agent program is a concrete implementation,
running within some physical architecture system
to produce f
agent = architecture + program
The Agent Architecture: A Model
Extension Basic Model

Mouth: Communication
Actuators Device
Head: General

Body: Application-
specific Abilities
Mobile Agents
PDAs etc
Example: Vacuum-cleaner world

A vacuum-cleaner world with just two locations

The vacuum-cleaner world is so simple that we can describe

everything that happens. This particular world has just two
locations: squares A and B.
The vacuum agent perceives which square it is in and
whether there is dirt in the square.
It can choose to move left, move right, suck up the dirt, or
do nothing.
One very simple agent function is the following: if the
current square is dirty, then suck; otherwise, move to the
other square.
A vacuum-cleaner agent, Cont
Percepts: location and contents, e.g., [A,Dirty]
Actions: Left, Right, Suck, NoOp


What is rational at any given time depends

on four things:

The performance measure that defines the

criterion of success.
The agents prior knowledge of the environment.
The actions that the agent can perform.
The agents percept sequence to date.
Rational Agents
A rational agent is one that does the right thing

An agent should strive to "do the right thing", based

on what it can perceive and the actions it can
perform. The right action is the one that will cause
the agent to be most successful.

Need to be able to assess agents performance

Should be independent of internal measures

Ask yourself: has the agent acted rationally?

Not just dependent on how well it does at a task

First consideration: evaluation of rationality

Rational agents Cont

Performance measure: An objective criterion

for success of an agent's behavior
E.g., performance measure of a vacuum-
cleaner agent could be amount of dirt
cleaned up, amount of time taken, amount
of electricity consumed, amount of noise
generated, etc.
Rational agents

Rational Agent:
For each possible percept sequence, a
rational agent should select an action that
is expected to maximize its performance
measure, given the evidence provided by
the percept sequence and whatever built-
in knowledge the agent has.
Rational agents
Performance measure: An objective criterion for success of
an agent's behavior, e.g.,
Robot driver?
Chess-playing program?
Spam email classifier?

Rational Agent: selects actions that is expected to

maximize its performance measure,
given percept sequence
given agents built-in knowledge
sidepoint: how to maximize expected future performance,
given only historical data
Rational agents
Rationality is distinct from omniscience (all-
knowing with infinite knowledge).

Agents can perform actions in order to

modify future percepts so as to obtain
useful information (information gathering,

An agent is autonomous if its behavior is

determined by its own experience (with
ability to learn and adapt).
Autonomy in Agents

The autonomy of an agent is the extent to which its

behaviour is determined by its own experience

No autonomy ignores environment/data
Complete autonomy must act randomly/no program

Example: baby learning to crawl

Ideal: design agents to have some
Possibly good to become more autonomous in time
Internal Structure

Second lot of considerations

Architecture and Program
Knowledge of the Environment
Types of the agents:
Utility Functions
Task Environment

Before we design an intelligent agent, Must first specify

the setting for intelligent agent design; specify its task

Performance measure

Consider, e.g., the task of designing an automated taxi


Example: Agent = robot driver in DARPA Challenge

Performance measure:
Time to complete course
Safe, fast, legal, comfortable trip, maximize profits

Roads, other traffic, obstacles, pedestrians, customers

Steering wheel, accelerator, brake, signal, horn

Optical Cameras, lasers, sonar, accelerometer,
speedometer, GPS, odometer, engine sensors,

Example: Agent = Medical diagnosis system

Performance measure:
Healthy patient, minimize costs, lawsuits

Patient, hospital, staff

Screen display (questions, tests, diagnoses, treatments,

Keyboard (entry of symptoms, findings, patient's answers)

Agent: Part-picking robot

Performance measure: Percentage
of parts in correct bins
Environment: Conveyor belt with
parts, bins
Actuators: Jointed arm and hand
Sensors: Camera, joint angle

Agent: Interactive English tutor

Performance measure: Maximize
student's score on test
Environment: Set of students
Actuators: Screen display
(exercises, suggestions,
Sensors: Keyboard
Environment types

Fully observable (vs. partially observable):

An agent's sensors give it access to the complete state of the
environment at each point in time.

Deterministic (vs. stochastic):

The next state of the environment is completely determined
by the current state and the action executed by the agent.
If the environment is deterministic except for the actions of
other agents, then the environment is strategic
Deterministic environments can appear stochastic to an agent
(e.g., when only partially observable)

Episodic (vs. sequential):

An agents action is divided into atomic episodes(each
episode consists of the agent perceiving and then performing
a single action), Decisions of the action in each episode do not
depend on previous decisions/actions it depends only on the
episode itself.
Environment types

Static (vs. dynamic):

The environment is unchanged while an agent is deliberating.
The environment is semidynamic if the environment itself
does not change with the passage of time but the agent's
performance score does

Discrete (vs. continuous):

A discrete set of distinct, clearly defined percepts and
How we represent or abstract or model the world

Single agent (vs. multi-agent):

An agent operating by itself in an environment. Does the
other agent interfere with my performance measure?
task observable deterministic/ episodic/ static/ discrete/ agents
environm. stochastic sequential dynamic continuous

crossword fully determ. sequential static discrete single


chess with fully strategic sequential semi discrete multi



taxi partial stochastic sequential dynamic continuous multi



image fully determ. episodic semi continuous single


partpicking partial stochastic episodic dynamic continuous single


refinery partial stochastic sequential dynamic continuous single


interact. partial stochastic sequential dynamic discrete multi

What is the environment for the DARPA Challenge?

Agent = robotic vehicle

Environment = 130-mile route through desert

Environment types
Chess with Chess without Taxi driving

a clock a clock

Fully observable Yes Yes No

Deterministic Strategic Strategic No
Episodic No No No
Static Semi Yes No
Discrete Yes Yes No
Single agent No No No

The environment type largely determines the agent design

The real world is (of course) partially observable,

stochastic, sequential, dynamic, continuous, multi-agent
Agent functions and programs

An agent is completely specified by the

agent function mapping percept sequences
to actions.

One agent function (or a small equivalence

class) is rational.

Aim: find a way to implement the rational

agent function concisely.
Table-lookup agent


Huge table
Take a long time to build the table
No autonomy
Even with learning, need a long time to learn the table entries
Agent types
Five basic types in order of increasing generality,

Simple reflex agents

Model-based reflex agents
Goal-based agents
Problem-solving agents
Utility-based agents
Can distinguish between different goals
Learning agents

Each kind of agent program combines particular components in

particular ways to generate actions.
Simple reflex agents
The simplest kind of agent is the simple
reflex agent.
These agents select actions on the basis of
the current percept, ignoring the rest of the
percept history. For example, the vacuum
agent whose agent function is tabulated in
slide 6 is a simple reflex agent, because its
decision is based only on the current
location and on whether that location
contains dirt.
Simple reflex agents
Model-based reflex agents

The most effective way to handle partial
The agent should maintain some sort of
internal state that depends on the percept
history and thereby reflects at least some of
the unobserved aspects of the current state.
For example, for other driving tasks such as
changing lanes, the agent needs to keep
track of where the other cars are if it cant
see them all at once.
Model-based reflex agents

The structure of the model-based reflex agent with internal

state, showing how the current percept is combined with the
old internal state to generate the updated description of the
current state, based on the agents model of how the world
Goal-based agents

- Knowing something about the current state of the

environment is not always enough to decide what to
- For example, at a road junction, the taxi can turn
left, turn right, or go straight on. The correct decision
depends on where the taxi is trying to get to.
- In other words, as well as a current state
description, the agent needs some sort of goal
information that describes situations that are
desirable. The agent program can combine this with
the model (the same information was used in the
model based reflex agent) to choose actions that
achieve the goal.
Goal-based agents
Utility-based agents
Utility-based agents
Learning Agents
A learning agent can be divided into four conceptual components,
as shown in the Figure next slide. The most important distinction
is between the learning element, which is responsible for making
improvements, and the performance element, which is responsible
for selecting external actions. The performance element is what
we have previously considered to be the entire agent: it takes in
percepts and decides on actions. The learning element uses
feedback from the critic on how the agent is doing and determines
how the performance element should be modified to do better in
the future.
The critic tells the learning element how well the agent is doing
with respect to a fixed performance standard. The critic is
necessary because the percepts themselves provide no indication
of the agent's success.
Learning Agents

The last component of the learning agent is the problem

generator. It is responsible for suggesting actions that will lead to
new and informative experiences. The point is that if the
performance element had its way, it would keep doing the actions
that are best; given what it knows. But if the agent is willing to
explore a little and do some perhaps suboptimal actions in the
short run, it might discover much better actions for the long run.
The problem generator's job is to suggest these exploratory
Learning agents
The RHINO Robot
Museum Tour Guide Running Example

Museum guide in Bonn

Two tasks to perform
Guided tour around exhibits
Provide info on each exhibit

Very successful
18.6 kilometres
47 hours
50% attendance increase
1 tiny mistake (no injuries)
Architecture and Program

Method of turning environmental input into actions
Hardware/software (OS etc.) on which agents program

RHINOs architecture:
Sensors (infrared, sonar, tactile, laser)
Processors (3 onboard, 3 more by wireless Ethernet)
RHINOs program:
Low level: probabilistic reasoning, vision,
High level: problem solving, planning (first order logic)
Knowledge of Environment

Knowledge of Environment (World)

Different to sensory information from environment.

World knowledge can be (pre)-programmed in

Can also be updated/inferred by sensory information.

Choice of actions informed by knowledge of...

Current state of the world
Previous states of the world
How its actions change the world

Example: Chess agent

World knowledge is the board state (all the pieces)
Sensory information is the opponents move
Its moves also change the board state
RHINOs Environment Knowledge

Programmed knowledge
Layout of the Museum
Doors, exhibits, restricted areas

Sensed knowledge
People and objects (chairs) moving

Affect of actions on the World

Nothing moved by RHINO explicitly
But, people followed it around (moving people)

Action on the world

In response only to a sensor input
Not in response to world knowledge

Humans flinching, blinking

Chess openings, endings
Lookup table (not a good idea in general)
35100 entries required for the entire game

RHINO: no reflexes?
Dangerous, because people get everywhere

Always need to think hard about

What the goal of an agent is

Does agent have internal knowledge about goal?

Obviously not the goal itself, but some properties
Goal based agents
Uses knowledge about a goal to guide its actions
E.g., Search, planning
Goal: get from one exhibit to another
Knowledge about the goal: whereabouts it is
Need this to guide its actions (movements)
Utility Functions

Knowledge of a goal may be difficult to pin down

For example, checkmate in chess

But some agents have localised measures

Utility functions measure value of world states
Choose action which best improves utility (rational!)
In search, this is Best First

RHINO: various utilities to guide search for route

Main one: distance from the target exhibit
Density of people along path
Details of the Environment

Must take into account:

some qualities of the world

A robot in the real world
A software agent dealing with web data streaming in

Third lot of considerations:

Accessibility, Determinism
Dynamic/Static, Discrete/Continuous
Accessibility of Environment

Is everything an agent requires to choose its

actions available to it via its sensors?
If so, the environment is fully accessible

If not, parts of the environment are inaccessible

Agent must make informed guesses about world

Invisible objects which couldnt be sensed
Including glass cases and bars at particular heights
Software adapted to take this into account
Determinism in the Environment

Does the change in world state

Depend only on current state and agents action?

Non-deterministic environments
Have aspects beyond the control of the agent
Utility functions have to guess at changes in world

Robot in a maze: deterministic

Whatever it does, the maze remains the same

RHINO: non-deterministic
People moved chairs to block its path
Episodic Environments

Is the choice of current action

Dependent on previous actions?
If not, then the environment is episodic

In non-episodic environments:
Agent has to plan ahead:
Current choice will affect future actions

Short term goal is episodic
Getting to an exhibit does not depend on how it got to
current one
Long term goal is non-episodic
Tour guide, so cannot return to an exhibit on a tour
Static or Dynamic Environments

Static environments dont change

While the agent is deliberating over what to do

Dynamic environments do change

So agent should/could consult the world when choosing
Alternatively: anticipate the change during deliberation
Alternatively: make decision very fast

Fast decision making (planning route)
But people are very quick on their feet
Discrete or Continuous

Nature of sensor readings / choices of action

Sweep through a range of values (continuous)
Limited to a distinct, clearly defined set (discrete)

Maths in programs altered by type of data

Chess: discrete

RHINO: continuous
Visual data can be considered continuous
Choice of actions (directions) also continuous
RHINOs Solution to
Environmental Problems

Museum environment:
Inaccessible, non-episodic, non-deterministic, dynamic,

RHINO constantly update plan as it moves

Solves these problems very well
Necessary design given the environment
Representations type
There are various ways that the components
can represent the environment that the agent
1- atomic representation:
each state of the world is indivisibleit has no internal
structure. Consider the problem of finding a driving route
from one end of a country to the other via some sequence
of cities. For the purposes of solving this problem, it may
suffice to reduce the state of world to just the name of the
city we are in, a single atom of knowledge; a "black box"
whose only discernible property is that of being identical to
or different from another black box.
- The algorithm underlying search and game-playing (Chapters 3-5),
Hidden Markov models (Chapter 15), and Markov decision processes
(Chapter 17).
Representations type
2- A factored representation:
A factored representation splits up each state into a fixed
set of variables or attributes, each of which can have a
value. While two different atomic states have nothing in
commonthey are just different black boxestwo different
factored states can share some attributes.
e.i. consider a higher-fidelity description for the same
problem, where we need to be concerned with more than
just atomic location in one city or another; we might need
to pay attention to how much gas is in the tank, our
current GPS coordinates, whether or not the oil warning
light is working, how much spare change we have for toll
crossings, what station is on the radio, and so on.
Many important areas of Al are based on factored representations,
including constraint satisfaction algorithms (Chapter 6), propositional
logic (Chapter 7), planning (Chapters 10 and 11), Bayesian networks
(Chapters 13-16), and the machine learning al-gorithms in Chapters
18,20, and 21.
Representations type
3- structured representation :
we need to understand the world as having things in it that
are related to each other, not just variables with values.
For example, we might notice that a large truck ahead of
us is reversing into the driveway of a dairy farm but a cow
has got loose and is blocking the truck's path.
objects such as cows and trucks and their various and
varying relationships can be described explicitly using
structured representation.
Structured representations underlie relational databases and first-order logic (Chapters 8, 9,
and 12), first-order probability models (Chapter 14), knowledge-based learning (Chapter 19)
and much of natural language understanding (Chapters 22 and 21). In fact, almost everything
that humans express in natural language concerns objects and their relationships.

Think about these in design of


Internal structure
How to test of agent
whether agent
is acting
rationally Autonomous

Specifics of
Usual systems environment
engineering stuff
End of Chapter 2