Вы находитесь на странице: 1из 57

Ethical Machine Learning

Novi Quadrianto

12th September 2017


Machine Learning in the Recent News
Machine learning has had major impacts at many parts of
computer science as a field and in commercial world.

2
Machine Learning at Sussex
• Our group is called Predictive Analytics Lab (PAL). The
group consists of 5 team members (+3 by October 2017),
consisting of faculty, researchers and research students;
• We undertake high quality research, publish in top machine
learning and computer vision conferences/journals including
NIPS (7x), ICML (6x), CVPR (3x), ICCV/ECCV (2x), and
TPAMI/JMLR (4x);
• Also, we provide support, technology, and highly-trained
specialists to a new generation of technology companies.

3
Talk Outline

• Fairness
I Many notions of fairness

I Many techniques for ensuring fairness


I My unifying view

• Transparency
I Transparency, trust, intepretability, explainability, explanations

I Many techniques for generating explanations: computer

science view
I What explanation is: social science view
I How to evaluation explanation: social science view

4
The long-term goal: Ethical Machine Learning
• Problem: How to build a learning model that is ethical?
• Approach: Develop a framework that is able to handle
fairness, transparency, confidentiality, privacy, and their
combinations automatically in a plug-and-play manner.

Input Data Constraints at deployment time


Use Do not use
holiday photos complex models
ETHICAL Outputs
MACHINE LEARNING Decision models that are:
Transparent X
Use employment Do not use Fair X
applications protected characteristics Confidentiality-aware X

Plug-and-play constraints Automated


at deployment time inference
Use Do not use
medical scans confidential information
Generic learning problems
including binary/multi-class
classification and regression

EthicalML: Injecting Ethical and Legal Constraints into Machine Learning


Models, EPSRC grant, 10/2017 - 03/2019.

5
Why ethics in machine learning?

The UK’s House of Commons Science and Technology


Committee February 2016 report on “The Big Data Dilemma”
recommended an urgent formation of a Council of Data Ethics.
The US’s Executive Office of the President May 2016 report on
“Big Data: A Report on Algorithmic Systems, Opportunity, and
Civil Right” urged protection of fundamental values like fairness,
confidentiality, and privacy.

6
Fairness

http://pwp.gatech.edu

7
When processing personal data for profiling purposes, EU’s
General Data Protection Regulation (GDPR) (enforceable from
25 May 2018) states that we must prevent discriminatory effects.
Article 9 of GDPR; personal data includes “data revealing racial
or ethnic origin, political opinions, religious or philosophical beliefs,
or trade-union membership, . . . , genetic data, biometric
data,. . . ,data concerning health or data concerning a natural
person’s sex life or sexual orientation . . . ”

8
Problem setting
Mr. Q is trying to get a bank loan, from the VTB24.

9
What is fair?

A decision is fair if:


Type A fair treatment: it is not based on protected characteristics
such as race, gender, marital status, or age
Type B fair impact: it does not disproportionately benefit or hurt
individuals sharing a certain value of their protected
characteristic
Type C fair predictive performance: given the target outcomes, it
enforces equal discrepancies between decisions and target
outcomes across groups of individuals based on their
protected characteristic

10
Learning setup

• What we are given:


I a set of N training examples, represented by feature vectors

X = {x1 , . . . , xN } ⊂ X = Rd , their label annotation,


Y = {y1 , . . . , yN } ∈ Y = {+1, −1}
I protected characteristic information, Z = {z , . . . , z } ⊂ Z,
1 N
where zn encodes the protected characteristics of sample xn .
• What we want:
I a predictor f for the label y
new of an un-seen instance xnew ,
given Y , X and Z
Go to Privileged Learning

11
How to ensure fair treatment (type A fairness)?

Recap: fair treatment is when decisions are not based on


protected characteristics.
Some known mechanisms for ensuring fair treatment:
• simply ignoring protected characteristic features Z (called
fairness through unawareness). Any issue?

12
How to ensure fair impact (type B fairness)?

Recap: fair impact is when a certain protected attribute does not


end up positively or negatively affecting a data point.
Some definitions of fair impact:
• A binary decision model is fair if its decision {+1, −1} are
independent of the protected characteristic z ∈ {0, 1} across
the demographics (called demographic/statistical parity).
A decision fˆ satisfies this definition if
P (sign(fˆ(x)) = 1|z = 0) = P (sign(fˆ(x)) = 1|z = 1).

13
How to ensure fair impact (type B fairness)?
Some known mechanisms for ensuring demographic/statistical
parity: P (sign(fˆ(x)) = 1|z = 0) = P (sign(fˆ(x)) = 1|z = 1)
Zemel et al. learn a probabilistic mapping of the data point to a
set of K latent prototypes that is independent of z, while retaining
as much information about y as possible. Have three terms in the
objective function for demographic parity, reconstruction error,
and cross-entropy loss. The demographic parity term can be
written as:
1 PNz0 1 PNz1
Nz0 n=1 P (M = k|xn , z = 0) = Nz1 n=1 P (M = k|xn , z = 1),

∀k = 1, . . . , K, where P (M = k|x) is a softmax


PK function with K
prototypes P (M = k|x) = exp(−d(x, vk ))/ j=1 exp(−d(x, vj )).
Zemel, Wu, Swersky, Pitassi, and Dwork. Learning fair representations. ICML 2013.

14
How to ensure fair impact (type B fairness)?
Some known mechanisms for ensuring demographic/statistical
parity: P (sign(fˆ) = 1|z = 0) = P (sign(fˆ) = 1|z = 1)
Louizos et al. extend the previous approach and use deep
variational auto-encoders (VAE) framework to find a latent
representation v. The generative model (decoder) pθ (x|z, v) and
the variational posterior (encoder) qφ (v|x, z) are deep neural
networks.
y v2

z v z v1

x x
N N
Unsupervised model Semi-supervised model

Louizos, Swersky, Li, Welling, and Zemel. The variational fair autoencoder. ICLR
2016.

15
How to ensure fair impact (type B fairness)?
Some known mechanisms for ensuring demographic/statistical
parity: P (sign(fˆ) = 1|z = 0) = P (sign(fˆ) = 1|z = 1)
Zafar et al. propose (no more learning v) de-correlation
constraints between classifier decision functions and protected
characteristics as a way to approximately deliver demographic
parity.
Cov(z − ẑ, fˆ(x)) = E[(z − ẑ)fˆ(x)] − E[(z − ẑ)]E[fˆ(x)]
N
1 X
≈ (zn − ẑ)fˆ(xn ),
N
n=1

with fˆ(x) = hw, xi.


The above quantity will be approximately zero for a large training
set when the decision function satisfies demographic parity.
Zafar, Valera, Gomez-Rodriguez, and Gummadi. Fairness Constraints: Mechanisms for
Fair Classification. AISTATS 2017.

16
How to ensure fair impact (type B fairness)?

A definition of demographic/statistical parity:


P (sign(fˆ) = 1|z = 0) = P (sign(fˆ) = 1|z = 1)

Any issue?

17
How to ensure fair predictive perf. (type C fairness)?
Recap: fair predictive performance is when every protected group
is harmed or helped in the same way.
Some definitions of fair predictive performance:
• A binary decision model is fair if its decisions {+1, −1} are
conditionally independent of the protected characteristic
z ∈ {0, 1} given the positive target outcome y (called
equality of opportunity). A decision fˆ satisfies this
definition if
P (sign(fˆ(x)) = 1|z = 0, y = 1) = P (sign(fˆ(x)) = 1|z = 1, y = 1).

18
How to ensure fair predictive perf. (type C fairness)?
Recap: fair predictive performance is when every protected group
is harmed or helped in the same way.
Some definitions of fair predictive performance:
• A binary decision model is fair if its decisions {+1, −1} are
conditionally independent of the protected characteristic
z ∈ {0, 1} given the positive target outcome y (called
equality of opportunity). A decision fˆ satisfies this
definition if
P (sign(fˆ(x)) = 1|z = 0, y = 1) = P (sign(fˆ(x)) = 1|z = 1, y = 1).

• A binary decision model is fair if its decisions {+1, −1} are


conditionally independent of the protected characteristic
Z ∈ {0, 1} given the target outcome y (called equalized
odds). A decision fˆ satisfies this definition if
P (sign(fˆ(x)) = 1|z = 0, y) = P (sign(fˆ(x)) = 1|z = 1, y), for y ∈
{1, −1}.
18
How to ensure fair predictive perf. (type C fairness)?

Some known mechanisms for ensuring equality of opportunity:


P (sign(fˆ(x)) = 1|z = 0, y = 1) = P (sign(fˆ(x)) = 1|z = 1, y = 1).
Hardt et al. achieve equal true positive rates across the two
groups of individuals by post-processing the soft-outputs of an
unfair classifier. The post-processing step consists of learning a
different threshold for a different group of individuals.
Hardt, Price, and Srebro. Equality of opportunity in supervised learning. NIPS 2016.

19
How to ensure fair predictive perf. (type C fairness)?
Some known mechanisms for ensuring equality of opportunity:
P (sign(fˆ(x)) = 1|z = 0, y = 1) = P (sign(fˆ(x)) = 1|z = 1, y = 1).
Zafar et al. use the same de-correlation constraint as in
demographic parity but this time it is between true positive rates
and protected characteristics.
 
(1+y)
max 0, 2 y fˆ(x)
fˆ(x)
Cov(z − ẑ,   )
   
(1+y) (1+y) ˆ
max 0, 2 yfˆ(x)
ˆ max 0, y f (x)
= E[(z − ẑ)
f (x) fˆ(x)
] − E[(z − ẑ)]E[  2
]
N
1 X
 
(1+y)
ˆ max 0, 2 y fˆ(x)
≈ (zn − ẑ)
f (x
n ) ,

N
n=1

with fˆ(x) = hw, xi.


The (empirical) covariance will be approximately zero when the
decision function satisfies equal true positive rates.
Zafar, Valera, Gomez-Rodriguez, Gummadi. Fairness Beyond Disparate Treatment &
Disparate Impact: Learning Classification w/o Disparate Mistreatment. WWW 2017.
20
Quiz

1. Go to https://www.socrative.com or download the app


https://www.socrative.com/apps.html.
2. Enter the Room Number 73957fec.
3. Cast the vote!

21
Quiz on various fairness definitions
http://socrative.com and Room Name 73957fec.
customer’s attributes ground truth classifier’s
protected non-protected (pay back loan) (decision to approve)
race on elec. salary
roll > 2M p.a. C1 C2 C3
red 1 1 1 3 1 1 1
red 2 1 0 3 1 1 0
red 3 0 1 7 1 0 1
blue 1 1 1 3 1 0 1
blue 2 1 0 7 1 1 1
blue 3 0 0 3 0 1 0
Which classifier respects fairness through unawareness?
A C1 only
B C2 only
C C3 only
D C1 and C2
E C2 and C3
Adapted from Zafar, Valera, Gomez-Rodriguez, and Gummadi. WWW 2017.

22
Quiz on various fairness definitions
http://socrative.com and Room Name 73957fec.
customer’s attributes ground truth classifier’s
protected non-protected (pay back loan) (decision to approve)
race on elec. salary
roll > 2M p.a. C1 C2 C3
red 1 1 1 3 1 1 1
red 2 1 0 3 1 1 0
red 3 0 1 7 1 0 1
blue 1 1 1 3 1 0 1
blue 2 1 0 7 1 1 1
blue 3 0 0 3 0 1 0
Which classifier respects demographic/statistical parity?
A C1 only
B C2 only
C C3 only
D C1 and C2
E C2 and C3
Adapted from Zafar, Valera, Gomez-Rodriguez, and Gummadi. WWW 2017.

23
Quiz on various fairness definitions
http://socrative.com and Room Name 73957fec.
customer’s attributes ground truth classifier’s
protected non-protected (pay back loan) (decision to approve)
race on elec. salary
roll > 2M p.a. C1 C2 C3
red 1 1 1 3 1 1 1
red 2 1 0 3 1 1 0
red 3 0 1 7 1 0 1
blue 1 1 1 3 1 0 1
blue 2 1 0 7 1 1 1
blue 3 0 0 3 0 1 0
Which classifier respects equality of opportunity?
A C1 only
B C2 only
C C3 only
D C1 and C2
E C2 and C3
Adapted from Zafar, Valera, Gomez-Rodriguez, and Gummadi. WWW 2017.

24
My unifying view

• Problem: Which fairness criterion (or criteria) to use?


• Approach: Recycle two well-established machine learning
techniques, privileged learning and distribution matching,
and harmonize them for satisfying multi-faceted fairness
definitions.
NQ and Sharmanska. Recycling for fairness: Learning with conditional distribution
matching constraints. NIPS 2017.

25
My unifying view

Ideas:
• consider protected characteristics as privileged information
that is available at training but not at test time; this
accelerates model training and delivers fairness through
unawarenes
• cast statistical parity, equalized odds, and equality of
opportunity as a classical two-sample problem of conditional
distributions
• return a Pareto frontier of options

NQ and Sharmanska. Recycling for fairness: Learning with conditional distribution


matching constraints. NIPS 2017.

26
Privileged learning
• What we are given:
I training triplets (x1 , x? , y1 ), . . . , (xN , x? , yN ) where
1 N
(xn , yn ) ⊂ X × Y is the training input-output pair and
x?n ∈ X ? is additional information about a training xn .
I This additional (privileged) information is only available

during training. As an illustrative example, xn is a colour


feature from 2D image while x?n is a feature from 3D cameras
and laser scanners.

Go to Fairness Problem Setting

Vapnik and Vashist. A new learning paradigm: Learning using privileged information.
Neural Networks 2009.

27
Privileged learning
• What we are given:
I training triplets (x1 , x? , y1 ), . . . , (xN , x? , yN ) where
1 N
(xn , yn ) ⊂ X × Y is the training input-output pair and
x?n ∈ X ? is additional information about a training xn .
I This additional (privileged) information is only available

during training. As an illustrative example, xn is a colour


feature from 2D image while x?n is a feature from 3D cameras
and laser scanners.
• What we want:
I to use the additional data, x? , to accelerate the learning
n
process of inferring an optimal predictor on the data space X ,
i.e. f : X → Y.
I The difference between accelerated and non-accelerated

methods is in the rate of convergence to the optimal predictor,



e.g. 1/N cf.1/ N for margin-based classifiers.
Go to Fairness Problem Setting

Vapnik and Vashist. A new learning paradigm: Learning using privileged information.
Neural Networks 2009.

27
Privileged learning algorithm SVM∆ +
• it modifies the required distance of data instance to the
decision boundary based on easiness/hardness of that data
instance in the privileged space X ? , a space that contains
protected characteristic such as race.
• Easiness/hardness is reflected in the negative of the
confidence, −yn (hw? , x?n i + b? ); the higher this value, the
harder this data instance to be classified correctly.

Vapnik and Izmailov. Learning using privileged information: similarity control and
knowledge transfer. JMLR 2015.

28
Privileged learning algorithm SVM∆ +
• it modifies the required distance of data instance to the
decision boundary based on easiness/hardness of that data
instance in the privileged space X ? , a space that contains
protected characteristic such as race.
• Easiness/hardness is reflected in the negative of the
confidence, −yn (hw? , x?n i + b? ); the higher this value, the
harder this data instance to be classified correctly.
Optimization problem: XN
minimize 1/2 kwk2 + 1/2γ kw? k2 + C∆ max (0, −yn [hw? , x?n i + b? ])
w∈Rd ,b∈R
? n=1
w? ∈Rd ,b? ∈R
N
X
+C max (0, 1−yn [hw? , x?n i + b? ]−yn [hw, xn i + b])
n=1

Vapnik and Izmailov. Learning using privileged information: similarity control and
knowledge transfer. JMLR 2015.

28
Privileged learning algorithm SVM∆ +

N
X
minimize 1/2 kwk2 + 1/2γ kw? k2 + C∆ max (0, −yn [hw? , x?n i + b? ])
w∈Rd ,b∈R
? n=1
w? ∈Rd ,b? ∈R
N
X
+C max (0, 1−yn [hw? , x?n i + b? ] − yn [hw, xn i + b])
n=1

• −yn [hw? , x?n i + b? ] >> 0 means xn , without protected


characteristic, is expected to be a hard-to-classify instance
therefore its dist. to the decision boundary is increased.
• −yn [hw? , x?n i + b? ] << 0 means xn , without protected
characteristic, is expected to be an easy-to-classify instance
therefore its dist. to the decision boundary is reduced.
Vapnik and Izmailov. Learning using privileged information: similarity control and
knowledge transfer. JMLR 2015.

29
Distribution matching
• Remark: Privileged is not enough! We can still suffer from
discrimination by proxy.
• Need distribution matching to prevent proxy variables from
affecting fairness.
For statistical parity criterion, enforce the closeness between the
distributions of function outputs :
Distance(fˆ(XZ=0 ), fˆ(XZ=1 ));
fˆ(XZ=0 ) := {fˆ(xZ=0
1 ), . . . , fˆ(xZ=0
NZ=0 )}.
For equalized odds criterion, enforce the closeness between the
distributions of true positive and false positive rates
Distance(I[y = +1]fˆ(XZ=0 ), I[y = +1]fˆ(XZ=1 ))
and Distance(I[y = −1]fˆ(XZ=0 ), I[y = −1]fˆ(XZ=1 )).
For equal opportunity, enforce the closeness between the
distributions of just true positive rates.
30
Distribution matching on misclassification rates
Distance(1 − y fˆ(XZ=0 ), 1 − y fˆ(XZ=1 ))

SVM z = 0 SVM z = 1

31
Fair DM+ z = 0 Fair DM+ z = 1
Distribution matching
To avoid a parametric assumption on the distance estimate
between distributions, we use the Maximum Mean Discrepancy
10 10

(MMD) criterion. 9

8
9

7 7

6 6
p̂(misclassifications for z = 0) p(misclassifications for z = 1)
Y Axis

Y Axis
5 5

4 4

3 3

2 2

1 1

0 0

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10

X Axis X Axis

µZ=0 = Ep̂ [ (misclass. For Z=0)]


µZ=1
the mapping : µZ=0 µZ=1 = Ep [ (misclass. for Z=1)]
(1 yi hw, xi i) = k(1 yi hw, xi i, ·)
MMD(p, p̂) = ||µZ=0 µZ=1 ||2RKHS

RKHS with kernel k(·, ·)

Gretton, Borgwardt, Rasch, Schoelkopf, and Smola. A kernel two-sample test. JMLR
2012.

32
Special cases
Mean matching for demographic parity
Zemel et al. impose the following statistical parity constraint:
NX
Z=0 NX
Z=1
1 1
fˆ(xZ=0
n ; c) = fˆ(xZ=1
n ; c); ∀c = 1, . . . , C.
NZ=0 NZ=1
n=1 n=1

Assuming a linear kernel k on this constraint is equivalent to


requiring that for each c
NX
Z=0 D
1 E
µ[fˆ(xZ=0
n ; c)] = fˆ(xZ=0
n ; c), · =
NZ=0
n=1
NX
Z=1 D
1 E
= fˆ(xZ=1
n ; c), · = µ[fˆ(xZ=1
n ; c)].
NZ=1
n=1

Zemel, Wu, Swersky, Pitassi, and Dwork. Learning fair representations. ICML 2013.

33
Special cases
Mean matching for equalized odds and equal opportunity
Zafar et al. impose the following equal false positive rate
constraint:

NX
Z=0 NX
Z=1

min(0, I[yn = −1]fˆ(xZ=0


n )) = min(0, I[yn = −1]fˆ(xZ=1
n )).
n=1 n=1

Again, assuming a linear kernel k on this constraint is equivalent to


requiring that
PNZ=0
n=1 hmin(0,I[yn =−1]f (xn
µ[min(0,I[yn =−1]fˆ(xZ=0 ))]= N 1 ˆ Z=0 )),·i
n
Z=0
PNZ=1
=N 1
Z=1 n=1 hmin(0,I[yn =−1]fˆ(xZ=1
n )),·i=µ[min(0,I[yn =−1]fˆ(xZ=1
n ))].

Zafar, Valera, Gomez-Rodriguez, Gummadi. WWW 2017.

34
Pareto front
Observations:
• Minimizing the prediction error and the prediction un-fairness
involves solving a multi-objective optimization problem.
• In this type of problems, in general, there is no single optimal
solution that jointly minimizes the different objectives.
• Instead, there is a collection of optimal solutions called the
Pareto frontier.
The Pareto frontier: In the context of minimization, we say that
x Pareto dominates x0 if fk (x) ≤ fk (x0 ) ∀k , with at least one of
the inequalities being strict.
The Pareto frontier (Pareto set) X p is then the subset of
non-dominated points in X , i.e., the set such that
∀xp ∈ X p , ∀x ∈ X , ∃k ∈ 1, . . . , K for which fk (xp ) < fk (x).

35
Pareto front

The Pareto frontier is con-


sidered to be optimal be-
cause for each point in that
set one cannot improve in
one of the objectives without
deteriorating some other
objective.
Optimization procedures: genetic/evolutionary algorithms (e.g.
Non-dominated Sorting Genetic Algorithm (NSGA) – II and
Strength Pareto Evolutionary Algorithm (SPEA) - II), BFGS,
Convex Concave Procedure.

36
Transparency

http://wisdom.tenner.org/

37
Transparency, trust, intepretability, explainability,
explanations
Transparency & trust
We want to design and implement machine learning systems that
are transparent so that users will be better equipped to
understand and therefore trust the machine learning systems.
Intepretability, explainability, explanations
Two complementary approaches to increase trust and transparency
of machine learning systems:
1. generating decisions in which one of the criteria taken into
account during the computation is how well a human could
understand the decisions in the given context
(interpretability or explainability)
2. explicitly explaining decisions to people (explanation)
Miller. Explanation in AI: Insights from the social sciences. arXiv 2017.

38
Quiz on interpretability

http://socrative.com and Room Name 73957fec.


Which one is a model with high interpretability?
A linear models
B decision tree models
C deep neural network models
D rule list models
E none of the above

39
EU’s General Data Protection Regulation (GDPR) creates a “right
to explanation” for users, whereby users will have the right to ask
for “an explanation of an automated algorithmic decision that was
made about them”
Caveat: research by Wachter et al. has revealed that “the
GDPR is likely to only grant individuals information about the
existence of automated decision-making and about system
functionality, but no explanation about the rationale of a
decision” (“right to be informed”).
UK’s House of Commons Science and Technology Committee. Report on Robotics
and AI. 2016

Wachter, Mittelstadt, Floridi. Why a right to explanation of automated


decision-making does not exist in the General Data Protection Regulation.
International Data Privacy Law 2016.

40
System explanation v. decision explanation

What! Rejected,
please explain?

System functionality explanation:

We use this:

Decision explanation: You are rejected and not accepted because your race is Red.
In the training set, almost all people having accepted loan have their race as Blue.

41
How to provide explanations? A computer science view

• sensitivity analysis, gradient-based methods


• knowledge distillation/model compression
• investigation on hidden layers

Kim and Doshi-Velez. ICML tutorial on interpretable machine learning. 2017

42
Sensitivity analysis for providing explanations
Ribeiro et al. propose – Local Interpretable Model-agnostic
Explanations (LIME) – to generate explanations of any classifier by
approximating it using linear models with interpretable features.
• G is the class of linear
models
• Ω(g) limits the number of
used features
• πx (z) is a proximity
measure between z to x
(the locality)
• L(f, g, πx ) =
πx (z)(f (z) − g(z0 ))2
P
arg min L(f, g, πx ) + Ω(g)
g∈G (z,z0 )∈Z

Ribeiro, Singh, Guestrin. “Why should I trust you?” Explaining the predictions of any
classifier. KDD 2016.

43
Sensitivity analysis for providing explanations

Example of explanations from Ribeiro et al.’s LIME.

Ribeiro, Singh, Guestrin. KDD 2016.

https://www.oreilly.com/learning/
introduction-to-local-interpretable-model-agnostic-explanations-lime

44
Sensitivity analysis for providing explanations
Example of explanations from Ribeiro et al.’s LIME.

Ribeiro, Singh, Guestrin. KDD 2016.


https://www.oreilly.com/learning/
introduction-to-local-interpretable-model-agnostic-explanations-lime

45
Sensitivity analysis for providing explanations

Koh et al. use influence functions to identify training points most


responsible for a given prediction. Influence functions approximate
the effect of removing/perturbing a training point and retraining.
1/N Inf ((x, y), (xtest , ytest )) ≈
L((xtest , ytest ); θ) − L((xtest , ytest ); θ−(x,y) )
For L(·) that is twice-differentiable and strictly convex around θ,
the Inf (·) has a closed-form expression, albeit it is not scalable.
Koh and Liang. Understanding black-box predictions via influence functions. ICML
2017.

46
Sensitivity analysis for providing explanations

Koh and Liang. Understanding black-box predictions via influence functions. ICML
2017.

47
Gradient-based methods for providing explanations
How much difference a tiny change in each pixel of x would make
to the classification score for class c. Use this: ∂Sc (x)/∂x where Sc
is a class activation function for class c.
Grad-CAM (Selvaraju et al. 2016)
SmoothGrad (Smilkov et al. 2017)

Kim and Doshi-Velez. ICML tutorial on interpretable machine learning. 2017


Smilkov, Thorat, Kim, et al. Removing noise by adding noise. arXiv 2017.
Selvaraju, Cogswell, Das, et al. Visual explanations from deep networks via
gradient-based localization. CVPR 2016.

48
Model compression/knowledge distillation for providing
explanations

Idea: Output probability predictions on large ensemble models of


complex deep networks are used as training labels for much smaller
models.
The original goal of model compression/knowledge distillation is for
model compactness but can also be for model interpretability.
Ba and Caruana. Do deep nets really need to be deep? NIPS 2014.

Hinton, Vinyals, Dean. Distilling the knowledge in a neural network. NIPS Workshop
2014.

49
Model compression/knowledge distillation for providing
explanations
Idea: Output probability predictions on large ensemble models of
complex deep networks are used as training labels for much smaller
models.

Hendricks, et al. Generating visual explanations. ECCV 2016.

50
What is explanation? A social science view
Miller surveyed over 200 social science papers on explanation
and there are three important highlights about explanations:
1. Explanations are contrastive: people do not ask why event P
happened, but rather why event P happened instead of some
event Q;
2. Explanations are selected: people rarely, if ever, expect an
explanation that consists of an actual and complete cause of
an event;
3. Explanations are social: a transfer of knowledge, presented as
part of a conversation or interaction, and are thus presented
relative to the explainer’s beliefs about the explainee’s beliefs.
Miller. Explanation in AI: Insights from the social sciences. arXiv 2017.

51
What is explanation? A social science view

Explanations are not just the presentation of causes (causal


attribution).
An event may have many causes, often the explainee cares only
about a small subset (relevant to the contrast case), the explainer
selects a subset of this subset (based on several different criteria),
and explainer and explainee may interact and argue about this
explanation.
Miller. Explanation in AI: Insights from the social sciences. arXiv 2017.

52
How to evaluation explanation: social science view

Likelihood is not everything. While likely causes are part of good


explanations, they do not strongly correlate with explanations that
people find useful.
Three criteria that are at least as equally important:
1. simplicity
2. generality
3. coherence
Miller. Explanation in AI: Insights from the social sciences. arXiv 2017.

53
Thank you for your attention

Also thanks to
• The team at the University of Sussex

• Funding sources

and UK/EU-based commercial/charity-based clients

54