MSR Mlfeedback

End-User Debugging of Machine Learning Systems
Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science http://www.eecs.oregonstate.edu/~wong
Collaborators
Faculty Margaret Burnett Simone Stumpf Tom Dietterich Jon Herlocker Grad Students Erin Fitzhenry Lida Li Ian Oberst Vidya Rajaram Undergrads Russell Drummond Erin Sullivan
Papers
Stumpf S., Rajaram V., Li L., Burnett M., Dietterich T., Sullivan E., Drummond R., Herlocker J. (2007) . Toward Harnessing User Feedback For Machine Learning. In Proceedings of IUI 2007. Stumpf, S., Rajaram V., Li L., Wong, W.-K., Burnett, M., Dietterich, T., Sullivan, E., Herlocker, J. (2008) Interacting Meaningfully with Machine Learning Systems: Three Experiments. (Submitted to IJHCS) Stumpf, S., Sullivan, E., Fitzhenry, E., Oberst, I., Wong, W.-K., Burnett., M. (2008). Integrating Rich User Feedback into Intelligent User Interfaces. In Proceedings of IUI 2008.
Motivation
Date: Mon, 28 Apr 2008 23:59:00 (PST) From: John Doe <john.doe@onid.orst.edu> To: Weng-Keen Wong <wong@eecs.oregonstate.edu> Subject: CS 162 Assignment
I cant get my Java assignment to work! It just wont compile and it prints out lots of error messages! Please help!
public class MyFrame extends JFrame { private AsciiFrameManager reader; private JPanel displayPanel; public MyFrame(String filename) throws Exception { reader = new AsciiFrameManager(filename); displayPanel = new JPanel(); ...
CS 162
John Doe
Trash
Machine learning tool adapts to end user Similar situation in recommender systems, smart desktops, etc.
Motivation
Date: Mon, 28 Apr 2008 23:51:00 (PST) From: Bella Bose <bose@eecs.oregonstate.edu> To: Weng-Keen Wong <wong@eecs.oregonstate.edu> Subject: Teaching Assignments
Ive compiled the teaching preferences for all the faculty. Here are the teaching assignments for next year:
Fall Quarter CS 160 (Computer Science Orientation) Paul Paulson CS 161 (Introduction to Programming I) Chris Wallace CS 162 (Introduction to Programming II) Weng-Keen Wong ...
Trash
Machine Learning systems are great when they work correctly, aggravating when they dont
The end user is the only person at the computer

Can we let end users correct machine learning systems?
Motivation
Learn to correct behavior quickly

Sparse data on start Concept drift Effects of user feedback on accuracy? Effects on users?
Rich end-user knowledge

Overview
End-User
Explanation
End user feedback
Machine Learning Algorithm
Related Work
Explanation Expert Systems (Swartout 83, Wick and Thompson 92) TREPAN (Craven and Shavlik 95) Description Logics (McGuinness 96) Bayesian networks (LaCave and Diez 00) Additive classifiers (Poulin et al. 06)
End user interaction Active Learning (Cohn et al. 96, many others)
Constraints (Altendorf et al. 05, Huang and Mitchell 06)

Ranks (Radlinski and Joachims 05) Feature Selection (Raghavan et al. 06) Crayons (Fails and Olsen 03) Programming by Demonstration (Cypher 93, Lau and Weld 99, Lieberman 01)
Others (Crawford et al. 02, Herlocker et al. 00)
Outline
1. What types of explanations do end users understand? What types of corrective feedback could end users provide? (IUI 2007) How do we incorporate this feedback into a ML algorithm? (IJHCS 2008) What happens when we put this together? (IUI 2008)
2.
3.
What Types of Explanations do End Users Understand?
Thinkaloud study with 13 participants Classify Enron emails Explanation systems: rule-based, keyword-based, similarity-based Findings:

Rule-based best but not a clear winner Evidence indicates multiple explanation paradigms needed
What types of corrective feedback could end users provide? Suggested corrective feedback in response to explanations: 1. Adjust importance of word 2. Add/remove word from consideration 3. Parse / extract text in a different way 4. Word combinations 5. Relationships between messages/people
Outline
1. What types of explanations do end users understand? What types of corrective feedback could end users provide? (IUI 2007) How do we incorporate this feedback into a ML algorithm? (IJHCS 2008) What happens when we put this together? (IUI 2008)
2.
3.
12
Incorporating Feedback into ML Algorithms
Two approaches: Constraint-based User co-training
Constraint-based approach
Constraints: 1. If weight on word reduced or word removed, remove the word as a feature 2. If weight of word increased, word assumed to be important for that folder P ( x j 1 | Y yk ) P ( x j 1 | Y yk ) 3. If weight of word increased, word is a better predictor for that folder than other words P (Y y k | x j 1) P (Y y k | xk 1)
Estimate parameters for Naive Bayes using MLE with these constraints
Standard Co-training
Create classifiers C1 and C2 based on the two independent feature sets. Repeat i times Add most confidently classified messages by any classifier to training data Rebuild C1 and C2 with the new training data
User Co-training
CUSER = Classifier based on user feedback CML = Machine learning algorithm For each session of user feedback Add most confidently classified messages by CUSER to training data Rebuild CML with the new training data
User Co-training
CUSER = Classifier based on user feedback CML = Machine learning algorithm For each session of user feedback Add most confidently classified messages by CUSER to training data Rebuild CML with the new training data
Well expand the inner loop on the next slide
User Co-training
For each folder f, let vector vf = words with weights increased by the user For each message m in the unlabeled set For each folder f, Compute Probf from the machine learning classifier Scoref=# of words in vf appearing in the message * Probf f max arg max Score f
f Folders
Scoreother
f Folders\ f max
max
Score f
Scorem=Scorefmax Scoreother Sort Scorem for all messages in decreasing order Select the top k messages to add to the training set along with their folder label fmax Rebuild CML with the new training data
Constraint-based vs User co-training
Constraint-based

Difficult to set hardness of constraint Constraints often already satisfied End-user can over-constrain the learning algorithm Slow Requires unlabeled emails in inbox Better accuracy than constraint-based
User co-training

Results
Feedback from keyword-based paradigm
Feedback from similarity-based paradigm
Outline
1. What types of explanations work for end users? What types of corrective feedback could end users provide? (IUI 2007) How do we incorporate this feedback into a ML algorithm? (IJHCS 2008) What happens when we put this together? (IUI 2008)
2.
3.
21
Experiment: Email program
22
Experiment: Procedure
Intelligent email system to classify emails into folders

43 English-speaking, non-CS students Background questionnaire Tutorial (email program and folders) Experiment task on feedback set
Correct folders. Add, remove, change weight on keywords. 30 interaction logs
Post-session questionnaire
23
Experiment: Data
Enron data set

9 folders 50 training messages
10 each for 5 folders with folder labels For use in experiment Same for each participant For evaluation after experiment
50 feedback messages

1051 test messages
24
Experiment: Classification algorithm User co-training

Two classifiers: User, Nave Bayes Slight modification on user classifier Scoref=sum of weights in vf appearing in the message Weights can be modified interactively by user
25
60% 40% 20% 0%
Results: Accuracy improvements of rich feedback
Subject
-20% -40% -60%
Accuracy over folder feedback
Change over baseline
Change over folder feedback
Rich Feedback: participant folder labels and keyword changes Folder feedback: participant folder labels
26
60% 40% 20% 0%
Results: Accuracy improvements of rich feedback
Subject
-20% -40% -60%
Accuracy over baseline
Change over baseline
Change over folder feedback
Rich Feedback: participant folder labels and keyword changes Baseline: original Enron labels
27
Results: Accuracy summary
60% of participants saw accuracy improvements, some very substantial Some dramatic decreases More time between filing emails or more folder assignments higher accuracy
29
Interesting bits
1.
2.
Need to communicate the effects of the users corrective feedback Unstable classifier period
With sparse training data, a single new training example can dramatically change the classifiers decision boundaries Wild fluctuations in classifiers predictions frustrate end users Causes wall of red
Interesting bits: Unstable classifier period

0.7 0.6
Accuracy
0.5 0.4 0.3 0.2 0.1 0 0 50 100 150 200 250 300 350 Number of training data points
Moved test emails into training set to look for effect on accuracy (Baseline, participant 101)
31
Interesting bits
3.
4.
Unlearning important, especially to correct undesirable changes Gender differences Females took longer to complete Females added twice as many keywords Comment more on unlearning
Interesting directions for HCI

1. 2. 3. 4.
Gender differences More directed debugging Other forms of feedback Communicating effects of corrective feedback
Users need to detect the system is listening to their feedback
5.

Explanations
Form Fidelity
Interesting directions for Machine Learning

1.
2.
3. 4.
Algorithms for learning from corrective feedback Modeling reliability of user feedback Explanations Incorporating new features
Future work
ML Whyline (with Andy Ko)
35
For more information

wong@eecs.oregonstate.edu www.eecs.oregonstate.edu/~wong
36

MSR Mlfeedback

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

MSR Mlfeedback

Загружено:

Авторское право:

Доступные форматы

End-User Debugging of Machine Learning Systems

The end user is the only person at the computer

Learn to correct behavior quickly

Rich end-user knowledge

End user feedback

Machine Learning Algorithm

Constraints (Altendorf et al. 05, Huang and Mitchell 06)

Others (Crawford et al. 02, Herlocker et al. 00)

What Types of Explanations do End Users Understand?

Incorporating Feedback into ML Algorithms

Two approaches: Constraint-based User co-training

Well expand the inner loop on the next slide

Constraint-based vs User co-training

Feedback from similarity-based paradigm

Experiment: Email program

Intelligent email system to classify emails into folders

Correct folders. Add, remove, change weight on keywords. 30 interaction logs

Enron data set

9 folders 50 training messages

1051 test messages

Experiment: Classification algorithm User co-training

60% 40% 20% 0%

Results: Accuracy improvements of rich feedback

Accuracy over folder feedback

Change over baseline

Change over folder feedback

60% 40% 20% 0%

Results: Accuracy improvements of rich feedback

Accuracy over baseline

Change over baseline

Change over folder feedback

Results: Accuracy summary

Interesting bits: Unstable classifier period

Interesting directions for HCI

Users need to detect the system is listening to their feedback

Interesting directions for Machine Learning

ML Whyline (with Andy Ko)

For more information

Вам также может понравиться