Вы находитесь на странице: 1из 1

INTERACTIVE RANKING OF SKYLINES USING

MACHINE LEARNING TECHNIQUES


Weiwei Cheng, Eyke Hüllermeier, Bernhard Seeger, Ilya Vladimirskiy
Department of Mathematics and Computer Science
Marburg University, Germany

Ranking the Skyline Algorithm Design Experimental Results

Base Learner: Noise tolerant perceptron with margin. Workflow:

Training Data: A set of revealed (pairwise) preferences


a ≺ b, turned into positive and negative examples for
classification.

Monotonicity: a ≥ b ⇒ U (a) ≥ U (b) must be guaranteed


for all a, b ∈ O.
The skyline operator maps a finite set O of objects, each
characterized in terms of a fixed number of features Utility: Linear model U (a) = hw, ai = w1a1 + . . . + wdad Monotone vs. non-monotone learning:
(criteria), to the subset of Pareto-optimal elements:
1

(monotonicity holds if w ≥ 0) and kernalized version.


1
0,9 0,9

0,8
n o
0,8

df 0 0

Kendall tau Coefficient


0,7

P (O) = o ∈ O | {o ∈ O | o ≺ o } = ∅

Kendall tau Coefficient


0,7

Bayes Point Machine:


0,6 0,6
Non-monotone Non-monotone
0,5 0,5
Monotone Monotone
0,4 0,4

Important problem: P (O) may become huge, especially in 0,3

0,2
0,3

0,2

high dimensions ! 0,1

0
0,1

0
5 t=8,17 15 t=7,59 25 t=8,27 35 t=5,09 45 t=4,43 5 t=6,72 15 t=8,31 25 t=4,03 35 t=4,30 45 t=1,85
No. of Preferences & t-statistic No. of Preferences & t-statistic

Ensemble (Bayes point machine) vs. single learner:


Ranking the skyline via a (latent) utility function: 0,75 0,75

Approximation of the Bayes point by the center of mass of 0,74

0,73
0,74

0,73

version space.

Kendall tau Coefficient

Kendall tau Coefficient


0,72 0,72

0,71 0,71
Non-BPM Non-BPM
0,7 0,7
BPM BPM
0,69 0,69

0,68 0,68

0,67 0,67

0,66 0,66

0,65 0,65
5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50
t=2,93 t=3,84 t=4,60 t=4,33 t=3,83 t=5,32 t=4,54 t=4,59 t=5,97 t=5,21 t=2,27 t=4,22 t=5,55 t=3,37 t=3,85 t=5,03 t=2,79 t=3,98 t=4,92 t=3,17

Active Learning Strategy: No. of Permutations & t-statistic No. of Permutations & t-statistic

1. Constitute a committee of learners. Active vs. non-active learning:


– A utility function U (·) assigns a real utility degree to each
2. Find two maximally conflicting learners. 1 1

object a = (a1 . . . ad) ∈ O; U (a) < U (b) means that the user 0,95 0,95

3. For each learner, generate a corresponding ranking. 0,9 0,9

strictly prefers b to a.

Kendall tau Coefficient

Kendall tau Coefficient


0,85 0,85

Return the first discordant pair as a query. Add the 0,8


Non-active
0,8
Non-active

– Utility degrees induce a total order; thus, a ranking can


Active Active

answer to the preference set.


0,75 0,75

be presented instead of an unsorted answer set.


0,7 0,7

4. Retrain the committee on the enlarged preference set and


0,65 0,65

– User feedback is used to improve ranking quality.


0,6 0,6
10 15 t=2,58 20 t=2,93 10 15 t=4,78 20 t=6,89

go to step 2. No. of Preferences & t-statistic No. of Preferences & t-statistic

Poster presented at KDML 2007, Workshop Knowledge Discovery, Data Mining, and Machine Learning, Halle/Saale, September 2007.