# INTERACTIVE RANKING OF SKYLINES USING

## MACHINE LEARNING TECHNIQUES

Weiwei Cheng, Eyke Hüllermeier, Bernhard Seeger, Ilya Vladimirskiy
Department of Mathematics and Computer Science
Marburg University, Germany

## Training Data: A set of revealed (pairwise) preferences

a ≺ b, turned into positive and negative examples for
classification.

## Monotonicity: a ≥ b ⇒ U (a) ≥ U (b) must be guaranteed

for all a, b ∈ O.
The skyline operator maps a finite set O of objects, each
characterized in terms of a fixed number of features Utility: Linear model U (a) = hw, ai = w1a1 + . . . + wdad Monotone vs. non-monotone learning:
(criteria), to the subset of Pareto-optimal elements:
1

1
0,9 0,9

0,8
n o
0,8

df 0 0

## Kendall tau Coefficient

0,7

P (O) = o ∈ O | {o ∈ O | o ≺ o } = ∅

0,7

## Bayes Point Machine:

0,6 0,6
Non-monotone Non-monotone
0,5 0,5
Monotone Monotone
0,4 0,4

0,2
0,3

0,2

## high dimensions ! 0,1

0
0,1

0
5 t=8,17 15 t=7,59 25 t=8,27 35 t=5,09 45 t=4,43 5 t=6,72 15 t=8,31 25 t=4,03 35 t=4,30 45 t=1,85
No. of Preferences & t-statistic No. of Preferences & t-statistic

## Ensemble (Bayes point machine) vs. single learner:

Ranking the skyline via a (latent) utility function: 0,75 0,75

0,73
0,74

0,73

version space.

## Kendall tau Coefficient

0,72 0,72

0,71 0,71
Non-BPM Non-BPM
0,7 0,7
BPM BPM
0,69 0,69

0,68 0,68

0,67 0,67

0,66 0,66

0,65 0,65
5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50
t=2,93 t=3,84 t=4,60 t=4,33 t=3,83 t=5,32 t=4,54 t=4,59 t=5,97 t=5,21 t=2,27 t=4,22 t=5,55 t=3,37 t=3,85 t=5,03 t=2,79 t=3,98 t=4,92 t=3,17

Active Learning Strategy: No. of Permutations & t-statistic No. of Permutations & t-statistic

## 1. Constitute a committee of learners. Active vs. non-active learning:

– A utility function U (·) assigns a real utility degree to each
2. Find two maximally conflicting learners. 1 1

object a = (a1 . . . ad) ∈ O; U (a) < U (b) means that the user 0,95 0,95

## 3. For each learner, generate a corresponding ranking. 0,9 0,9

strictly prefers b to a.

0,85 0,85

Non-active
0,8
Non-active

Active Active

0,75 0,75

0,7 0,7

0,65 0,65

## – User feedback is used to improve ranking quality.

0,6 0,6
10 15 t=2,58 20 t=2,93 10 15 t=4,78 20 t=6,89

## go to step 2. No. of Preferences & t-statistic No. of Preferences & t-statistic

Poster presented at KDML 2007, Workshop Knowledge Discovery, Data Mining, and Machine Learning, Halle/Saale, September 2007.