Академический Документы
Профессиональный Документы
Культура Документы
URUGUAY
FRANCE
Agenda
Motivations Quality evaluation problem Quality evaluation framework
Illustrated for data freshness
Conclusions
AMW2007 26/10/2007
Vernika Peralta
Motivations
Evaluation Problem
Proposed Framework
Conclusions
Data Quality
Context: Querying multiple, distributed, autonomous and heterogeneous data sources
Where data comes from ? Which is its precision degree ? Which confidence can I have in data ? Have I all pertinent answers ? When was data produced ?
Chic apartment, sea terrace Cannasvieiras center, $5000 English beach, T4, garage
AMW2007 26/10/2007
Vernika Peralta
Motivations
Evaluation Problem
Proposed Framework
Conclusions
Accessibility
Contextual
Representational
Precision, Correction, level of detail. Objectivity, Non ambiguity, Factuality, Impartiality Credibility, Confidence Reputation System availability, source availability, Transaction availability, Ease of use, Localization, Assistance Security, Privileges Pertinence, Relevance, Utility Importance, Added value, Contents Currency, Age, Volatility Density, Coverage, Suffisance Volume, Data quantity Interpretation, Modifiability, Reasonable, Traceability, Appearance, Presentation Comprehension, Readability, Clearness, Signification, Provider, Comparability Minimality, Uniqueness, Concise representation Consistency, Format, Syntax, Alias, Semantics, Control of versions
4
AMW2007 26/10/2007
Motivations
Evaluation Problem
Proposed Framework
Conclusions
AMW2007 26/10/2007
Vernika Peralta
Motivations
Evaluation Problem
Proposed Framework
Conclusions
Process-oriented approach:
Analysis of activities and detection of critical paths System improvement (evolution, maintenance)
Both approaches may be combined but in practice they are rarely considered together:
Complexity of systems Cost vs. immediate needs
6
AMW2007 26/10/2007
Vernika Peralta
Motivations
Evaluation Problem
Proposed Framework
Conclusions
? ? ? U
How to calculate the quality of results? How to calculate the quality of intermediate results?
Score>4
1 .50
CineCritic
AMW2007 26/10/2007
Preferences Versailles
30 .90
MovieLens
AlloCine
Vernika Peralta 7
Motivations
Evaluation Problem
Proposed Framework
Conclusions
7 ?
.80 ?
10 .75 ? ?
30 ?
1 ?
7 ?
.90 ?
How can we bound the quality of results? How can we obtain constraints for sources?
1 ?
.50 ?
30 .90 ? ?
MovieLens
7 ?
1 ?
7 ?
1 ?
CineCritic
AlloCine
UGC
8
AMW2007 26/10/2007
Vernika Peralta
Motivations
Evaluation Problem
Proposed Framework
Conclusions
Approach
Develop a framework for:
Providing a formal base for quality evaluation Analyzing quality factors and metrics Identifying DIS properties that influence quality factors Developing quality evaluation algorithms
Motivations
Evaluation Problem
Proposed Framework
Conclusions
Freshness
Several freshness definitions:
Currency: distance extraction delivery
Example: banc balance Example of metric: time passed since data extraction Extraction frequency
extraction time
delivery time
10
Vernika Peralta
Motivations
Evaluation Problem
Proposed Framework
Conclusions
AMW2007 26/10/2007
Vernika Peralta
11
Motivations
Evaluation Problem
Proposed Framework
Conclusions
Dimension 1:
Nature of data
Data does not change with the same rhythm. The update frequency is a determinant element for freshness evaluation
Stable data: ex. city names, postal codes Long-term-changing data: ex. customer addresses Frequently-changing data: ex. stock, real-time info
Sometimes, we can anticipate data freshness by correlating DB current state and data update cycle:
Change events
Ex. Marital status (married, divorced, )
Change frequency
Ex. Cinema billboards (every Wednesday)
AMW2007 26/10/2007 Vernika Peralta 12
Motivations
Evaluation Problem
Proposed Framework
Conclusions
Dimension 2:
Architecture Types
AMW2007 26/10/2007
Vernika Peralta
13
Motivations
Evaluation Problem
Proposed Framework
Conclusions
Categories :
Synchronous policies:
Pull-pull, push-push
Asynchronous policies:
Pull/pull, pull/push, push/pull, push/push
pull
push
Sources
AMW2007 26/10/2007
Vernika Peralta
14
Motivations
Evaluation Problem
Proposed Framework
Conclusions
Reasoning support
Freshness evaluation bases on an abstract representation of DIS processes
Process graph
Nodes: sources, activities, queries Edges: synchronization, data flow
R1
R2 N5
R3 N6
combine delay
N4 N1 S1 N2 S2 N3 S3
cost
15
Motivations
Evaluation Problem
Proposed Framework
Conclusions
R1
F-Expected=100 F-Effective=110 combine=max cost=30 N3 decompose=id delay=10 F-Expected=60 F-Effective=45 cost=45 N1 F-Expected=15 F-Effective=0 delay=45 F-Expected=25 F-Effective=35
Query nodes:
Freshness expected by users
Activity nodes:
Execution cost of an activity Combine function for several freshness values Decompose function for freshness constraints
N2 cost=20
F-Expected=5 F-Effective=15
Control edges :
S1 Delay between the execution of 2 activities SourceFreshness=0 Data edges: Effective freshness produced by an activity Expected freshness for an activity
AMW2007 26/10/2007 Vernika Peralta
S2
SourceFreshness=15
16
Motivations
Evaluation Problem
Proposed Framework
Conclusions
AMW2007 26/10/2007
Vernika Peralta
17
Motivations
Evaluation Problem
Proposed Framework
Conclusions
Forward propagation
Allows:
Fixing freshness bounds Verifying graph conformity for user expectations
ExpectedFreshness=100
R1
F-Effective=110 combine=max N3 decompose=id cost=30 delay=10 F-Effective=45 cost=45 N1 F-Effective=0 delay=45 F-Effective=35
Mechanism:
Propagate freshness values along the graph (topological order) Calculate the freshness produced by each node
combining properties of the node and its predecessors
N2 cost=20
F-Effective=15
S1
SourceFreshness=0
S2
SourceFreshness=15
AMW2007 26/10/2007
Vernika Peralta
18
Motivations
Evaluation Problem
Proposed Framework
Conclusions
Backward propagation
Allows:
Fixing freshness constraints for sources Verifying graph conformity for source freshness
ExpectedFreshness=100
R1
F-Expected=100 combine=max decompose=id cost=30 delay=45 F-Expected=25
N3
delay=10 F-Expected=60
Mechanism :
Propagate freshness values along the graph (inverse topological order) Calculate freshness constraints for each node
Decomposing freshness constraints of successor nodes
AMW2007 26/10/2007
cost=45 N1 F-Expected=15
N2 cost=20
F-Expected=5
S1
SourceFreshness=0
S2
SourceFreshness=15
Vernika Peralta
19
Motivations
Evaluation Problem
Proposed Framework
Conclusions
R1
F-Expected=100 F-Effective=110 cost=30 N3 combine=max decompose=id delay=10 F-Expected=60 F-Effective=45 cost=45 delay=45 F-Expected=25 F-Effective=35
N1
N2
cost=20
F-Expected=15 F-Effective=0
F-Expected=5 F-Effective=15
S1
SourceFreshness=0
S2
SourceFreshness=15
AMW2007 26/10/2007
Vernika Peralta
20
Motivations
Evaluation Problem
Proposed Framework
Conclusions
R5
C N45
AMW2007 26/10/2007
Vernika Peralta
Motivations
Evaluation Problem
Proposed Framework
Conclusions
Restructuring primitives:
Add node / edge / labels Delete node / edge / labels
Restructuring macro-operations:
Decompose node Parallelize node Fusion nodes Replace node / sub-graph Possibility of defining new macro-operations
22
AMW2007 26/10/2007
Vernika Peralta
Motivations
Evaluation Problem
Proposed Framework
Conclusions
Execute propagation algorithms Verify the conformity of results to user expectations / source constraints For each non-conforming result:
Determine critical paths Analyze critical paths Propose restructuring actions
AMW2007 26/10/2007
Vernika Peralta
23
Motivations
Evaluation Problem
Proposed Framework
Conclusions
Conclusions
A framework for quality evaluation, analysis and improvement
Analysis of quality factors Quality graphs Parametric evaluation algorithms Analysis of critical paths Improvement actions
AMW2007 26/10/2007
Vernika Peralta
24
Thanks
PhD manuscript: http://www.prism.uvsq.fr/~vepe/pubs/2006/phd-vp.zip
URUGUAY
FRANCE