Вы находитесь на странице: 1из 180

Digital Enterprise Research Institute www.deri.

ie
Introduction to Peer-to-Peer
Networks Networks
Manf red Hauswi rt h, Marcel Karnst edt
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Goals of the Tutorial
g ta te p se esea c st tute www.deri.ie
Posi t i on t he P2P paradi gm
f i n t he desi gn space of
di st ri but ed syst ems
Get an overvi ew of P2P
syst ems and t he underl yi ng
concept s
Underst and t he probl em of Underst and t he probl em of
decent ral i zed dat a
management i n P2P
syst ems
Digital Enterprise Research Institute www.deri.ie
What is P2P?
g ta te p se esea c st tute www.deri.ie
Cl ay Shi rkey (The Accel erat or Group):
Peer t o peer i s a cl ass of appl i cat i ons t hat t ake ad ant age of Peer- t o- peer is a cl ass of appl i cat i ons t hat t ake advant age of
resourcesst orage, cycl es, cont ent , human presence
avai l abl e at t he edges of t he Int ernet . Because accessi ng
t hese decent ral i zed resources means operat i ng in an t hese decent ral i zed resources means operat i ng in an
envi ronment of unst abl e connect i vi t y and unpredi ct abl e IP
addresses, peer- t o- peer nodes must operat e out si de t he DNS
and have si gni f i cant or t ot al aut onomy of cent ral servers. and have si gni f i cant or t ot al aut onomy of cent ral servers.
P2P l i t mus t est :
Does i t al l ow f or vari abl e connect i vi t y and t emporary net work
addresses? addresses?
Does i t gi ve t he nodes at t he edges of t he net work si gni f i cant
aut onomy?
P2P~ an appl i cat i on- level Int ernet on t op of t he Int ernet P2P an appl i cat i on level Int ernet on t op of t he Int ernet
Digital Enterprise Research Institute www.deri.ie
P2P in a historical Context
g ta te p se esea c st tute www.deri.ie
The ori gi nal Int ernet was desi gned as a P2P syst em
any 2 comput ers coul d send packet s t o each ot her any 2 comput ers coul d send packet s t o each ot her
no f i rewal l s / no net work address t ransl at i on
no asymmet ri c connect i ons (V.90, ADSL, cabl e, et c.)
h b k h ki l l FTP d l C/ Sb t he back- t hen kill er apps FTP and t elnet are C/ S but
anyone coul d t el net / FTP anyone el se
servers act ed as cl i ent s and vi ce versa
cooperat i on was a cent ral goal and val ue: no spam or
exhaust i ve bandwi dt h consumpt i on
Typi cal exampl es of ol d- f ashi oned P2P: Typi cal exampl es of ol d f ashi oned P2P :
Usenet News
DNS
The emergence of P2P can be seen as a renai ssance
of t he ori gi nal Int ernet model
Digital Enterprise Research Institute www.deri.ie
What is P2P?
g ta te p se esea c st tute www.deri.ie
Every part i ci pat i ng node act s as bot h
a cl i ent and a server (servent )
Every node pays i t s part i ci pat i on by
provi di ng access t o (some of ) i t s
resources
Propert i es: Propert i es:
no cent ral coordi nat i on
no cent ral dat abase
no peer has a gl obal vi ew of t he no peer has a global vi ew of t he
syst em
global behavior emerges f rom local
i nt eract i ons
all exist ing dat a and services are
accessi bl e f rom any peer
peers are aut onomous
peers and connect i ons are unrel i abl e peers and connect i ons are unreliable
Digital Enterprise Research Institute www.deri.ie
Where is P2P System layers ?
g ta te p se esea c st tute www.deri.ie
User
Users
Commerce and soci et y i s P2P
QoS
uses
Commerce and soci et y is P2P
Appl i cat i on l ayer
E- commerce syst ems can be P2P
Application
QoS
exploits
or cent ral i zed
Inf ormat i on management
Di rect ori es and dat abases can be
Information
Management
QoS
p
Di rect ori es and dat abases can be
P2P or cent ral i zed
Net works are P2P
Network
Qos
exploits
Int ernet
Network
Digital Enterprise Research Institute www.deri.ie
Types of P2P Systems
g ta te p se esea c st tute www.deri.ie
E- commerce syst ems
eBay B2B market pl aces B2B i nt egrat i on servers eBay, B2B market pl aces, B2B i nt egrat i on servers,
Fi l e shari ng syst ems
Napst er, Gnut el l a, Freenet ,
Di st ri but ed Dat abases
Mari posa [St onebraker96],
N k Net works
Arpanet
Mobi l e ad- hoc net works Mobi l e ad hoc net works
Digital Enterprise Research Institute www.deri.ie
How much P2P is involved?
g ta te p se esea c st tute www.deri.ie
P2P P2Pappl i cat i on P2Puser
no no yes eBay
P2P
i nf ormat i on
management
P2P appl i cat i on P2P user
i nt eract i on
N t
no no yes eBay
yes yes yes Gnut el l a
no yes yes Napst er
yes yes yes Gnut el l a,
Freenet
Digital Enterprise Research Institute www.deri.ie
Related Approaches
g ta te p se esea c st tute www.deri.ie
Rel at ed di st ri but ed i nf ormat i on syst em approaches:
Event - based syst ems
Push syst ems
Mobi l e agent s
Di st ri but ed dat abases Di st ri but ed dat abases
Digital Enterprise Research Institute www.deri.ie
Event-based (publish/subscribe)
g ta te p se esea c st tute www.deri.ie
Syst em model
Component s (peers) i nt eract by generat i ng and recei vi ng event s p (p ) y g g g
Component s decl are i nt erest i n recei vi ng speci f i c (pat t erns of )
event s and are not i f i ed upon t hei r occurrence
Support s a hi ghl y f l exi bl e i nt eract i on bet ween l oosel y- coupl ed pp g y y p
component s
Subscr i be t o
XY
X f ol l ow ed by Y
X
Y
Digital Enterprise Research Institute www.deri.ie
Event-based vs. Peer-to-Peer
g ta te p se esea c st tute www.deri.ie
Common propert i es:
symmet ri c communi cat i on st yl e symmet ri c communi cat i on st yl e
dynami c bi ndi ng bet ween producers and consumers
Subscri pt i on t o event s ~ passi ve queri es
EB: not i f i cat i on
P2P: act i ve di scovery
S b i t i l t hi t i t d Subscri pt i on l anguage support s more sophi st i cat ed
queri es and pat t ern mat chi ng (event pat t erns wi t h
t i me dependenci es)
Event - based syst ems t ypi cal l y have a speci al i zed
event di st ri but i on i nf rast ruct ure
EB 2 d P2P 1 d EB: 2 node t ypes, P2P: 1 node t ype
EB i nf rast ruct ure must be depl oyed
Digital Enterprise Research Institute www.deri.ie
Push Systems
g ta te p se esea c st tute www.deri.ie
A set of designat ed g
broadcast ers of f er i nf ormat i on
t hat i s pre- grouped i n channel s
(weat her, news, et c.)
R i b i b t h l Recei vers subscri be t o channel s
of t hei r i nt erest and recei ve
channel i nf ormat i on as i t i s
bei ng broadcast (t i mel y g ( y
di st ri but i on)
Recei vers may have t o pay pri or
t o recei vi ng t he i nf ormat i on
( i f l f ) (pay- per- vi ew, f l at f ee, et c.)
Pul l push
Digital Enterprise Research Institute www.deri.ie
Push Systems vs. Peer-to-Peer
g ta te p se esea c st tute www.deri.ie
Asymmet ri c communi cat i on st yl e (P2P: symmet ri c)
F i i l d di i b i di Focus i s on t i mel y dat a di st ri but i on not on di scovery
Fi l t eri ng may be depl oyed t o reduce dat a
t ransmi ssi on requi rement s t ransmi ssi on requi rement s
Subscri pt i on t o channel s i s prerequi si t e
Producer/ consumer bi ndi ng i s st at i c
Push syst ems requi re a speci al i zed di st ri but i on
i nf rast ruct ure
P h 3 d t P2P 1 d t Push: 3 node t ypes, P2P: 1 node t ype
Push i nf rast ruct ure must be depl oyed
Digital Enterprise Research Institute www.deri.ie
Mobile Agents
g ta te p se esea c st tute www.deri.ie
A mobi l e agent i s a comput at i onal ent i t y
h d i k i t hat moves around i n a net work at i t s
own vol i t i on t o accompl i sh a t ask on
behal f of i t s owner
can cooperat e wi t h ot her agent s can cooperat e wi t h ot her agent s
l earns (Whom t o vi si t next ?)
Mobi l i t y (het erogeneous net work!)
Weak: code dat a Weak: code, dat a
St rong: code, dat a, execut i on St ack
Digital Enterprise Research Institute www.deri.ie
Mobile Agents vs. Peer-to-Peer
g ta te p se esea c st tute www.deri.ie
Very si mi l ar i n t erms of search and navi gat i on
P2P: t he peers propagat e request s (search updat e) P2P: t he peers propagat e request s (search, updat e)
MA: t he nodes propagat e t he agent s
Mobi l e agent ~ act i ve query
Mobi l e agent syst ems requi re a consi derabl y more
sophi st i cat ed envi ronment
mobi l e code support (heavy) mobi l e code support (heavy)
securi t y (prot ect t he recei vi ng node f rom mal i ci ous mobi l e
agent s and vi ce versa)
In many domai ns P2P syst ems can t ake over
more apt f or di st ri but ed dat a management
l ess requi rement s (sendi ng code requi res much l ess requi rement s (sendi ng code requi res much
bandwi dt h, securi t y, et c.)
Digital Enterprise Research Institute www.deri.ie
Distributed Databases
g ta te p se esea c st tute www.deri.ie
Fragment i ng l arge dat abases (e.g., rel at i onal ) over
physi cal l y di st ri but ed nodes physi cal l y di st ri but ed nodes
Ef f i ci ent processi ng of compl ex queri es (e.g., SQL)
by decomposi ng t hem
Ef f i ci ent updat e st rat egi es (e.g., l azy vs. eager)
Consi st ent t ransact i ons (e.g., 2 phase commi t )
N l l h l t l di t i Normal l y approaches rel y on cent ral coordi nat i on
Digital Enterprise Research Institute www.deri.ie
Distributed Databases vs. Peer-to-Peer
g ta te p se esea c st tute www.deri.ie
Dat a di st ri but i on i s a key i ssue f or P2P syst ems
Approaches i n di st ri but ed DB t hat address Approaches i n di st ri but ed DB t hat address
scal abi l i t y
LH* f ami l y of scal abl e hash i ndex st ruct ures [Li t wi n97]
Snowbal l : scal abl e st orage syst em f or workst at i on cl ust ers
[Vi ngral ek98]
Fat - Bt ree: a scal abl e B- Tree f or paral l el DB [Yokot a 9] Fat Bt ree: a scal abl e B Tree f or paral l el DB [Yokot a 9]
Approaches i n di st ri but ed DB t hat address
aut onomy (and scal abi l i t y)
Mari posa: di st ri but ed rel at i onal DBMS based on an
underl yi ng economi c model [St onebraker96]
P2P Dat a Management has t o address bot h g
scal abi l i t y and aut onomy
Digital Enterprise Research Institute www.deri.ie
Usage Patterns to Position P2P
g ta te p se esea c st tute www.deri.ie
Di scoveri ng i nf ormat i on i s t he predomi nant probl em
Occasi onal di scovery: search engi nes
P2P MA
Occasi onal di scovery: search engi nes
ad hoc request s, i rregul ar
E.g., new t own where i s t he next car rent al ?
N i f i i b d
P2P, MA
h
Not i f i cat i on: event - based syst ems
not i f i cat i on f or (correl at ed) event s (event pat t erns)
E.g., not i f y me when my st ocks drop bel ow a t hreshol d
push
g , y y p
Syst emat i c di scovery: P2P syst ems
f i nd cert ai n t ype of i nf ormat i on on a regul ar basi s
Eg search f or MP3 f i l es of Jet hro Tul l regul arl y
search engines, MA
E.g., search f or MP3 f iles of Jet hro Tul l regul arl y
Cont i nuous i nf ormat i on f eed: push syst ems
subscri pt i on t o a cert ai n i nf ormat i on t ype
event - based
E.g., sport s channel , updat es are sent as soon as avai l abl e
Digital Enterprise Research Institute www.deri.ie
The Interaction Spectrum
g ta te p se esea c st tute www.deri.ie
Event - based syst ems Mobile agent s y
Push syst ems
g
Peer- t o- peer syst ems
i i passive act ive
Digital Enterprise Research Institute www.deri.ie
Peer-to-Peer vs. C/S and Web
g ta te p se esea c st tute www.deri.ie
Client -Server Client -Server
Session-
based
Web-based
Peer-t o-Peer
based
Coupl i ng t ight loose very loose
Comm.
St yl e
asymmet ric asymmet ric symmet ric
Number of moderat e high Number of
Cl i ent s
moderat e
(1000)
high
(1,000,000)
high (1,000,000)
Number of
S
few (10)
many
(100 000)
none (0)
Ser ver s
( )
(100,000)
( )
Digital Enterprise Research Institute www.deri.ie
Coupling vs. Scalability
g ta te p se esea c st tute www.deri.ie
u
p
l
i
n
g
C
o
u
session-based
push-based
event based
web-based
event-based
peer-to-peer
Scalability
Digital Enterprise Research Institute www.deri.ie
P2P System Models
g ta te p se esea c st tute www.deri.ie
Cent ral i zed model
gl obal i ndex hel d by a cent ral aut hori t y gl obal i ndex hel d by a cent ral aut hori t y
(si ngl e poi nt of f ai l ure)
di rect cont act bet ween request ors and provi ders
Exampl e: Napst er Exampl e: Napst er
Decent ralized model
Exampl es: Freenet , Gnut el l a
no gl obal i ndex, no cent ral coordi nat i on, gl obal behavi or emerges
f rom l ocal i nt eract i ons, et c.
di rect cont act bet ween request ors and provi ders (Gnut el l a) or
f medi at ed by a chai n of i nt ermedi ari es (Freenet )
Hi erarchi cal model
i nt roduct i on of super- peers p p
mi x of cent ral i zed and decent ral i zed model
Digital Enterprise Research Institute www.deri.ie
Centralized Information Systems
g ta te p se esea c st tute www.deri.ie
Web search engi ne
Gl obal scal e appl i cat i on
Client
Gl obal scal e appl i cat i on
Exampl e: Googl e
150 Mi o searches/ day
Client
Client
1- 2 Terabyt es of dat a
(Apri l 2001)
1
Find
2
Result
Google
Server
Client Client
Find
"aberer"
Result
home page of Karl Aberer
Client
Client
St rengt hs
Client Client
Client
Gl obal ranki ng
Fast response t i me
Weaknesses
Google: 15000 servers
Weaknesses
Inf rast ruct ure, admi ni st rat i on, cost
A new company f or every gl obal appl i cat i on ?
Digital Enterprise Research Institute www.deri.ie
(Semi-)Decentralized Information Systems
g ta te p se esea c st tute www.deri.ie
P2P Musi c f i l e shari ng
Gl obal scal e appl i cat i on Gl obal scal e appl i cat i on
Exampl e: Napst er
1.57 Mi o. Users
3
Peer
Peer
PeerX
10 TeraByt e of dat a
(2 Mi o songs, 220 songs per user)
(February 2001)
1
Napster
S
Peer Peer
Find
<title> "brick in the wall"
2
Result
Request and transfer file
f mp3
Server
Peer
Peer
<artist> "pink floyd"
<size> "1 MB"
<category> "rock"
schema
you find f.mp3 at peer x
f.mp3
from peer X directly
Peer Peer
Peer
Napster: 100 servers
Digital Enterprise Research Institute www.deri.ie
Lessons Learned from Napster
g ta te p se esea c st tute www.deri.ie
St rengt hs: Resource Shari ng
Every node pays i t s part i ci pat i on by provi di ng access t o i t s resources
physi cal resources (di sk net work) knowl edge (annot at i ons) ownershi p (f i l es) physi cal resources (di sk, net work), knowl edge (annot at i ons), ownershi p (f i l es)
Every part i ci pat i ng node act s as bot h a cl i ent and a server (servent ): P2P
gl obal i nf ormat i on syst em wi t hout huge i nvest ment
decent ral i zat i on of cost and admi ni st rat i on = avoi di ng resource bot t l enecks
Weaknesses: Cent ral i zat i on
server is single point of f ailure
uni que ent i t y requi red f or cont rol l i ng t he syst em = desi gn bot t l eneck uni que ent it y required f or cont rol l i ng t he syst em design bot t l eneck
copying copyright ed mat erial made Napst er t arget of l egal at t ack
increasing degree of resource sharing and decentralization
Centralized
System
Decentralized
System
Digital Enterprise Research Institute www.deri.ie
Fully Decentralized Information Systems
g ta te p se esea c st tute www.deri.ie
P2P f i l e shari ng
Gl obal scal e appl i cat i on
Strengths
Good response time, scalable
Gl obal scal e appl i cat i on
Exampl e: Gnut el l a
40.000 nodes, 3 Mi o f i l es
p
No infrastructure, no administration
No single point of failure
Weaknesses
,
(August 2000)
High network traffic
No structured search
Free-riding
Find
"brick in the wall"
I have
"brick_in_the_wall.mp3"
.
Self-organizing System
Gnutella: no servers
Digital Enterprise Research Institute www.deri.ie
Self-Organization
g ta te p se esea c st tute www.deri.ie
Sel f - organi zed syst ems wel l known f rom physi cs,
bi ol ogy, cybernet i cs bi ol ogy, cybernet i cs
di st ri but i on of cont rol ( = decent ral i zat i on = symmet ry i n
rol es = P2P)
l l i t t i i f t i d d i i local int eract ions, inf ormat ion and decisions
emergence of gl obal st ruct ures
f ai l ure resi l i ence
Digital Enterprise Research Institute www.deri.ie
P2P Architectures
g ta te p se esea c st tute www.deri.ie
Pri nci pl e of sel f - organi zat i on can be appl i ed at
di f f erent syst em l ayers di f f erent syst em l ayers
Networking
Layer
Int ernet Rout i ng TCP/ IP,
DNS
Data Access
Layer
Overlay
Net works
Resource
Locat i on
Gnut el l a,
FreeNet
Service Layer P2P Messagi ng, Napst er,
appl i cat i ons Di st ri but ed
Processi ng
Set i,
Groove
User Layer User
Communi t i es
Collaborat ion eBay, Ciao
Ori gi nal Int ernet desi gned as decent ral i zed syst em:
P2Poverl ay net works ~ appl i cat i on- l evel Int ernet on t op of
Communi t i es
P2P overl ay net works appl i cat i on l evel Int ernet on t op of
t he Int ernet
support appl i cat i on- speci f i c addresses
Digital Enterprise Research Institute www.deri.ie
Resource Location in P2P Systems
g ta te p se esea c st tute www.deri.ie
Probl em: Peers need t o l ocat e di st ri but ed i nf ormat i on
Peers wi t h address p st ore dat a i t ems d t hat are i dent i f i ed by a key k
d
Gi k k ( di k ) l h d Given a key k
d
(or a predi cat e on k
d
) locat e a peer t hat st ores d,
i .e. l ocat e t he index information (k
d
, p)
Thus, t he dat a we have t o manage consi st s of t he key- val ue pai rs (k
d
, p)
Can such a di st ri but ed dat abase be mai nt ai ned and accessed by a
set of peers wi t hout cent ral cont rol ?
P1
P2 P3
P4
k
d
="jingle" ?
P5
P6 P7
P8
k
d
="jingle-bells"
p="P8"
d="jingle-bells.mp3""
("jingle",P8)
6 7
Digital Enterprise Research Institute www.deri.ie
Resource Location Problem
g ta te p se esea c st tute www.deri.ie
Operat i ons
search f or a key at a peer: p- > search(k)
d k d (k ') updat e a key at a peer: p- > updat e(k,p')
peers j oi ni ng and l eavi ng t he net work: p- > j oi n(p )
Perf ormance Cri t eri a (f or search)
search l at ency: e g searcht ime(query) ~ Log(size(dat abase)) search lat ency: e.g. searcht i me(query) ~ Log(size(dat abase))
message bandwidt h, e.g. messages(query) ~ Log(si ze(dat abase))
messages(updat e) ~ Log(si ze(dat abase))
st orage space used, e.g. st oragespace(peer) ~ Log(size(dat abase))
resi l i ence t o f ai l ures (net work, peers)
Qual i t at i ve Cri t eri a
compl ex search predi cat es: equal i t y, pref i x, cont ai nment , si mi l ari t y
search search
use of gl obal knowl edge
peer aut onomy
peer anonymi t y and t rust
securi t y (e.g. deni al of servi ce at t acks)
Digital Enterprise Research Institute www.deri.ie
Summary
g ta te p se esea c st tute www.deri.ie
What i s a P2P Syst em ?
What i s emergence ?
At whi ch l ayers can t he P2P archi t ect ure occur ?
How do we def i ne ef f i ci ency f or a P2P resource How do we def i ne ef f i ci ency f or a P2P resource
l ocat i on syst em ?
Digital Enterprise Research Institute www.deri.ie
Unstructured P2P Overlay Networks
g ta te p se esea c st tute www.deri.ie
No i ndex i nf ormat i on i s used
i .e. t he i nf ormat i on (k, p) i s only avai l abl e di rect l y f rom p ( , p) y y p
Si mpl est approach: Message Fl oodi ng (Gossi pi ng)
send query message t o C nei ghbors
messages have l i mi t ed t i me- t o- l i ve TTL messages have l i mi t ed t i me- t o- l i ve TTL
messages have IDs t o el i mi nat e cycl es
k="jingle-bells"
Example: C=3, TTL=2
Digital Enterprise Research Institute www.deri.ie
Gnutella
g ta te p se esea c st tute www.deri.ie
Devel oped i n a 14 days qui ck hack by Nul l sof t (wi namp)
Ori gi nal l y i nt ended f or exchange of reci pes
E l i f G l l Evol ut i on of Gnut el l a
Publ i shed under GNU General Publ i c Li cense on t he Nul l sof t web server
Taken of f af t er a coupl e of hours by AOL (owner of Nul l sof t )
Thi s was enough t o i nf ect t he Int ernet Thi s was enough t o i nf ect t he Int ernet
Gnut ella prot ocol was reverse engineered f rom downloaded versions of
t he ori gi nal Gnut el l a sof t ware
Thi rd- part y cl i ent s were publ i shed and Gnut el l a st art ed t o spread
d f l d Based on message f l oodi ng
Typical values C= 4, TTL= 7
One request l eads t o messages
Hooki ng up t o t he Gnut el l a syst ems requi res t hat a new peer knows at
240 , 26 ) 1 ( * * 2
0
=

=
TTL
i
i
C C
Hooki ng up t o t he Gnut el l a syst ems requi res t hat a new peer knows at
l east one Gnut el l a host (gnut el l ahost s.com:6346; out si de t he Gnut el l a
prot ocol speci f i cat i on)
Neighbors are f ound using a basic discovery prot ocol (ping- pong
messages) messages)
Digital Enterprise Research Institute www.deri.ie
Gnutella: Protocol Message Types
g ta te p se esea c st tute www.deri.ie
Type Description Contained Information Type Description Contained Information
Ping
Announce availability and probe for
other servents
None
Pong
Response to a ping IP address and port# of responding servent;
number and total kb of files shared number and total kb of files shared
Query
Search request Minimum network bandwidth of responding
servent; search criteria
QueryHit
Returned by servents that have
the requested file
IP address, port# and network bandwidth of
responding servent; number of results and the requested file responding servent; number of results and
result set
Push
File download requests for
servents behind a firewall
Servent identifier; index of requested file; IP
address and port to send file to

Digital Enterprise Research Institute www.deri.ie
Gnutella: Meeting Peers (Ping/Pong)
Digital Enterprise Research Institute www.deri.ie
C
A
B D
E
As ping
E
Bs pong
Cs pong
Ds pong
E Es pong
Digital Enterprise Research Institute www.deri.ie
Gnutella: Searching (Query/QueryHit/GET)
Digital Enterprise Research Institute www.deri.ie
X.mp3
GET X.mp3
X.mp3
C
A
p
B D
E
As query (e g X mp3)
E
A s query (e.g., X.mp3)
Cs query hit
Es query hit
X.mp3
Digital Enterprise Research Institute www.deri.ie
Popularity of Queries [Sripanidkulchai01]
g ta te p se esea c st tute www.deri.ie
Very popul ar document s are approxi mat el y equal l y popul ar
Less popul ar document s f ol l ow a Zi pf - l i ke di st ri but i on (i .e., Less popul ar document s f ol l ow a Zi pf l i ke di st ri but i on (i.e.,
t he probabi l i t y of seei ng a query f or t he i
th
most popul ar
query i s proport i onal t o 1/(i
alpha
)
Access f requency of web document s al so f ol l ows Zi pf - l i ke Access f requency of web document s al so f ol l ows Zi pf l i ke
di st ri but i ons cachi ng mi ght work f or Gnut el l a
Digital Enterprise Research Institute www.deri.ie
Free-riding on Gnutella [Adar00]
g ta te p se esea c st tute www.deri.ie
24 hour sampl i ng peri od:
70% f G l l h f i l 70% of Gnut ella users share no f iles
50% of al l responses are ret urned by t op 1% of shari ng host s
A soci al probl em not a t echni cal one A soci al probl em not a t echni cal one
Probl ems:
Degradat i on of syst em perf ormance: col l apse?
Increase of syst em vul nerabi l i t y
Cent ral i zed (backbone) Gnut el l a copyri ght i ssues?
Veri f i ed hypot heses: Veri f i ed hypot heses:
H1: A si gni f i cant port i on of Gnut el l a peers are f ree ri ders.
H2: Free ri ders are di st ri but ed evenl y across domai ns
H3: Of t en host s share f i l es nobody i s i nt erest ed i n (are not
downl oaded)
Digital Enterprise Research Institute www.deri.ie
Free-riding Statistics - 1 [Adar00]
g ta te p se esea c st tute www.deri.ie
H1: Most Gnut el l a users are f ree ri ders
Of 33 335 host s: Of 33,335 host s:
22,084 (66%) of t he peers share no f i l es
24,347 (73%) share t en or l ess f i l es
Top 1 percent (333) host s share 37%(1,142,645) of t ot al f i l es shared
Top 5 percent (1,667) host s share 70%(1,142,645) of t ot al f i l es shared
Top 10 percent (3,334) host s share 87%(2,692,082) of t ot al f i l es shared
Digital Enterprise Research Institute www.deri.ie
Free-riding Statistics - 2 [Adar00]
g ta te p se esea c st tute www.deri.ie
H3: Many servent s share f il es nobody downloads 3 a y se e t s s a e es obody do oads
Of 11,585 shari ng host s:
Top 1%of si t es provi de nearl y 47%of al l answers
Top 25%of si t es provi de 98%of al l answers
7,349 (63%) never provi de a query response
Digital Enterprise Research Institute www.deri.ie
Topology of Gnutella [Jovanovic01]
g ta te p se esea c st tute www.deri.ie
Smal l - worl d propert i es veri f i ed (f i nd everyt hi ng cl ose by)
Backbone + out ski rt s
Digital Enterprise Research Institute www.deri.ie
Gnutella Backbone [Jovanovic01]
g ta te p se esea c st tute www.deri.ie
Digital Enterprise Research Institute www.deri.ie
Categories of Queries [Sripanidkulchai01]
g ta te p se esea c st tute www.deri.ie
Cat egori zed t op 20 queri es
Digital Enterprise Research Institute www.deri.ie
Caching in Gnutella [Sripanidkulchai01]
g ta te p se esea c st tute www.deri.ie
Average bandwi dt h consumpt i on i n t est s: 3.5Mbps
Best case: t race 2 (73% hi t rat e = 3.7 t i mes t raf f i c
reduct i on)
Digital Enterprise Research Institute www.deri.ie
Gnutella: Bandwidth Barriers
g ta te p se esea c st tute www.deri.ie
Cl i p2 measured Gnut el l a over 1 mont h:
i l i 560 bi l (i l di TCP/ IP h d ) t ypical query is 560 bi t s long (includi ng TCP/ IP headers)
25% of t he t raf f i c are queri es, 50% pi ngs, 25% ot her
on average each peer seems t o have 3 ot her peers act i vel y g p p y
connect ed
Cl i p2 f ound a scal abi l i t y barri er wi t h subst ant i al
perf ormance degradat i on i f queri es/ sec > 10: perf ormance degradat i on i f queri es/ sec > 10:
10 queri es/ sec
* 560 bi t s/ query
* 4 (t o acco nt f or t he ot her 3 q art ers of message t raf f i c) * 4 (t o account f or t he ot her 3 quart ers of message t raf f i c)
* 3 si mul t aneous connect i ons
67,200 bps
10 i / i i t h f di l 10 queri es/ sec maxi mum i n t he presence of many di al up users
won t i mprove (more bandwi dt h - l arger f i l es)
Digital Enterprise Research Institute www.deri.ie
Gnutella: Summary
g ta te p se esea c st tute www.deri.ie
Compl et el y decent ral i zed
Hi t rat es are hi gh Hi t rat es are hi gh
Hi gh f aul t t ol erance
Adopt s wel l and dynami cal l y t o changi ng peer popul at i ons
l h h k f f ( b ) l Prot ocol causes high net work t raf f i c (e.g., 3.5Mbps). For example:
4 connect i ons C / peer, TTL = 7
1 pi ng packet can cause packet s 240 , 26 ) 1 ( * * 2
0
=

=
TTL
i
i
C C
No est i mat es on t he durat i on of queri es can be gi ven
No probabi l i t y f or successf ul queri es can be gi ven
Topol ogy i s unknown al gori t hms cannot expl oi t i t
0

= i
Topol ogy is unknown al gori t hms cannot expl oi t it
Free ri di ng i s a probl em
Reput at i on of peers i s not addressed
Si l b d l bl ( h ) Simple, robust , and scalabl e (at t he moment )
Digital Enterprise Research Institute www.deri.ie
Modern Gnutella
g ta te p se esea c st tute www.deri.ie
Lot s of i mprovement s
Hybri d Super- Peer archi t ect ure
Gnut el l a + DHT
Digital Enterprise Research Institute www.deri.ie
Improvements of Message Flooding
g ta te p se esea c st tute www.deri.ie
Expandi ng Ri ng
st art search wi t h smal l TTL (e.g. TTL = 1)
i f i i l i TTL ( TTL TTL 2) i f no success i t erat i vel y i ncrease TTL (e.g. TTL = TTL + 2)
k- Random Wal kers
f orward query t o one randoml y chosen nei ghbor onl y, wi t h l arge
TTL
st art k random wal kers
random wal ker peri odi cal l y checks wi t h request er whet her t o
cont i nue
Digital Enterprise Research Institute www.deri.ie
Discussion Unstructured Networks
g ta te p se esea c st tute www.deri.ie
Perf ormance
Search l at ency: low (graph propert i es) Search l at ency: l ow (graph propert i es)
Message Bandwi dt h: hi gh
i mprovement s t hrough random wal kers, but essent i al l y t he
whol e net work needs t o be expl ored whol e net work needs t o be expl ored
St orage cost : l ow (onl y l ocal nei ghborhood)
Updat e and mai nt enance cost : l ow (onl y l ocal updat es)
Resi l i ence t o f ai l ures good: mul t i pl e pat hs are expl ored
and dat a i s repl i cat ed
Qual i t at i ve Cri t eri a
search predi cat es: very f l exi bl e, any predi cat e i s possi bl e
gl obal knowl edge: none requi red
peer aut onomy: hi gh
Digital Enterprise Research Institute www.deri.ie
Summary
g ta te p se esea c st tute www.deri.ie
How are unst ruct ured P2P net works charact eri zed ?
What i s t he purpose of t he pi ng/ pong messages i n
Gnut el l a ?
Why i s search l at ency i n Gnut el l a l ow ?
Whi ch are met hods t o reduce message bandwi dt h
i n unst ruct ured net works ? i n unst ruct ured net works ?
Digital Enterprise Research Institute www.deri.ie
Hierarchical P2P Overlay Networks
g ta te p se esea c st tute www.deri.ie
Servers provi de i ndex i nf ormat i on, i .e. t he
i nf ormat i on (k, p) i s avai l abl e f rom dedi cat ed i nf ormat i on (k, p) i s avai l abl e f rom dedi cat ed
servers
Si mpl est Approach
one cent ral server
user regi st er f i l es
servi ce (f i l e exchange) is organi zed
index server
servi ce (f ile exchange) is organi zed
as P2P archi t ect ure
k="jingle-bells"
Digital Enterprise Research Institute www.deri.ie
Napster
g ta te p se esea c st tute www.deri.ie
Cent ral (vi rt ual ) dat abase whi ch hol ds an i ndex of of f ered
MP3/ WMA f i l es
Cl h d f h l ( ) Cli ent s connect t o t hi s server, i dent if y t hemselves (account )
and send a l i st of MP3/ WMA f i l es t hey are shari ng (C/ S)
Ot her cl i ent s can search t he i ndex and l earn f rom whi ch
cl i ent s t hey can ret ri eve t he f i l e (P2P) cl i ent s t hey can ret ri eve t he f ile (P2P)
Addi t i onal servi ces at server (chat et c.)
Napster Server Napster Server
register
(user, files)
A B
A has X.mp3
Download X.mp3
Digital Enterprise Research Institute www.deri.ie
Superpeer Networks
g ta te p se esea c st tute www.deri.ie
Improvement of Cent ral Index Server (Morpheus, Kaaza)
mul t i pl e i ndex servers bui l d a P2P net work
cl i ent s are associ at ed wi t h one (or more) superpeers
superpeers use message f looding t o f orward search request s
Experi ences
redundant superpeers
d are good
superpeers should have
hi gh out degree (> 20)
TTL shoul d be mi ni mi zed TTL shoul d be mi ni mi zed
Digital Enterprise Research Institute www.deri.ie
Discussion
g ta te p se esea c st tute www.deri.ie
Perf ormance
Search l at ency: very low (i ndex) Search l at ency: very l ow (i ndex)
Message Bandwi dt h: l ow
wi t h superpeers f l oodi ng occurs, but t he number of
superpeers i s comparat i vel y smal l superpeers i s comparat i vel y smal l
St orage cost : l ow at cl i ent , hi gh at i ndex server
Updat e cost : l ow (no repl i cat i on)
Resi l i ence t o f ai l ures: bad (syst em has si ngl e- poi nt of
f ai l ure)
Qual i t at i ve Cri t eri a
search predi cat es: very f l exi bl e, any predi cat e i s possi bl e
gl obal knowl edge: server
peer aut onomy: l ow
Digital Enterprise Research Institute www.deri.ie
Summary
g ta te p se esea c st tute www.deri.ie
Whi ch are t he t wo l evel s of P2P net works i n
superpeer net works, and t o whi ch f unct i onal l ayers superpeer net works, and t o whi ch f unct i onal l ayers
are t hey rel at ed ?
Whi ch probl em of di st ri but i on i s avoi ded i n
superpeer net works and addressed i n st ruct ured
net work ? What i s t he i mpact on t he rel at i on net work ? What i s t he i mpact on t he rel at i on
bet ween nodes and f unct i onal l ayers ?
Digital Enterprise Research Institute www.deri.ie
Structured P2P Overlay Networks
g ta te p se esea c st tute www.deri.ie
Unst ruct ured overl ay net works what we l earned
si mpl i ci t y (si mpl e prot ocol ) si mpl i ci t y (si mpl e prot ocol )
robust ness (al most i mpossi bl e t o ki l l no cent ral
aut hori t y)
P f Perf ormance
search l at ency O(l og n), n number of peers
updat e and mai nt enance cost l ow updat e and mai nt enance cost l ow
Drawbacks
t remendous bandwi dt h consumpt i on f or search
f ree ri di ng
Can we do bet t er? Can we do bet t er?
Digital Enterprise Research Institute www.deri.ie
Efficient Resource Location
g ta te p se esea c st tute www.deri.ie
update cost
high
FULL REPLICATION
h gh
low
STRUCTURED P2P OVERLAY
NETWORKS
search cost
low
low
high
NETWORKS
(e.g. prefix routing)
maximal bandwidth
high
UNSTRUCTURED P2P
OVERLAY NETWORKS
(e.g. Gnutella)
SERVER SERVER
(e.g. Napster)
Digital Enterprise Research Institute www.deri.ie
Distribution of Index Information
g ta te p se esea c st tute www.deri.ie
Goal : provi de ef f i ci ent search usi ng f ew messages wi t hout usi ng
desi gnat ed servers
Easy: di st ri but i on of i ndex i nf ormat i on over al l peers i e every peer Easy: di st ri but i on of i ndex i nf ormat i on over al l peers, i .e. every peer
mai nt ai ns and provi des part of t he i ndex i nf ormat i on (k, p)
Di f f i cul t : di st ri but i ng t he dat a access st ruct ure t o support ef f i ci ent
search
server
?
Search starts here
Where to start the search?
data access
?
index information I
structure
peers (storing data and index information)
I1 I2 I3 I4
peers (storing data)
p ( g )
Digital Enterprise Research Institute www.deri.ie
Approaches
g ta te p se esea c st tute www.deri.ie
Di f f erent st rat egi es
P- Gri d: di st ri but i ng a bi nary search t ree P Gri d: di st ri but i ng a bi nary search t ree
Chord: const ruct i ng a di st ri but ed hash t abl e
CAN: Rout i ng i n a d- di mensi onal space
Freenet : cachi ng i ndex i nf ormat i on al ong search pat hs
Commonal i t i es
each peer mai nt ai ns a smal l part of t he i ndex inf ormat i on each peer mai nt ai ns a smal l part of t he i ndex i nf ormat i on
(rout i ng t abl e)
searches perf ormed by di rect ed message f orwardi ng
Di f f erences
perf ormance and qual i t at i ve cri t eri a
Digital Enterprise Research Institute www.deri.ie
P-Grid
g ta te p se esea c st tute www.deri.ie
Search t ree (pref i x t ree)
??? 101
?
???
ext ra
101
?
0?? 1??
ext ra
dat a
101
?
00? 01? 10? 11? 101
?
000 001 010 011 100 101 110 111
!
101
N obj ect s log
2
( N) st eps
Digital Enterprise Research Institute www.deri.ie
Scalable data access structures
g ta te p se esea c st tute www.deri.ie
Assume number of dat a obj ect s > > st orage of one
node node
Di st ri but ed st orage
Gi ven a dat a access st ruct ure
Si ze of dat a access st ruct ure = number of dat a obj ect s
Si f d t t t > > t f d Size of dat a access st ruct ure > > st orage of one node
Probl em: where t o st ore? Probl em: where t o st ore?
Digital Enterprise Research Institute www.deri.ie
Non-scalable Distribution of Search Tree
g ta te p se esea c st tute www.deri.ie
Distribute search tree over peers
bottleneck
???
bottleneck
0?? 1??
00? 01? 10? 11?
000 001 010 011 100 101 110 111
peer 1 peer 2 peer 3 peer 4
Digital Enterprise Research Institute www.deri.ie
Scalable Distribution of Search Tree
g ta te p se esea c st tute www.deri.ie
"Napst er"
bot t leneck
???
0??
1??
00? 01? 10? 11?
000 001 010 011 100 101 110 111
peer 1 peer 2 peer 3 peer 4
Digital Enterprise Research Institute www.deri.ie
Scalable data access structures
g ta te p se esea c st tute www.deri.ie
Associat e each peer wit h a complet e pat h
???
0?? 1??
00? 01? 10? 11?
000 001 010 011 100 101 110 111
peer 1 peer 2 peer 3 peer 4
Digital Enterprise Research Institute www.deri.ie
Scalable data access structures
g ta te p se esea c st tute www.deri.ie
???
1??
peer 1 peer 2
10?
peer 4
knows more about
t his part of t he t ree
100 101
p
knows more about
t his part of t he t ree
peer 3
Digital Enterprise Research Institute www.deri.ie
The result is P-Grid
g ta te p se esea c st tute www.deri.ie
101
?
Peers cooperat e in search
???
?
101
?
11?
1??
peer 1 peer 2
peer 3
???
101
Message
3
peer 4
110 111
1??
peer 1 peer 2
?
101
?
t o peer 3
101 ?
100 101
10?
peer 4
?
101
!
peer 3
Digital Enterprise Research Institute www.deri.ie
Construction
g ta te p se esea c st tute www.deri.ie
Spl i t t i ng Approach (P- Gri d)
peers meet and deci de whet her t o ext end search t ree by spl i t t i ng
h d t he dat a space
peers can perf orm l oad bal anci ng consi deri ng t hei r st orage l oad
net works wi t h di f f erent ori gi ns can merge, l i ke Gnut el l a, Freenet
(l oose coupl i ng) (l oose coupl i ng)
Node Insert i on Approach (Chord, CAN, )
peers det ermi ne t hei r "l eaf posi t i on" based on t hei r IP address peers det ermi ne t hei r l eaf posi t i on based on t hei r IP address
nodes rout e f rom a gat eway node t o t hei r node- i d t o popul at e
t he rout i ng t abl e
net work has t o st art f rom si ngl e ori gi n (st rong coupl i ng) net work has t o st art f rom si ngl e ori gi n (st rong coupl i ng)
Repl i cat i on of dat a i t ems and rout i ng t abl e ent ri es i s used t o
i ncrease f ai l ure resi l i ence
Digital Enterprise Research Institute www.deri.ie
P-Grid Discussion
g ta te p se esea c st tute www.deri.ie
Perf ormance
Search l at ency: O(l og n) (wi t h hi gh probabi l i t y provabl e) Search l at ency: O(l og n) (wi t h hi gh probabi l i t y, provabl e)
Message Bandwi dt h: O(l og n) (sel ect i ve rout i ng)
St orage cost : O(l og n) (rout i ng t abl e)
Updat e cost : l ow (l i ke search)
Qual i t at i ve Cri t eri a Qual i t at i ve Cri t eri a
search predi cat es: pref i x searches
gl obal knowl edge: key hashi ng
peer aut onomy: peers can l ocal l y deci de on t hei r rol e
(spl i t t i ng deci si on)
Digital Enterprise Research Institute www.deri.ie
DHT example: Chord
g ta te p se esea c st tute www.deri.ie
Hashi ng of search keys AND peer addresses on bi nary keys of
l engt h m
e.g. m= 8, key("j i ngl e- bel l s.mp3")= 17, key(196.178.0.1)= 3
Dat a keys are st ored at next l arger node key
peer with hashed identifier p p peer with hashed identifier p,
data with hashed identifier k, then
k e ] predecessor(p), p ]
p
k
m=8
32 keys
stored
at
predecessor
p2
Search possibilities
1. every peer knows every other
O(n) routing table size
2 peers know successor
p3
2. peers know successor
O(n) search cost
Digital Enterprise Research Institute www.deri.ie
Routing Tables
g ta te p se esea c st tute www.deri.ie
Every peer knows m peers wi t h exponent i al l y
i ncreasi ng di st ance
E h bl Each peer p stores a routing table
First peer with hashed identifier s
i
such that
s
i
=successor(p+2
i-1
) for i=1,..,m
W it l fi (i ) We write also s
i
= finger(i, p)
p
p+2
p+4
p+1
i s
i
p+8
s
1,
s
2,
s
3
s
5
p2
i s
i
1 p2
2 p2
3 2
p+8
s
4
5
p3
3 p2
4 p3
5 p4
p+16
Search
O(log n) routing table size
p
p4
Digital Enterprise Research Institute www.deri.ie
Search
g ta te p se esea c st tute www.deri.ie
search(p, k)
find in routing table largest (i, p*) such that p* e [p,k[
/* largest peer key smaller than the searched data key */
if such a p* exists then search(p*, k)
else return (successor(p)) // found
p
p+2
p+4
p+1
8
s
1,
s
2,
s
3
s
Search
p2
p+8
s
4
s
5
k1
k2
O(log n) search cost
RT with exp. increasing
distance O(log n) with high
p3
p+16
distance O(log n) with high
probability
p3
p4
Digital Enterprise Research Institute www.deri.ie
Node Insertion
g ta te p se esea c st tute www.deri.ie
New node q j oi ni ng t he net work
q asks exi st i ng node p t o f i nd predecessor and f i ngers
cost : O(l og
2
n)
p
p+2
p+4
p+1
q
p+4
p2
routing table
of p
routing table
of q
p+8
i s
i
1 q
i s
i
1 p2
p q
p3
p4
2 q
3 p2
4 p3
2 p2
3 p3
4 p3
p+16
4 p3
5 p4
4 p3
5 p4
Digital Enterprise Research Institute www.deri.ie
Load Balancing in Chord
g ta te p se esea c st tute www.deri.ie
Network size n=10^4 Network size n=10 4
5 10^5 keys
uniform data distribution
50 keys per node?
NO, as IP addresses do
not map uniformly into
data key space. y p
Digital Enterprise Research Institute www.deri.ie
Length of Search Paths
g ta te p se esea c st tute www.deri.ie
Network size n=2^12 Network size n=2 12
100 2^12 keys
Path length Log
2
(n)
RTs can be seen
as an embedding
of search trees
into the network
d h h and thus search
starts at a randomly
selected tree depth
Digital Enterprise Research Institute www.deri.ie
Chord Discussion
g ta te p se esea c st tute www.deri.ie
Perf ormance
Search: l i ke P- Gri d Search: l i ke P Gri d
Node j oi n/ l eave cost : O(l og
2
n)
Resi l i ence t o f ai l ures: repl i cat i on t o successor nodes
Qual i t at i ve Cri t eri a
search predi cat es: equal i t y of keys onl y search predi cat es: equal i t y of keys onl y
gl obal knowl edge: key hashi ng, net work ori gi n
peer aut onomy: nodes have by vi rt ue of t hei r address a
speci f i c rol e i n t he net work
Digital Enterprise Research Institute www.deri.ie
Topological Routing (CAN)
g ta te p se esea c st tute www.deri.ie
Based on hashi ng of keys i nt o a d-dimensional space (a t orus)
Each peer i s responsi bl e f or keys of a subvol ume of t he space (a zone)
E h h d f ibl f h i hb i Each peer st ores t he adresses of peers responsibl e f or t he neighboring
zones f or rout i ng
Search request s are greedi l y f orwarded t o t he peers i n t he cl osest zones
Assi gnment of peers t o zones depends on a random sel ect i on made
by t he peer
Digital Enterprise Research Institute www.deri.ie
Network Search and Join
g ta te p se esea c st tute www.deri.ie
N d 7 j i th t k b h i di t i th l f 1 Node 7 joins the network by choosing a coordinate in the volume of 1
=> O(d) updates or RTs
Digital Enterprise Research Institute www.deri.ie
CAN Refinements
g ta te p se esea c st tute www.deri.ie
Mul t i pl e Real i t i es
We can have r di f f erent coordi nat e spaces We can have r di f f erent coordi nat e spaces
Nodes hol d a zone i n each of t hem
Creat es r repl i cas of t he (key, val ue) pai rs
Increases robust ness
Reduces pat h l engt h as search can be cont i nued i n t he
real i t y where t he t arget i s cl osest y g
Overl oadi ng zones
Di f f erent peers are responsi bl e f or t he same zone
Spl i t s are onl y perf ormed i f a maxi mum occupancy (e.g. 4)
i s reached is reached
Nodes know al l ot her nodes i n t he same zone
But onl y one of t he nei ghbors
Digital Enterprise Research Institute www.deri.ie
CAN Path Length
g ta te p se esea c st tute www.deri.ie
Digital Enterprise Research Institute www.deri.ie
CAN Discussion
g ta te p se esea c st tute www.deri.ie
Perf ormance
Search l at ency: O(d n
1/ d
) depends on choi ce of d (wi t h hi gh Search l at ency: O(d n
/
), depends on choi ce of d (wi t h hi gh
probabi l i t y, provabl e)
Message Bandwi dt h: O(d n
1/ d
), (sel ect i ve rout i ng)
St orage cost : O(d) (rout i ng t abl e) St orage cost : O(d) (rout i ng t abl e)
Updat e cost : l ow (l i ke search)
Node j oi n/ l eave cost : O(d n
1/ d
)
Resi l i ence t o f ai l ures: real i t i es and overl oadi ng
Qual i t at i ve Cri t eri a Q
search predi cat es: spat i al di st ance of mul t i di mensi onal keys
gl obal knowl edge: key hashi ng, net work ori gi n
peer aut onomy: nodes can deci de on t hei r posi t i on in t he key peer aut onomy: nodes can deci de on t hei r posi t i on in t he key
space
E l f l h di t
Digital Enterprise Research Institute www.deri.ie
Dynamical Clustering (Freenet)
g ta te p se esea c st tute www.deri.ie
Freenet Background
P2P syst em whi ch support s publ i cat i on, repl i cat i on, and ret ri eval
of dat a of dat a
Prot ect s anonymi t y of aut hors and readers: i nf easi bl e t o
det ermi ne t he ori gi n or dest i nat i on of dat a
Nodes are not aware of what t hey st ore (keys and f i l es are sent
d d d) and st ored encrypt ed)
Uses an adaptive routing and caching strategy
Index inf ormat i on mai nt ai ned at each peer (l i mi t ed cache si ze) Index i nf ormat i on mai nt ai ned at each peer (l i mi t ed cache si ze)
Key Data Address
8e47683isdd0932uje89 ZT38hwe01h02hdhgdzu tcp/125.45.12.56:6474
456r5wero04d903iksd0 Rhweui12340jhd091230 tcp/67.12.4.65:4711
f3682jkjdn9ndaqmmxia eqwe1089341ih0zuhge3 tcp/127.156.78.20:8811
wen09hjfdh03uhn4218 erwq038382hjh3728ee7 tcp/78.6.6.7:2544
712345jb89b8nbopledh tcp/40.56.123.234:1111
d0ui43203803ujoejqhh tcp/128.121.89.12:9991

Digital Enterprise Research Institute www.deri.ie
Freenet Routing
g ta te p se esea c st tute www.deri.ie
If a search request arri ves
Ei t her t he dat a i s i n t he t abl e
O h i f d d h dd i h h Or t he request i s f orwarded t o t he addresses wi t h t he most
si mi l ar keys (l exi cographi c si mi l ari t y, edi t di st ance) t i l l an answer
i s f ound or TTL reached (e.g. TTL = 500)
If an answer arri ves
The key, address and dat a of t he answer are i nsert ed i nt o t he
t abl e
The l east recent l y used key and dat a i s evi ct ed The l east recent l y used key and dat a i s evi ct ed
Qual i t y of rout i ng shoul d i mprove over t i me
Node i s l i st ed under cert ai n key i n rout i ng t abl es ode s st ed u de ce t a ey out g t ab es
Theref ore get s more request s f or si mi l ar keys
Theref ore t ends t o st ore more ent ri es wi t h si mi l ar keys
(cl ust eri ng) when recei vi ng resul t s and cachi ng t hem
D i l i t i f d t Dynami c repl i cat i on of dat a
Digital Enterprise Research Institute www.deri.ie
Freenet Routing
g ta te p se esea c st tute www.deri.ie
peer p has k
peer p' has k'
search k
response (k,p)
new link established
search k' k' similar to k search k , k similar to k
Digital Enterprise Research Institute www.deri.ie
Freenet: Inserting Files
g ta te p se esea c st tute www.deri.ie
Fi rst a t he key of t he f i l e i s cal cul at ed
An i nsert message wi t h t hi s proposed key and a
hops- t o- l i ve val ue i s sent t o t he nei ghbor wi t h t he
most si mi l ar key
Th h k h t h t h d k Then every peer checks whet her t he proposed key
i s al ready present i n i t s l ocal st ore
yes ret urn st ored f i l e (ori gi nal request er must propose y ( g q p p
new key)
no rout e t o next peer f or f urt her checki ng (rout i ng uses
t he same key si mi l ari t y measure as searchi ng) t he same key si mi l ari t y measure as searchi ng)
cont i nue unt i l hops- t o- l i ve are 0 or f ai l ure
Digital Enterprise Research Institute www.deri.ie
Freenet: Evolution of Path Length
g ta te p se esea c st tute www.deri.ie
1000 identical nodes
max 50 data
items/node items/node
max 200
references/node
Initial references:
(i-1, i-2, i+1, i+2) mod n
each time-step:
- randomly insert
- TTL=20
every 100 time-steps:
300 requests
(TTL=500) from
median path length 500 6
(TTL=500) from
random nodes and
measure actual path
length (failure=500).
Digital Enterprise Research Institute www.deri.ie
Freenet Discussion
g ta te p se esea c st tute www.deri.ie
Perf ormance
Search l at ency: low (smal l worl d propert y) Search l at ency: l ow (smal l worl d propert y)
Message Bandwi dt h: l ow (sel ect i ve rout i ng)
St orage cost : rel at i vel y l ow (experi ment al l y not val i dat ed !)
Updat e cost : l ow (l i ke search)
but a boot st rappi ng phase i s requi red
Resi l i ence t o f ai l ures: good (hi gh degree of repl i cat i on of Resi l i ence t o f ai l ures: good (hi gh degree of repl i cat i on of
dat a and keys)
Q l i t t i C i t i Qual i t at i ve Cri t eri a
search predi cat es: wi t h encrypt i on onl y equal i t y of keys
gl obal knowl edge: none g g
peer aut onomy: hi gh (wi t h encrypt i on ri sk of st ori ng
undesi red dat a)
Digital Enterprise Research Institute www.deri.ie
Comparison
g ta te p se esea c st tute www.deri.ie
Paradigm Search Type
Search Cost
( )
Paradigm Search Type
(messages)
Gnutella
Breadth-first
search on graph
String
comparison 0
2* *( 1)
TTL
i
i
C C
=

search on graph comparison


Freenet
Depth-first
search on graph
Equality O(Log n) ?
0 i =

Chord
Implicit binary
search trees
Equality O(Log n)
CAN
d-dimensional
Equality O(d n^(1/d)) CAN
space
Equality O(d n (1/d))
P-Grid
Binary prefix
trees
Prefix O(Log n)
Digital Enterprise Research Institute www.deri.ie
Small World Graphs
g ta te p se esea c st tute www.deri.ie
Each P2P syst em can be
i nt erpret ed as a di rect ed graph p g p
(overl ay net work)
peers correspond t o nodes
rout i ng t abl e ent ri es as di rect ed rout i ng t abl e ent ri es as di rect ed
l i nks
Task Task
Fi nd a decent ral i zed al gori t hm
(greedy rout i ng) t o rout e a
f d A message f rom any node A t o any
ot her node B wi t h f ew hops
compared t o t he si ze of t he graph
h f h Requi res t he exi st ence of short
pat hs i n t he graph
Digital Enterprise Research Institute www.deri.ie
Milgrams Experiment
g ta te p se esea c st tute www.deri.ie
Fi ndi ng short chai ns of acquai nt ances
l i nki ng pai rs of peopl e i n USA who di dn t
know each ot her;
Source person i n Nebraska
Sends message wi t h f i rst name and locat i on Sends message wi t h f irst name and l ocat i on
Target person i n Massachuset t s.
Average l engt h of t he chai ns t hat were
compl et ed was bet ween 5 and 6 st eps
Si x degrees of separat i on pri nci pl e
BIG QUESTION:
WHY t here should be short chains of acquaint ances t e e s ou d be s o t c a s o acqua t a ces
l i nki ng t oget her arbi t rary pai rs of st rangers???
Digital Enterprise Research Institute www.deri.ie
Random Graphs
g ta te p se esea c st tute www.deri.ie
For many years t ypi cal expl anat i on was - random
graphs graphs
Low di amet er: expect ed di st ance bet ween t wo nodes i s
l og
k
N, where k i s t he out degree and N t he number of nodes
Wh i t i l t d i f l t d When pairs or vert ices are select ed unif ormly at random
t hey are connect ed by a short pat h wi t h hi gh probabi l i t y
But t here are some i naccuraci es
If A and B have a common f ri end C i t i s more l i kel y t hat t hey
t hemsel ves wi l l be f ri ends! (cl ust eri ng)
Many real worl d net works (soci al net works, bi ol ogi cal y g
net works i n nat ure, art i f i ci al net works power gri d, WWW)
exhi bi t t hi s cl ust eri ng propert y
Random net works are NOT cl ust ered.
Digital Enterprise Research Institute www.deri.ie
Clustering
g ta te p se esea c st tute www.deri.ie
Cl ust eri ng measures t he f ract i on of nei ghbors of a node t hat
are connect ed t hemsel ves
Regul ar Graphs have a hi gh cl ust eri ng coef f i ci ent Regul ar Graphs have a hi gh cl ust eri ng coef f i ci ent
but al so a hi gh di amet er
Random Graphs have a l ow cl ust eri ng coef f i ci ent
but a l ow di amet er but a l ow di amet er
Bot h model s do mat ch t he propert i es expect ed f rom real
net works!
Random Graph (k= 4)
Short pat h l engt h
Regul ar Graph
(k= 4)
p g
L~l og
k
N
Al most no cl ust eri ng
C~k/ n
Long pat hs
L ~ n/ (2k)
Hi ghl y cl ust ered
C~k/ n
Hi ghl y cl ust ered
C~3/ 4
Digital Enterprise Research Institute www.deri.ie
Small-World Networks
g ta te p se esea c st tute www.deri.ie
Random rewi ri ng of regul ar graph (by Wat t s and
St rogat z) St rogat z)
Wi t h probabi l i t y p rewi re each l i nk i n a regul ar graph t o a
randoml y sel ect ed node
R lt i h h t i b t h f l d Resul t ing graph has propert ies, bot h of regular and
random graphs
Hi gh cl ust eri ng and short pat h l engt h
Freenet has been shown t o resul t i n smal l worl d graphs
Digital Enterprise Research Institute www.deri.ie
Flashback: Freenet Search Performance
g ta te p se esea c st tute www.deri.ie
Modi f yi ng rout i ng t abl es i n Freenet t hrough
cachi ng has a "rewi ri ng ef f ect " cachi ng has a rewi ri ng ef f ect
St udi es show t hat Freenet graphs have smal l - worl d
propert i es
Expl ai ns i mprovi ng search perf ormance
Regular graph:
n nodes, k nearest neighbors
path length ~ n/2k
4096/16 = 256
Random graph:
path length ~ log (n)/log(k)
~ 4
Rewired graph (1% of nodes):
path length ~ random graph
clustering ~ regular graph
Small World Graph
Digital Enterprise Research Institute www.deri.ie
Search in Small World Graphs
g ta te p se esea c st tute www.deri.ie
BUT! Wat t s- St rogat z can provi de a model
f or t he st ruct ure of t he graph f or t he st ruct ure of t he graph
exi st ence of short pat hs
hi gh cl ust eri ng high cl ust eri ng
It does not expl ai n how t he short est pat hs
are f ound are f ound
al so Gnut el l a net works are smal l - worl d graphs
why can search be ef f i ci ent i n Freenet ? why can search be ef f i ci ent i n Freenet ?
Digital Enterprise Research Institute www.deri.ie
P2P Overlay Networks as Graphs
g ta te p se esea c st tute www.deri.ie
Each P2P syst em can be
i nt erpret ed as a di rect ed i nt erpret ed as a di rect ed
graph
peers correspond t o nodes
rout i ng t abl e ent ri es as di rect ed
l i nks
embedded i n some space
P- Gri d: i nt erval [0,1]
Chord: ri ng [0,1)
CAN: d- di mensi onal t orus
Freenet : st ri ngs + Freenet : st ri ngs +
l exi cographi cal di st ance
k
Digital Enterprise Research Institute www.deri.ie
Kleinbergs Small-World Model
g ta te p se esea c st tute www.deri.ie
Kl ei nberg s Smal l - Worl d s model
Embed t he graph i nt o an r- di mensi onal gri d
const ant number p of short range l i nks (nei ghborhood)
q l ong range l i nks: choose l ong- range l i nks such t hat t he probabi l i t y
t o have a l ong range cont act i s proport i onal t o 1/ d
r
Import ance of r ! Import ance of r !
Decent ral i zed (greedy) rout i ng perf orms best i f f . r = di mensi on of
space
r = 2
Digital Enterprise Research Institute www.deri.ie
Influence of r (1)
g ta te p se esea c st tute www.deri.ie
Each peer u has link to the peer v with probability proportional to
where d(u v) is the distance between u and v
r
v u d ) , (
1
where d(u,v) is the distance between u and v.
Optimal value: r = dim = dimension of the space
If r < dim we tend to choose more far away neighbors (decentralized If r < dim we tend to choose more far away neighbors (decentralized
algorithm can quickly approach the neighborhood of target, but then slows
down till finally reaches target itself).
If r > dim we tend to choose more close neighbors (algorithm finds quickly
t t in its n i hb h d b t h s it sl l if it is f ) target in it s neighborhood, but reaches it slowly if it is far away).
When r = 0 long range contacts are chosen uniformly. Random graph theory
proves that there exist short paths between every pair of vertices, BUT
there is no decentralized algorithm capable finding these paths
Digital Enterprise Research Institute www.deri.ie
Influence of r (2)
g ta te p se esea c st tute www.deri.ie
Gi ven node u i f we can part i t i on t he remai ni ng peers i nt o set s A
1
, A
2
,
A
3
, , A
logN
, where A
i
, consi st s of al l nodes whose di st ance f rom u i s
bet ween 2
i
and 2
i+1,
i=0 log(N-1) bet ween 2 and 2
,
i=0..log(N 1).
Then given r = dim each l ong range cont act of u is nearly equally likely t o
bel ong t o any of t he set s A
i
When q = log N on average each node wi l l have a l i nk i n each set of A
i
q g g
i
A
A
1
A
A
3
A
2
1
A
4
Digital Enterprise Research Institute www.deri.ie
DHTs and Kleinberg model
g ta te p se esea c st tute www.deri.ie
P- Gri d s
model model
Kl ei nberg s
model
Digital Enterprise Research Institute www.deri.ie
Conclusions from Kleinberg's Model
g ta te p se esea c st tute www.deri.ie
Wi t h respect t o t he Wat t s and St rogat z model
t here i s no decent ral i zed al gori t hm capabl e perf ormi ng ef f ect i ve search
i n t he cl ass of SWnet works const ruct ed accordi ng t o Wat t s and St rogat z in t he class of SW net works const ruct ed accordi ng t o Wat t s and St rogat z
J. Kl ei nberg present ed t he i nf i ni t e f amily of Small World net works t hat
general i zes t he Wat t s and St rogat z model and shows t hat decent ral i zed
search al gori t hms can f i nd short pat hs wi t h hi gh probabi l i t y
t h i t l i d l i t hi t h t f i l f hi h t here exist only one unique model wi t hin t hat f amily f or which
decent ral i zed al gori t hms are ef f ect i ve.
Wi t h respect t o overl ay net works
Many of t he st ruct ured P2P overl ay net works are si mi l ar t o Kl ei nberg s
model (e.g. Chord, randomized version, q= log N, r= 1)
Unst ruct ured overlay net works also f it int o t he model (e.g. Gnut ella q= 5,
r= 0)
Some vari ant s of st ruct ured P2P overl ay net works are havi ng no y g
neighborhood lat t ice (e.g. P- Grid, p= 0)
Ext ensions t o spaces beyond regular grids are possible (e.g. arbit rary
met ri c spaces)
Digital Enterprise Research Institute www.deri.ie
Summary
g ta te p se esea c st tute www.deri.ie
How can we charact eri ze P2P overl ay net works such t hat we
can st udy t hem usi ng graph- t heoret i c approaches?
What i s t he mai n di f f erence bet ween a random graph and a
SW graph? g p
What i s t he mai n di f f erence bet ween t he Wat t s/ St rogat z and
t he Kl ei nberg model ? t he Kl ei nberg model ?
What i s t he rel at i onshi p bet ween st ruct ured overl ay net works
and smal l worl d graphs? and smal l worl d graphs?
What are possi bl e vari at i ons of t he smal l worl d graph model ?
Digital Enterprise Research Institute www.deri.ie
Specific problems: Identity management
g ta te p se esea c st tute www.deri.ie
Definit ion:
Con i t ent m pping of et of t t ib t e ont o n Consist ent mapping of a set of at t ribut es ont o an
ident ifier in a unique, det erminist ic, and secure way
Ident i f i cat i on i s an essent i al bui l di ng bl ock i n
di st ri but ed (i nf ormat i on) syst ems
E l di t i Exampl es: di rect ory servi ces
DNS: symbol i c host names IP address
X.500: di st i ngui shed name obj ect (at t ri but es) X.500: di st i ngui shed name obj ect (at t ri but es)
UDDI: query web servi ce speci f i cat i on
Digital Enterprise Research Institute www.deri.ie
Identity management issues
g ta te p se esea c st tute www.deri.ie
Dat a management
Uni queness of i dent i f i ers Uni queness of i dent i f i ers
Cent ral i zed vs. di st ri but ed dat a management
Degree of decent ral i zat i on (ort hogonal t o t he di st ri but i on of
dat a!) dat a!)
Updat e consi st ency
Securi t y
Access permi ssi ons (+ management )
Requi res uni que i dent i f i cat i on
R i l i i t t t k Resilience against at t acks
Inf rast ruct ure
Thi rd part y? Thi rd part y?
Scal abi l i t y, robust ness, 24/ 7 avai l abi l i t y, et c.
Digital Enterprise Research Institute www.deri.ie
Use case: Dynamic IP addresses
g ta te p se esea c st tute www.deri.ie
Most comput ers on t he Int ernet have a dynami c IP
address address
Li mi t ed number of IP addresses
Dynami c Host Conf i gurat i on Prot ocol (l ease t i me)
Host mobi l i t y (physi cal mobi l i t y)
Probl em f or any syst em t hat bui l ds a di st ri but ed
management st ruct ure on t op of such net works management st ruct ure on t op of such net works
Use Case:
Management of dynamic I P addresses in st ruct ured Management of dynamic I P addresses in st ruct ured
P2P syst ems ( Chord, DKS, Past ry, P- Grid, et c. )
Digital Enterprise Research Institute www.deri.ie
P2P systems and dynamic IP addresses
g ta te p se esea c st tute www.deri.ie
St ruct ured P2P syst ems (Chord, Past ry, P- Gri d, DKS,
et c.) et c.)
These syst ems const ruct a di st ri but ed i ndex rout i ng
t abl es
D i IP dd t i t bl b Dynamic IP addresses rout ing t abl es become
i nconsi st ent syst em can break down
Unst ruct ured (Gnut el l a) and hi erarchi cal (Fast Track)
syst ems
Less of a probl em
But t hey pay wi t h But , t hey pay wi t h
hi gh bandwi dt h consumpt i on, or
si ngl e poi nt of f ai l ures, or
hi gh i nf rast ruct ure cost s or hi gh i nf rast ruct ure cost s, or
et c.
Digital Enterprise Research Institute www.deri.ie
Problems to address
g ta te p se esea c st tute www.deri.ie
How t o f i nd out t hat an IP address has become i nval i d?
No response p
Net work probl em or di d t he peer get a new address?
Response
Is i t st i l l t he same peer? (aut hent i ci t y, repl ay, man- i n- t he- p y p y
mi ddl e at t acks)
Frequency of address changes i s cruci al
Peers can j oi n and l eave at any moment j y
IP address can change at any moment
Securi t y: DOS at t acks are very si mpl e
Assume peers report back t hei r new IP address Assume peers report back t hei r new IP address
Evi l Hacker.org part i ci pat es i n t he overl ay and t hus f i nds
out IP addresses
Evi l Hacker org report s al l IP addresses it f i nds poi nt i ng t o Evi l Hacker.org report s al l IP addresses it f i nds poi nt i ng t o
random host s or i t sel f
A scal abl e and secure i nf rast ruct ure i s requi red
Digital Enterprise Research Institute www.deri.ie
Problems of current P2P approaches
g ta te p se esea c st tute www.deri.ie
Thi rd- part y i nf rast ruct ures are requi red
Very cost l y mai nt enance prot ocol s Very cost l y mai nt enance prot ocol s
Mai nt enance prot ocol s may compromi se st ruct ural
propert i es (e.g., l oad- bal anci ng)
Previ ous knowl edge i s l ost (e.g., reput at i on of t he
peer, QoS, et c.)
N t h dd i t No current approach addresses securi t y
Onl y t he owner shoul d be al l owed t o updat e t he mappi ng
DOS, repl ay, man- in- t he- mi ddl e, et c. are not addressed DOS, repl ay, man in t he mi ddl e, et c. are not addressed
Digital Enterprise Research Institute www.deri.ie
IPv6 as an alternative
g ta te p se esea c st tute www.deri.ie
Wi t h IPv6 dynami c addresses or NAT are no l onger
necessary necessary
IPv6 address space i s ~3,4 * 10
38
(or 10
30
addresses per
person on t he pl anet )
IP 4 ( t ) dd i 2
32
IPv4 (current ) address space is 2
32
IPsec (i ncl uded i n IPv6)
sol ves aut hent i cat i on probl em sol ves aut hent i cat i on probl em
DOS at t acks are more di f f i cul t
Mobi l i t y i s addressed
IPv6: home/ f orei gn address
IPv4: mobi l i t y ext ensi on but not support ed on a l arge scal e
Probl em: IPv6 has not been depl oyed yet Probl em: IPv6 has not been depl oyed yet
Digital Enterprise Research Institute www.deri.ie
DNS as an alternative
g ta te p se esea c st tute www.deri.ie
DNS ext ensi ons
Several RFCs ext end t he ori gi nal DNSspeci f i cat i on so t hat Several RFCs ext end t he ori gi nal DNS speci f i cat i on so t hat
DNS coul d support secure updat es
Probl ems
Very heavy wei ght Very heavy- wei ght
Conf i gurat i on i s very di f f i cul t and error- prone
Not f or t he normal user (as i n a P2P syst em)
D DNS( d i i l i ) DynDNS (and si mi l ar servi ces)
Host s mai nt ai n a consi st ent name/ address mappi ng i n a
speci al DNS domai n vi a a speci al cl i ent p p
Probl ems
Cent ral i zed scal abi l i t y
Servi ce may go out of busi ness Servi ce may go out of busi ness
Digital Enterprise Research Institute www.deri.ie
Basic idea
g ta te p se esea c st tute www.deri.ie
DNS
FQDN IP address
static
mapping
lookup IP
address
P G id
Index
routing based on
FQDN (any overlay)
P-Grid
Data
routing based on
lookup IP
address
logical identifier
P-Grid
logical identifier IP address
DYNAMIC mapping
Digital Enterprise Research Institute www.deri.ie
Informal discussion
g ta te p se esea c st tute www.deri.ie
Use uni que l ogi cal i dent i f i ers (UUIDs) i nst ead of physi cal
i dent i f i ers (IP addresses):
Peer i dent i f i ers
Rout i ng based on UUIDs
Use t he overl ay it sel f t o securel y mai nt ai n mappi ngs bet ween Use t he overl ay i t sel f t o securel y mai nt ai n mappi ngs bet ween
t he l ogi cal and t he physi cal i dent i f i er
Sel f - ref erent i al approach Sel f - ref erent i al approach
Rat e of changes < sel f - heal i ng rat e
Dynami c equi l i bri um
Advant ages:
General i dent i f i cat i on f aci l i t y di sent angl i ng l ogi cal i dent i f i ers
f rom net work st ruct ure
Tracki ng of chances (f or exampl e, reput at i on)
Digital Enterprise Research Institute www.deri.ie
Maintenance and routing
g ta te p se esea c st tute www.deri.ie
Uni versal Uni que Ident i f i er (UUID) are generat ed
l ocal l y
Crypt ographi cal l y secure hash f unct i on gl obal
uni queness
Index / rout i ng t abl es: UUID
Peers mai nt ai n an up- t o- dat e UUID- IP mappi ngs i n P- Gri d
R t i Rout i ng
Peers cache known UUID- IP mappi ngs
Mapping exi st s i n cache ident it y of t arget peer i s app g e st s cac e de t t y o t a get pee s
checked bef ore f orwardi ng
No mappi ng i s known query P- Gri d f or mappi ng
Digital Enterprise Research Institute www.deri.ie
Security issues
g ta te p se esea c st tute www.deri.ie
Peer generat es publ i c/ pri vat e key
UUID- publ i c key P- Gri d
Why i s i t secure?
Dat a i n P- Gri d i s st ored at a number of random repl i cas
Hard t o at t ack
Request are
signed (privat e key) aut hent i ci t y & access permi ssi ons
d l bl t ime- st amped no replay possi bl e
Quorums are requi red f or each request
B i h PGP Bet t er securit y t han PGP
Quorums
Independent , random pat hs avoi d weakest l i nk probl em
113
Revoke and updat e of securi t y rel evant i nf ormat i on i s possi bl e
Digital Enterprise Research Institute www.deri.ie
Directory maintenance: self-healing
g ta te p se esea c st tute www.deri.ie
Repai r st rat egi es
Eager repai r: repai r each st al e ent ry encount ered i mmedi at el y Eager repai r: repai r each st al e ent ry encount ered i mmedi at el y
Lazy repai r: repai r a rout i ng t abl e when al l ref erences at one l evel
become st al e
DNS
lookup IP
address
lookup IP address
in case of failure
ro ting based on
routing based on
logical identifier
address
in case of failure
routing based on
FQDN (any overlay)
maintain logical identifier IP address
mapping: eager or lazy
Digital Enterprise Research Institute www.deri.ie 0 1
query(01*) @ 7
query(0101) @ 7 (for stale entry 5, cycle -> abort)
query(1110) @ 7 (for stale entry 14, forward to 12 or 13)
(1110) @ 12 (i ffli )
ID
ID
Presently online
Presently offnline
ID
Stores mappings
g ta te p se esea c st tute www.deri.ie 0 1
01
query(1110) @ 12 (is offline)
query(1110) @ 13 (for stale entry 2)
query(0010) @ 13 (forward to 5)
query(0010) @ 5 (forward to 7)
query(0010) @ 7 (forward to 9)
00
1 : 2 ,12
Up-to-date cache
ID
Stores mappings
of peers
2
12 13 14
01
10 11
qu ry( ) @ 7 (forwar to 9)
query(0010) @ 9 (new entry for 2 found !)
query(1110) @ 2 (new entry for 14 found !)
query(01*) @ 14 (finally )
00
Stale cache
12
12 13 14
0 : 1,14
10 : 11,13
2
12,13,14
000 001 010 011
100 101
1 : 12 13
1
1
1 : 8 2
9
2,3
1 : 2 12
14
4,5
1 : 11,12
3
6,7
0 : 4,7
11
8,9
0 : 5,9
13
10,11
0 : 5,7
10 : 6,13
12
12,13,14
1 : 12, 13
01 : 5, 10
001: 9,4
7
1
4
2,3
1 : 8,2
01 : 3, 10
000: 1,7
5
4,5
1 : 2,12
00 : 9,4
011: 3,10
10
6,7
1 : 11,12
00 : 1,9
010: 5,14
0 : 4,7
11 : 2,12
101: 8,13
6
8,9
0 : 5,9
11 : 2,12
100: 6,11
8
10,11
1 : 12, 13
01 : 5,14
001: 9,4
1 : 6,13
01 :10,14
000: 1,7
1 : 8, 13
00 : 7,9
011: 3,10
1 : 6,8
00 : 1,7
010: 5,14
1 : 1,3
11 : 2,12
101: 8,13
0 : 4,9
11 : 2,12
100: 6,11
Digital Enterprise Research Institute www.deri.ie
Eager repair strategy
g ta te p se esea c st tute www.deri.ie
Dynamic equilibrium equat ion
Syst em i s i n a dynami c equi l i bri um i f t he rat e of changes due t o
changi ng mappi ngs and t he rat e of repai rs i s equal
Dynamic equilibrium equat ion
LHS
Rate at which repair of stale routing entries occur Rate at which repair of stale routing entries occur
r
up
changes per 1- r
up
queri es
N
rec
1 addit ional recursive queries
Repair makes sense only if t he rout i ng ent ry t o be repai red
corresponds t o an onl i ne peer corresponds t o an onl i ne peer
A repai r i s possi bl e onl y i f recursi ve query succeeds
RHS
Rate of entries turning stale
h r
up
changes
1- p
dyn
probabilit y of non- st ale ref erences (only t hese can t urn st ale)
r ref erences at each peer f or each of l og
2
n l evel s
Digital Enterprise Research Institute www.deri.ie
Lazy repair strategy
g ta te p se esea c st tute www.deri.ie
Repai r onl y i f al l ref erences of a l evel are st al e
Not al l rout i ng ent ri es are t reat ed uni f orml y
The number of st al e ent ri es f or each rout i ng l evel at each peer
def i nes t he st at e of t hat l evel
Markovi an model
0 ref
st ale
1 ref
st ale
2 ref
st ale
r ref
st ale

I D
change
I D
change
I D
change
I D
change
repairs
Dynami c equi l i bri um equat i on
i nf l ow = out f l ow (f or each st at e)
At dynami c equi l i bri um, t he number of rout i ng l evel s wi t h
i b f l i h h l h ld given number of st ale ent ries over t he whole syst em should
not change
N.B. We dist inguish st ale ent ries from offline peers
Digital Enterprise Research Institute www.deri.ie
Lazy repair: Analytical vs.
simulation results
g ta te p se esea c st tute www.deri.ie
Number of messages vs. rat e of change ( N= 128, 256, 512, 1024,
replicat ion fact or is 8)
Msg
HL
80
Msg
Lazy Rec., Mess. vs. r_up for p_on=1 Hn=N 8L,
sim,N=256
ana,N=128
sim,N=128
r
up
60
ana N=1024
sim,N=1024
ana,N=512
sim,N=512
ana,N=256
40
ana,N=1024
20
0.025 0.05 0.075 0.1 0.125 0.15
r_up
Digital Enterprise Research Institute www.deri.ie
Effect of p
on
on stability and
message overhead
g ta te p se esea c st tute www.deri.ie
H HLL
I n net works wit h more online peers t he lazy st rat egy is advant ageous
but collapses earlier
1000
1200
Msg
Lazy vs. Eager rec.Hr_up=0.2, lg_2HnL=5L
Direct ory "unst able"
800
1000
Eager rec
Lazy rec
400
600
p on
200
Direct ory "st able"
0.3 0.4 0.5 0.6 0.7 0.8 0.9
p_on
Digital Enterprise Research Institute www.deri.ie
Summary
g ta te p se esea c st tute www.deri.ie
Decent ral i zed, sel f - mai nt ai ni ng, l i ght - wei ght , and
sec re di rect or ser i ce secure di rect ory servi ce
Robust and appl i cabl e i n unrel i abl e envi ronment s
Cont ri but i ons
Logi cal i ndependence of i dent i t y f rom net work propert i es
General approach f or i dent i f i cat i on
St ruct ural propert i es are mai nt ai ned
Exi st i ng knowl edge is ret ai ned Exi st i ng knowl edge is ret ai ned
Dynami c resi l i ence of a P2P syst em under churn
Digital Enterprise Research Institute www.deri.ie
GridVine: Peer Data Management
g ta te p se esea c st tute www.deri.ie
Searching semant ically richer obj ect s
in large scale het erogeneous net works g g
<xap:CreateDate>2001-12-
19T18:49:03Z</xap:CreateDate>
<xap:ModifyDate>2001-12-
19T20:09:28Z</xap:ModifyDate>
date?
19T20:09:28Z</xap:ModifyDate>
<es:DofCreation> 05/08/2004 </es:DofCreation>
?
?
?
?
?
<myRDF:Date> Jan 1, 2005 </myRDF:Date>
?

Lack of semant ic int eroperabilit y


Digital Enterprise Research Institute www.deri.ie
Information Heterogeneity
Digital Enterprise Research Institute www.deri.ie
Synt act i c di screpanci es
I GUI D D t
Semant i c het erogenei t y
I mageGUI D cDat e
A0657B25 05.08.04
<es:cDate> 05/08/2004 </es:cDate>
VS
g y
Ext ensi bl e st andards (XML, RDF, XMP, PSA, Wi nFS...)
<rdf:Property rdf:ID="width">
<rdfs:label>Width</rdfs:label>
<rdf:Property rdf:ID=Length-Y">
<rdfs:label>Length Y</rdfs:label>
S
100s of evol vi ng schemas f or one part i cul ar domai n (e.g.,
<rdfs:label>Width</rdfs:label>
<rdfs:subPropertyOf rdf:resource="#length"/>
</rdf:Property>
<rdfs:label>Length-Y</rdfs:label>
<rdfs:subPropertyOf rdf:resource="#length"/>
</rdf:Property>
VS
100s of evol vi ng schemas f or one part i cul ar domai n (e.g.,
prot ei n i nf ormat i on, pi ct ure met adat a)
Shared represent at i on i s not enough p g
Digital Enterprise Research Institute www.deri.ie
Integrating Data in Distributed
Databases
Digital Enterprise Research Institute www.deri.ie
The Wrapper- Medi at or archi t ect ure
D
a
t
e

D
a
t
e

e
Digital Enterprise Research Institute www.deri.ie
Integrating Data in the new Web
Ecology
g ta te p se esea c st tute www.deri.ie
Medi at ed Archi t ect ures Medi at ed Archi t ect ures Large Scale I nformat ion Syst ems Large Scale I nformat ion Syst ems
(e.g., WWW)) (e.g., WWW))
Scale Scal e Number of sources < 100 Number of sources > 1000
Uncert ai nt y Uncert ai nt y Consi st ent Dat a
- Coordi nat i on
Uncert ai n Dat a
- Aut onomy Coordi nat i on
- Manual l y curat ed dat a
Schemas creat ed by
admi ni st rat ors
Aut onomy
- Semi - aut omat i c creat i on of dat a
Schemas creat ed by end users
Dynami ci t y Dynami ci t y Rel at i vel y st abl e set of sources
- st abl e medi at or
Sources known a priori
Net work churn
- node f ai l ures
Unknown sources
Expressi veness Expressi veness Rel at i onal Dat a
St ruct ured Schemas
- Int egri t y const rai nt s
St ruct ured Queri es
Semi - st ruct ured dat a
Schemat as
- Few i nt egri t y const rai nt s
Simple S- P Queri es St ruct ured Queri es Simple S P Queri es
Digital Enterprise Research Institute www.deri.ie
Decentralized Interoperability
g ta te p se esea c st tute www.deri.ie
Q1=
<GUID>$p/GUID</GUID>
FOR $p IN /Photoshop_Image
WHERE $ /C t LIKE "%R bi%"
Q2=
<GUID>$p/GUID</GUID>
FOR $p IN T12
WHERE $ /C t LIKE "%R bi%" WHERE $p/Creator LIKE "%Robi%"
Phot oshop
( ow n schema)
Wi nFS
( k now n schema)
WHERE $p/Creator LIKE "%Robi%"
<Photoshop_Image>
<GUID>178A8CD8865</GUID>
<Creator>Robinson</Creator>
<Subject>
<WinFSImage>
<GUID>178A8CD8866</GUID>
<Author>
<DisplayName>
( )
T12 =
<Bag>
<Item>
Tunbridge Wells
</Item>
<Item>Royal Council</Item>
</Bag>
p y
Henry Peach Robinson
<DisplayName>
<Role>Photographer</Role>
<Author>
<Keyword>
T12
<Photoshop_Image>
<GUID>$fs/GUID</GUID>
<Creator>
$fs/Author/DisplayName
/
</Bag>
</Subject>

</Photoshop_Image>
Tunbridge
</Keyword>
<Keyword>Council</Keyword>

</WinFSImage>
</Creator>
</Photoshop_Image>
FOR $fs IN /WinFSImage

Ext endi ng i nt egrat i on t echni ques t o decent ral i zed set t i ngs
Digital Enterprise Research Institute www.deri.ie
Peer Data Management Systems
g ta te p se esea c st tute www.deri.ie
<xap:CreateDate>2001-12-
19T18:49:03Z</xap:CreateDate>
<xap:ModifyDate>2001-12-
date?
<es:cDate> 05/08/2004 </es:cDate>
xap:ModifyDate 2001 12
19T20:09:28Z</xap:ModifyDate>
<es:cDate> 05/08/2004 </es:cDate>
es:cDate xap:CreateDate
<myRDF:Date> Jan 1, 2005
</myRDF:Date>
Pai rwi se mappi ngs
Peer Dat a Management Syst ems (PDMS)
Local mappi ngs overcome gl obal het erogenei t y
It erat i ve query ref ormul at i on
Digital Enterprise Research Institute www.deri.ie
3-Tier Network
g ta te p se esea c st tute www.deri.ie
Semantic
M di ti Mediation
Layer
Jupp /
P-Grid
DHTs
Overlay
Layer
Internet
Layer
Digital Enterprise Research Institute www.deri.ie
Data-Centric P2P Systems
Digital Enterprise Research Institute www.deri.ie
Pi azza / Hyperi on
More expressi ve mappi ng l anguages More expressi ve mappi ng l anguages
LAV- st yl e query ref ormul at i on i n P2P set t i ngs?
Net work- i nt ensi ve
L l d l t ? Large- scal e depl oyment ?
Perf ect mappi ngs
PIER
Scal es a rel at i onal engi ne on t op of a DHT
Fi xed schema
RDFPeers
Indexes RDF t ri pl es i n a DHT
No schemas No schemas
Digital Enterprise Research Institute www.deri.ie
Conclusions
Digital Enterprise Research Institute www.deri.ie
More and more machi ne- processabl e (semi - st ruct ured)
dat a avai l abl e
Peer Product i on
Human Comput at i on
Smart Eart h Smart Eart h
Top- down ef f ort s t o al i gn dat a have f ai l ed l argel y
SUMO
Emergent Semant i cs
Bot t om- up
Dynami c Dynami c
Best-Effort

Onl y resort t o f ost er i nt eroperabi l i t y i n t he l arge scal e


d t l i d d t t l i decent ral ized dat a spaces current ly emerging
Digital Enterprise Research Institute www.deri.ie
Credits
g ta te p se esea c st tute www.deri.ie
Karl Aberer
Phi l i ppe Cudre- Mauroux Phi l i ppe Cudre- Mauroux
Anwi t aman Dat t a
Roman Schmi dt
Digital Enterprise Research Institute www.deri.ie
Are you still awake?
g ta te p se esea c st tute www.deri.ie
Digital Enterprise Research Institute www.deri.ie
Introduction to Peer-to-Peer
Networks Part 2 Networks Part 2
Manf red Hauswi rt h, Marcel Karnst edt
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Outline Part 2
g ta te p se esea c st tute www.deri.ie
The Vi si on: A Uni versal St orage
f W b D t f or Web Dat a
A Di st ri but ed Uni versal St orage
Dat a & Query Model Dat a & Query Model
Operat ors
Query Engi ne
Mappi ngs & Query Expansi on
The Praxi s: Impl ement at i on
Summary & Out l ook Summary & Out l ook
133 of XYZ
Digital Enterprise Research Institute www.deri.ie
Outline Part 2
g ta te p se esea c st tute www.deri.ie
The Vi si on: A Uni versal St orage
f W b D t f or Web Dat a
A Di st ri but ed Uni versal St orage
Dat a & Query Model Dat a & Query Model
Operat ors
Query Engi ne
Mappi ngs & Query Expansi on
The Praxi s: Impl ement at i on
Summary & Out l ook Summary & Out l ook
134 of XYZ
Digital Enterprise Research Institute www.deri.ie
Examples
g ta te p se esea c st tute www.deri.ie
Huge het erogeneous dat a set s col l aborat i on Huge het erogeneous dat a set s, col l aborat i on,
dynami c, st ruct ured, deep, l i nked,
Cl i que proj ect :
Int egrat ed st ruct rured st orage
Dat a pre- processi ng, i nt egrat i on, rest ruct uri ng, i ndexi ng
135
Dat a access API and query processor f or compl ex queri es
135 Marcel Karnst edt IFIP Dat abase Meet i ng Ni cosi a, Cyprus, 2009
Digital Enterprise Research Institute www.deri.ie
Public Data Management
g ta te p se esea c st tute www.deri.ie
....Semant i c & soci al Web, encycl opedi as,
recommender syst ems ... The worl d i s a dat abase recommender syst ems ... The worl d i s a dat abase
Dat aset s, whi ch are
Mai nt ai ned by l arge communi t i es i n a di st ri but ed way
Of publ i c i nt erest
Homogenized database, extensible and flexible,
distributed scalable structured data and queries
136 of XYZ
distributed, scalable, structured data and queries
Digital Enterprise Research Institute www.deri.ie
Main Challenges
g ta te p se esea c st tute www.deri.ie
Dat a management
Scal abi l i t y and robust ness Scal abi l i t y and robust ness
Securi t y, t rust , f ai rness
Guarant ees, consi st ency, i nt egri t y , y, g y
CAP- Theorem [Gi l bert et al . 2002], ACID vs. BASE
Query expressi veness
DB- l i ke queri es wi t h advanced f unct i onal i t y
Support of IR queri es and si mi l ari t y i s mandat ory
Schema- unaware queri es and/ or queri es on schema Schema- unaware queri es and/ or queri es on schema
Ef f i ci ent processi ng
Ef f i ci ent query operat ors
137 of XYZ
Cost awareness i n changi ng si t uat i ons
Digital Enterprise Research Institute www.deri.ie
Approaches
g ta te p se esea c st tute www.deri.ie
Who pays the load?
Who owns the data?
views over 100.000
data sources?
Mediator
data sources?
Efficient
query processing?
Do we trust them?
Mediator
Si ndi ce, YARS
Jena, Oracl e
SW- St ore
PIER, PeerDB
Ambi ent DB
RDFPeers
138 of XYZ
UniStore
Digital Enterprise Research Institute www.deri.ie
Outline Part 2
g ta te p se esea c st tute www.deri.ie
The Vi si on: A Uni versal St orage
f W b D t f or Web Dat a
A Di st ri but ed Uni versal St orage
Dat a & Query Model Dat a & Query Model
Operat ors
Query Engi ne
Mappi ngs & Query Expansi on
The Praxi s: Impl ement at i on
Summary & Out l ook Summary & Out l ook
139 of XYZ
Digital Enterprise Research Institute www.deri.ie
Influences
g ta te p se esea c st tute www.deri.ie
Robustness,
self-organization,
l bilit
Efficient
lookups
P2P paradigm
scalability
PDMS
DHTs &
SDDS
Index structures Distributed DBS
Sensor
networks
Transparency,
query processing
Data streams
140 of XYZ
Digital Enterprise Research Institute www.deri.ie
The Big Picture
g ta te p se esea c st tute www.deri.ie
Who wrote an article for cool
movies?
Who wrote an article for cool
movies?
DB
movies? movies?
Wikipedia(article,author)
Pulp Fiction,MK
Wikipedia(article,author)
Pulp Fiction,MK
Del.icio.us(bookmark,tag,creator) Del.icio.us(bookmark,tag,creator)
DBPedia(link,wikilink,category)
Pulp Fictoon,Q. Tarantino,movie
DBPedia(link,wikilink,category)
Pulp Fictoon,Q. Tarantino,movie
Del.icio.us(bookmark,tag,creator)
http://pfiction,cool,MKa
Del.icio.us(bookmark,tag,creator)
http://pfiction,cool,MKa
141 of XYZ
Digital Enterprise Research Institute www.deri.ie
Layers of Processing
g ta te p se esea c st tute www.deri.ie
Scheduling,
Adaptation,
Processing Strategies
p ,
Costs
Similarity /
Approximate
Operators
Query Operators
Operators
Multicast,
Routing
,
Aggregation,
Range
142 of XYZ
Digital Enterprise Research Institute www.deri.ie
Outline Part 2
g ta te p se esea c st tute www.deri.ie
The Vi si on: A Uni versal St orage
f W b D t f or Web Dat a
A Di st ri but ed Uni versal St orage
Dat a & Query Model Dat a & Query Model
Operat ors
Query Engi ne
Mappi ngs & Query Expansi on
The Praxi s: Impl ement at i on
Summary & Out l ook Summary & Out l ook
143 of XYZ
Digital Enterprise Research Institute www.deri.ie
Universal Relation Model
g ta te p se esea c st tute www.deri.ie
Si nce t he ei ght i es: Model f or si mpl i f i ed ret ri eval i n
(rel at i onal ) dat abases ( )
Uni versal rel at i on cont ai ni ng al l at t ri but es
Si mpl i f i es navi gat i on over mul t i pl e rel at i ons duri ng
f query f ormul at i on
A A A B B C
SW- St ore
A
1
A
2
A
3
B
1
B
2
C
1
...
144 of XYZ
Digital Enterprise Research Institute www.deri.ie
Triple Store
g ta te p se esea c st tute www.deri.ie
Uni versal rel at i on model
St ori ng each t upl e as a set of t ri pl es (oi d, at t ri but e, St ori ng each t upl e as a set of t ri pl es (oi d, at t ri but e,
val ue)
Si mi l ar t o RDF: subj ect , predi cat e, obj ect
OID Car Mileage HP Price
232 Volvo V70 34.000 180 28.000
SW- St ore
RDFPeers
Extensible
Flexible
232 Car Volvo V70
RDFPeers
Si ndi ce, YARS
...
Flexible
Self-descriptive
No need for
232 Car Volvo V70
232 Mileage 34.000
232 HP 180
232 P i 28 000
145 of XYZ
representing null values
232 Price 28.000
Digital Enterprise Research Institute www.deri.ie
Indexing
g ta te p se esea c st tute www.deri.ie
Indexi ng of at t ri but es = key f or Hashi ng
Whi ch at t ri but es? Al l ! Whi ch at t ri but es? Al l !
For t upl e (oi d, v
1
, v
2
, ...) of R(OID, A
1
, A
2
, ...)
232 Car Volvo V70
232 Mileage 34.000
232 HP 180
YARS
Hexast ore
232 HP 180
232 Price 28.000
h(oid) for object lookup
h(A
1
|| v
1
)
h(A
2
|| v
2
) for A
i
v
... (prefix search)
146 of XYZ
...trade-off storage vs. performance
Digital Enterprise Research Institute www.deri.ie
P-Grid : Range Queries
g ta te p se esea c st tute www.deri.ie
[Dat t a et al . 2005]
Digital Enterprise Research Institute www.deri.ie
Outline Part 2
g ta te p se esea c st tute www.deri.ie
The Vi si on: A Uni versal St orage
f W b D t f or Web Dat a
A Di st ri but ed Uni versal St orage
Dat a & Query Model Dat a & Query Model
Operat ors
Query Engi ne
Mappi ngs & Query Expansi on
The Praxi s: Impl ement at i on
Summary & Out l ook Summary & Out l ook
148 of XYZ
Digital Enterprise Research Institute www.deri.ie
VQL
g ta te p se esea c st tute www.deri.ie
Query l anguage VQL
Inspi red by RDF query language SPARQL Inspi red by RDF query l anguage SPARQL
Conj unct i ve queri es
Enhanced by advanced operat ors
Operat i ons bot h on i nst ance and schema l evel
Basi c query f orm
SELECT ?oid, ?val
WHERE { ?oid price ?val }
ORDER BY
SELECT ?oid, ?val
WHERE { ?oid price ?val }
ORDER BY ORDER BY ...
LIMIT ...
ORDER BY ...
LIMIT ...
149 of XYZ
SPARQL
Digital Enterprise Research Institute www.deri.ie
Similarity Queries
g ta te p se esea c st tute www.deri.ie
WHERE { ?o attrib ?value WHERE { ?o attrib ?value
Numeri cal si mi l ari t y: val ue di st ance
WHERE { ?o attrib ?value
FILTER (edist(?value, v) < 2) }
WHERE { ?o attrib ?value
FILTER (edist(?value, v) < 2) }
Numeri cal si mi l ari t y: val ue di st ance
St ri ng si mi l ari t y:
Edi t di st ance usi ng (posi t i onal ) q- Grams
LSH f orest
SWAM
[Gravano et al . 2001, Schal l ehn et al . 2004]
Requi res addi t i onal key- val ue pai rs i n P- Gri d
For each t ri pl e (oi d A v) For each t ri pl e (oi d, A, v)
h(q- gram
i
(A)) oi d
h(q- gram
i
(v)) oi d
Approach f or i nst ance and schema l evel
150 of XYZ
Approach f or i nst ance and schema l evel
Digital Enterprise Research Institute www.deri.ie
String Similarity
g ta te p se esea c st tute www.deri.ie
.
.
.
.
0
0 0 1
1
1
. . .
.
.
000 111
Al l val ues i n di st ance d= 1
Query range f or at t ri but e vs. query d+ 1 q- grams
151 of XYZ
Q y g q y q g
[Net DB06]
Digital Enterprise Research Institute www.deri.ie
More Operators
g ta te p se esea c st tute www.deri.ie
Si mi l ari t y j oi ns [Net DB06]
WHERE { ?o
1
attr
1
?v
1
. ?o
2
attr
2
?v
2
FILTER (edist(?v
1,
?v
2
) < k) }
WHERE { ?o
1
attr
1
?v
1
. ?o
2
attr
2
?v
2
FILTER (edist(?v
1,
?v
2
) < k) }
WHERE { ?o attr ?v } WHERE { ?o attr ?v }
Ranking queries: top-k, skyline [DBRank07]
WHERE { ?o attr ?v }
ORDER BY ?v LIMIT k
WHERE { ?o attr ?v }
ORDER BY ?v LIMIT k
WHERE { ?o attr ?v }
ORDER BY ?v NN A String LIMIT k
WHERE { ?o attr ?v }
ORDER BY ?v NN A String LIMIT k
152 of XYZ
WHERE { ?o attr1 ?x . ?o attr2 ?y}
SKYLINE OF ?x MIN, ?y MAX
WHERE { ?o attr1 ?x . ?o attr2 ?y}
SKYLINE OF ?x MIN, ?y MAX
Digital Enterprise Research Institute www.deri.ie
String Similarity Joins
g ta te p se esea c st tute www.deri.ie
Doubl ed sequent i al
Doubl ed paral l el
l d Cloud servi ces
153 of XYZ
Paral l el and sequent i al Sequent i al and paral l el
Digital Enterprise Research Institute www.deri.ie
Skyline Queries
g ta te p se esea c st tute www.deri.ie
Obj ect s t hat are not domi nat ed by ot her obj ect s Obj ect s t hat are not domi nat ed by ot her obj ect s
Scori ng f unct i on on mul t i pl e at t ri but es, no wei ght i ng
dominated objects
price p
154 of XYZ
mileage
Digital Enterprise Research Institute www.deri.ie
Skylines: Basic Idea
g ta te p se esea c st tute www.deri.ie
DSL
Skyf rame
Frame Skyline algorithm
over 2 dimensions
Skyf rame
over 2 dimensions
Mi ni mum of f i rst
di mensi on def i nes
maxi mum f or second
di mensi on
Mi ni ma/ Maxi ma provi de Mi ni ma/ Maxi ma provi de
a frame narrowi ng t he
search space
155 of XYZ
Digital Enterprise Research Institute www.deri.ie
Skylines: Processing
g ta te p se esea c st tute www.deri.ie
y
...
...
...
Min x
x
Min y
1. Fi nd mi ni mum i n one selective di mensi on x
2. Use y val ue of mi n(x) t o l i mi t search range
3 Use range query rout i ng t o bui l d l ocal skyl i nes 3. Use range query rout i ng t o bui l d l ocal skyl i nes
4. Al ways shi p current skyl i ne wi t h query
5. Det ermi ne gl obal skyl i ne at one peer
156 of XYZ
g y p
6. Opt i onal l y: di st ri but ed range queryi ng on t he
way t o mi n(x)
Digital Enterprise Research Institute www.deri.ie
Skylines: More Dimensions
g ta te p se esea c st tute www.deri.ie
Al l proj ect i ons t o 2d
sub- spaces are skyl i ne
candi dat es
Obj ect s of t he searched
f rame can domi nat e
proj ect i ons
Proj ect i ons cannot Proj ect i ons cannot
domi nat e obj ect s of t he
searched f rame
157 of XYZ
Digital Enterprise Research Institute www.deri.ie
Outline Part 2
g ta te p se esea c st tute www.deri.ie
The Vi si on: A Uni versal St orage
f W b D t f or Web Dat a
A Di st ri but ed Uni versal St orage
Dat a & Query Model Dat a & Query Model
Operat ors
Query Engi ne
Mappi ngs & Query Expansi on
The Praxi s: Impl ement at i on
Summary & Out l ook Summary & Out l ook
158 of XYZ
Digital Enterprise Research Institute www.deri.ie
Query Execution
g ta te p se esea c st tute www.deri.ie
Goal : st at el ess processi ng Push approach
Messages cont ai ni ng bot h pl an and int ermedi at e resul t s Messages cont ai ni ng bot h pl an and i nt ermedi at e resul t s
(based on Mut ant Query Pl ans [Papadi mos et al . 2002])
Recei ver peer i s i dent i f i ed by appl yi ng t he hash f unct i on
M l i l i f h l l h h k Mult iple inst ances of t he plan t ravel t rough t he net work
{(A,1),(A,2)}
{(A,2)} B
PIER, PeerDB
p
1
p
1
pp
p
4
p
4
{( , ),( , )}
{(B,2),(C,1),(B,2),(C,4)}
{(A,2,B,2,C,1), (A,2,B,2,C,4)}
o(A) B
o(A) B
DARQ
p
0
p
0
p
2
p
2
p
3
p
3
p
5
p
5
{(A,3),(A,4)}
{(B 5) (B 6)}
p
0
p
0
{(A,5),(B, 5)}
o(A) B
{(A,3),(A,4)} B
159 of XYZ
{(A,5),(A,6)}
{(B,5),(B,6)}
{(A,5)} B
Digital Enterprise Research Institute www.deri.ie
Cost-Based Planning
g ta te p se esea c st tute www.deri.ie
[NetDB06, P2P06, DBRank07]
Find all values of attribute A in max. distance d=1
1
0
Query all peers for A in parallel or
p1 p1
p2 p2
p5 p5
01
001
Query all peers for A in parallel or
sequence
#msgs = m
l
+ |A|-1 #msgs = m
l
+ |A|-1
p2 p2
1
0
Query d+1 q-grams in parallel
h(A)
p1 p1
p2 p2
p5 p5
01
001 #msgs = (d+1)*m
l
#msgs = (d+1)*m
l
Obj ect Gl obe
160 of XYZ
p2 p2
h(A#q-gram
1
)
h(A#q-gram
2
)
Obj ect Gl obe
DARQ
Digital Enterprise Research Institute www.deri.ie
Guarantees: Completeness
g ta te p se esea c st tute www.deri.ie
Probl em:
Fi re and f orget query st rat egy doesn t guarant ee Fi re and f orget query st rat egy doesn t guarant ee
compl et e resul t s
Goal :
Al l ow t o est i mat e resul t compl et eness
For t he user (98% of al l possi bl e answers)
For bl ocki ng query operat ors (aggregat ors, ranki ng- based
operat ors) i n order t o guarant ee a cert ai n l evel of
compl et eness
Idea:
Not f easi bl e on dat a l evel , but perhaps on peer l evel ?!
161 of XYZ
Digital Enterprise Research Institute www.deri.ie
Completeness Estimation
g ta te p se esea c st tute www.deri.ie
Est i mat i on on peer l evel
Based on rout i ng graphs and rout i ng met hods Based on rout i ng graphs and rout i ng met hods
Support of probabi l i st i c guarant ees
Accuracy i mproved by Mi l est one messages (Mi Mes)
...
Seaweed
[CIKM08, WIDM08]
Join(A=B) P
1
B
P
2
B
P
3
B
P
m
B
...
Extract(A)
sequ
Extract(B)
range
P
P
1
A
P
2
A
P
3
A
P
n
A
...
Routing
162 of XYZ
Query graph
P
0
Routing graph
Routing
level
Routing
point
P
x
Y
Digital Enterprise Research Institute www.deri.ie
CERQ: Initial Estimation
g ta te p se esea c st tute www.deri.ie
0 1
01 00
P
1
01 00
000 001
0000 0001
P
0
P
2
P
3
P
4
Example: P-Grid range query 00100-1101 at P
0
Predict trie on information from:
1) local path: 0001 1) local path: 0001
2) local routing table (at least one node per level/sub-tree)
=> estimates 8 (out of 10)
163 of XYZ
...the better, the more information is kept in each routing table
[P2P07]
Digital Enterprise Research Institute www.deri.ie
CERQ: Estimation Refinement
g ta te p se esea c st tute www.deri.ie
The i ni t i al est i mat e mi ght not be correct
Ref i nement by ot her (i nt ermedi at e) peers Ref i nement by ot her (i nt ermedi at e) peers
Pi ggy- back i nf ormat i on
Query cont ai ns est i mat i on of peers i n sub- t ree
Q li i i Query replies can cont ain correct ions
0 1
Sub-query P
0
-> P
3
Sub-tree 001*
P
1
01 00
000 001
Estimate: 3 peers
P
3
s routing table
contains peer(s) of
P
0
P
2
P
3
P
4
0000 0001
164 of XYZ
sub-tree 0010*
Digital Enterprise Research Institute www.deri.ie
CERQ: Further Improvements
g ta te p se esea c st tute www.deri.ie
Use of st ruct ural repl i cat i on
Peers have more t han one ent ry per l evel Peers have more t han one ent ry per l evel
Each ent ry mi ght have a di f f erent pat h
Every pat h al l ows peers t o l earn more about sub- t rees
More ent ri es mean bet t er i ni t i al est i mat es
Use of cachi ng Use of cachi ng
P2P net works are dynami c
Though t he st ruct ure i s l i kel y t o be st abl e
The l earned st ruct ure can be cached f or l at er est i mat es
165 of XYZ
Digital Enterprise Research Institute www.deri.ie
CERQ: Other Overlays
g ta te p se esea c st tute www.deri.ie
Ski pGraphs
Most si mi l ar t o P- Grid G
Rout i ng i nf ormat i on of mul t i pl e l evel s
Unknown number of peers i n bucket l ayer
Pref i x hash t ree (Chord)
Peers bui l d a t ree- hi erarchy
Onl y appl i cabl e i f number of chi l dren i s known
CAN and Mercury CAN and Mercury
Forwardi ng al ong nei ghbors
No est i mat i on can be gi ven
=> The idea can be mapped under certain conditions
166 of XYZ
=> The idea can be mapped under certain conditions
Digital Enterprise Research Institute www.deri.ie
Outline Part 2
g ta te p se esea c st tute www.deri.ie
The Vi si on: A Uni versal St orage
f W b D t f or Web Dat a
A Di st ri but ed Uni versal St orage
Dat a & Query Model Dat a & Query Model
Operat ors
Query Engi ne
Mappi ngs & Query Expansi on
The Praxi s: Impl ement at i on
Summary & Out l ook Summary & Out l ook
167 of XYZ
Digital Enterprise Research Institute www.deri.ie
Representing Mappings
g ta te p se esea c st tute www.deri.ie
Si mpl e ki nd of at t ri but e correspondences
equi v
subsumes
A
1
A
2
A
3
A
4
A
5
A
6
equi v
Triple representation
(A
4
, equiv, A
5
)
(A
3
, subsumes, A
6
)
Extensible to ontologies and views
PeerDB
SQPeer
168 of XYZ
Extensible to ontologies and views
[Ideas08]
Digital Enterprise Research Institute www.deri.ie
Query Expansion
g ta te p se esea c st tute www.deri.ie
Unexpanded query
dd d
Gri dVi ne
PDMS
Map operat ors added
169 of XYZ
Fi rst mappi ng Expanded query
Digital Enterprise Research Institute www.deri.ie
Outline Part 2
g ta te p se esea c st tute www.deri.ie
The Vi si on: A Uni versal St orage
f W b D t f or Web Dat a
A Di st ri but ed Uni versal St orage
Dat a & Query Model Dat a & Query Model
Operat ors
Query Engi ne
Mappi ngs & Query Expansi on
The Praxi s: Impl ement at i on
Summary & Out l ook Summary & Out l ook
170 of XYZ
Digital Enterprise Research Institute www.deri.ie
UniStore
g ta te p se esea c st tute www.deri.ie
[ICDE07] [ICDE07]
Digital Enterprise Research Institute www.deri.ie
Evaluation: Similarity Joins
g ta te p se esea c st tute www.deri.ie
c1: seq & par/ seq
c2: par & par/ seq c2: par & par/ seq
c3: seq & par/ par
c4: par& par/ par
172 of XYZ
c5: par & l ocal
Digital Enterprise Research Institute www.deri.ie
Evaluation: CERQ
g ta te p se esea c st tute www.deri.ie
1 reference 3 references 5
references
Est i mat i on i s correct af t er a f ew correct i ons Est i mat i on i s correct af t er a f ew correct i ons
More ref erences l ead t o (as expect ed)
Bet t er i ni t i al est i mat e
Less correct i ons
Smal l er errors
Bet t er est i mat es f or smal l er ranges (q1 and q2)
173 of XYZ
Bet t er est i mat es f or smal l er ranges (q1 and q2)
Repl i cat i on f act or 1 (si mi l ar resul t s f or f act or 2)
Digital Enterprise Research Institute www.deri.ie
Evaluation: Completeness
g ta te p se esea c st tute www.deri.ie
Mi n, max, avg 74 peers Mi n, max, avg 50 peers
174 of XYZ
Wit hout MiMes Wi t h Mi Mes
Digital Enterprise Research Institute www.deri.ie
Outline Part 2
g ta te p se esea c st tute www.deri.ie
The Vi si on: A Uni versal St orage
f W b D t f or Web Dat a
A Di st ri but ed Uni versal St orage
Dat a & Query Model Dat a & Query Model
Operat ors
Query Engi ne
Mappi ngs & Query Expansi on
The Praxi s: Impl ement at i on
Summary & Out l ook Summary & Out l ook
175 of XYZ
Digital Enterprise Research Institute www.deri.ie
Summary & Outlook
g ta te p se esea c st tute www.deri.ie
Web dat a i s huge, het erogeneous, st ruct ured,
l i nked
Modern appl i cat i ons requi re a uni versal and f l exi bl e
st orage
DB- l i ke and RDF- l i ki ng
DHTs wel l - sui t ed f or l arge- scal e dat a management
Uni St ore as one sol ut i on
Robust and scal abl e uni versal and l i ght wei ght Robust and scal abl e, uni versal and l i ght - wei ght
Sophi st i cat ed query capabi l i t i es
Adapt i ve, cost - based, st at el ess and paral l el QP
Guarant ees, semant i c l ayer
and all on totally decentralised and self-organising P2P!!
Open i ssues
176 of XYZ
Open i ssues
Pri vacy & Trust , reput at i on
Guarant ees, consi st ency, i nt egri t y
Digital Enterprise Research Institute www.deri.ie
Acknowledgements
g ta te p se esea c st tute www.deri.ie
Kai - Uwe Sat t l er, Manf red Hauswi rt h, Kat j a Hose,
Roman Schmi dt , Renaul t John, Brahmananda Roman Schmi dt , Renaul t John, Brahmananda
Sapkot a, Conor Hayes, ...
St udent s: Mart i n Ri cht arsky, Mi chael Ha, Jessi ca
Ml l er, Mari o Wi egandt , St ef an Schwal m, Mat t hi as
Marx Thomas Kreyl i ng Marx, Thomas Kreyl i ng, ...
Support ed by t he Sci ence Foundat i on Irel and under
Grant No. SFI/ 08/ CE/ I1380 (Li on- 2) and under
Grant No. 08/ SRC/ I1407 (Cl i que: Graph & Net work
Anal ysi s Cl ust er)
177 of XYZ
Anal ysi s Cl ust er)
Digital Enterprise Research Institute www.deri.ie
Related Systems
g ta te p se esea c st tute www.deri.ie
Sindice: Si ndi ce. The semant i c web i ndex. ht t p:/ / si ndi ce.com/
YARS: A. Hart h, J. Umbri ch, A. Hogan, S. Decker, Yars2: A f ederat ed reposi t ory f or
queryi ng graph st ruct ured dat a f rom t he web, i n: Proc. of ISWC/ ASWC, 2007.
Jena: Wi l ki nson, K., Sayers, C., Kuno, H., Reynol ds, D.: Ef ci ent RDF st orage and ret ri eval
i n Jena2. In: SWDB, pp. 131150 (2003)
Oracle: Chong, E.I., Das, S., Eadon, G., Sri ni vasan, J.: An ef ci ent SQL- based RDF
queryi ng scheme. In: VLDB, pp. 12161227 (2005)
SW Store: D J Abadi A Marcus S R Madden K Hol l enbach SW St ore: a vert ically SW-Store: D. J. Abadi, A. Marcus, S. R. Madden, K. Hol l enbach. SW- St ore: a vert ically
part i t i oned DBMS f or Semant i c Web dat a management . The VLDB Journal (2009) 18:385
406
PIER: R. Huebsch, J. M. Hel l erst ei n, N. Lanham, B. Thau Loo, S. Shenker, and I. St oi ca.
Queryi ng t he Int ernet wi t h PIER. In VLDB 03, pages 321332, 2003.
d k l bl d b d b d RDFPeers: M. Cai and M. Frank. RDFPeers: a scalable dist ribut ed RDF reposi t ory based
on a st ruct ured peer- t o- peer net work. In WWW 04, pages 650657, 2004.
PeerDB: W. S. Ng, B. Ch. Ooi , and K.- L. Tan. PeerDB: A P2P- based Syst em f or Di st ri but ed
Dat a Shari ng. In ICDE 03, pages 633644, 2003.
AmbientDB: P. Boncz and C. Trei j t el . Ambi ent DB: Rel at i onal Query Processi ng i n a P2P AmbientDB: P. Boncz and C. Treij t el. Ambi ent DB: Rel at i onal Query Processi ng in a P2P
Net work. In Workshop On Dat abases, Inf ormat i on Syst ems and P2P Comput i ng,
(DBISP2P 03), pages 153168, 2003.
Hexastore: Wei ss, C., Karras, P., Bernst ei n, A.: Hexast ore: sext upl e i ndexi ng f or
semant i c web dat a management . VLDB, 2008
SPARQL: E Prud hommeaux and A Seaborne SPARQL Query Language f or RDF 2006
178 of XYZ
SPARQL: E. Prud hommeaux and A. Seaborne. SPARQL Query Language f or RDF, 2006.
W3C Candi dat e Recommendat i on.
Digital Enterprise Research Institute www.deri.ie
Related Systems /2
g ta te p se esea c st tute www.deri.ie
LSH forest: M. Bawa, T. Condi e, and P. Ganesan. LSH f orest : sel f - t uni ng i ndexes f or
si mi l ari t y search. In WWW 05, pages 651660, 2005.
SWAM: F. Banaei - Kashani and C. Shahabi . SWAM: a f ami l y of access met hods f or
i il it h i t d t t k I CIKM 04 304 313 2004 similarit y- search in peer- t o- peer dat a net works. In CIKM 04, pages 304313, 2004.
Cloud services: M. Brant ner, D. Fl orescu, D. Graf , D. Kossmann, and T. Kraska. Bui l di ng
a dat abase on S3. In SIGMOD 08, pages 251264, 2008.
DSL: P. Wu, C. Zhan, Y. Feng, B. Zhao, D. Agrawal , and A. El Abbadi. Paral l el i zi ng skyl i ne
queri es f or scal abl e di st ri but i on. In EDBT 06, pages 112130, 2006. q , p g ,
Skyframe: S. Wang, Q. H. Vu, B. Ch. Ooi , A. K. H. Tung, and L. Xu. Skyf rame: a
f ramework f or skyl i ne query processi ng i n peer- t o- peer syst ems. The VLDB Journal ,
18(1):345362, 2009.
DARQ: B. Qui l i t z and U. Leser. Queryi ng Di st ri but ed RDF Dat a Sources wi t h SPARQL. In
ESWC 08 pages 524538 2008 ESWC 08, pages 524538, 2008.
ObjectGlobe: R. Braumandl , M. Kei dl , A. Kemper, D. Kossmann, A. Kreut z, S. Sel t zsam,
and K. St ocker. Ob j ect Gl obe: Ubi qui t ous query processi ng on t he Int ernet . VLDB Journal ,
10(1):4871, 2001.
Seaweed: D. Narayanan, A. Donnel l y, R. Mort i er, and A. Rowst ron. Del ay aware queryi ng
i h d I VLDB 06 727 738 2006 wit h seaweed. In VLDB 06, pages 727738, 2006.
SQPeer: G. Kokki ni di s, E. Si di rourgos, and V. Chri st ophi des. Query Processi ng i n RDF/ S-
Based P2P Dat abase Syst ems. Semant i c Web and Peer- t o- Peer, chapt er 4, pages 5981.
Springer, 2006.
GridVine: K. Aberer, P. Cudre- Mauroux, M. Hauswi rt h, and T. Van Pel t . Gri dVi ne:
179 of XYZ
, , ,
Bui l di ng Int ernet - Scal e Semant i c Overl ay Net works. In ISWC 04, pages 107121, 2004.
PDMS: A. Y. Hal evy, Z. G. Ives, P. Mork, and I. Tat ari nov. Pi azza: dat a management
i nf ras- t ruct ure f or semant i c web appl i cat i ons. In WWW 03, pages 556567, 2003.
Digital Enterprise Research Institute www.deri.ie
Thank you!
g ta te p se esea c st tute www.deri.ie
[CIKM08]: M. Karnst edt , K.Sat t l er, M. Ha, M. Hauswi rt h, B. Sapkot a, R. Schmi dt : Est i mat i ng t he
Number of Answers wi t h Guarant ees f or St ruct ured Queri es i n P2P Dat abases, CIKM 2008,
Napa, USA.
[WIDM08]: M. Karnst edt , K. Sat t l er, M. Ha, M. Hauswi rt h, B. Sapkot a, R. Schmi dt :
Approxi mat i ng Query Compl et eness by Predi ct i ng t he Number of Answers i n DHT- based Web Approxi mat i ng Query Compl et eness by Predi ct i ng t he Number of Answers i n DHT- based Web
Appl i cat i ons, WIDM'08 i cw CIKM'08, Napa, USA, 2008.
[Ideas08]: M. Karnst edt , K.Sat t l er, M. Hauswi rt h, B. Sapkot a, R. Schmi dt : Ad- hoc Int egrat i on
and Queryi ng of Semant i c Web Dat a, Ideas 2008, Coi mbra, Port ugal .
[DBRank07]: M. Karnst edt , J. Ml l er, K. Sat t l er, Cost - Aware Skyl i ne Queri es i n St ruct ured
Overl ays DBRank 07@ICDE2007 Ist anbul Turkey Overl ays, DBRank 07@ICDE 2007, Ist anbul , Turkey.
[ICDE07]: M. Karnst edt , K.Sat t l er, M. Ri cht arsky, J. Ml l er, M. Hauswi rt h, R. Schmi dt , R. John:
Uni St ore: Queryi ng a DHT- based Uni versal St orage, ICDE 2007 Demonst rat i on.
[P2P07]: M. Karnst edt , K.Sat t l er, R. Schmi dt : Compl et eness Est i mat i on of Range Queri es i n
St ruct ured Overl ays, P2P 2007, Gal way, Irel and.
[P2P06]: M Karnst edt K Sat t l er M Hauswi rt h R Schmi dt : Cost Aware Processi ng of Si mi l ari t y [P2P06]: M. Karnst edt , K. Sat t l er, M. Hauswi rt h, R. Schmi dt : Cost - Aware Processi ng of Si mi l ari t y
Queri es i n St ruct ured Overl ays, P2P 2006, Cambri dge, UK
[NetDB06]: M. Karnst edt , K. Sat t l er, M. Hauswi rt h, R. Schmi dt : Si mi l ari t y Queri es on St ruct ured
Dat a i n St ruct ured Overl ays, Net DB'06 @ ICDE 2006, At l ant a, GA.
[Gilbert et al. 2002]: S. Gi l bert and N. Lynch. Brewer s conj ect ure and t he f easi bi l i t y of
consi st ent avai l abl e part i t i on t ol erant web servi ces SIGACT News 33(2):51 59 2002 consi st ent , avai l abl e, part i t i on- t ol erant web servi ces. SIGACT News, 33(2):5159, 2002.
[Datta et al. 2005]: A. Dat t a, M. Hauswi rt h, R. Schmi dt , R. John, and K. Aberer. Range queri es i n
t ri e- st ruct ured overl ays. In P2P 05, pages 5766, 2005.
[Papadimos et al. 2002]: V. Papadimos, D. Mai er. Mut ant Query Pl ans. Inf ormat i on and
Sof t ware Technol ogy, 44(4):197206, Apri l 2002.
[S h ll h t l 2004] E S h l l h I G i t K S t t l S t i Si i l i t O t i b d
180 of XYZ
[Schallehn et al. 2004]: E. Schal l ehn, I. Gei st , K. Sat t l er: Support i ng Si mi l ari t y Operat i ons based
on Approxi mat e St ri ng Mat chi ng on t he Web, CoopIS 2004, Larnaca.
[Gravano et al. 2001]: L. Gravano et al .; Approxi mat e St ri ng Joi ns i n a Dat abase (al most ) f or
Free, VLDB 2001, Roma.

Вам также может понравиться