CS109/Stat121/AC209/E-109 Data Science: Network Models

CS109/Stat121/AC209/E-109
Data Science
Network Models
Hanspeter Pster & Joe Blitzstein
pster@seas.harvard.edu / blitzstein@stat.harvard.edu
1
5
4 3
2
This Week
HW4 due tonight at 11:59 pm
Friday lab 10-11:30 am in MD G115

Examples from Newman (2003)
I Introduction 3
FIG. 2 Three examples of the kinds of networks that are the topic of this review. (a) A food web of predator-prey interactions
between species in a freshwater lake [272]. Picture courtesy of Neo Martinez and Richard Williams. (b) The network of
collaborations between scientists at a private research institution [171]. (c) A network of sexual contacts between individuals
in the study by Potterat et al. [342].
A. Types of networks
A set of vertices joined by edges is only the simplest
type of network; there are many ways in which networks
may be more complex than this (Fig. 3). For instance,
there may be more than one dierent type of vertex in a
network, or more than one dierent type of edge. And
vertices or edges may have a variety of properties, nu-
merical or otherwise, associated with them. Taking the
example of a social network of people, the vertices may
represent men or women, people of dierent nationalities,
locations, ages, incomes, or many other things. Edges
may represent friendship, but they could also represent
animosity, or professional acquaintance, or geographical
proximity. They can carry weights, representing, say,
how well two people know each other. They can also be
directed, pointing in only one direction. Graphs com-
posed of directed edges are themselves called directed
graphs or sometimes digraphs, for short. A graph rep-
resenting telephone calls or email messages between in-
dividuals would be directed, since each message goes in
only one direction. Directed graphs can be either cyclic,
meaning they contain closed loops of edges, or acyclic
meaning they do not. Some networks, such as food webs,
are approximately but not perfectly acyclic.
One can also have hyperedgesedges that join more
than two vertices together. Graphs containing such edges
are called hypergraphs. Hyperedges could be used to in-
dicate family ties in a social network for examplen in-
dividuals connected to each other by virtue of belonging
to the same immediate family could be represented by
an n-edge joining them. Graphs may also be naturally
partitioned in various ways. We will see a number of
examples in this review of bipartite graphs: graphs that
contain vertices of two distinct types, with edges running
only between unlike types. So-called aliation networks
Graphs
A graph G=(V,E) consists of a vertex set V and an
edge set E containing unordered pairs {i,j} of vertices.
1
15
7
16
2
12
10
6
8
11
13
14
4
5
9
3
1
2 3
4
graph multigraph
The degree of vertex v is the number of
edges attached to it.
A Plea for Clarity: What is a Network?
graph vs. multigraph (are loops, multiple edges ok?

What is a simple graph?)
directed vs. undirected
weighted vs. unweighted
dynamics of vs. dynamics on
labeled vs. unlabeled
network as quantity of interest vs. quantities of

interest on networks
Why model networks?
Hard to interpret hairballs.
We can dene some interesting features

(statistics) of a network, such as measures of
clustering, and compare the observed values
against those of a model
Warning: much of the network literature

carelessly ignores the way in which the network
data were gathered (sampling) and whether
there are missing/unknown nodes or edges!
Erdos-Renyi Random Graph Model
Independently ip coins with prob. p of heads
Let n get large and p get small, with the average

degree c = (n-1)p held constant.
What happens for c < 1?
What happens for c > 1?
What happens for c = 1?

Degree Sequences
1
2
3
4 5
6
7 9
8
Take V = {1, . . . , n} and let d
i
be the degree of vertex i.
The degree sequence of G is d = (d
1
, . . . , d
n
).
n = 9, d = (3, 4, 3, 3, 4, 3, 3, 3, 2)
A sequence d is graphical if there is a graph G with degree sequence d.
G is a realization of d.
MCMC on Networks
mixing times, burn-in, bottlenecks, autocorrelation,...
Switchings Chain
1 2
3 4
1 2
3 4
Power Laws
Power-law (a.k.a. scale-free) networks: the number of

vertices of degree k is proportional to k
-
!

Stumpf et al (2005): Subnets of scale-free networks are

not scale-free, especially for large
!

Their subnets are i.i.d. node-based.
What about features other than degree distributions?

p1 Model (Holland-Leinhardt 1981)
3.4 The p
1
Model for Social Networks
A conceptually separate thread of research developed in parallel in the statistics and social
sciences literature, starting with the introduction of the p
1
model. Consider a directed graph
on the set of n nodes. Holland and Leinhardts p
1
model focuses on dyadic pairings and
keeps track of whether node i links to j, j to i, neither, or both. It contains the following
parameters:
: a base rate for edge propagation,

i
(expansiveness): the eect of an outgoing edge from i,

j
(popularity): the eect of an incoming edge into j,

ij
(reciprocation/mutuality): the added eect of reciprocated edges.
Let P(0, 0) be the probability for the absence of an edge between i and j, P
ij
(1, 0) the
probability of i linking to j (1 indicates the outgoing node of the edge), P
ij
(1, 1) the
probability of i linking to j and j linking to i. The p
1
model posits the following probabilities
(see [149]):
log P
ij
(0, 0) =
ij
, (3.1)
log P
ij
(1, 0) =
ij
+
i
+
j
+ , (3.2)
log P
ij
(0, 1) =
ij
+
j
+
i
+ , (3.3)
log P
ij
(1, 1) =
ij
+
i
+
j
+
j
+
i
+ 2 +
ij
. (3.4)
In this representation of p
1
,
ij
is a normalizing constant to ensure that the probabilities
for each dyad (i, j) add to 1. For our present purposes, assume that the dyad is in one
and only one of the four possible states. The reciprocation eect,
ij
, implies that the odds
of observing a mutual dyad, with an edge from node i to node j and one from j to i, is
enhanced by a factor of exp(
ij
) over and above what we would expect if the edges occured
independently of one another.
The problem with this general p
1
representation is that there is a lack of identication of
the reciprocation parameters. The following special cases of p
1
are identiable and of special
interest:
1.
i
= 0,
j
= 0, and
ij
= 0. This is basically an Erdos-Renyi-Gilbert model for
directed graphs: each directed edge has the same probability of appearance.
2.
ij
= 0, no reciprocal eect. This model eectively focuses solely on the degree distri-
butions into and out of nodes.
3.
ij
= , constant reciprocation. This was the version of p
1
studied in depth by Holland
and Leinhardt using maximum likelihood estimation.
27
ERGMs (Exponential Random Graph Models)
26 J. BLITZSTEIN AND P. DIACONIS
More formally, dene a probability measure P
on the space of all graphs on n vertices

by
P
(G) = Z
1
exp
i=1
i
d
i
(G)
,
where Z is a normalizing constant. The real parameters
1
, . . . ,
n
are chosen to achieve
given expected degrees. This model appears explicitly in Park and Newman [59], using the
tools and language of statistical mechanics.
Holland and Leinhardt [35] give iterative algorithms for the maximum likelihood es-
timators of the parameters, and Snijders [65] considers MCMC methods. Techniques of
Haberman [31] can be used to prove that the maximum likelihood estimates of the
i
are
consistent and asymptotically normal as n , provided that there is a constant B such
that |
i
| B for all i.
Such exponential models are standard fare in statistics, statistical mechanics, and social
networking (where they are called p
models). They are used for directed graphs in Holland

and Leinhardt [35] and for graphs in Frank and Strauss [27, 70] and Snijders [65, 66],
with a variety of sucient statistics (see the surveys in [3], [56], and [66]). One standard
motivation for using the probability measure P
when the degree sequence is the main

feature of interest is that this model gives the maximum entropy distribution on graphs
with a given expected degree sequence (see Lauritzen [43] for further discussion of this).
Unlike most other exponential models on graphs, the normalizing constant Z is available in
closed form. Furthermore, there is an easy method of sampling exactly from P
, as shown
by the following. The same formulas are given in [59], but for completeness we provide a
brief proof.
Lemma 1. Fix real parameters
1
, . . . ,
n
. Let Y
ij
be independent binary random vari-
ables for 1 i < j n, with
P(Y
ij
= 1) =
e
(
i
+
j
)
1 + e
(
i
+
j
)
= 1 P(Y
ij
= 0).
Form a random graph G by creating an edge between i and j if and only if Y
ij
= 1. Then
G is distributed according to P
, with
Z =

1i<jn
(1 + e
(
i
+
j
)
).
Proof. Let G be a graph and y
ij
= 1 if {i, j} is an edge of G, y
ij
= 0 otherwise. Then
the probability of G under the above procedure is
P(Y
ij
= y
ij
for all i, j) =

i<j
e
y
ij
(
i
+
j
)
1 + e
(
i
+
j
)
.
How can we test and t this model?
How can we use this model?
Pseudolikelihood (Strauss-Ikeda 80)
Fix a pair of nodes {i,j}, and consider the indicator r.v. of
whether an edge {i,j} is present in G.
Conditioning on the rest of G yields great simplication:
P(edge {i, j}|rest)
P(no edge {i, j}|rest)
= e
(x(G
+
)x(G
))
So use logistic regression? Be careful of variance estimates!
MCMCMLE (Geyer-Thompson 92)
But why and not =?
Dont know the normalizing constant!
From now on we write
P
(G) =
exp(
x(G))
c()
6
Write
Fix some baseline
0
and estimate log-likelihood ratio.
l() l(
0
) = (
0
)
x(G) log
c()
c(
0
)
c()
c(
0
)
= E
0
q
(G)
q
0
(G)
Ratio of normalizing
constants is:
= q
(G)/c()
So can approximate the MLE via MCMC.
What about the choice of
0
though?
i.i.d. node i.i.d. edge snowball RDS
short
paths
Erdos
Dyad
Indep.
ERGM
Fixed
degree
Geom
Latent Space Models
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Hoff et al (2002) model:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Degrees
6
3
6
2
4
7
5
4
6
3
5
5
4
2
4
4
2
3
3
2
Normalization?
Closeness
uses the reciprocal of the average
shortest distance to other nodes
0.56
0.46
0.54
0.4
0.46
0.54
0.53
0.46
0.53
0.45
0.53
0.5
0.51
0.39
0.46
0.49
0.39
0.45
0.4
0.43
Betweenness
many variations:
shortest paths
vs. ow
maximization
vs. all paths vs.
random paths
Eigenvector Centrality
use eigenvector of A corresponding to the largest
eigenvalue (Bonacich); more generally, power centrality

CS109/Stat121/AC209/E-109 Data Science: Network Models

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

CS109/Stat121/AC209/E-109 Data Science: Network Models

Загружено:

Авторское право:

Доступные форматы

CS109/Stat121/AC209/E-109

HW4 due tonight at 11:59 pm

Friday lab 10-11:30 am in MD G115

graph vs. multigraph (are loops, multiple edges ok?

directed vs. undirected

weighted vs. unweighted

dynamics of vs. dynamics on

labeled vs. unlabeled

network as quantity of interest vs. quantities of

Hard to interpret hairballs.

We can dene some interesting features

Warning: much of the network literature

Independently ip coins with prob. p of heads

Let n get large and p get small, with the average

What happens for c < 1?

What happens for c > 1?

What happens for c = 1?

Power-law (a.k.a. scale-free) networks: the number of

Stumpf et al (2005): Subnets of scale-free networks are

Their subnets are i.i.d. node-based.

What about features other than degree distributions?

on the space of all graphs on n vertices

models). They are used for directed graphs in Holland

when the degree sequence is the main

Вам также может понравиться