Академический Документы
Профессиональный Документы
Культура Документы
Data Science
Network Models
Hanspeter Pster & Joe Blitzstein
pster@seas.harvard.edu / blitzstein@stat.harvard.edu
1
5
4 3
2
This Week
(G) = Z
1
exp
i=1
i
d
i
(G)
,
where Z is a normalizing constant. The real parameters
1
, . . . ,
n
are chosen to achieve
given expected degrees. This model appears explicitly in Park and Newman [59], using the
tools and language of statistical mechanics.
Holland and Leinhardt [35] give iterative algorithms for the maximum likelihood es-
timators of the parameters, and Snijders [65] considers MCMC methods. Techniques of
Haberman [31] can be used to prove that the maximum likelihood estimates of the
i
are
consistent and asymptotically normal as n , provided that there is a constant B such
that |
i
| B for all i.
Such exponential models are standard fare in statistics, statistical mechanics, and social
networking (where they are called p
, as shown
by the following. The same formulas are given in [59], but for completeness we provide a
brief proof.
Lemma 1. Fix real parameters
1
, . . . ,
n
. Let Y
ij
be independent binary random vari-
ables for 1 i < j n, with
P(Y
ij
= 1) =
e
(
i
+
j
)
1 + e
(
i
+
j
)
= 1 P(Y
ij
= 0).
Form a random graph G by creating an edge between i and j if and only if Y
ij
= 1. Then
G is distributed according to P
, with
Z =
1i<jn
(1 + e
(
i
+
j
)
).
Proof. Let G be a graph and y
ij
= 1 if {i, j} is an edge of G, y
ij
= 0 otherwise. Then
the probability of G under the above procedure is
P(Y
ij
= y
ij
for all i, j) =
i<j
e
y
ij
(
i
+
j
)
1 + e
(
i
+
j
)
.
How can we test and t this model?
How can we use this model?
Pseudolikelihood (Strauss-Ikeda 80)
Fix a pair of nodes {i,j}, and consider the indicator r.v. of
whether an edge {i,j} is present in G.
Conditioning on the rest of G yields great simplication:
P(edge {i, j}|rest)
P(no edge {i, j}|rest)
= e
(x(G
+
)x(G
))
So use logistic regression? Be careful of variance estimates!
MCMCMLE (Geyer-Thompson 92)
But why and not =?
Dont know the normalizing constant!
From now on we write
P
(G) =
exp(
x(G))
c()
6
Write
Fix some baseline
0
and estimate log-likelihood ratio.
l() l(
0
) = (
0
)
x(G) log
c()
c(
0
)
c()
c(
0
)
= E
0
q
(G)
q
0
(G)
Ratio of normalizing
constants is:
= q
(G)/c()
So can approximate the MLE via MCMC.
What about the choice of
0
though?
i.i.d. node i.i.d. edge snowball RDS
short
paths
Erdos
Dyad
Indep.
ERGM
Fixed
degree
Geom
Latent Space Models
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Hoff et al (2002) model:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Degrees
6
3
6
2
4
7
5
4
6
3
5
5
4
2
4
4
2
3
3
2
Normalization?
Closeness
uses the reciprocal of the average
shortest distance to other nodes
0.56
0.46
0.54
0.4
0.46
0.54
0.53
0.46
0.53
0.45
0.53
0.5
0.51
0.39
0.46
0.49
0.39
0.45
0.4
0.43
Betweenness
many variations:
shortest paths
vs. ow
maximization
vs. all paths vs.
random paths
Eigenvector Centrality
use eigenvector of A corresponding to the largest
eigenvalue (Bonacich); more generally, power centrality