Вы находитесь на странице: 1из 13

QUICK REFERENCE GUIDE

GRAPH CONCEPTS
Name
Graph

Multigraph
Weighted
Graph
Labelled Graph
Distance
between 2
nodes
Simple Path
Length of a
path

Definition
A set of nodes & edges
Can be directed or undirected
A graph is connected if there is a path
between any two of its vertices, otherwise
they are connected components
A graph that allows loops and multiple
edges
Graph with weighted edges
A graph where its nodes or edges have
properties (attributes)
Shortest path between the 2 nodes

Name
Diameter of
graph

Example

Definition
Maximum
distance in
graph

Distance between
A & D is 2

Nodes are unique


Number of Edges

NETWORK CHARACTERISTICS
A Full network
contains all entities
and connections
among them
Ego: Node in focus
Alter: neighbor of
Ego
Egocentric
Network: an ego and
its connections

Unimodal Network
Multimodal Network

Only one type of vertex


Vertices have

2 types

e.g. person, document


Multiplex Network

Name

Edges of

2 types

e.g. people and modes of communication


Graph Level Metrics
Definition
Usage

QUICK REFERENCE GUIDE

Size of
network
Density of
network

The number of nodes in the network, or


The number of edges in the network
Number of ties in the network over number
of ALL possible ties
Directed network of size

Used to compute connectiveness of


the network

n ,

no . of ties=n (n1)
Undirected network of size

no . of ties=n
Reachability

Degree
Centrality
In-degree
Centrality
Outdegree
Centrality
Closeness
Centrality

n ,

n1
2

The ability to get from one vertex to


another within a graph
Vertex Metrics (Centrality)
Count of the total number of connections
linked to vertex
Note: in/out degree for directed graphs

of shortest distance
1

Closeness Centrality= all other vertices


OR
Average Distance to all other vertices
OR
(Average Distance to all other vertices)-1

Betweenness
Centrality

Measure of how often a given vertex lies on


the shortest path between two other
vertices

Betweenness Centrality=

Using geodesic
(shortest)
distance,

Node A=

1
=0.25
1+ 1+ 1+1

Node B=

1
=0.14
1+2+2+2

NodeC=

1
=0.17
1+2+1+2

Node D=

1
=0.2
1+2+1+1

Node E=

1
=0.17
1+ 2+ 2+ 1

Node

Betweennes Eigenvecto
s
r
0.5
Number of shortestApaths passing
through v0.162
0.241
Number ofBshortest1.5
paths
C
0.0
0.194
D
0.5
0.162
E
1.5
0.241
0

Note: Betweenness centrality of all nodes


when network density

Eigenvector
Centrality

centrality

Depends on both the number and quality of


its connections
Small value: Analysis weighted towards
local structure surrounding the ego
Positive Beta: Good for ego to be connected

For Node B , betweenness=A B C+ A B E/ ADE

1+ 0.5
Large value: Weighs towards wider
network structure
Negative Beta: Egos disadvantage

QUICK REFERENCE GUIDE

Metric

to highly central people

Cut Vertex

A vertex whose removal disconnects a


graph
Note: See Structural Balance
An edge whose removal disconnects a
graph
Vertex Characteristics (Pivotal, Gatekeeper)
A node X is Pivotal for a pair of distinct
B is pivotal
nodes Y and Z if X lies on every shortest
for pairs A
path between Y and Z
& C, and A
&D

Bridge

Pivotal Node

Gatekeeper
Node

A node X is a Gatekeeper if for a pair of


nodes Y and Z, every path from Y to Z

to be connected to others who are


themselves well-connected

Gatekeeper
Pivotal

passes through X
A node V is a Local Gatekeeper if
there are two neighbors of V, Y and Z,
that are not connected by an edge
Gatekeeper/Pivotal Local
Gatekeeper
Node A is a gatekeeper
Node D is a local gatekeeper, but not
a gatekeeper
Comparison
Generally, the 3 centrality types will be positively correlated, when they are not, it probably tells you
something interesting about the network
Low Degree
High
Degree

High
Closen
ess
High
Betwe
enness

Key player tied to


important
important/active
alters
Ego's few ties are
crucial for network
flow
Alter is super
important, connected
to a big chunk of the
network

Low Closeness
Embedded in cluster that is far
from the rest of the network

Low Betweenness
Ego's connections are
redundant - communication
bypasses him/her
Alter connects to each other
Probably multiple paths in
the network, ego is near
many people, but so are
many others

Very rare cell. Would mean that


ego monopolizes the ties from a
small number of people to many
others.

QUICK REFERENCE GUIDE

SOCIAL GROUPS
Total mutual
Total connected
Total dyads
Reciprocity

Dyads (2 nodes)
Undirected
Directed
Reciprocity

2 ties - Yes/No
No, 1-way (which way), 2-way
- Ratio of all dyads to
reciprocated r/s

Ratio of all
connected
dyads to
reciprocated r/s

Cliques
Every member of a clique knows everybody else, i.e. Density

Undirected
Directed
Transitivity

x y z

Ye

x y z

Ye

Triadic
Closure

Clans
An N-clan is an N-clique where every pair h
distance

Any subset of nodes from a clique also forms a clique


N-Clique
Members within a N-clique are at most N distance
away from each other
Example
{ A, C, E } forms a 2clique
BUT
{ A, B, C, E } is not a 2clique because B-E is 3
distance away

Triads (3 n

6
10
0, 1, 2, or 3 ties
2/10
16OR
possible r/s (See below)
N
x y z
2/6

i.e., N-clan cannot use nodes outside the c

Example

{ A, B, C } is a 2-clan
{ A, C, E } is not a 2-

Note: Nodes in N-clique can


depend on non-clique nodes to form the N-path

Clustering
Clustering
Coefficient

Clustering coefficient of an ego is defined as:


- How well the alters are connected among
themselves, i.e.:

actual ties
Max ties

Agglomerat
ive

Density in 1.5 degree egocentric network

Clustering coefficient of the entire network is the:


Average of the clustering coefficients of ALL the
nodes
Clustering Algorithm
Bottom up: Start from singleton and merge
Divisive
Top down: Start from cluste

QUICK REFERENCE GUIDE

STRUCTURAL BALANCE
Triadic
people

friends
future.

Closure is the idea that if two


in a social network have a friend in
common, then there is an increased
likelihood that they will become
themselves at some point in the
Strong Triadic Closure Property:
a node A has edges to nodes B and
C, then the B-C edge is likely to form
A-B and A-C are both strong ties.

if
if

AF=

1
6

AE=

2
5

AE ismore of a local bridge

Structural Balance Property: For every set of three nodes,


considering the three edges connecting them, either all three are

Node A violates the Strong Triadic Closure


Property as there is no edge between B and C at all.

Bridge: An edge whose removal disconnects a graph


Local Bridge: An edge whose removal results in a path

from its

labeled +, or else exactly one of them is labeled +

endpoints, A & B, i.e. A & B have no common friends

if A-B is a strong tie bridge, A/B cannot have a strong tie to

another node or it violates the Strong Triadic Closure Property


Graded Measure for a local bridge:

number of nodes who are neighbors of both AB


number of nodes whoare neighbors of at least one of AB

Weak Structural Balance Property: There is no set of three nodes


s.t. the edges among them consist of exactly 2 +ve and 1 ve, If a
graph is weakly balanced, its nodes can be divided into groups where
every 2 nodes in the same group are friends and every 2 nodes in
different groups are enemies
Balance Theorem: If a labeled complete graph is balanced, then
either all pairs of nodes are friends, or the nodes can be divided into
two groups, X and Y, such that
1) every pair of nodes in X like each other,
2) every pair of nodes in Y like each other, and
3) everyone in X is the enemy of everyone in Y
i.e. if a complete graph has 2 sets of mutual friends, with complete
mutual antagonism between the two, it is balanced

QUICK REFERENCE GUIDE

INFORMATION FLOW
1.

Find any path from source to sink that has a positive flow capacity
remaining. If no more such paths, exit

2.

Determine

, the maximum flow along this path, which is equal to

the smallest flow capacity on any arc in the path (the bottleneck arc)
3.

Subtract

from the remaining flow capacity in the forward

direction
Add

Freemans formula for Network Centralization


g

CD=

4.

( [
i=1

C D ( n )C D ( ni ) ]

each arc (if needed)


Go to Step 1

A cut is any set of directed arcs containing at least one arc in every
path from the source to the sink. The cut value is the sum of the flow
capacities in the source-to-sink direction of all the arcs.

C D is centralizationof the network


C D ( ni ) is degree centrality of node i
C D ( n ) is degree centrality of the highest centrality node
g isthe number of nodes thenetwork
Centralization shows the degree of inequality or variance in
the network as a percentage of that of a perfect star
network of the same size0.

Max Flow/Min Cut (Flow, Capacity)

to the remaining flow capacity in the backwards direction for

Finding Min Cut

(g1)(g2)

Note: The star network is the most unequal network &

C D =1

Node 2
1-2: 2
1-3: 3
1-4: 4(2)
1-5: 2(1)
1-6: 4(2)
2-3: 0
2-4: 3
2-5: 1
2-6: 3
3-4: 1
3-5: 1
3-6: 2
4-5: 0
4-6: 2
5-6: 3

Node 3
1-2: 2
1-3: 3
1-4: 4(1)
1-5: 2(1)
1-6: 4(2)
2-3: 0
2-4: 3
2-5: 1
2-6: 3
3-4: 1
3-5: 1
3-6: 2
4-5: 0
4-6: 2
5-6: 3

By the max-flow min-cut theorem, the cut value of the min cut
is equals to the max flow.
UCINET: Network > Cohesion > Max Flow
Flow Betweenness
Let

m jk

vertex

be the amount of flow between vertex

which must pass through

flow. The flow betweenness of vertex

m jk

where

i ,

and

and

for any maximum

is the sum of all

are distinct and

j<k .

The flow betweenness is therefore a measure of the


contribution of a vertex to all possible maximum flows

Max-flow min-cut theorem: for any network having a single origin node
and a single destination node, the maximum possible flow from origin
to destination equals the minimum cut value for all cuts in the network

UCINET: Network > Centrality and Power > Flow Betweenness

Finding Max Flow

Information Cascade

Bookkeeping Algorithm:

QUICK REFERENCE GUIDE

Conditional Probability:

P ( A|B )=

P ( A ) P ( A|B )
P( B)

There are four key conditions in an information cascade model:


1. Agents make decisions sequentially
2. Agents make decisions rationally based on the information they
have
3. Agents do not have access to the private information of others
4. A limited action space exists (e.g. an adopt/reject decision
Occurs when a person observes the actions of others and then
despite possible contradictions in his/her own private information
signalsengages in the same acts

STUDY DESIGN
1. Basics: Measurements & Data
Variable: Characteristic or property
Scales: Nominal, Ordinal, Interval, Ratio
Nomina
l
Ordinal
Interval

Categorical; Qualitative
e.g. Male, Female; North, South, East, West
No concept of gap size:

e.g. first, second, third; primary, secondary, jc


Gaps measured in continuous units
Can perform

Ratio

a>b >c

+,

+,, ,

e.g. dollars

What type of scale to use?0


-

Pivotal/Non-pivotal: Categorical
Survey Ratings: Ratio
Edge (Yes/No): Categorical
Weighted edge (e.g., 110): Ratio
2. Data collection

Asking
Responde
nts

1)
2)
3)
4)

Experime
nts
Web
Access
Secondar
y Data

Web crawling
Blogs, forums, social media

1)
2)
3)
4)

Datasets on the internet (context)


Reports
Email Records
Company transaction record

operations

e.g. Celcius
Ratios can be compared
Can perform

Degree Centrality: Ratio


Betweenness Centrality: Ratio

operations

Simple Questions (e.g. age, education)


Survey Type Questions
Open-ended questions
Roster choice method, i.e., respondents given a
list (roster) of people and asked questions about
them
e.g. which of the following would you regard as a
friend
Measure variables

QUICK REFERENCE GUIDE

3. Steps in doing a social network study

QUICK REFERENCE GUIDE

Decide
what to
study
Choose
relevant
populati
on

Collect
data
Analyse
Deduce
Findings
Report

What to study?
The Hypothesis
See Notes for examples

Variables
Identify variables, consider independent variables
e.g. Node properties, edge properties
Level of Detail
e.g. team email: sender, receiver, etc.
Sampling
Identify the population study is interested in
- Roles/positions (directors/politicians)
- Relationships (friends of )
- Events (participation/communication)
- Time
- Location
Complete Population (Census)
VS
Random (ego) + snowball (alters)
Refer to 2. Data Collection
Mixture of qualitative, descriptive statistics, and
statistical tests
Statistics, and compare with prior studies
Clear, meaningful and obvious graphs
Introduction Literature Review Objective
(Hypothesis) Methodology Analysis
Findings

UCINET CHEAT SHEET


Display dataset

Data > Display (cntr-d)

NetDraw

Visualize > NetDraw


To open a dataset: File > Open > Ucinet dataset > Network
Data > Unpack

Separate files
with multiple
matrices
Prepare Data
Produce matrix
from attributes
Display
Univariate
Statistics
Compute
Network
Metrics

Outputs matrix in selected


data file

Note: Refer to Notes

Data > Data Editors > Matrix Editor (cntr-s)


Data > Attribute to matrix
Tools > Univariate Statistics (cntr-u)
Note: Refer to Notes
Network > Centrality and
Input: .##h file

power > Multiple measures (cntr-m)

Test observed
mean/density
against a fixed
value
Find p-value
against a fixed
value

Test whether the density of the


selected network is close to the
Expected density.

Network > Compare densities > Against


theoretical parameter

In this case, z-score is -3.7943


i.e. 3.79 s.d. to the left of
expected density observed
density is significantly smaller
than expected density of 1.0 as

Actual density as shown in UCINET Output in


Display dataset

p-value
Test of density
(more than
mean, takes
into account of
variability)
difference
between 2
networks

Network > Compare densities > Paired


(same nodes)

Find p-value of
2 groups
divided on node
attributes

Compares Matrix VS Matrix.

Correlation
between 2
networks with
same actors

Tools > Testing Hypothesis > Dyadic (QAP)


> QAP Correlation (cntr-q)

0.0002

Used to compare density


difference between two
networks. Good for testing time
difference of the same network
t-statistic
p-value

2.4089
0.0052<0.05

the difference density is significant

Find r

Col 1: coefficient for dataset


Col 2: p-value
Col 3: average coefficient of all sampled datasets
Compares Matrix VS Matrix.
NO dependency

Test

UCINET Action / Input

pvalue> 0.05

UCINET Output

correlation is not significant

Analysis

Type
Regressio
n
(you have
control
over the
independe
nt
variable)

Tools > Testing Hypothesis > Dyadic (QAP) > QAP


Regression > MR-QAP Linear Regression >
Double Dekker: if no missing values
Semi-partialling: missing values

T-test of 2
group
means

Tools > Testing Hypothesis > Node-level > TTest

Compares Matrix VS Matrix

Compares Column VS Column

Look at R-sq first to see if model is a good fit. Then look at individual variables
T-test used to test if
there are differences
between the means
of two groups, in this
case, whether the
govt or non-govt
groups have
different out-degree
centrality (col 1).
Is one group
bigger than the
Result: No difference
across groups, all p-other?
values are

ANOVA for
2 or more
groups

Tools > Testing Hypothesis > Node-level > Anova

0.05

Look at f-statistic
and significance.
Significance is the
same as that of twotailed test.
Note: Refer to Notes

Compares Column VS Column

Triad Undirected

X1 X2 X3
X 1 : No. of mutual dyads

D: Down
U: Up
T: Transitive
C: Cyclic

X 2 : No. of asymmetric dyads


X 3 : No. of null dyads

Вам также может понравиться