Академический Документы
Профессиональный Документы
Культура Документы
Advanced Quantitative Research Methodology, Lecture Notes: December Text Analysis 23, 2011 II: Unsupervise 1 / 23
Reading
Justin Grimmer and Gary King. 2010. Quantitative Discovery of Qualitative Information: A General Purpose Document Clustering Methodology http://gking.harvard.edu/files/abs/discov-abs.shtml.
2 / 23
3 / 23
3 / 23
3 / 23
3 / 23
4 / 23
4 / 23
Bell(n) = number of ways of partitioning n objects Bell(2) = 2 (AB, A B) Bell(3) = 5 (ABC, AB C, A BC, AC B, A B C)
4 / 23
Bell(n) = number of ways of partitioning n objects Bell(2) = 2 (AB, A B) Bell(3) = 5 (ABC, AB C, A BC, AC B, A B C) Bell(5) = 52
4 / 23
Bell(n) = number of ways of partitioning n objects Bell(2) = 2 (AB, A B) Bell(3) = 5 (ABC, AB C, A BC, AC B, A B C) Bell(5) = 52 Bell(100)
4 / 23
Bell(n) = number of ways of partitioning n objects Bell(2) = 2 (AB, A B) Bell(3) = 5 (ABC, AB C, A BC, AC B, A B C) Bell(5) = 52 Bell(100) 1028 Number of elementary particles in the universe
4 / 23
Bell(n) = number of ways of partitioning n objects Bell(2) = 2 (AB, A B) Bell(3) = 5 (ABC, AB C, A BC, AC B, A B C) Bell(5) = 52 Bell(100) 1028 Number of elementary particles in the universe Now imagine choosing the optimal classication scheme by hand!
4 / 23
Bell(n) = number of ways of partitioning n objects Bell(2) = 2 (AB, A B) Bell(3) = 5 (ABC, AB C, A BC, AC B, A B C) Bell(5) = 52 Bell(100) 1028 Number of elementary particles in the universe Now imagine choosing the optimal classication scheme by hand! That we think of all this as astonishing . . . is astonishing
4 / 23
5 / 23
5 / 23
5 / 23
Existing methods:
5 / 23
Existing methods:
Many choices: model-based, subspace, spectral, grid-based, graphbased, fuzzy k -modes, anity propogation, self-organizing maps,. . .
5 / 23
Existing methods:
Many choices: model-based, subspace, spectral, grid-based, graphbased, fuzzy k -modes, anity propogation, self-organizing maps,. . . Well-dened statistical, data analytic, or machine learning foundations
5 / 23
Existing methods:
Many choices: model-based, subspace, spectral, grid-based, graphbased, fuzzy k -modes, anity propogation, self-organizing maps,. . . Well-dened statistical, data analytic, or machine learning foundations How to add substantive knowledge:
5 / 23
Existing methods:
Many choices: model-based, subspace, spectral, grid-based, graphbased, fuzzy k -modes, anity propogation, self-organizing maps,. . . Well-dened statistical, data analytic, or machine learning foundations How to add substantive knowledge: With few exceptions, who knows?!
5 / 23
Existing methods:
Many choices: model-based, subspace, spectral, grid-based, graphbased, fuzzy k -modes, anity propogation, self-organizing maps,. . . Well-dened statistical, data analytic, or machine learning foundations How to add substantive knowledge: With few exceptions, who knows?! The literature: little guidance on when methods apply
5 / 23
Existing methods:
Many choices: model-based, subspace, spectral, grid-based, graphbased, fuzzy k -modes, anity propogation, self-organizing maps,. . . Well-dened statistical, data analytic, or machine learning foundations How to add substantive knowledge: With few exceptions, who knows?! The literature: little guidance on when methods apply Deep problem in cluster analysis literature: no way to know which method will work ex ante
5 / 23
6 / 23
6 / 23
Methods and substance must be connected (no free lunch theorem) The usual approach fails: hard to do it by understanding the model
6 / 23
Methods and substance must be connected (no free lunch theorem) The usual approach fails: hard to do it by understanding the model We do it ex post (by qualitative choice). For example:
6 / 23
Methods and substance must be connected (no free lunch theorem) The usual approach fails: hard to do it by understanding the model We do it ex post (by qualitative choice). For example:
Create long list of clusterings; choose the best
6 / 23
Methods and substance must be connected (no free lunch theorem) The usual approach fails: hard to do it by understanding the model We do it ex post (by qualitative choice). For example:
Create long list of clusterings; choose the best Too hard for mere humans!
6 / 23
Methods and substance must be connected (no free lunch theorem) The usual approach fails: hard to do it by understanding the model We do it ex post (by qualitative choice). For example:
Create long list of clusterings; choose the best Too hard for mere humans! An organized list will make the search possible
6 / 23
Methods and substance must be connected (no free lunch theorem) The usual approach fails: hard to do it by understanding the model We do it ex post (by qualitative choice). For example:
Create long list of clusterings; choose the best Too hard for mere humans! An organized list will make the search possible E.g.,: consider two clusterings that dier only because one document (of many) moves from category 5 to 6
6 / 23
7 / 23
7 / 23
7 / 23
7 / 23
A New Strategy
Make it easy to choose best clustering from millions of choices
8 / 23
A New Strategy
Make it easy to choose best clustering from millions of choices
8 / 23
A New Strategy
Make it easy to choose best clustering from millions of choices
1 2
Code text as numbers (in one or more of several ways) Apply all clustering methods we can nd to the data each representing dierent (unstated) substantive assumptions (<15 mins)
8 / 23
A New Strategy
Make it easy to choose best clustering from millions of choices
1 2
Code text as numbers (in one or more of several ways) Apply all clustering methods we can nd to the data each representing dierent (unstated) substantive assumptions (<15 mins) (Too much for a person to understand, but organization will help)
8 / 23
A New Strategy
Make it easy to choose best clustering from millions of choices
1 2
Code text as numbers (in one or more of several ways) Apply all clustering methods we can nd to the data each representing dierent (unstated) substantive assumptions (<15 mins) (Too much for a person to understand, but organization will help) Develop an application-independent distance metric between clusterings, a metric space of clusterings, and a 2-D projection
3 4
8 / 23
A New Strategy
Make it easy to choose best clustering from millions of choices
1 2
Code text as numbers (in one or more of several ways) Apply all clustering methods we can nd to the data each representing dierent (unstated) substantive assumptions (<15 mins) (Too much for a person to understand, but organization will help) Develop an application-independent distance metric between clusterings, a metric space of clusterings, and a 2-D projection Local cluster ensemble creates a new clustering at any point, based on weighted average of nearby clusterings
3 4
8 / 23
A New Strategy
Make it easy to choose best clustering from millions of choices
1 2
Code text as numbers (in one or more of several ways) Apply all clustering methods we can nd to the data each representing dierent (unstated) substantive assumptions (<15 mins) (Too much for a person to understand, but organization will help) Develop an application-independent distance metric between clusterings, a metric space of clusterings, and a 2-D projection Local cluster ensemble creates a new clustering at any point, based on weighted average of nearby clusterings A new animated visualization to explore the space of clusterings (smoothly morphing from one into others)
3 4
8 / 23
A New Strategy
Make it easy to choose best clustering from millions of choices
1 2
Code text as numbers (in one or more of several ways) Apply all clustering methods we can nd to the data each representing dierent (unstated) substantive assumptions (<15 mins) (Too much for a person to understand, but organization will help) Develop an application-independent distance metric between clusterings, a metric space of clusterings, and a 2-D projection Local cluster ensemble creates a new clustering at any point, based on weighted average of nearby clusterings A new animated visualization to explore the space of clusterings (smoothly morphing from one into others) Millions of clusterings, easily comprehended (takes about 10-15 minutes to choose a clustering with insight)
Quantitative Discovery from Text 8 / 23
3 4
Obama
Cluster Solution 1
Cluster Solution 2
Carter Johnson Ford Eisenhower Truman Roosevelt
affprop maximum
hclust binary median hclust canberra median hclust canberra mcquitty kmeans kendall biclust_spectral affprop affprop manhattan cosine hclust canberra single hclust binary single
hclust maximum single hclust correlation median hclust hclust pearson pearson centroid median correlation centroid hclust binary hclust centroid hclust canberra centroid average average hclust hclust correlation pearson mcquitty mcquitty hclust kendall single hclust euclidean centroid mspec_max
Nixon
``Roosevelt To Carter''
hclust manhattan centroid manhattan single hclust spearman centroid hclust hclust maximum maximum centroid median hclust kmedoids kendall manhattan centroid hclust euclidean median hclust hclust correlation pearson complete complete hclust kendall average hclust spearman median hclust hclust manhattan kendall median median hclust euclidean average single hclust maximum mcquitty hclust maximum complete affprop euclidean average hclust manhattan hclust mcquitty euclidean average divisive euclidean q hclust spearman single
kmeans maximum
Kennedy
som
divisive manhattan mspec_man hclust euclidean euclidean complete mcquitty hclust kendall complete hclust hclust correlation ward complete hclust canberra clust_convex hclust euclidean dismea ward hclust hclust spearman kendall mcquitty mcquitty hclust binary ward hclust hclust binary binary complete mcquitty hclust canberra ward spec_canb hclust spearman complete hclust manhattan complete
Obama
Bush
``Reagan Republicans''
spec_cos spec_euc hclust manhattan ward kmeans manhattan kmeans euclidean spec_man hclust pearson ward hclust spearman ward kmeans spearman
Reagan HWBush
kmeans canberra
HWBush Clinton
mult_dirproc
Reagan
9 / 23
10 / 23
10 / 23
Distance between clusterings: a function of the pairwise document agreements (pairwise agreements triples, quadruples, etc.)
10 / 23
Distance between clusterings: a function of the pairwise document agreements (pairwise agreements triples, quadruples, etc.) Invariance: Distance is invariant to the number of documents (for any xed number of clusters)
10 / 23
Distance between clusterings: a function of the pairwise document agreements (pairwise agreements triples, quadruples, etc.) Invariance: Distance is invariant to the number of documents (for any xed number of clusters) Scale: the maximum distance is set to log(num clusters)
10 / 23
Distance between clusterings: a function of the pairwise document agreements (pairwise agreements triples, quadruples, etc.) Invariance: Distance is invariant to the number of documents (for any xed number of clusters) Scale: the maximum distance is set to log(num clusters)
10 / 23
Distance between clusterings: a function of the pairwise document agreements (pairwise agreements triples, quadruples, etc.) Invariance: Distance is invariant to the number of documents (for any xed number of clusters) Scale: the maximum distance is set to log(num clusters)
Only one measure satises all three (the variation of information) Meila (2007): derives same metric using dierent axioms (lattice theory)
10 / 23
www.routledge.com/politics
11 / 23
12 / 23
12 / 23
Scale: (1) unrelated, (2) loosely related, or (3) closely related Table reports: mean(scale)
12 / 23
Scale: (1) unrelated, (2) loosely related, or (3) closely related Table reports: mean(scale)
Evaluator 1 1.16
Evaluator 2 1.60
12 / 23
Scale: (1) unrelated, (2) loosely related, or (3) closely related Table reports: mean(scale)
Evaluator 1 1.16
Evaluator 2 1.60
12 / 23
Scale: (1) unrelated, (2) loosely related, or (3) closely related Table reports: mean(scale)
12 / 23
Scale: (1) unrelated, (2) loosely related, or (3) closely related Table reports: mean(scale)
12 / 23
Scale: (1) unrelated, (2) loosely related, or (3) closely related Table reports: mean(scale)
12 / 23
Scale: (1) unrelated, (2) loosely related, or (3) closely related Table reports: mean(scale)
12 / 23
Evaluating Performance
13 / 23
Evaluating Performance
Goals:
13 / 23
Evaluating Performance
Goals:
Validate Claim: computer-assisted conceptualization outperforms human conceptualization
13 / 23
Evaluating Performance
Goals:
Validate Claim: computer-assisted conceptualization outperforms human conceptualization Demonstrate: new experimental designs for cluster evaluation
13 / 23
Evaluating Performance
Goals:
Validate Claim: computer-assisted conceptualization outperforms human conceptualization Demonstrate: new experimental designs for cluster evaluation Inject human judgement: relying on insights from survey research
13 / 23
Evaluating Performance
Goals:
Validate Claim: computer-assisted conceptualization outperforms human conceptualization Demonstrate: new experimental designs for cluster evaluation Inject human judgement: relying on insights from survey research
13 / 23
Evaluating Performance
Goals:
Validate Claim: computer-assisted conceptualization outperforms human conceptualization Demonstrate: new experimental designs for cluster evaluation Inject human judgement: relying on insights from survey research
13 / 23
Evaluating Performance
Goals:
Validate Claim: computer-assisted conceptualization outperforms human conceptualization Demonstrate: new experimental designs for cluster evaluation Inject human judgement: relying on insights from survey research
13 / 23
Evaluating Performance
Goals:
Validate Claim: computer-assisted conceptualization outperforms human conceptualization Demonstrate: new experimental designs for cluster evaluation Inject human judgement: relying on insights from survey research
13 / 23
14 / 23
14 / 23
14 / 23
14 / 23
14 / 23
14 / 23
14 / 23
14 / 23
14 / 23
14 / 23
14 / 23
14 / 23
0.3
0.2
0.1
0.1
0.2
0.3
15 / 23
0.3
0.2
0.1
0.1
0.2
0.3
Lautenberg: 200 Senate Press Releases (appropriations, economy, education, tax, veterans, . . . )
Gary King (Harvard, IQSS) Quantitative Discovery from Text 15 / 23
0.3
0.2
0.1
0.1
0.2
0.3
Policy Agendas: 213 quasi-sentences from Bushs State of the Union (agriculture, banking & commerce, civil rights/liberties, defense, . . . )
Gary King (Harvard, IQSS) Quantitative Discovery from Text 15 / 23
0.3
0.2
0.1
0.1
0.2
0.3
Reuters: nancial news (trade, earnings, copper, gold, coee, . . . ); gold standard for supervised learning studies
Gary King (Harvard, IQSS) Quantitative Discovery from Text 15 / 23
16 / 23
16 / 23
16 / 23
16 / 23
16 / 23
Created info packet on each clustering (for each cluster: exemplar document, automated content summary)
16 / 23
Created info packet on each clustering (for each cluster: exemplar document, automated content summary) Asked for
6 2
16 / 23
Created info packet on each clustering (for each cluster: exemplar document, automated content summary) Asked for
6 2
User chooses only care about the one clustering that wins
16 / 23
Created info packet on each clustering (for each cluster: exemplar document, automated content summary) Asked for
6 2
User chooses only care about the one clustering that wins Both cases a Condorcet winner:
16 / 23
Created info packet on each clustering (for each cluster: exemplar document, automated content summary) Asked for
6 2
User chooses only care about the one clustering that wins Both cases a Condorcet winner:
Immigration: Our Method 1 vMF 1 vMF 2 Our Method 2 K-Means 1 K-Means 2
16 / 23
Created info packet on each clustering (for each cluster: exemplar document, automated content summary) Asked for
6 2
User chooses only care about the one clustering that wins Both cases a Condorcet winner:
Immigration: Our Method 1 vMF 1 vMF 2 Our Method 2 K-Means 1 K-Means 2
Genetic testing: Our Method 1 {Our Method 2, K-Means 1, K-means 2} Dir Proc. 1 Dir Proc. 2
Gary King (Harvard, IQSS) Quantitative Discovery from Text 16 / 23
17 / 23
17 / 23
17 / 23
17 / 23
17 / 23
17 / 23
- Data: 200 press releases from Frank Lautenbergs oce (D-NJ) - Apply our method
17 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
affprop maximum
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
18 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral hclust canberra kmeans kendall median kmeans spearman kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
affprop cosine
kmeans manhattan
affprop maximum
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
18 / 23
Example Discovery
mult_dirproc
mixvmf
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single
kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
affprop maximum
som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral hclust canberra kmeans kendall median kmeans spearman kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
affprop cosine
kmeans manhattan
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
Red point: a clustering by Anity Propagation-Cosine (Dueck and Frey 2007) Close to: Mixture of von Mises-Fisher distributions (Banerjee et. al. 2005)
18 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
affprop maximum
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
18 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral hclust canberra kmeans kendall median
affprop maximum
kmeans spearman
kmeans manhattan
kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
18 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral hclust canberra kmeans kendall median
affprop maximum
kmeans spearman
kmeans manhattan
kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
18 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
affprop maximum
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
18 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
affprop maximum
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
18 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral affprop maximum hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
Mixture:
q
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
18 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral affprop maximum hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
Mixture:
0.39 Hclust-Canberra-McQuitty
q
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
18 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral affprop maximum hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
Mixture:
0.39 Hclust-Canberra-McQuitty 0.30 Spectral clustering Random Walk (Metrics 1-6)
q
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
18 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral affprop maximum hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
Mixture:
0.39 Hclust-Canberra-McQuitty 0.30 Spectral clustering Random Walk (Metrics 1-6) 0.13 Hclust-Correlation-Ward
q
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
18 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral affprop maximum hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
Mixture:
0.39 Hclust-Canberra-McQuitty 0.30 Spectral clustering Random Walk (Metrics 1-6) 0.13 Hclust-Correlation-Ward 0.09 Hclust-Pearson-Ward
q
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
18 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral affprop maximum hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
Mixture:
0.39 Hclust-Canberra-McQuitty 0.30 Spectral clustering Random Walk (Metrics 1-6) 0.13 Hclust-Correlation-Ward 0.09 Hclust-Pearson-Ward 0.05 Kmediods-Cosine
q
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
18 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral affprop maximum hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
Mixture:
0.39 Hclust-Canberra-McQuitty 0.30 Spectral clustering Random Walk (Metrics 1-6) 0.13 Hclust-Correlation-Ward 0.09 Hclust-Pearson-Ward 0.05 Kmediods-Cosine 0.04 Spectral clustering Symmetric (Metrics 1-6)
q
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
18 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
affprop maximum
q
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
Mayhew
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
affprop maximum
q
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
Credit Claiming, Pork: Sens. Frank R. Lautenberg (D-NJ) and Robert Menendez (D-NJ) announced that the U.S. Department of Commerce has awarded a $100,000 grant to the South Jersey Economic Development District
Mayhew
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
affprop maximum
q
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
Credit Claiming, Legislation: As the Senate begins its recess, Senator Frank Lautenberg today pointed to a string of victories in Congress on his legislative agenda during this work period
Mayhew
Example Discovery
mult_dirproc kmeans correlation hclust canberra ward sot_cor divisive stand.euc mixvmf hclust binary complete hclust correlationmixvmfVA mcquitty affprop cosine hclust pearson mcquitty hclust pearson average hclust correlation complete hclust pearson complete hclust correlation average hclust binary average hclust binary mcquitty spec_man spec_cos spec_mink spec_euc spec_max mspec_mink spec_canb mspec_man mspec_max mspec_cos mspec_canb mspec_euc kmeans pearson
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
affprop maximum
q
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
Advertising: Senate Adopts Lautenberg/Menendez Resolution Honoring Spelling Bee Champion from New Jersey
Advertising
q
Mayhew
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
affprop maximum
q
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
Mayhew
18 / 23
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
affprop maximum
q
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
Partisan Taunting: Senator Lautenbergs amendment would change the name of ...the Republican bill...to More Tax Breaks for the Rich and More Debt for Our Grandchildren Decit Expansion Reconciliation Act of 2006
Mayhew
18 / 23
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
affprop maximum
q
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
Denition: Explicit, public, and negative attacks on another political party or its members
Mayhew
18 / 23
hclust pearson single hclust pearson median hclust correlation single mec hclust correlation median hclust binary single som hclustpearson correlation centroid rock hclust centroid hclust binary median hclust canberra single hclust spearman complete biclust_spectral hclust canberra kmeans kendall median kmeans spearman kmeans manhattan kmeans canberra hclust binary centroid hclust kendall single hclust spearman centroid hclust kendall centroid average average hclust spearman median kendall median hclust spearman single hclust kendall mcquitty hclust spearman mcquitty hclust kendall complete hclust canberra centroid kmedoids manhattan hclust manhattan centroid hclust manhattan median hclust manhattan average affprop manhattan hclust euclidean single hclust manhattan single hclust euclidean median divisive manhattan hclust maximum single hclust euclidean centroid hclust euclidean average hclust manhattan mcquitty clust_convex hclust euclidean mcquitty kmedoids euclidean hclustmaximum maximum centroidaffprop euclidean hclust median divisive euclidean hclust maximum average hclust maximum complete hclust euclidean complete hclust maximum hclust manhattan complete mcquitty
affprop maximum
q
hclust correlation ward kmedoids hclust pearson wardstand.euc hclust canberra mcquitty
dist_ebinary dist_binary dist_fbinary dist_minkowski dist_canb dist_max dist_cos dismea hclust manhattan ward hclust canberra complete hclust binary ward
affprop info.costs
spearman ward hclusthclust kendall ward hclust maximum ward kmeans maximum kmeans binary
Denition: Explicit, public, and negative attacks on another political party or its members
Mayhew
18 / 23
19 / 23
19 / 23
19 / 23
20 / 23
20 / 23
20 / 23
Frequency
10
20
30
0.1
0.2
0.4
0.5
21 / 23
0.1
0.2
0.4
0.5
21 / 23
2) Measurement
Quantitative Methods
22 / 23
2) Measurement
Quantitative Methods
3) Validation Quantitative methods for conceptualization: aiding discovery - Few formal methods designed explicitly for conceptualization
22 / 23
2) Measurement
Quantitative Methods
3) Validation Quantitative methods for conceptualization: aiding discovery - Few formal methods designed explicitly for conceptualization - Belittled: Tom Swift and His Electric Factor Analysis Machine (Armstrong 1967)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 22 / 23
2) Measurement
Quantitative Methods
3) Validation Quantitative methods for conceptualization: aiding discovery - Few formal methods designed explicitly for conceptualization - Belittled: Tom Swift and His Electric Factor Analysis Machine (Armstrong 1967) - Evaluation methods measure progress in discovery
Gary King (Harvard, IQSS) Quantitative Discovery from Text 22 / 23
http://GKing.Harvard.edu
23 / 23