Академический Документы
Профессиональный Документы
Культура Документы
Kyle Jamieson
which
Also sold contains
Diagnoses
Patient
to presents
purchased abdominal
from pain.
with Diagnosis?
E. Coli
infection
Big Data is Everywhere
MapReduce / Hadoop
1 4 2 2
2 8 1 8
1 4 2 2
2 2 1 5
. . . .
9 3 3 8
Image Features
8
MapReduce Map Phase
1 6 1 3
1 2 4 8 2 1 2 8
2 4 2 4 1 8 5 4
. . . . . . . .
9 1 3 3 3 4 8 4
22 17
Outdoor Indoor
CPU 1 CPU 2
26 26
Pictures . . Pictures
26 31
1 2 1 4 8 6 2 1 1 2 8 3
2 4 7 2 4 7 1 8 4 5 4 4
. . . . . . . . . . . .
9 1 5 3 3 5 3 4 9 8 4 3
I O O I I I O O I O I I
Image Features
10
Map-Reduce for Data-Parallel ML
Excellent for large data-parallel tasks!
Data-Parallel Graph-Parallel
Is there more to
Machine Learning
Map Reduce
Feature
Extraction
Algorithm
Tuning
Lasso
?
Label Propagation
Kernel
Methods
Belief
Propagation
Basic Data Processing Tensor PageRank
Factorization
Deep Belief Neural
Networks Networks
11
Exploiting Dependencies
Graphs are Everywhere
Social Network Collaborative Filtering
Users
Netflix
Movies
Docs Wiki
Words
Concrete Example
Label Propagation
Label Propagation Algorithm
Social Arithmetic: Sue Ann
50% What I list on my profile 80% Cameras
40%
40% Sue Ann Likes 20% Biking
+ 10% Carlos Like
I Like: 60% Cameras, 40% Biking
Profile
50%
Recurrence Algorithm: 50% Cameras
Me
50% Biking
Likes[i] = Wij Likes[ j]
jFriends[i]
Carlos
iterate until convergence 10% 30% Cameras
70% Biking
Parallelism:
Compute all Likes[i] in parallel
Properties of Graph Parallel Algorithms
What I Like
What My
Friends Like
Map-Reduce for Data-Parallel ML
Excellent for large data-parallel tasks!
Data-Parallel Graph-Parallel
MapReduce MapReduce?
Feature Algorithm Label Propagation
Lasso
Extraction Tuning Belief
Kernel
Propagation
Methods
Basic Data Processing
Tensor PageRank
Factorization
Deep Belief Neural
Networks Networks
17
Problem: Data Dependencies
MapReduce doesnt efficiently express
data dependencies
User must code substantial data transformations
Costly data replication
Barrier
MapAbuse: Iterative MapReduce
Only a subset of data needs computation:
Iterations
Barrier
MapAbuse: Iterative MapReduce
System is not optimized for iteration:
Iterations
Startup Penalty
Startup Penalty
Disk Penalty
Disk Penalty
Disk Penalty
Data Data Data Data
CPU 2 CPU 2 CPU 2
Data Data Data Data
Data-Parallel Graph-Parallel
Map Reduce
Feature Cross
Extraction Validation Graphical Models Semi-Supervised
Gibbs Sampling Learning
Computing Sufficient Belief Propagation Label Propagation
Statistics Variational Opt. CoEM
Collaborative Graph Analysis
Filtering PageRank
Tensor Factorization Triangle Counting
22
Limited CPU Power
Limited Memory
Limited Scalability
23
Distributed Cloud
Challenges:
- Distribute state
- Keep data consistent
- Provide fault tolerance
24
The GraphLab Framework
Graph Based Update Functions
Data Representation User Computation
Consistency Model
25
Data Graph
Data is associated with both vertices and edges
Graph:
Social Network
Vertex Data:
User profile
Current interests estimates
Edge Data:
Relationship
(friend, classmate, relative)
26
Distributed Data Graph
Partition the graph across multiple machines:
27
Distributed Data Graph
Ghost vertices maintain adjacency structure
and replicate remote data.
ghost vertices
28
Distributed Data Graph
Cut efficiently using HPC Graph partitioning
tools (ParMetis / Scotch / )
ghost vertices
29
The GraphLab Framework
Graph Based Update Functions
Data Representation User Computation
Consistency Model
30
Update Function
A user-defined program, applied to a
vertex; transforms data in scope of vertex
Pagerank(scope){
// Update the current vertex data
Update function applied (asynchronously)
vertex.PageRank =a
a f
h a b c d g
e f g
h i j k
33
The GraphLab Framework
Graph Based Update Functions
Data Representation User Computation
Consistency Model
34
PageRank Revisited
Pagerank(scope) {
vertex.PageRank = a
ForEach inPage:
vertex.PageRank += (1- a ) inPage.PageRank
vertex.PageRank = tmp
}
35
PageRank data races confound convergence
36
Racing PageRank: Bug
Pagerank(scope) {
vertex.PageRank = a
ForEach inPage:
vertex.PageRank += (1- a ) inPage.PageRank
vertex.PageRank = tmp
}
37
Racing PageRank: Bug Fix
Pagerank(scope) {
tmp = a
vertex.PageRank
ForEach inPage:
tmp += (1- a ) inPage.PageRank
vertex.PageRank
vertex.PageRank = tmp
}
38
Throughput != Performance
Higher
Throughput
(#updates/sec)
No Consistency
Potentially Slower
Convergence of ML
39
Serializability
For every parallel execution, there exists a sequential execution
of update functions which produces the same result.
time
CPU 1
Parallel
CPU 2
Sequential Single
CPU 40
Serializability Example
Write
Stronger / Weaker
consistency levels availableRead
Execute tasks
on all vertices of Execute tasks
color 0 on all vertices of
color 0
Execute tasks
Execute tasks on all vertices of
on all vertices of color 1
color 1
43
Matrix Factorization
Netflix Collaborative Filtering
Alternating Least Squares Matrix Factorization
Model: 0.5 million nodes, 99 million edges
Users Movies
Users
Netflix
D D
Movies
44
Netflix Collaborative Filtering
Speedup vs 4 machines
4
10
16
Ideal
14 d=100 (30M Cycles)
d=50 Ideal
(7.7M Cycles)
12 3
10 HadoopMPI MPI
Runtime(s)
d=20 (2.1M Cycles) Hadoop
D=100
10 d=5 (1.0M Cycles)
8 GraphLab
D=20 2
6 10
4
2 GraphLab
1
1 10
4 8 16 24 32 40 48 56 64 4 8 16 24 32 40 48 56 64
#Nodes
# machines #Nodes
# machines
(D = 20)
45
Distributed Consistency
Solution 1: Chromatic Engine
Edge Consistency via Graph Coloring
Requires a graph coloring to be available
Frequent barriers inefficient when only some
vertices active
: RW Lock
47
Consistency Through Locking
Acquire write-lock on center vertex, read-lock on adjacent.
lock scope 1
Process request 1
Time
scope 1 acquired
update_function 1
release scope 1
Process release 1
49
Pipelining hides latency
GraphLab Idea: Hide latency using pipelining
lock scope 1
lock scope 2
Process request 1
lock scope 3
Time
Process request 2
scope 1 acquired Process request 3
scope 2 acquired
scope 3 acquired
update_function 1
release scope 1
update_function 2
Process release 1
release scope 2 50
Distributed Consistency
Solution 1: Chromatic Engine
Edge Consistency via Graph Coloring
Requires a graph coloring to be available
Frequent barriers inefficient when only some
vertices active
Slow machine
0
0 50 100 150
time elapsed(s) 53
Chandy-Lamport checkpointing
Step 1. Atomically one initiator
(a) Turns red, (b) Records its own state
(c) sends marker to neighbors
1.5
async.
Onesnapshot
slow
1
machine
0.5 No system performance penalty
sync. snapshot
incurred from the slow machine!
0
0 50 100 150
time elapsed(s) 55
Summary
Two different methods of achieving consistency
Graph Coloring
Distributed Locking with pipelining
Efficient implementations
Asynchronous FT w/fine-grained Chandy-Lamport
Useability
56
Friday Precept:
Roofnet performance
More Graph Processing
Monday topic:
Streaming Data Processing
57