Вы находитесь на странице: 1из 4

Greedy Method for Graphs: please read pages 57-64 and 137 - 161 in the Kleinberg and Tardos

textbook.

Priority Queues (p57):

 O(log n): add / delete / select min key

Naïve implementation: one of the operations must be O(n)

Heap implementation:

 A balanced binary tree


o Can be implemented as a tree or within an array
 Keys in heap order if children >= parent (key value), for a min-heap.
 Array implementation:
o Starting from zero-index:
 Children are at: 2i+1, 2i+2
 Parent at: floor((i-1)/2)?
o Starting from index 1:
 Children are: left: 2i, right: 2i+1
 Parent: floor(i/2)

Insert: (heap)

 Add to end of array


 Heapify-up() to fix the array up: O(log n)
o On the tree, view the path to get to the new leaf node as a linked list.
o Then what you want to do is sort the linked list. You know only one element at most is
in the wrong order in that list.
o Basically, like one iteration of Bubble sort: get parent, check, and swap. Current element
now at the parent’s position.

Remove: (heap) -> only extract the next priority element (at the root position)

Delete: (heap): more generalized, delete the ith element

 Delete the ith element H[i] (creates a hole) and there are now n-1 elements
 Move the nth element to the ith position (therefore n-1 elements formatted correctly per array
bounds)
 If H[i] (child) smaller than its parent, heapify-up (for a min-heap).
 If H[i] (parent) larger than its children, call heapify-down.
o Heapify-down:
 Choose the smaller of the two children, and swap
 Current element now at the child’s position.
 Else: already in correct position: no need to fix anything up.

Retrieve a specific element (without necessarily wanting to search for it):

 Have a ‘hash table’ or similar with (key, value) = (node ID, array position in heap) for instant
retrieval of the position.
o I.e. cache the position locations.

Use cases:

 Stable matching: need a dynamic set and to get the next highest priority each time
o I.e. used in greedy algorithms (that need a global view).
 Sorting: O(2n log n): insert each element, then remove each element in order

Shortest paths in a graph (p137):

Dijkstra’s algorithm:

 Set of explored nodes (initially empty)


 Get next smallest path (to next unexplored node): accumulated total cost + edge length
 Add to set of explored and continue (need to add back into the PQ as well)

Proof that it works: (induction?)

 From the set (of visited), we go to this next node (as chosen by greedy)
 Any alternate path to this node must be longer in total, by how the greedy alg has performed.
 (Since this argument works for the nth sub-path, it works for all of them)
 This is the same argument as the ‘loop’ argument.
Proof breaks if: negative weights.

Dijkstra: continuous version of a BFS (same idea of expanding water).

Runtime:

 Using a priority queue, for n nodes and m edges, you can run the alg in time: O(m log(n)).
o You process m edges, and you’ll have seen all the nodes.
o The heap must store for every node; log(n).
o (Isn’t this an ideal scenario though? It’s stop when m nodes visited, but you can have
more than m iterations)

Min Spanning Tree:

 Kruskal’s algorithm:
o List of edges -> sort them by weight, ascending order
o Have a graph with unconnected nodes
o Get each next edge (and add if it doesn’t create a cycle (i.e. at least one end must be to
a new unvisited node))
o (Alternate version: graph with all its connections, and delete each most expensive edge
each iteration)
 Prim’s algorithm:
o Identical to Dijkstra, but the cost function is different (edge cost, not the accumulated
path cost)
o Proof is also the same.

(skip details…)

Union Find: an efficient form of Kruskal’s

 Efficiently finds the identities of the connected components containing two nodes, v and w (to
identify cycles)
 Manages disjoint sets and the merging of two sets into a single set.
o Disjoint set => a component. All nodes within that component share a common name.
o Find(u): find name of component containing node u.
 Can only be used to maintain components as we add edges – can’t handle edge deletion.

Functions: MakeUnionFind: O(n), Find, Union (i.e. combine two components)

Data structure: array, or hash table, where (key, value) => Component[node] = component name (ptr).

 Explicitly maintain the list of elements in each set (easier to categorize)


 Union: use one of the two sets’ component names (use the larger set’s name).
 Initially all elements in their own set.

Union (function):
 Worst case is O(n): for large sets.
 Need the average run time to be better: for k Union operations, desire O(k log k).
o Bound the run time for many operations as opposed to one.
 Node u and v: say we want v’s component name to be shared, then update u to point to v.
o Union operation: O(1), Find operation: O(log(n)): size of set (initially 1) doubles at most
log(n) times, …if you join for equal-sized sets.
 Further improvements:
o After a Find operation: “compress the path we followed after every Find operation by
resetting all pointers along the path to point to the current name of the set.”
o I.e. 1st Find operation is expensive. But after you call it, backtrack along the path and
change all the pointers to point to v as well. This bounds the total search time cost
(after k iterations)… “very close to linear to n”.

Union Find in context of Kruskal’s:

 Use Find() to check if the edge nodes belong to different (unconnected) components.
 If you add the edge, call Union() to join the two sets together.
 NOTE: this calls Find() an equal amount of times as you call Union(), so it makes sense we get
O(m log n). (Also we get this bound due to initial sorting of edges).

Clustering: p157

 Have a bunch of objects; want to measure similarity of these objects.


 Use a ‘distance function’: more ‘distance’ => less similar (think of distance as meaning foreign).

K-clustering:

 Divide a bunch of objects into k groups.


 Spacing of a k-cluster = minimum distance between any two points in any cluster
 Want dissimilar / far-away points to be in separate clusters.
 Solution: run Kruskal’s algorithm (n elements -> n clusters) until we have k clusters remaining
(then stop).

(proof: can’t “invert” and be true because we’re greedy: shortest edges were considered first).

Вам также может понравиться