Академический Документы
Профессиональный Документы
Культура Документы
textbook.
Heap implementation:
Insert: (heap)
Remove: (heap) -> only extract the next priority element (at the root position)
Delete the ith element H[i] (creates a hole) and there are now n-1 elements
Move the nth element to the ith position (therefore n-1 elements formatted correctly per array
bounds)
If H[i] (child) smaller than its parent, heapify-up (for a min-heap).
If H[i] (parent) larger than its children, call heapify-down.
o Heapify-down:
Choose the smaller of the two children, and swap
Current element now at the child’s position.
Else: already in correct position: no need to fix anything up.
Have a ‘hash table’ or similar with (key, value) = (node ID, array position in heap) for instant
retrieval of the position.
o I.e. cache the position locations.
Use cases:
Stable matching: need a dynamic set and to get the next highest priority each time
o I.e. used in greedy algorithms (that need a global view).
Sorting: O(2n log n): insert each element, then remove each element in order
Dijkstra’s algorithm:
From the set (of visited), we go to this next node (as chosen by greedy)
Any alternate path to this node must be longer in total, by how the greedy alg has performed.
(Since this argument works for the nth sub-path, it works for all of them)
This is the same argument as the ‘loop’ argument.
Proof breaks if: negative weights.
Runtime:
Using a priority queue, for n nodes and m edges, you can run the alg in time: O(m log(n)).
o You process m edges, and you’ll have seen all the nodes.
o The heap must store for every node; log(n).
o (Isn’t this an ideal scenario though? It’s stop when m nodes visited, but you can have
more than m iterations)
Kruskal’s algorithm:
o List of edges -> sort them by weight, ascending order
o Have a graph with unconnected nodes
o Get each next edge (and add if it doesn’t create a cycle (i.e. at least one end must be to
a new unvisited node))
o (Alternate version: graph with all its connections, and delete each most expensive edge
each iteration)
Prim’s algorithm:
o Identical to Dijkstra, but the cost function is different (edge cost, not the accumulated
path cost)
o Proof is also the same.
(skip details…)
Efficiently finds the identities of the connected components containing two nodes, v and w (to
identify cycles)
Manages disjoint sets and the merging of two sets into a single set.
o Disjoint set => a component. All nodes within that component share a common name.
o Find(u): find name of component containing node u.
Can only be used to maintain components as we add edges – can’t handle edge deletion.
Data structure: array, or hash table, where (key, value) => Component[node] = component name (ptr).
Union (function):
Worst case is O(n): for large sets.
Need the average run time to be better: for k Union operations, desire O(k log k).
o Bound the run time for many operations as opposed to one.
Node u and v: say we want v’s component name to be shared, then update u to point to v.
o Union operation: O(1), Find operation: O(log(n)): size of set (initially 1) doubles at most
log(n) times, …if you join for equal-sized sets.
Further improvements:
o After a Find operation: “compress the path we followed after every Find operation by
resetting all pointers along the path to point to the current name of the set.”
o I.e. 1st Find operation is expensive. But after you call it, backtrack along the path and
change all the pointers to point to v as well. This bounds the total search time cost
(after k iterations)… “very close to linear to n”.
Use Find() to check if the edge nodes belong to different (unconnected) components.
If you add the edge, call Union() to join the two sets together.
NOTE: this calls Find() an equal amount of times as you call Union(), so it makes sense we get
O(m log n). (Also we get this bound due to initial sorting of edges).
Clustering: p157
K-clustering:
(proof: can’t “invert” and be true because we’re greedy: shortest edges were considered first).