Вы находитесь на странице: 1из 11

CSC 172 Midterm

Fall 2013

Hashing (W. 5.1 5.7)

Overview Performing insertions, deletions, and searches in constant average time o Good for when youre doing a lot of these Ideal: array of some fixed size containing the items o Hash function: converts the data item being stored to a location in the hash table Ideally simple to compute, ensure two distinct keys get mapped to two different cells Key: data field of the item that is used in the hashing function, (like name in an employee class) tableSize: Should be prime, modding to fit in the table works better (think of the math)

Hash Function If key is an integer, usually key mod tableSize will be reasonable (super easy to compute, uniform), not always ideal (all keys are products of the tableSize) String keys: add up the ASCII (or Unicode) values, mod tableSize o If keys are fairly short and the table size is relatively long, then there will be bunches near the start of the table and the positions in the end of the table will not be used. o Good hashing function, which expects uniform distribution o [ ] more costly to compute Not necessarily the best, costly computation, but has good distribution o Better hashing function: value = value * 37 + key[i] thru the string Deal with Negative: value += tableSize 37- just a prime number thats large enough to space out the factors but not so large that computation takes a long time Collision Resolution Techniques Separate Chaining: array of lists, when an item is hashed to a location, add it to that list; either linked or array lists can be used; Expected list size is lambda (elements/tableSize) Linear Probing: Increase by one if theres a collision until an empty spot is found. Can loop Quadratic Probing: quadratic probing function (i2 ) is added to the hash until an empty spot is found. Note: new element can always be added if the table size is prime and the table is at least half empty. Double Hashing: add ( ) ( ) added each time.
Updated: 14 Jan 2014 10:27 AM

CSC 172 Midterm

Fall 2013

o Function should never evaluate to zero. Good function ( ) ( ) General Rule: often beneficial to double table size (and find next prime) when lambda is >=.5 , should be below for one that doesnt use separate chaining

Heap and Priority Queues (W 6.1-6.4, 6.9)

Model: set with priorities, comparable with < or > Operations: Insert and delete Min or Max (depends on min heap or max heap) To implement, can use sorted linked list, but not as efficient O(1) delete, linear insert Heap: complete binary tree, with partial sort order (the parent compared to node is satisfied, i.e. the root of each subtree is bigger than the children) o Can be represented as an array because structure is so regular: at position leftChild: rightChild: Parent: Ignore index 0 Heapify: transform a regular array into a heap: start at size/2 decrementing by one and bubble down along the way, linear Array Implementation, insert is log N delete is constant

Sorting (W. 7.1-7.9)

Comparison-Based Sorting: <=> with numbers and the compareTo method ) comparisons are needed General Lower Bound: Using only comparisons, (

Sorting Methods Insertion Sort: n-1 passes; elements in 0 to p in order for each pass p. compares adjacent elements ( ) Shell sort: compare distant elements meaning subquadratic. Sorts elements that are gap apart and decrement gap down until gap = 1. Worse case is still quadtratic but ( ) Heapsort: build a heap in linear time, and keep deleting the min from the heap until its ) empty. Each delete is log N time so the runtime is ( ) MergeSort: Recursive; merge two sorted lists into a third list. This takes ( ) but worse case is quadratic. Split the list by a pivot and Quicksort: AVERAGE ( divide into the elements larger than and smaller than the pivot. Picking the pivot is important, dont pick the first element, pick the median of a sample of medians (sort the medians by a standard algorithm like insertion sort).
Updated: 14 Jan 2014 10:27 AM

CSC 172 Midterm

Fall 2013

o If the array is very small, then Quicksort is slower than Insertion; because of the recursive calls, happens a lot. Use insertion or shell sort to sort a small set of data ( )

Disjoint Sets (W. 8.1 - 8.7)

Relation: any two elements in a set are either true or false o Equivalence Relation: reflexive, symmetric and transitive Unions usually easy, finds are hard Union join two sets Find- determine if two elements are in the same set Keep an array of size N. Roots have a negative value. Positive numbers are the index of their parent. Union-by-Size: Make the smaller tree a sub tree of the larger. The roots then contain the negative of their size Union-by-height: same as by size, but using the height heuristic o The negative of the height minus an extra one is used (a single node has height 0) Path Compression o During find, completely separate from the work done in union; to implement, set s[x] = find(s[x]) very little extra work and less depth of the nodes, each node is pointing to its grandparent. o Not compatible with union-by-height since the heights will change o Compatible with union-by-size since Not sure if its worth the extra work since its already expected to run in linear time for a sequence of M operations

Graphs (W. 9.1-9.6)

Graph: vertices (nodes) and edges (a.k.a. arcs) Directed: ordered, edges only go one way Undirected: unordered, edges go both ways Weight/cost: the value of the edge Path: sequences of vertices to get from a to b o Length: number of edges, vertices 1 Connected: a undirected graph is ~ if theres a path connected each vertex o Directed graph: strongly connected Adjacency Matrix: 2D array of Booleans stating whether (u,v) are connected, good for dense graphs, when theres a lot of edges
Updated: 14 Jan 2014 10:27 AM

CSC 172 Midterm

Fall 2013

o Weighted: use negative numbers for non-connected vertices, use the weight/cost of the edge to represent connected vertices Adjacency List: each vertex keeps track of all adjacent vertices. Linear space requirement ( | | | | ) weights are also stored in the lists Topological Sort: ordering of directed acyclic graphs where vertices with no parents are first, removed from the graph and then the process continues until all have been removed from the graph o Course prerequisites, need to take course with no prerequisites before advanced courses

Shortest Path Single source Shortest-Path, start a specified point, find the shortest path to all other points on the graph Negative Cost Cycle: a cycle is formed that includes a negative weight, so the shortest path would be an infinite loop in that cycle; the shortest path is undefined To keep track of the actual path, set path of vertex in adjacency to the one being processed (inner for loop path to the outer for loop) Unweighted shortest path o Weighted shortest path with all edges of cost one o Breadth-First Search: process the vertices in layers; the ones closest to the start are done first, then works outwards analogous for level-order traversal for trees Dijkstras Algorithm o Initialize everything to be unknown and infinite distance o Set the source to be known and d = 0 o Start with a vertex with the smallest distance from all unknown vertices, this vertex is now known o For all adjacent vertices, if their distance would be improved by adding current to their path, then do it o Repeat until all vertices are known o Does not work for graphs with negative edge cost

Max Flow/Min Cut How much can be sent thru a (directed) graph Start at the source, end at the sink, both of which have INF output/input Works with both cyclic and a cyclic graphs o The cycle would just limit the flow by the edge with the lowest edge capacity

CSC 172 Midterm

Fall 2013

Min cut: the minimum cut (sum of all edge capacities) that separates the sink from the source; the absolute minimum is the bottleneck on the flow, meaning that the two are exactly equal Algorithm o Initial, flow graph has no flow, by end will have max flow, o Residual graph, how much can be added for each edge An edge in this graph is called residual edge o Each Stage: Find a path from source to sink The minimum edge on the path is the amount of flow that can be added on this path Do this until all source and sink cant be reached Need to allow the algorithm to change its mind: for every edge in flow graph, add complement (switch directions) with the capacity of the flow graph edge. Optimization come from choosing which augmenting path

Minimum Spanning Trees Tree formed from edges in the graph such that all vertices are included; one with the minimum number of edges is the MST o Number of edges in MST is V 1 o Real life example: wiring a house with minimum of cable (ignoring all other constraints) Prims Algorithm: grow the tree in successive stages o At each stage, a node is selected as root and add an edge o The edge that is added is the minimum edge such that it adds a vertex to the tree ( | | ) without heaps, optimal for dense graphs o | | ) using binary heaps, good for sparse graphs o (| | Kruskals Algorithm, continually select edges o Edges selected in order of smallest weight o Edge accepted iff it does not cause a cycle o Maintains a forest and joins trees in a forest until a single tree is left (the correct number of edges (V 1) has been added to the tree) To determine if an edge would cause a cycle, use a DisjointSet. To vertices belong to the same set if and only if they are in the same tree
Updated: 14 Jan 2014 10:27 AM

CSC 172 Midterm

Fall 2013

(| | (| |

Each vertex starts in its own set | | ) dominated by the heap operations, | | ( | | ) so its also | |) | | ) | |) | |) (| | ( | | (| |

Introduction to NP-Completeness (W. 9.7)

Easy vs. Hard o Undecidable problems: impossible to solve by a computer Halting Problem: Is it possible to have a compile that checks for infinite loops? Recursively undecidable: when run on itself, must terminate and loop infinitely at the same time o NP-Class Nondeterministic Polynomial Time Deterministic machine- at each point in time, executing an instruction then goes to the next instruction Nondeterministic Machine- has a range of options to choose from and will always pick the best one (optimal guessing) Does not make undecidable problems solvable A problem is NP if, in polynomial time, we can prove that any yes is correct ( a solution can be checked in polynomial time) Answers can be checked in polynomial time All problems with solutions solvable in polynomial time are in the NP class Logically, there are someproblems that can be checked in polynomial time but not solved in polynomial time but none have been found so far, considered unlikely by experts. Not all decidable problems are NP o NP-Complete Hardest, subset of NP Solution is testable in polynomial time, and a known NP-C problem can be reduced to it, then its NP-C NP-C problems cannot be solved in polynomial time

Fall 2013

Algorithm Design Techniques (W. 10)

Greedy Algorithms (W. 10.1) Def.: work in phases; decision made that appears to be good, without looking down road to see possible consequences Local optimum: what is used to make the decision, at termination, local optimum should equal global (we hope) Change for US money works by a greedy algorithm, minimizing coins needed Scheduling (assuming jobs are sorted shortest to longest completion time) o How to minimize average completion time? Single core: shortest-> longest, ties arbitrarily broken Multicore: start at the first processor, proceed round-robin adding the next shortest job to the next processor o Minimizing Final Completion time (only for multiprocessors)- only concern is when the last job finishes Key: keep all processors always busy NP-complete, all solution must be tried exponential growth and computational sadness

Huffman (W 10.1.2) Basic File compression Algorithm works by starting with a forest of trees each with a single node (letter). Repeatedly join the two with the smallest frequency until a single tree is present. o Going left down the tree is a 0, right is 1; Called a prefix code, since the encoding of any character is not a prefix for any others

Approximate Bin Packing (W 10.1.3) o Online: item must be placed before the next one can be processed Cannot always give optimal solution, since it cant change its mind Next Fit check to see whether it fits in the same bin as the last one, o place it Else create new bin, ad it there First Fit Scan the bins in order Place item in the first bin large enough to hold it or make a new one if necessary
Updated: 14 Jan 2014 10:27 AM

CSC 172 Midterm

Fall 2013

Best Fit Place the item in the tightest spot among all bins o Offline: processing doesnt begin until the entire input is read The issue with online hard to place late sequence large items Sort the items decreasing to solve this Can apply either First-Fit Decreasing or Best-Fit Decreasing These are only better options and not going to guarantee optimal packing (minimum bins) Divide and Conquer (W. 10.2) Divide: smaller problems are solved recursively Conquer: Final solution is formed from the mini-solutions Insist the sub problems are Disjoint o essentially no overlapping o No calculating the same things twice Quicksort and Mergesort- examples that run at the mathematically optimal O( N log N) Need more than one recursive call (i.e. doing the left and right sub-problems) ( ) ( ) ( ) ( ) (linear extra work in addition to the two recursive calls) In General: ( ) ( ) ( ) ( ) { ( ) ( ) Closest points on a plane: Sort by both x (list P) and y (list Q). Do a recursive call on the left and right sides of the dividing line and for the constant work. This is the ( ) part. ), which The constant work is done by keeping two lists. Presort both at cost ( will only add a constant to the big-O, which is dropped so essentially free. Both lists are passed at each recursive call. Once the dividing line is known, go through the Ysorted and place each element in the correct left or right sublist. The work to calculate the center distance is done with double nested for-loops and still linear time extra. Selection Problem: find the kth smallest element: Linear Time o Quickselect: very efficient in practice, o Main difference: only one subproblem in the divide and conquer method o To get linear algorithm Use the quickselect as a basis, but only make one recursive call Ensure subproblem is fraction of original, not only a few elements off
Updated: 14 Jan 2014 10:27 AM

CSC 172 Midterm

Fall 2013

Easy if lots of time spent doing the pivot, but pivot needs to be really quick to find, otherwise it has too big of an impact on the run time Median-of-5-median partitioning: Good pivot finding method: sample of medians groups of 5 elements, ignore 1. Arrange N elements into extra medians, call them M 2. Find median of each group, totaling 3. Find the median of this group, this is the pivot Run Time o Sort 5 elements with 8 comparisons (constant), times is linear (total) o Median of M, recursive call on the selection algorithm Overhead is great so rarely used in practice but it does eliminate worse-case scenarios o Theoretical Improvements for Arithmetic Problems Multiplying very large numbers are no longer constant time The solutions have huge overhead so only used for extremely large numbers Integer Multiplication Split in half most significant digits and the least significant, each with N/2 digits, then algebra to get down to O N ^ log 3 base 2 with three recursive calls Small N- too much overhead Big N- far better algorithms, still divide and conquer Dynamic Programming (W. 10.3) Definition: A technique to translate a recursive algorithm into a non-recursive which stores data in tables Table Instead of recursion: o Store data for the subproblems in a table, one extra dimension, to eliminate recomputing values Optimal BST: list of words and fixed probabilities of their occurrence, arrange in a way such that total access time is minimized o Need to figure out the center that minimizes the total access time, and restively go down on both left and right until the tree is formed All Pairs Shortest Path: compute the shortest path between all pairs of vertexes in a directed graph
Updated: 14 Jan 2014 10:27 AM

CSC 172 Midterm

Fall 2013

10

o Run Time: N^3, slight improvement over Dijkstra because of tighter bounds on the for loops Randomized Algorithms (W. 10.4) At least once during the algorithm, random number used to make a decision. Run time depends on both the input and the random numbers used Worse case is often the same as non-randomized algorithms o Same input twice, two different runtimes because the random numbers used differ, nonrandom algorithm will have the same runtime Quicksort (nonrandom) worse case is quadratic for presorted data, the randomized algorithm is O (N log N)-expected run time. o Expected running time bound is somewhat better than average case, but considerably weaker than the corresponding worse-case bound. Random Number Generators o Pseudorandom- appear random, what computers can actually do o Need to generate a sequence of random numbers to ensure randomness that is not really random o Linear Congruential Generator: Repeats after M-1 items If M is prime, xi never zero Initial value is the seed (dont pick 0 or 1) Recommended: M = 231 1 and A = 48,271 Skip Lists, supports both searching and insertion in O(log N) expected time o Each node contains a link to the node a few down from it Primality Testing o The obvious algorithm is not good for very large numbers, approx. where d is the number of digits in the number (odds starting at 3, special check for 2). o Use Fermats little theorem: ( ). Tests if M is prime- if not equal to one, then its not prime, if it is equal to 1 then it is probably prime Several values of A are used to make the chance of a false positive negligible

Backtrack and State-Space Search (W. 10.5) Many times is clever exhaustive search Exhaustive search that preliminary rules some out and can change its mind, like placing furniture
Updated: 14 Jan 2014 10:27 AM

CSC 172 Midterm

Fall 2013

11

Pruning: elimination of a large group of possibilities in one step Turnpike Reconstruction o Need to place points on a number line, given the distances o Guess location on where they are until a fuck up is made , then back up Tic-Tac-Toe o Applies to more complicated games like checkers or chess, TTT is easier to think about o Apply numbers to goodness of a spot, +1 if comp might win, 0 if draw, -1 if comp would have lost o Terminal position: position where above assignment can be determined when the board is examined o Minimax strategy: nonterminal points are determined by recursive calls assuming both players play optimally One person tries to minimize (human), other tries to maximize (comp) o Successor Position: Ps that is reachable from P by one move o In complicated games, the biggest thing that limits is how deep the recursion can go, add more depth get better results o Transposition Table: data structure that records/keeps track of previously evaluated positions. Almost always done with hashing. Time saving allows recursion to go several levels deeper. - Pruning: most significant improvement o Intelligently eliminate solutions that will not have the positive effect on the value (min or max) Game Tree: trace of recursive calls in hypothetical game to evaluate hypothetical position; no tree is actually constructed by algorithm