Вы находитесь на странице: 1из 23

CSA

1017

Data Structures and Algorithms 1

Mr. John Abela

CSA 1017 Data Structures and Algorithms 1


Lecturer: John Abela

Algorithms and Complexity



Algorithm: a set of rules for solving a problem in a finite number of steps

Numbers:
- Natural: (1, 2, 3 )
- Integers: (including negatives and 0)
- Rational: (including fractions)
- Real: (including non-terminating and non-recurring decimals ex. )
- Complex: (including roots of negative numbers)

Set: a collection of objects (not necessarily numbers)

U ->

Universal Set

|S| ->

Cardinality

->

The number of objects in set S


In order to have a mathematical structure, one needs a set and an operator or a relation

<set name>, <operator or relation symbol>


Operator: takes one or more elements from a set, apply an operation and gives something else
-

Example: addition (+), subtraction(-), concatenate(), union(), intersection (), etc...

Unary Operator: works on one object

Binary Operator: works on two objects

Properties of operators:
-

x, y S,

x c y = x c y

Associative: Consider S, a
x, y, z S,

Commutative: Consider S, c

(x a y) a z = x a (y a z)

Distributive: Consider S, d1, d2


x, y, z S,

x d1 (y d2 z) = (x d1 y) d2 (x d1 z)
1

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Closed Set: when the operator gives back an object from the same set, therefore:

a, b S,

a <operator> b S

Example: the natural numbers are closed under addition

When an operator, works on an object and the identity, the result is the object itself
Consider S, o
a S,

a o i = a

If the operand is addition, the identity is 0 and if the operand is multiplication, the identity is 1

Relation: acts upon elements from a set and gives a true or false
a, b S,

a <relation> b -> T/F


Properties of relations:
-

Reflexive: Consider S, r
x S,

Symmetric: Consider S, s
x, y S,

x r x -> T

x s y -> T

and

y s x -> T

x, y, z S,

x t y -> T

and

y t z -> T

Then

Transitive: Consider S, t

x t z -> T


Equivalence: a relation that is reflexive, symmetric and transitive. With equivalence, the set becomes
partitioned and every element is part of one (and only one) partition. Also, the union of all the
partitions equals the original set

Function:

Domain

f
Func[on

Co-
Domain

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Notations:

, f(x) = x2

f: "

f: a " a*a


Cartesian Product: the set SxS (or S2) holds all possible pairs (or triples) from set S

Defining Sets: Example: to define a set of even numbers

E = { a | a mod 2 = 0 }

| means such that


Inductive Definition: Example: to define natural numbers
-

Basis

Operation

+1 or succ()

Closure


Function Properties:
-

One-to-One: every y in the co-domain has only one x in the domain such that f(x) = y

Onto: for every y in the co-domain there is an x in the domain such that f(x) = y. Therefore, all
elements in the co-domain are used


Types of Functions:
Linear Function: Example: f(x) = 2x
Polynomial Function: Example: f(x) = x2
Logarithmic Function: Example: f(x)= log2 x
Exponential Function: Example: f(x) = 2x

Intractable problems: problems that can be solved but not fast enough for the solution to be useful

[all exponential functions]

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Travelling Salesman Problem (TSP): given a list of cities and their pair-wise distances, the task is to find
the shortest possible tour that visits each city exactly once
[In this case, to obtain an optimal result, one has to use an exponential algorithm (generate
all permutations possible and compare results). However, it will take billions of years to
execute, so we use an approximation solution. We use an algorithm that will generate
random sequences and keeps the smallest result. This solution may, or may not, be the
shortest distance.]

Space Complexity: the amount of space required by an algorithm to execute, expressed as a function of
the size of the input to the problem

Time Complexity: the amount of time taken by an algorithm to run, expressed as a function of the size
of the input to the problem (cannot be less than space complexity)
-

Bubble sort n items: O(n2)

Quick sort n items: O(n.log2 n)

Sequential search through n unsorted items: O(n)

Binary search through n sorted items: O(log2 n)


Towers of Hanoi: to move the disks on a different pole without ever having a larger disk on a smaller
disk (Time Complexity: O(2n))

Q: Why do we work out the time complexity?
A: To make sure that the algorithm is not intractable (ie. not exponential)

Consider, input n

Input
Invariant

for i = 1 to 100
writeln(Students are dumb);

Input
Variant

for i = 1 to n
writeln(Lecturers are nice people);


[When you work out the time complexity of a program, ignore all code fragments that are
input invariant and add up the time complexities of the code fragments which are input
variant. This means that for the above program, the time complexity is O(n)]
4

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Running Time Function (RTF): a function that from some points onwards become positive. A function f
is and RTF if:

f : + "

st

and f(n) > 0

n > no

no +


Examples:
f(n) = 2n + 1

no = 0

f(n) =n

no = 0

f(n) = n2 16

no = 4

f(n) = sin n + 10
sin n + 1 is not an RTF since 0 is not strictly positive


Theta Relation: consider two RTFs f(n) and g(n)

f(n) is (g(n)) if:

c1 and c2 +

st c1 g(n) f(n) c2 g(n)

n > n0

and n0 +


Examples:
f(n) = n2 g(n) = 3n2

C1 g(n) f(n) C2 g(n)


1 3n2 n2 1. 3n2
9
=> n2 is 3n2

-

f(n) = n3 g(n) = n2
C1 g(n) f(n) C2 g(n)
.. n2 n3 .. n2
=> n3 is not n2
You cannot find a C2 such that C2n2 is greater than n3.
Since, when n exceeds C2, C2n2 will be smaller than f(n)
5

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Note: if the largest exponent of two functions is the same, you will be able to find C1 and C2. Coefficients
and other terms are ignored

Example: n2 is 4n2 + 3n + 1

and their complexity is O(n2)


is reflexive

f(n) is (f(n))

proof:

c1 f(n) f(n) c2 f(n)

f(n) f(n) f(n)

let c1 = c2 = 1


is symmetric

if f(n) is (g(n)) then g(n) is (f(n))

proof:

if f(n) is (g(n))

=> c1 g(n) f(n) c2 g(n)

for g(n) is (f(n))

c3 f(n) g(n) c4 f(n)

consider f(n) c2 g(n)

1/c2 f(n) g(n)

=> c3 = 1/c2

consider c1 g(n) f(n)

g(n) 1/c2 f(n)

=> c4 = 1/c1

1/c2 f(n) g(n) c4 = 1/c1 f(n)

Note that no remains the same

CSA 1017

Data Structures and Algorithms 1

is transitive

Mr. John Abela

if f(n) is (g(n)) and g(n) is (h(n)) then f(n) is (h(n))

proof:

if f(n) is (g(n))

=> c1 g(n) f(n) c2 g(n)

if g(n) is (h(n))

=> c3 h(n) g(n) c4 h(n)

for f(n) is (h(n))

c5 h(n) f(n) c6 h(n)

consider c1 g(n) f(n) and c3 h(n) g(n)

c3 c1 h(n) c1 g(n) f(n)

=> c5 = c3 . c1

consider f(n) c2 g(n)and g(n) c4 h(n)

f(n) c2 g(n) c4 c2 h(n)

=> c6 = c4 . c2

c3 . c1 h(n) f(n) c4 . c2 h(n)

note that no is the maximum out of the two


Properties for Theta relation:
1. For any c > 0, c f(n) is (f(n))
2. If f1(n) is (g(n)) and f2(n) is (g(n)), then (f1 + f2)(n) is (g(n))
3. If f1(n) is (g1(n)) and f2(n) is (g2(n)), then (f1 . f2)(n) is ((g1 . g2) (n))

Big O Relation: consider two RTFs f(n) and g(n)
f(n) is O(g(n)) if:

c + and n0 +

st f(n) c g(n)

n > n0


If f(n) is O(g(n)) than the growth rate of f is not larger than that of g
7

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Examples:
-

n3 O(n2) -> False

n2 O(n3) -> True


Omega Relation: consider two RTFs f(n) and g(n)
f(n) is (g(n)) if:

c + and n0 +

st f(n) c g(n)

n > n0


If f(n) is (g(n)) than the growth rate of f is not smaller than that of g

Properties of Big O and Omega notations:
1. If f(n) is (g(n)) then g(n) is O(f(n))
2. If f(n) is (g(n)) and f(n) is O(g(n)) then f(n) is (g(n))

Notes:
-

<Logarithmic function> is O(n)

<Logarithmic function> is not (n)


[Logarithmic growth is less than linear]

<Exponential function> is not O(n)

< Exponential function> is (n)


[Exponential growth is greater than linear]


Non-Computable Problems: problem that provably, no one can write an algorithm to find their solution.
Example:
-

Halting: one cannot write an algorithm that works on another program (given as input) and
determine whether that program will terminate or not

Busy Beaver

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Computable Problems:
Example:
-

Polynomial: problems for which the solution is polynomial

Exponential: problems with exponential solutions

Unknown Algorithms: problems for which no standard algorithm exists. Example: an algorithm
to predict the Super 5 of next week


A problem is in NP if:
1. It is a problem where the answer is yes or no [decision problem]
2. It is presented to the oracle and the answer is returned in O(1)
3. Answer is verifiable in polynomial time [polynomial verifiability]

Example: TSP
1. Decision Problem: Is there a tour of length k or less?
2. Oracle: answer is a list of cities
3. Polynomial Verifiability: compute the distance between the cities in the answer and check that
the result is k

Example: Map Coloring Problem [Given a map of countries ,find the least amount of colors needed so
that two adjacent countries do not have the same color]
1. Decision Problem: Can the map be colored with k colors or less?
2. Oracle: answer is a list of cities and colors
3. Polynomial Verifiability:
a. Place a node in the center of each country and name it according to the color
b. Check that no edge has the same letter twice

NP-Completeness (NPC):

Let be in NP

is in NPC if all problems in NP can be Turing reduced to


Turing Reduced (): take the input of P` (a very hard problem) and change it to the input of P in
polynomial time



9

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Example of Turing reduction:


Vertex Cover: given a graph, a vertex covers an edge if it touches it. Choose the smallest number of
vertices that cover all the edges
Set Cover: given a set, and a list of its subsets, find the smallest amount of subsets that their union is
equal to the original set

Change the input of the vertex cover to accommodate the set cover

- Original set: all edges

- Subsets: subset for each vertex containing the edges that it covers

Therefore, Vertex Cover Set Cover


Reducing TSP to shell sort: note: not Turing reduction!

Create a list of all combinations and their lengths (still exponential) and sort it using shell sort

TSP is in NPC (ie. all problems can be reduced to it)


[P NP]
P = NP ?: can an exponential problem be reduced to polynomial solution?

If one NPC is reduced to polynomial time, all NPCs can be reduced to polynomial time!

10

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Data Structures

Graph: a collection of vertices that could be connected together by edges,

G = (V, E)

V is a finite set of vertices

E is a subset of VxV

Example:
a

d
c


V = {a, b, c, d, e, f}
E = {ab, bc, cd}

Path: an ordered sequence of vertices V1Vn

st (Vi, Vi+1) E

1 i n


Directed Edge: an edge that can only be traversed in one direction
A " B

Cycle: a path, V1Vn, where V1 = Vn
Note: a graph can either be cyclic or acyclic

Connected Graph:

If a, b V

a path from a to b


Snake Relation (~): V, ~

a, b V

a~b if a path from a to b

This relation partitions the graph into components



11

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Linked List: a logical list ideal to list in order, but not to search

Trees:
-

Parent node is called the root

Roots can have children

Children can have children

Nodes without children are called leaves

Acyclic graph

One path from root to all other nodes


To define the descendants of x:
-

Take the sub-tree of which x is the root

All nodes except x


To define the ancestors of y:
-

Take the path from y to the root

All nodes except y


Height: the length of the largest path of a given tree

Note: height is important because we usually search a tree to find a node, and the longest path, is the
longest time you can take. Therefore, the time complexity for searching is the height.

Restriction on Trees:
-

Structure Conditions: restriction on the number of children per node

Order Condition: restriction in the order of values in nodes

Balance Condition: restriction on the balance of the heights of the tree

Structure

Order

Balance

Unrestricted Tree

Binary Tree

BST

AVL



12

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Unrestricted Tree: a tree that has no type of restriction whatsoever


-

Best Case Height : 1

Worst Case Height: n 1


Q: How do you store an unrestricted tree in a data structure?
A: First Child / Next Sibling

Example:

Node

First Child

Next Sibling


Binary Tree: every node has at most two children
-

Best Case Height : log2 n

Worst Case Height: n 1


Notes:
-

The level is the log2 of the nodes in that level

The height is [log2 (n+1)]-1


Proof:
Level 0 1 Node

Level 1 2 Nodes
Level 2 4 Nodes
Level 3 8 Nodes

Total Nodes: 15


13

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Storing a Binary Tree in an Array:


Left

Data

Right


Binary Search Tree (BST): for any node, the values in the left sub-tree must be less than (or equal to) it
and the values in the right sub-tree must be greater than (or equal to) it

Searching a BST:
-

Best Case Time Complexity : 1

Worst Case Time Complexity: height

Efficient Searching
17

12

30

15

27

If you enter nodes in alphabetical order, you get an unbalanced tree


Deleting an element from a sorted binary tree:
1.

Find element to delete

2.

Choose subtree on the left or right

3.

Find rightmost or leftmost element

4.

Place instead of the deleted element

5.

If necessary repeat

14

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Searching: start at root, visit the less nodes possible until you find the desired value, or until you find a
leaf (Time Complexity: height)

Traversal: a way of visiting all nodes (Time Complexity: O(n) )
-

In-order Traversal: Left-Root-Right


inorder(T)
inorder(T.L)
write(root)
inorder(T.R)

Pre-order Traversal: Root-Left-Right


preorder(T)
write(root)
preorder(T.L)
preorder(T.R)

Post-order Traversal: Left-Right-Root


postorder(T)
postorder(T.L)
postorder(T.R)
write(root)


Expression Tree: a binary tree where all leaves store operands and all non-leaves store operators. Every
node has 0 or 2 children.
-

Post-order Traversal: gets postfix expression

In-order Traversal: gets infix expression


inorder(T)
write(()
inorder(T.L)
write(root)
inorder(T.R)
write())



15

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Consider the following expression tree,


Inorder -> Infix: ((3+4)*5)

Postorder -> Postfix: 34+5*


Construct an Expression tree from a postfix expression:
-

Start reading postfix expression from left to right

When an operand is found, push it into the stack as a one-node tree

When an operator is found, pop two trees from the stack, join them by that operator, and push
back in the stack


Adelson Velskii-Landis (AVL): a BST with a balance condition: for any node, the height of the left sub-
tree and that of the right sub-tree, differs by mostly 1. Example:
17

12

30

20

15

18

40

25


If one adds the number 23, it would violate the AVL condition

Rebalancing in constant time:
1. Add new node (Example: 23)
2. Consider all nodes on path from new node to the root and recompute AVL height for every node.
Let the first node that violates the AVL condition be (in this case 30), go back to the new node,
and note the first two directions (in this case Left-Right) and use it in the template (Note:
searching for takes logarithmic time)

16

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Template:

LL

LR

RL

RR


4 Rotations: 4 rebalancing algorithms using constant time O(1). 1 and 4 are single rotations while 2 and
3 are double rotations.

Template 1:

A
B

Z
X


Template 4:

A
Y

X
Z

B
Y



17

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Template 2:

A
X

X Y


Template 3:

B
Z

A
X

B
X Y


B-Trees: can have many children
-

Advantage: faster

- Disadvantage: complex coding

Order M

- Root has between 2 and M children

All non-root nodes have 2 to M children

- All non-leaf nodes have M-1 data pointers

All data is in the leaves

- Leaves have M data items


17
30
1 2 3

10
1 2 3

10

20
1 2 3

17

18

20

40
51
1 2 3

30

31

39

40

51

52




18

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Abstract Data Type (ADT): we define how they work and not how to implement them, therefore they
are completely independent of the implementation (stacks & queues)

Priority Queue: a queue where each element is given a priority key, and the first element that goes out
is the one with the highest priority
-

Insertion & Retrieval:


1. Insertion O(1) and Retrieval O(n): both value and priority reside in an array in the order they
were entered. For retrieval, one have to search the whole array for the element with the
highest priority
2. Insertion O(n) and Retrieval O(1): for every element added, the array is swapped and sorted
according to the priority. Therefore, to remove an element, the last element of the array is
removed
3. Most efficient method (using a heap)


Heap: a binary tree (not a BST) that helps us implement a priority queue with O(log2 n) insertion and
removal. It is filled in level by level from left to right. All leaves are at l or l 1. For all nodes n, n >=
children
17
11
7

6
1


Cascade Up: once an element is added, check the value of its parent. IT the value of the parent is less
than that of the new element, swap them etc

Cascade Down: since only the root can be deleted, a technique called cascade down is used:
1. Remove the root
2. Put the last element added instead of the deleted root
3. Replace the node with the largest of its children (unless both smaller)
4. Etc



19

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Implementing a heap using an array:


-

Children of i are at 2i and 2i+1

Parents of i are at (int) 2


1

17

11



Sorting using a heap:
1. Build a heap
2. Put root in an array (ie. largest number)
3. Restore heap
4. Go to step 2

-

Time Complexity:
o n log2 n to build the heap from the unsorted list
o n log2 n to build the sorted list from the heap
o Total: 2n log2 n

20

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Sorting Algorithms

Iterative Sorts:
-

Bubble Sort

Shell Sort

Insertion Sort


Bubble Sort: checks the element with the adjacent element and swaps if necessary. Repeat until the
array is sorted. With every step, the largest number is put at the end of the list
-

Time Complexity: n(n 1) = n2 n = O(n2)

Space Complexity: n

Pseudo code: [most optimized]


flag
k = 1
repeat
flag = false
for i = 1 to n-k
if a[i] > a[i+1]
then swap and set flag = true
k = k + 1
until flag = false


Shell Sort: similar to bubble sort, but instead of comparing [i] with [i + 1], compare it with [i + k], where
k = 2 for every iteration. This will put the largest number at the end of the list
-

Time Complexity: O(n2)

Space Complexity: n

Pseudo code:
flag
k = n/2
repeat
flag = false
for i = 1 to n-k
if a[1] > a[i+k]
then swap and set flag = true
if k > 1
then k = k/2 and flag = true
until k = 1 and flag = false

21

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Insertion Sort: divides an array into a sorted and unsorted region. It then gets the first item of the
unsorted region and places it in the right place in the sorted region
-

Time Complexity: n(n 1) = n2 n = O(n2)

Space Complexity: n

Pseudo code:
for x = 2 to n
for i = x downto 2
if a[i] < a[i-1]
then swap


Recursive Sorts:
-

Quick Sort (most popular sort because of its space complexity)

Merge Sort (fastest sort)


Quick Sort: partitions the array around a pivot (ideally at the middle) and rearranges the array such that
all items before the pivot are less than the pivot, and all items after the pivot are greater the pivot
-

Time Complexity:
o Best: O(n log2 n)
o Worst: O(n2)

Space Complexity: n

Pseudo code:
if |L| <= 1
return L
else
choose pivot
rearrange L`, P, L``
quickSort(L`)
quickSort(L``)
return L` concatenate P concatenate L``

Rearrange L` P L:
1. Put chosen pivot at the end of the list
2. Create two pointers: i points at the beginning of the list, and j points to the last but one (ie.
excludes pivot)
3. i moves to the right and stops at the first number greater than the pivot
4. j moves to the left and stops at the first number less than the pivot
5. Swap the values pointed by i and j and continue moving i and j
6. When i and j overlap, swap the pivot with i

22

CSA 1017

Data Structures and Algorithms 1

Mr. John Abela

Merge Sort: splits an array in two halves, recursively merges both halves and merges sorted halves
-

Time Complexity: O(n log2 n)

Space Complexity: 2n

Pseudo code:
if |L| <= 1
return L
else
split into L` and L``
mergeSort(L`)
mergeSort(L``)
merge L` and L`` into L*
return L*

23

Вам также может понравиться