Data Structures and Algorithms

CSA
1017
Data Structures and Algorithms 1
Mr. John Abela
CSA 1017 Data Structures and Algorithms 1

Lecturer: John Abela

Algorithms and Complexity

Algorithm: a set of rules for solving a problem in a finite number of steps

Numbers:
- Natural: (1, 2, 3 )
- Integers: (including negatives and 0)
- Rational: (including fractions)
- Real: (including non-terminating and non-recurring decimals ex. )
- Complex: (including roots of negative numbers)

Set: a collection of objects (not necessarily numbers)

U ->
Universal Set
|S| ->
Cardinality
->
The number of objects in set S

In order to have a mathematical structure, one needs a set and an operator or a relation

<set name>, <operator or relation symbol>

Operator: takes one or more elements from a set, apply an operation and gives something else
-
Example: addition (+), subtraction(-), concatenate(), union(), intersection (), etc...
Unary Operator: works on one object
Binary Operator: works on two objects
Properties of operators:
-
x, y S,
x c y = x c y
Associative: Consider S, a
x, y, z S,
Commutative: Consider S, c
(x a y) a z = x a (y a z)
Distributive: Consider S, d1, d2

x, y, z S,
x d1 (y d2 z) = (x d1 y) d2 (x d1 z)
1
CSA 1017
Mr. John Abela
Closed Set: when the operator gives back an object from the same set, therefore:

a, b S,
a <operator> b S
Example: the natural numbers are closed under addition
When an operator, works on an object and the identity, the result is the object itself
Consider S, o
a S,
a o i = a
If the operand is addition, the identity is 0 and if the operand is multiplication, the identity is 1

Relation: acts upon elements from a set and gives a true or false
a, b S,
a <relation> b -> T/F

Properties of relations:
-
Reflexive: Consider S, r
x S,
Symmetric: Consider S, s
x, y S,
x r x -> T
x s y -> T
and
y s x -> T
x, y, z S,
x t y -> T
and
y t z -> T
Then
Transitive: Consider S, t
x t z -> T

Equivalence: a relation that is reflexive, symmetric and transitive. With equivalence, the set becomes
partitioned and every element is part of one (and only one) partition. Also, the union of all the
partitions equals the original set

Function:
Domain
f
Func[on
Co-
Domain
CSA 1017
Mr. John Abela
Notations:

, f(x) = x2
f: "
f: a " a*a

Cartesian Product: the set SxS (or S2) holds all possible pairs (or triples) from set S

Defining Sets: Example: to define a set of even numbers

E = { a | a mod 2 = 0 }
| means such that

Inductive Definition: Example: to define natural numbers
-
Basis
Operation
+1 or succ()
Closure

Function Properties:
-
One-to-One: every y in the co-domain has only one x in the domain such that f(x) = y
Onto: for every y in the co-domain there is an x in the domain such that f(x) = y. Therefore, all
elements in the co-domain are used

Types of Functions:
Linear Function: Example: f(x) = 2x
Polynomial Function: Example: f(x) = x2
Logarithmic Function: Example: f(x)= log2 x
Exponential Function: Example: f(x) = 2x

Intractable problems: problems that can be solved but not fast enough for the solution to be useful

[all exponential functions]
CSA 1017
Mr. John Abela
Travelling Salesman Problem (TSP): given a list of cities and their pair-wise distances, the task is to find
the shortest possible tour that visits each city exactly once
[In this case, to obtain an optimal result, one has to use an exponential algorithm (generate
all permutations possible and compare results). However, it will take billions of years to
execute, so we use an approximation solution. We use an algorithm that will generate
random sequences and keeps the smallest result. This solution may, or may not, be the
shortest distance.]

Space Complexity: the amount of space required by an algorithm to execute, expressed as a function of
the size of the input to the problem

Time Complexity: the amount of time taken by an algorithm to run, expressed as a function of the size
of the input to the problem (cannot be less than space complexity)
-
Bubble sort n items: O(n2)
Quick sort n items: O(n.log2 n)
Sequential search through n unsorted items: O(n)
Binary search through n sorted items: O(log2 n)

Towers of Hanoi: to move the disks on a different pole without ever having a larger disk on a smaller
disk (Time Complexity: O(2n))

Q: Why do we work out the time complexity?
A: To make sure that the algorithm is not intractable (ie. not exponential)

Consider, input n

Input
Invariant
for i = 1 to 100
writeln(Students are dumb);
Input
Variant
for i = 1 to n
writeln(Lecturers are nice people);

[When you work out the time complexity of a program, ignore all code fragments that are
input invariant and add up the time complexities of the code fragments which are input
variant. This means that for the above program, the time complexity is O(n)]
4

CSA 1017
Mr. John Abela
Running Time Function (RTF): a function that from some points onwards become positive. A function f
is and RTF if:
f : + "
st
and f(n) > 0
n > no
no +

Examples:
f(n) = 2n + 1
no = 0
f(n) =n
no = 0
f(n) = n2 16
no = 4
f(n) = sin n + 10
sin n + 1 is not an RTF since 0 is not strictly positive

Theta Relation: consider two RTFs f(n) and g(n)

f(n) is (g(n)) if:
c1 and c2 +
st c1 g(n) f(n) c2 g(n)
n > n0
and n0 +

Examples:
f(n) = n2 g(n) = 3n2
C1 g(n) f(n) C2 g(n)

1 3n2 n2 1. 3n2
9
=> n2 is 3n2

-
f(n) = n3 g(n) = n2
C1 g(n) f(n) C2 g(n)
.. n2 n3 .. n2
=> n3 is not n2
You cannot find a C2 such that C2n2 is greater than n3.
Since, when n exceeds C2, C2n2 will be smaller than f(n)
5
CSA 1017
Mr. John Abela
Note: if the largest exponent of two functions is the same, you will be able to find C1 and C2. Coefficients
and other terms are ignored

Example: n2 is 4n2 + 3n + 1
and their complexity is O(n2)

is reflexive
f(n) is (f(n))
proof:

c1 f(n) f(n) c2 f(n)
f(n) f(n) f(n)
let c1 = c2 = 1

is symmetric
if f(n) is (g(n)) then g(n) is (f(n))
proof:

if f(n) is (g(n))
=> c1 g(n) f(n) c2 g(n)
for g(n) is (f(n))
c3 f(n) g(n) c4 f(n)
consider f(n) c2 g(n)
1/c2 f(n) g(n)
=> c3 = 1/c2
consider c1 g(n) f(n)
g(n) 1/c2 f(n)
=> c4 = 1/c1
1/c2 f(n) g(n) c4 = 1/c1 f(n)
Note that no remains the same
CSA 1017
is transitive
Mr. John Abela
if f(n) is (g(n)) and g(n) is (h(n)) then f(n) is (h(n))
proof:

if f(n) is (g(n))
=> c1 g(n) f(n) c2 g(n)
if g(n) is (h(n))
=> c3 h(n) g(n) c4 h(n)
for f(n) is (h(n))
c5 h(n) f(n) c6 h(n)
consider c1 g(n) f(n) and c3 h(n) g(n)
c3 c1 h(n) c1 g(n) f(n)
=> c5 = c3 . c1
consider f(n) c2 g(n)and g(n) c4 h(n)
f(n) c2 g(n) c4 c2 h(n)
=> c6 = c4 . c2
c3 . c1 h(n) f(n) c4 . c2 h(n)
note that no is the maximum out of the two

Properties for Theta relation:
1. For any c > 0, c f(n) is (f(n))
2. If f1(n) is (g(n)) and f2(n) is (g(n)), then (f1 + f2)(n) is (g(n))
3. If f1(n) is (g1(n)) and f2(n) is (g2(n)), then (f1 . f2)(n) is ((g1 . g2) (n))

Big O Relation: consider two RTFs f(n) and g(n)
f(n) is O(g(n)) if:

c + and n0 +
st f(n) c g(n)
n > n0

If f(n) is O(g(n)) than the growth rate of f is not larger than that of g
7

CSA 1017
Mr. John Abela
Examples:
-
n3 O(n2) -> False
n2 O(n3) -> True

Omega Relation: consider two RTFs f(n) and g(n)
f(n) is (g(n)) if:

c + and n0 +
st f(n) c g(n)
n > n0

If f(n) is (g(n)) than the growth rate of f is not smaller than that of g

Properties of Big O and Omega notations:
1. If f(n) is (g(n)) then g(n) is O(f(n))
2. If f(n) is (g(n)) and f(n) is O(g(n)) then f(n) is (g(n))

Notes:
-
<Logarithmic function> is O(n)
<Logarithmic function> is not (n)

[Logarithmic growth is less than linear]

<Exponential function> is not O(n)
< Exponential function> is (n)

[Exponential growth is greater than linear]

Non-Computable Problems: problem that provably, no one can write an algorithm to find their solution.
Example:
-
Halting: one cannot write an algorithm that works on another program (given as input) and
determine whether that program will terminate or not
Busy Beaver
CSA 1017
Mr. John Abela
Computable Problems:
Example:
-
Polynomial: problems for which the solution is polynomial
Exponential: problems with exponential solutions
Unknown Algorithms: problems for which no standard algorithm exists. Example: an algorithm
to predict the Super 5 of next week

A problem is in NP if:
1. It is a problem where the answer is yes or no [decision problem]
2. It is presented to the oracle and the answer is returned in O(1)
3. Answer is verifiable in polynomial time [polynomial verifiability]

Example: TSP
1. Decision Problem: Is there a tour of length k or less?
2. Oracle: answer is a list of cities
3. Polynomial Verifiability: compute the distance between the cities in the answer and check that
the result is k

Example: Map Coloring Problem [Given a map of countries ,find the least amount of colors needed so
that two adjacent countries do not have the same color]
1. Decision Problem: Can the map be colored with k colors or less?
2. Oracle: answer is a list of cities and colors
3. Polynomial Verifiability:
a. Place a node in the center of each country and name it according to the color
b. Check that no edge has the same letter twice

NP-Completeness (NPC):

Let be in NP
is in NPC if all problems in NP can be Turing reduced to

Turing Reduced (): take the input of P` (a very hard problem) and change it to the input of P in
polynomial time

9

CSA 1017
Mr. John Abela
Example of Turing reduction:

Vertex Cover: given a graph, a vertex covers an edge if it touches it. Choose the smallest number of
vertices that cover all the edges
Set Cover: given a set, and a list of its subsets, find the smallest amount of subsets that their union is
equal to the original set

Change the input of the vertex cover to accommodate the set cover
- Original set: all edges
- Subsets: subset for each vertex containing the edges that it covers
Therefore, Vertex Cover Set Cover

Reducing TSP to shell sort: note: not Turing reduction!

Create a list of all combinations and their lengths (still exponential) and sort it using shell sort
TSP is in NPC (ie. all problems can be reduced to it)

[P NP]
P = NP ?: can an exponential problem be reduced to polynomial solution?

If one NPC is reduced to polynomial time, all NPCs can be reduced to polynomial time!

10

CSA 1017
Mr. John Abela
Data Structures

Graph: a collection of vertices that could be connected together by edges,

G = (V, E)
V is a finite set of vertices
E is a subset of VxV

Example:
a
d
c

V = {a, b, c, d, e, f}
E = {ab, bc, cd}

Path: an ordered sequence of vertices V1Vn

st (Vi, Vi+1) E
1 i n

Directed Edge: an edge that can only be traversed in one direction
A " B

Cycle: a path, V1Vn, where V1 = Vn
Note: a graph can either be cyclic or acyclic

Connected Graph:

If a, b V
a path from a to b

Snake Relation (~): V, ~

a, b V
a~b if a path from a to b
This relation partitions the graph into components

11

CSA 1017
Mr. John Abela
Linked List: a logical list ideal to list in order, but not to search

Trees:
-
Parent node is called the root
Roots can have children
Children can have children
Nodes without children are called leaves
Acyclic graph
One path from root to all other nodes

To define the descendants of x:
-
Take the sub-tree of which x is the root
All nodes except x

To define the ancestors of y:
-
Take the path from y to the root
All nodes except y

Height: the length of the largest path of a given tree

Note: height is important because we usually search a tree to find a node, and the longest path, is the
longest time you can take. Therefore, the time complexity for searching is the height.

Restriction on Trees:
-
Structure Conditions: restriction on the number of children per node
Order Condition: restriction in the order of values in nodes
Balance Condition: restriction on the balance of the heights of the tree
Structure
Order
Balance
Unrestricted Tree
Binary Tree
BST
AVL

12

CSA 1017
Mr. John Abela
Unrestricted Tree: a tree that has no type of restriction whatsoever

-
Best Case Height : 1
Worst Case Height: n 1

Q: How do you store an unrestricted tree in a data structure?
A: First Child / Next Sibling

Example:

Node
First Child
Next Sibling

Binary Tree: every node has at most two children
-
Best Case Height : log2 n
Worst Case Height: n 1

Notes:
-
The level is the log2 of the nodes in that level
The height is [log2 (n+1)]-1

Proof:
Level 0 1 Node
Level 1 2 Nodes
Level 2 4 Nodes
Level 3 8 Nodes
Total Nodes: 15

13

CSA 1017
Mr. John Abela
Storing a Binary Tree in an Array:

Left
Data
Right

Binary Search Tree (BST): for any node, the values in the left sub-tree must be less than (or equal to) it
and the values in the right sub-tree must be greater than (or equal to) it

Searching a BST:
-
Best Case Time Complexity : 1
Worst Case Time Complexity: height
Efficient Searching
17
12
30
15
27
If you enter nodes in alphabetical order, you get an unbalanced tree

Deleting an element from a sorted binary tree:
1.
Find element to delete
2.
Choose subtree on the left or right
3.
Find rightmost or leftmost element
4.
Place instead of the deleted element
5.
If necessary repeat
14

CSA 1017
Mr. John Abela
Searching: start at root, visit the less nodes possible until you find the desired value, or until you find a
leaf (Time Complexity: height)

Traversal: a way of visiting all nodes (Time Complexity: O(n) )
-
In-order Traversal: Left-Root-Right

inorder(T)
inorder(T.L)
write(root)
inorder(T.R)
Pre-order Traversal: Root-Left-Right

preorder(T)
write(root)
preorder(T.L)
preorder(T.R)
Post-order Traversal: Left-Right-Root

postorder(T)
postorder(T.L)
postorder(T.R)
write(root)

Expression Tree: a binary tree where all leaves store operands and all non-leaves store operators. Every
node has 0 or 2 children.
-
Post-order Traversal: gets postfix expression
In-order Traversal: gets infix expression

inorder(T)
write(()
inorder(T.L)
write(root)
inorder(T.R)
write())

15
CSA 1017
Mr. John Abela
Consider the following expression tree,

Inorder -> Infix: ((3+4)*5)
Postorder -> Postfix: 34+5*

Construct an Expression tree from a postfix expression:
-
Start reading postfix expression from left to right
When an operand is found, push it into the stack as a one-node tree
When an operator is found, pop two trees from the stack, join them by that operator, and push
back in the stack

Adelson Velskii-Landis (AVL): a BST with a balance condition: for any node, the height of the left sub-
tree and that of the right sub-tree, differs by mostly 1. Example:
17
12
30
20
15
18
40
25

If one adds the number 23, it would violate the AVL condition

Rebalancing in constant time:
1. Add new node (Example: 23)
2. Consider all nodes on path from new node to the root and recompute AVL height for every node.
Let the first node that violates the AVL condition be (in this case 30), go back to the new node,
and note the first two directions (in this case Left-Right) and use it in the template (Note:
searching for takes logarithmic time)

16

CSA 1017
Mr. John Abela
Template:
LL
LR
RL
RR

4 Rotations: 4 rebalancing algorithms using constant time O(1). 1 and 4 are single rotations while 2 and
3 are double rotations.

Template 1:
A
B
Z
X

Template 4:
A
Y
X
Z
B
Y

17

CSA 1017
Mr. John Abela
Template 2:
A
X
X Y

Template 3:
B
Z
A
X
B
X Y

B-Trees: can have many children
-
Advantage: faster
- Disadvantage: complex coding
Order M
- Root has between 2 and M children
All non-root nodes have 2 to M children
- All non-leaf nodes have M-1 data pointers
All data is in the leaves
- Leaves have M data items

17
30
1 2 3
10
1 2 3
10
20
1 2 3
17
18
20
40
51
1 2 3
30
31
39
40
51
52

18

CSA 1017
Mr. John Abela
Abstract Data Type (ADT): we define how they work and not how to implement them, therefore they
are completely independent of the implementation (stacks & queues)

Priority Queue: a queue where each element is given a priority key, and the first element that goes out
is the one with the highest priority
-
Insertion & Retrieval:

1. Insertion O(1) and Retrieval O(n): both value and priority reside in an array in the order they
were entered. For retrieval, one have to search the whole array for the element with the
highest priority
2. Insertion O(n) and Retrieval O(1): for every element added, the array is swapped and sorted
according to the priority. Therefore, to remove an element, the last element of the array is
removed
3. Most efficient method (using a heap)

Heap: a binary tree (not a BST) that helps us implement a priority queue with O(log2 n) insertion and
removal. It is filled in level by level from left to right. All leaves are at l or l 1. For all nodes n, n >=
children
17
11
7
6
1

Cascade Up: once an element is added, check the value of its parent. IT the value of the parent is less
than that of the new element, swap them etc

Cascade Down: since only the root can be deleted, a technique called cascade down is used:
1. Remove the root
2. Put the last element added instead of the deleted root
3. Replace the node with the largest of its children (unless both smaller)
4. Etc

19

CSA 1017
Mr. John Abela
Implementing a heap using an array:

-
Children of i are at 2i and 2i+1
Parents of i are at (int) 2

1
17
11

Sorting using a heap:
1. Build a heap
2. Put root in an array (ie. largest number)
3. Restore heap
4. Go to step 2

-
Time Complexity:
o n log2 n to build the heap from the unsorted list
o n log2 n to build the sorted list from the heap
o Total: 2n log2 n
20

CSA 1017
Mr. John Abela
Sorting Algorithms

Iterative Sorts:
-
Bubble Sort
Shell Sort
Insertion Sort

Bubble Sort: checks the element with the adjacent element and swaps if necessary. Repeat until the
array is sorted. With every step, the largest number is put at the end of the list
-
Time Complexity: n(n 1) = n2 n = O(n2)
Space Complexity: n
Pseudo code: [most optimized]

flag
k = 1
repeat
flag = false
for i = 1 to n-k
if a[i] > a[i+1]
then swap and set flag = true
k = k + 1
until flag = false

Shell Sort: similar to bubble sort, but instead of comparing [i] with [i + 1], compare it with [i + k], where
k = 2 for every iteration. This will put the largest number at the end of the list
-
Time Complexity: O(n2)
Space Complexity: n
Pseudo code:
flag
k = n/2
repeat
flag = false
for i = 1 to n-k
if a[1] > a[i+k]
then swap and set flag = true
if k > 1
then k = k/2 and flag = true
until k = 1 and flag = false
21

CSA 1017
Mr. John Abela
Insertion Sort: divides an array into a sorted and unsorted region. It then gets the first item of the
unsorted region and places it in the right place in the sorted region
-
Time Complexity: n(n 1) = n2 n = O(n2)
Space Complexity: n
Pseudo code:
for x = 2 to n
for i = x downto 2
if a[i] < a[i-1]
then swap

Recursive Sorts:
-
Quick Sort (most popular sort because of its space complexity)
Merge Sort (fastest sort)

Quick Sort: partitions the array around a pivot (ideally at the middle) and rearranges the array such that
all items before the pivot are less than the pivot, and all items after the pivot are greater the pivot
-
Time Complexity:
o Best: O(n log2 n)
o Worst: O(n2)
Space Complexity: n
Pseudo code:
if |L| <= 1
return L
else
choose pivot
rearrange L`, P, L``
quickSort(L`)
quickSort(L``)
return L` concatenate P concatenate L``
Rearrange L` P L:
1. Put chosen pivot at the end of the list
2. Create two pointers: i points at the beginning of the list, and j points to the last but one (ie.
excludes pivot)
3. i moves to the right and stops at the first number greater than the pivot
4. j moves to the left and stops at the first number less than the pivot
5. Swap the values pointed by i and j and continue moving i and j
6. When i and j overlap, swap the pivot with i
22

CSA 1017
Mr. John Abela
Merge Sort: splits an array in two halves, recursively merges both halves and merges sorted halves
-
Time Complexity: O(n log2 n)
Space Complexity: 2n
Pseudo code:
if |L| <= 1
return L
else
split into L` and L``
mergeSort(L`)
mergeSort(L``)
merge L` and L`` into L*
return L*
23

Data Structures and Algorithms

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Structures and Algorithms

Загружено:

Авторское право:

Доступные форматы

CSA

Data Structures and Algorithms 1

Mr. John Abela

CSA 1017 Data Structures and Algorithms 1

Algorithms and Complexity

The number of objects in set S

<set name>, <operator or relation symbol>

Example: addition (+), subtraction(-), concatenate(), union(), intersection (), etc...

Unary Operator: works on one object

Binary Operator: works on two objects

Distributive: Consider S, d1, d2

Data Structures and Algorithms 1

Mr. John Abela

Example: the natural numbers are closed under addition

a <relation> b -> T/F

Data Structures and Algorithms 1

Mr. John Abela

| means such that

[all exponential functions]

Data Structures and Algorithms 1

Mr. John Abela

Bubble sort n items: O(n2)

Quick sort n items: O(n.log2 n)

Sequential search through n unsorted items: O(n)

Binary search through n sorted items: O(log2 n)

Data Structures and Algorithms 1

Mr. John Abela

and f(n) > 0

f(n) is (g(n)) if:

st c1 g(n) f(n) c2 g(n)

C1 g(n) f(n) C2 g(n)

Data Structures and Algorithms 1

Mr. John Abela

and their complexity is O(n2)

c1 f(n) f(n) c2 f(n)

f(n) f(n) f(n)

if f(n) is (g(n)) then g(n) is (f(n))

=> c1 g(n) f(n) c2 g(n)

for g(n) is (f(n))

c3 f(n) g(n) c4 f(n)

consider f(n) c2 g(n)

1/c2 f(n) g(n)

consider c1 g(n) f(n)

g(n) 1/c2 f(n)

1/c2 f(n) g(n) c4 = 1/c1 f(n)

Note that no remains the same

Data Structures and Algorithms 1

Mr. John Abela

if f(n) is (g(n)) and g(n) is (h(n)) then f(n) is (h(n))

=> c1 g(n) f(n) c2 g(n)

=> c3 h(n) g(n) c4 h(n)

for f(n) is (h(n))

c5 h(n) f(n) c6 h(n)

consider c1 g(n) f(n) and c3 h(n) g(n)

c3 c1 h(n) c1 g(n) f(n)

consider f(n) c2 g(n)and g(n) c4 h(n)

f(n) c2 g(n) c4 c2 h(n)

c3 . c1 h(n) f(n) c4 . c2 h(n)

note that no is the maximum out of the two

Data Structures and Algorithms 1

Mr. John Abela

n3 O(n2) -> False

n2 O(n3) -> True

<Logarithmic function> is O(n)