Академический Документы
Профессиональный Документы
Культура Документы
3.6.1 Communication Methods.........................................................................52 3.6.2Baud, bit rate, etc..................................................................................... 53 3.6.3 Asynchronous data transmission.............................................................53 3.6.4 Baseband and broadband........................................................................55 3.6.5 Networks.................................................................................................. 55 3.6.6 Network topologies..................................................................................57 3.6.7 Networks Part Two...................................................................................58 3.6.8 Routers and gateways............................................................................. 60 3.6.9Web services............................................................................................ 64 3.6.10Wireless networking............................................................................... 65 3.6.11 Server-side scripting..............................................................................66 3.6.12 Security................................................................................................. 67 3.6.13 Encryption............................................................................................. 70
Abstraction - simplifying a problem to omit all unnecessary details. Information hiding - hiding the complexities of the system behind a simple to use interface. Interface - the boundary between the front end of a system and its
Time complexity - how long an algorithm takes to complete a task with a given input. Space complexity - how much memory an algorithm needs to complete a task with a given input. Order of growth - how quickly the complexity of a function grows with growth in Building on the definition of an algorithm from the AS course, there are some additional key points to remember: An algorithm is a sequence of unambiguous instructions. An algorithm has a range of legitimate inputs and should produce the correct result for all values within the range. Different algorithms can be developed to perform the same task. These algorithms can have different time and space complexities. The overall complexity of an algorithm depends on its time complexity and space complexity. Some algorithms are more time efficient than other -- insertion sort is faster than bubble sort. Some algorithms are more space efficient than others -- some use memory efficiently, others waste RAM. The table on the following page gives the order of complexity of algorithms. These points should be noted:
Quadratic and cubic time are examples of polynomial time which take the following form: The number of nested loops can dictate the value of c. For example, a bubble sort algorithm contains a loop inside a loop meaning that it will have quadratic time complexity.
Complexity
Constant time
Examples / Comments
Is a number odd or even? No matter the size of the input, it will always just be a matter of looking at the last digit. Binary search If a list is constantly being split into half with one half rejected then its time complexity will not grow as fast as linear time. Linear search If each item in a list has to be checked to see if it is the desired item then the time complexity will grow with the list size. Quick sort A quick sort splits a list and sorts it recursively. In the base-case scenario it has linearithmic time complexity. Bubble sort algorithm Two items in a list are looked at together and either swapped or not. Each pass can feature n 1 swaps and there are can be a total of n - 1 passes, giving quadratic time complexity.
FASTES T
n f Logarithmic time n f
x
Linear time
n Linearithmic time
n Quadratic time f
Cubic time
n f Exponential time n Factorial time Travelling Salesman Problem The travelling salesman problem tries to find the optimal tour between a The recursive Fibonacci algorithm roughly has this level of time complexity.
SLOWE ST
finite number n of nodes. As n grows in size, the problem becomes increasingly more difficult.
3.1.3Types of Problem
Non-computable - a problem that does not have an algorithmic solution. Tractable - a problem that has a reasonable (polynomial) time solution. Intractable - a problem for which no reasonable time solution has yet been found. Decision problem - a problem whose answer is always yes or no. Undecidable - a decision problem that is not computable. Heuristic solution - a trial and error approach using 'informed guesses' or learned knowledge to find a solution to an intractable problem.
Mealy machine - an FSM that determines its outputs from the present state and inputs. Moore machine - an FSM that determines its outputs from the present state only.
Mealy machine
inpu t 'a'|'A' outp ut
Moore machine
inpu t 'B' 'a' outp ut 'A'
S
0
transitio n
S
1
state
S
0
'b'
S
1
'b'|'B'
states.
A finite state machine without an output is known as a finite state automaton (FSA). FSA are restricted to decision problems (they only output yes or no). If a given input causes an FSA to stop at a valid halt state then the output is true. Otherwise, the output is false. Below is a diagram demonstrating a finite state automaton for unlocking a combination lock with the code 2371. Since an input from any given state only corresponds to one transition it can also be called a deterministic FSA or deterministic finite automaton (DFA).
NOT 2 2 3 7
STA RT NOT 3
23
23 7
UNLOCK ED
initial state
NOT 7 NOT 1
accepting state
Non-deterministic FMAs have their uses in pattern matching and can be converted into DFA's.
NFA(Nondeterministic Automaton)
Finite
4 6
6
b b a b
1
a a
2
b
b a
3
b a or b
1
b
3 7
a or b
Principle of universality - a universal machine is a machine capable of simulating any other machine. Equivalent Turing machine - all other types of computing machine are reducible to an equivalent Turing machine. Power of a Turing machine - no physical computing device can be more powerful than a Turing machine. If a Turing machine cannot solve a decision problem, nor can any physical computing device.
Alan Turing devised the Turing machine, an abstract computational device, in order to explore the limitations and capabilities of computer machines. Turing machines are the most basic of computing machines (their operations cannot be divided any further) and therefore have the theoretical potential to describe the operation of any computing machine. This is the principle of universality. By reasoning that every machine has an equivalent Turing machine, we can conclude that nothing is more powerful than a Turing machine.
input
output
|0
The above state transition diagram shows an algorithm which will print "1 0 1 0 1 0" on our theoretical tape infinitely (until there is no more empty tape available).
If we halted the Turing machine as it approaches the end of its fifth pass our tape would appear as shown above. We can display the table of instructions which creates this pattern as shown below:
Current state
A B C D
New state
B C D A
Input
Output
1 0
Tape head
Move Move Move Move right right right right
10
Another way to express this is using a transition function with the following syntax: (current state, input symbol) = (next state, output symbol, movement) These are the appropriate transition functions for our example: (A, ) = (B, 1, ) (C, ) = (D, 0, ) (B, ) = (C, , ) (D, ) = (A, , )
Natural language - a real spoken and written language with grammar or syntax rules and ambiguities, such as English and French. Formal language - a language defined by an alphabet and rules of syntax. Regular language - any language that an FSM will accept. Bothregularand formal languagescan be defined by a regular expression or Backus-Naur form. Natural language is a lot less strict than both regular and formal languages.
Regular Expressions(Regex)
Regular Expressions are used for data validation, file searching, and matching.
Regex
a* a+ a? a|b [ab] [a-z]
Meaning
Matches zero or more a's. Matches one or more a's. Matches zero or one a's. Matches a or b. Matches a or b - an alternative form of a|b. All lowercase letters. 11
Examples
c(at)* ar+t colou?r gr(a|e)y gr[ae]y ([a-z])+ cat, c, catat, catatat art, arrt, arrrt colour and coloronly. gray and grey only. gray and grey only. zbcbs, acdssx
[A-Z] \d \w \W \s . ^f f$
All UPPERCASE letters. Matches any single digit. Matches any single alphanumeric character. Matches any single nonalphanumeric character. Matches a single space. Matches any single character. Matches an 'f' with nothing before it. Matches an 'f' with nothing before it.
Tom, Harry, London Student5, Student8 Mk19, sC52 90?, 23$ 123 2, 123 8 hear, fear, dear rebound, rehab cosine, genuine
Backus-Naur Form(BNF)
Backus-Naur form is a notation for expressing the rules for constructing valid strings in a regular language. It can be expressed in a number of ways. Consider a British postcode for example SQ12 4YA. While "SQ" does not correspond to any city and therefore the postcode is invalid, it is in fact syntactically correct. We can break down this syntax as follows: <postcode> ::= <first-bit><optional-space><second-bit> <first-bit> ::= <letter><letter><digit><digit>|<letter><letter><digit>| <letter><digit><digit>|<letter><digit> <last-bit> ::= <digit><letter><letter> <letter> ::= A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z <digit> ::= 0|1|2|3|4|5|6|7|8|9 <optional-space> ::= " "|"" This can be represented on a parse tree.
<postcode > <first-bit> <optionalspace> <digit > <digit > <last-bit>
<letter >
<letter >
<digit >
<letter >
<letter >
""
Anything expressed in triangular brackets, <>, is non-terminal This value is terminal since it is an expression of since it cannot be broken terminal and/or non-terminal down any further. The same syntax can also be expressed by a series of syntax diagrams. values. 12
first-bit
optionalspace
last-bit
lette r
""
4 *) +] 3 -}
Remove all the brackets: 5 1 2 + 4 * + 3 -
13
5 + ((1 + 2) * 4) - 3
infix operan d
512+4*+3postfix operato r
We can convert postfix to infix using a stack which follows a last-in first-out (LIFO) basis.
Input
5 1 2 + 4 * + 3 -
Operation
Push operand Push operand Push operand Add Push operand Multiply Add Push operand Subtract
Stack
5 5, 1 5, 1, 2 5 5, 4 5 -3 --
Infix expression
14
Programming paradigm
Functional programming Logic programming Event-driven programming
Description
Treats computation as the evaluation of mathematical functions. Defines a set of facts and rules. The flow of the program is determined by events such as mouse clicks and key presses which trigger subroutines. Programmers use instances of class in order to create objects. Code is executed one line after the other. Code can be
Example
Haskell, F#. Prolog. C, C++, etc. Most languages.
15
Object-oriented programming
Object - an instance of a class. Instantiation - the action of declaring an instance of a class (an object). Class definition - a pattern or template that can be used to create objects of that class. Encapsulation - combining a record with the procedures and functions that manipulate it to form a new data type, a class. Inheritance - defining a class and then using it to build a hierarchy of descendant classes with each descendant inheriting access to all its ancestors' code and data. Polymorphism - giving an action one name that is shared up and down a class hierarchy. Each class in the hierarchy implements the action in a way appropriate A class defines the properties and methods (procedures and functions behaviours) that will be used for each instance of that class. Below is an example of a class written in C#. class Member Private variables, procedures and { functions are only accessible inside private intMemberShipNo; the class. private string Name; private string Email; public Member(intnewmemberno, stringmembername, string email) { MemberShipNo = newmemberno; Name = membername; Email = email; } public voidAmendMember(stringmembername, string email) { Everything which is public Name = membername; can be accessed outside Email = email; the class. } public voidDisplayMember() { Console.WriteLine("ID: {0}, Name: {1}, Email: {2}", MemberShipNo, Name, Email); } } In this basic example we have a class named "Member". We initialise this class by saying that we can create a new member with three main properties; membership number, name, and email address. An instantiation of this class would be written, for example, "Member mymember = new Member(1, "Frederick Dragonswatter", "fred.dragon@thedragonclub.co.uk");". We can then use the public methods to alter or print the information which we have stored; this usage of public methods to deal with private variables is referred to as encapsulation.
16
Important points to note: A child class inherits all the properties and methods of its parent class. This may be used for adding more complexity to a basic functional class, for example, a scientific calculator child class with a calculator parent class. A child class can override methods, an idea called polymorphism. This is particularly useful when applying the same method to every object in a list when a different result is desired for different objects.
Clock is a is a
Alarm clock
Watch
Above is a simple inheritance diagram which shows the relationship between a clock, an alarm clock and a watch. We can expand on this diagram with a class diagram as shown below. TClockCLASS Fields:
Hours Minutes
Methods:
SetTime GetHours GetMinutes IncrementTim e
TAlarmClockCLASS Fields:
Hours Minutes
TWatchCLASS Fields:
Hours Minutes
Methods:
SetTime GetHours GetMinutes IncrementTim
Methods:
SetTime GetHours GetMinutes IncrementTim
These fields and methods are automaticall y inherited by the child class.
Below, again in C#, is an example of inheritance and polymorphism. public class Clock public classAlarmClock : Clock { { private intHours; public override void Declare() privateintMinutes; { public virtual void Declare() Console.WriteLine("I am an alarm
17
{ Console.WriteLine("I am a clock."); } }
clock."); } }
The child class overrides the method defined by the parent class but inherits the other fields.
Recursive routine - a routine defined in terms of itself. General case - the solution in terms of itself for a value n. Base case - a value that has a solution which does not involve any reference to the general case solution. Stack frame - the locations in the stack area used to store the values referring to one invocation of a routine.
Recursive routines can often offer elegant solutions to a problem. Despite this, they are less efficient in terms of both time and space than iteration. One of the most common examples of a recursive method is the calculation of a factorial value, n!, with the function name Factorial(n). Below is code, in Pascal, for the function. Function Factorial (n : Integer) : Integer; Begin This is the base case since a factorial is defined as n! If n = 1 n in the recursion. Then Result := 1 This is the general case since it is the Else Result := n * Factorial(n-1) End
= n x (n-1) x x 1, therefore 1 is the smallest value of
This only works because the routine is called with values passed as parameters. Recursion does not work with global variables. For each invocation of a recursive routine, a portion of the stack (the stack frame) is assigned to store return values. This means that, for some inputs, there is a risk of stack overflow. Trying to calculate 0! recursively will cause such an error since the base case will never be reached. Below is a dry run for calculating 5!:
Call number
1 2 3 4 5
Function call
Factorial(5) Factorial(4) Factorial(3) Factorial(2) Factorial(1)
n
5 4 3 2 1
Result
5 4 3 2 1 * * * * Factorial(4) Factorial(3) Factorial(2) Factorial(1)
Result
5 4 3 2 1 * * * * 24 6 2 1
Return value
120 24 6 2 1
18
* *
Abstract data type (ADT) - a data type whose properties are specified independently of any particular programming language. List - a collection of elements with an inherent order. Pointer - a variable that contains an address. The pointer points to the memory location with that address. Null pointer - a point that does not point to anything, usually represented by or -1. Dynamic data structure - the memory taken up by the data structure varies at run time. Static data structure - the memory required to store the data structure is declared before run time. Lists are a type of abstract data typesince they are a collection of elements with an inherent order but this order is expressed outside of the programming language with the programmer needing no knowledge of how the list is stored or how it functions (an example of information hiding). There are two key types of lists.
Linear lists
A linear list is a static structure. This means that on declaring the list it is reserved a portion of the heap using adjacent memory locations. In this instance, order is given by the order of the memory locations that the items occupy. Advantages: Linear lists are easy to program. If elements are stored in key order, a binary search is possible. Disadvantages: Memory locations may be wasted due to arrays being static. Insertion of an element within an ordered list requires moving elements. Deletion of an element within a list requires moving elements.
Linked lists
A linked list is a dynamic structure. Since each item in the list points to the next, there is no need for the list to occupy adjacent memory locations in the heap therefore the list can be as large or small as necessary. Let's say we want to insert the words "one", "two", "three" and "four" into our list. one Start four
A null pointer is required to indicate the end of the list.
two
three
19
If we want to put these into alphabetical order we can simply rearrange the pointers as demonstrated below.
one Start
two
three
four We can program this as an array, with each item connected to a pointer. The table on the next page shows how to express the above linked list as an array.
Start NextFre e
3 4
When adding an item in this format, the next item will be placed in the field with the "NextFree" index and the current last item in the list will point to the new item. The "NextFree" value will then be altered. When deleting an item, simply update the pointers and the "Start" or "NextFree" values in order to ensure that it is still possible to chain through the list and to know where the first empty field is. In dynamic allocation memory space is only allocated when required at run time. Each time a list requires more memory space, it will be allocated a portion of the heap. If the memory locations used by a dynamic structure type are not given back to the stack when they are no longer in use, memory leakage will occur. This is when there becomes no memory left in the stack.
3.2.4 Queues
Queue - a first-in first-out (FIFO) abstract data-type. Circular queue - when the array element with the largest possible index has been used, the next element to join the queue reuses the vacated location at the beginning of the array. Linear queue - elements join the queue at one end and leave the queue at the other.
Join here
VIP
20
A queue is a type of list where the first item to be added is the first item to be removed (FIFO). In the programming sense, a queue has two operations; add a new item to the rear of the queue or remove an item from the front of the queue. Some uses of queues in a computing context are these: print jobs waiting to be printed characters entered at the keyboard and held in a buffer jobs waiting to be executed under a batch operating system simulations
Array implementation
On the next page is an example of using an array to implement a queue. In this example the items stay static therefore will always remain at the same index.
IndexData fields0Fred1Jack2
Matt345
Front Rear
0 2
IndexData fields0Fred1Jack2
Matt3Joe4Harry5Si mon
Front Rear
3 5
Adding Joe, Harry and Simon to the queue, and removing Fred and Jack from the queue, we now have a problem since we have reached the end of the memory locations reserved for this queue. Shuffle queue In a shuffle queue, once someone leaves the queue, all the items are moved to the next position along. A front pointer is not necessary since the item at the front of the queue will always be the item with the lowest index (in this case, 0).
IndexData fields0Fred1Jack2
Matt345
Rear
IndexData fields0Joe1Harry2
Simon345
Rear
Circular queue In a circular queue, vacated entries may be reused. This means that when an index is no longer reachable by chaining through the queue, we should delete the item that was in that position, i.e. when a person leaves the queue, the space they occupied becomes free. The example below shows Jack rejoining the queue.
21
IndexData fields0Fred1Jack2
Matt345
Front Rear
0 2
IndexData fields0Jack123Joe
4Harry5Simon
Front Rear
3 0
Fred
Jack
Matt
If we add an item to the queue, we simply add a pointer from Matt to the new item and move the rear pointer to point to this new item. Similarly, if we want to remove the Fred from the queue, we just need to move the front point to point to Jack.
Priority queues
Priority queues effectively take the first-in first-out principle of a queue and adjust it so that every element in the queue has an associated priority. The element in the queue with the highest priority will be the first to leave the queue therefore we essentially insert an element in the queue where they fit in terms of their given priority. Priority queues are especially used in simulations.
3.2.5 Stacks
Stack - a last-in first-out (LIFO) abstract type data. There are two operations which can be performed on a stack. These are adding a new item to the top of the stack (pushing) and removing an item from the top of the stack (popping). Some uses in a computing context are: Stacks are used to store the return address, parameter and register contents when a procedure or function call is made. When the procedure or function completes execution, the return address and other data are retrieved from the stack. Stacks are used for evaluating expressions in Reverse Polish Notation.
22
IndexData fields0Fred1Harry
2Joe345
TopOfSta ck
In this instance, Fred, Jack and Matt are added to the stack in that order. While this stack goes downwards, they can also go upwards take heed of the order of the index numbers. If Matt and Jack are popped from the stack and Harry and Joe are pushed onto the stack, the stack will appear as shown by the diagram on the right. Since items are pushed onto and popped from the top of the stack, we only need one pointer.
Fred
Jack
Matt
Joe
In the linked list implementation of a stack, each item in the stack points to the item below it. If a new item is added to the top of the stack, the new item will point to Fred and the start pointer will point to the new item. Removing an item from the stack simply means the start pointer pointing to the next highest item on the stack.
3.2.6 Hashing
Hashing - The process of applying a hash function to a key to generate a hash value. Hash key - the key that the hash function is applied to. Hash function - a function H, applied to a key k, which generates a hash value H(k) of range smallerthan the domain of values of k. Hash value - the value generated by the application of a hash function to the key. Hash table - a table with a column dedicated to the range of hash values that can be generated by applying a hash function to a key. If the hash value
The premise
In databases there may be a table with thousands or even millions of records. This makes searching for a specific record more time-consuming especially since
23
a linear search may have to search through all records just to find one. A hash tablemakes reading, writing and deleting records a much quicker process. Hashing is also used for storing passwords in databases since the process of hashing cannot be reversed. This means that the user will be able to input a password which can be verified by passing it through the hash function but even if someone could access the database table containing the passwords, they would not be able to successfully access any of the accounts. According to the pigeonhole principle, a good hash function will be one which generates as many combinations of hash values as there are combinations of hash keys so as to ensure no collisions by different hash keys.
Collisions
Collision - when two or more different keys hash to the same hash value. Open hashing - a method in which a collision is resolved by storing the record in the "next available" location. Closed hashing - all other locations in the table are closed off therefore a pointer column is added and a linked list of records with the same hash key is created. Rehashing - when the initial hash results in a collision, the hash value of the key is rehashed to generate a new hash value. Linear rehashing - the original hash value is incremented by 1 modulo N, 2 modulo N, etc. until an empty slot is found in a table of size N rows. As explained by the pigeonhole principle, if there are N + 1 items to be inserted into a table of size N rows, then there must be at least one row which will have to contain two items. In hashing terms, hashing two different keys to the same hash value is called a collision. There are two main methods of dealing with collisions. 1. In open hashing, the hash value is rehashed in order to position itself in the next available location. This is especially advantageous if the table has a large number of rows so that collisions are infrequent. When searching, if the hash value is found in the table yet the record is not the desired one , rehashing is used to look for where the desired item could be. If the end of the table is reached and the item is not found, only then can we conclusively say that an item is not in the hash table.If an item is deleted, a special marker must be put in place in order to prevent the search stopping prematurely. 2. In closed hashing, collisions are predicted as almost common occurrences. A pointer column is introduced in the table and if a collision occurs between two hash keys, a linked list is created. When searching, if the hash value is found but the first item is not the desired one, the search
24
will follow through the linked list until it finds the item. If the end of the linked list is reached and the item has not been found, we can conclusively say that the item is not in the hash table. If an item is deleted there are two options for where a linked list exists. Either the deleted item in the chain is replaced by a special marker or each item is moved up.
A worked example
Give the contents of the hash table that results when you insert items with the keys CO M P U T I N G in that order into an initially empty table of M= 5 rows, using closed hashing. Use the hash function ktimes 11mod Mto transform the kth letter of thealphabet into a table index (row number), e.g., hash(B) = hash(2) = 22 Mod 5 = 2. Characte r C O M P U k 3 15 13 16 21 k * 11 Mod 5 33 Mod 5 = 3 165 Mod 5 =0 143 Mod 5 =3 176 Mod 5 =1 231 Mod 5 =1 Pointer
T U
Character T I N G
k 20 9 14 17
Index 0 1 2 3 4
Character O P G C I
M N
If we have a larger table, we can represent the same information using open hashing. We have nine letters to enter into our hash table so we should let the hash table have nine rows. Index 0 1 2 3 4 5 6 7 8 Character O P U C M T I N G Here, we have used linear rehashing, i.e. keep incrementing the hash value until the row belonging to that index is empty. You can see how, since there are many collisions in this example, the indexes of a lot of the characters do not represent its actual hash value or a value close thereof.
25
Graph - a diagram consisting of circles, called vertices, joined by lines, called edges or arcs; each edge joins exactly two vertices. Neighbours - two vertices are neighbours if they are connected by an edge. Degree (of a vertex) - the number of neighbours for that vertex. Labelled or weighted graph - a graph in which the edges are labelled or given a value called a weight. Automation - turning an abstraction into a form that can be processed by a computer. Directed graph or digraph - a diagram consisting of circles, called vertices, joined by directed lines, called edges. Simple graph - an undirected graph without multiple edges and in which each edge connects two different vertices. Closed path or cycle - a sequence of edges that starts and ends at the same vertex and such that any two successive edges in the sequence share a vertex.
B E A C F A
B E C
F D
Above are two diagrams which represent the same graph. The graph on the right is a directed graph or digraph form of the simple graph on the left. Simple graph cannot contain loops since these edges do not connect two different vertices. If we look again at the above diagrams, we can form a closed path or circuit from the graph by travelling on the path C-D-F-E-C. Here, we have visited different vertices sequentially and returned back to the node we began at. In computing, a graph often represents an abstration of a problem. For example, a company may have different business plans for generating profit and may want to discover which route would give them the most profit year on year. The London Underground map is a typical example of abstraction since it only keeps the important details and does not say true to the actual geography of the stations.
26
Adjacency matrix An adjacency matrix of size n by n for a graph with n vertices stores whether or not two vertices are directly connected. We can use 0s and 1s to represent this information, 1 meaning that two vertices are neighbours and 0 meaning that they are not.
12345101110210010310011411101 500110
5
For an undirected graph there will always be a symmetrical pattern as shown in the above matrix, since aij = ajiwhere a is the cell in the adjacency matrix and i and j are two distinct vertices. Notice that the matrix tells us that vertex 1 is not adjacent to vertex 1.
12345101100200010300011410001 500000
5
When the graph becomes directed, the matrix is no longer symmetrical. If we read off a31, we have a value of 0 meaning that we cannot travel from vertex 3 to vertex 1. However, reading off a13 tells us that we can travel from vertex 1 to vertex 3. Therefore aijaji. You should fill in the matrix for each row, I, to column, j, representing whether it is possible to travel from vertex i to vertex j.
1 2 0 1 1 2 4 3 0 9 1 2 5
12345119202233904 11125
1 9
For a weighted or labelled graph, we can no longer use 0 in our adjacency matrix since it could easily be a valid distance between two vertices. For that reason, we may use the infinity symbol () instead. An adjacency matrix for a labelled graph may be called a distance matrix.
2 3
27
Adjacency list An adjacency list specifies which vertices are connected in a different way to an adjacency matrix.
1 3
VertexAdjacent vertices12, 3,
421, 431, 4, 541, 2, 3, 553, 4
5
Similarly, if we have a directed graph we fill in the list from the view of the vertex in the vertex column, i.e. which vertices can we go to from that node?
1 3
VertexAdjacent
32434, 541, 55
5
vertices12,
If we want to fill in the table with distances the following format can be used (this time we do not need to use the infinity symbol because we only include adjacent vertices):
1 2 0 1 1 2 4 3 0 9 1 2 5
VertexAdjacent
vertices12,
1 9
2 3
Matrix or list? Adjacency matrix o If many vertex pairs are connected by edges, then the adjacency matrix doesnt waste much space and it indicates whether an edge exists with one access (rather than following a list).
Adjacency list o If the graph is sparse, so not many of its vertex pairs have edges between them, the adjacency list is preferable.
Trees
If a non-directed connected graph has no cycles then we can identify it as being a tree. In a tree, there is just one path between each pair of vertices. 28
root node E
A B D C
internal node E
C F F leaf node
Tree
Rooted tree
If a tree has a designated root from with every edge being directed away from this root it is called a rooted tree.
Algorithms
Graphs can be used to represent mazes by placing a node on each decision point of the maze, i.e. each place where a path splits into two or more paths, as well as at the start and end positions. We can then designate three Boolean flags to each node. These flags are: Undiscovered is the node yet to be found? Discovered has the node been found yet? Processed or completely explored Have we visited all the incident edges of the node?
D A D B C O G I N L M F H E K J G I O H J P N A F E E P B C K L M
In this abstraction, all the dead ends of the maze have been included to ensure that each path of the maze can be fully explored as it might be in real life. Since the graph that we have abstracted from our maze has no cycles, it is a tree therefore we can transform it into a rooted tree by choosing A as our root node. On a rooted tree we can apply both breadth-first and depth-first searching algorithms in order to fully explore the whole of the maze.
29
A
1 2 4
A
1 3 6 2
B
5
B
4 6
F
8
3 9
F
8
H
1 0
D
1 1
H
1 0
J
1 2
I
1 3
J
1 2
L
1 4
1 1
L
1 4
1 3
5 Breadth-first Depth-first O P O P search search In breadth-first searching we explore all the undiscovered neighbours of our root, and then all the undiscovered neighbours of these nodes until all the nodes are fully explored. In depth-first searching we explore all the neighbours to the left until we find a leaf, then we backtrack to that leafs root and explore to the right.
1 5
Left subtree
+ * A C E / G
Right subtree
Each node in a binary tree is the root node of a sub-tree in its own right. If we consider a branch connecting "A" and another node then "A" will be the root of its own sub-tree with that node. Whether this node is on the left or right of "A", we will still need to explore this sub-tree as if there were a node on the other side. There are three different ways of traversing a binary tree (a tree where each node only has two leaves).
30
1 2
+ * / C
4
A
3
E
6
G
7
Result: +*AC/EG
7 3
Pre-order Traversal Visit the root Traverse the left sub-tree in pre-order Traverse the right sub-tree in preorder Our recursive definition tells us that we should always pick up the value of the root of a sub-tree before we explore that subtree. We then always go to the left first. Post-order Traversal Traverse the left sub-tree in postorder Traverse the right sub-tree in postorder This is Visit effectively the same as pre-order but the root flipped. But remember, the left is always visited before the right. In-order Traversal Traverse the left sub-tree in in-order Visit the root Traverse the right sub-tree in in-order
+ * / C
2
A
1
E
4
G
5
Result: AC*EG/+
4 2
+ * / C
3
A
1
E
5
G
7
Result: A*C+E/G
For in-order traversal, the values from the leaves are always separated by the values from the nodes.
31
1
LeftPoint er
Index
RightPoint er
2 4 + 5
Item
3 6 7
4 0 z 0 0
5 6 0
Null pointer
6 0 y 0 0
7 3 0
Above is an abstraction of a tree to show the information that we have to represent and store. You will notice that each leaf has a left and right pointer with a value of 0. This is a null pointer since there is nothing to the left or right of these leaves.
12*324+536-740 z0506060y07030
Using this method of storing the tree, it is then possible to apply any of the aforementioned algorithms in order to traverse the tree.
Linear search - this search method starts at the beginning of the list and compares each element in turn with the required value until a match is found or the end of the list is reached. Bubble sort - during a pass through the list, neighbouring values are compare and swapped if they are not in the correct order. Several passes are made until one pass does not require any further swaps.
Bubble sort
Bubble sort was covered in the AS specification but it is important to remember exactly how it works and to recognise the complexity of this sorting algorithm. First, we should consider a list which has been sorted into reverse alphabetical order. This was actually a mistake and we would now like to take this sorted list and re-sort it into alphabetical order. This is an example of the worst case scenario for the bubble sort.
32
Zebr a
Yak
Tiger
Snake
Sloth
Mous e
Lion
Cow
Bird
Ant
Pass # (Swa ps) 1 (9) 2 (8) 3 (7) 4 (6) 5 (5) 6 (4) 7 (3) 8 (2) 9 (1) 10 (0)
Yak Tiger Snak e Sloth Mous e Lion Cow Bird Ant Ant
Tiger Snak e Sloth Mous e Lion Cow Bird Ant Bird Bird
Snake Sloth Mouse Lion Cow Bird Ant Cow Cow Cow
Sloth Mous e Lion Cow Bird Ant Lion Lion Lion Lion
Mous e Lion Cow Bird Ant Mous e Mous e Mous e Mous e Mous e
Lion Cow Bird Ant Slot h Slot h Slot h Slot h Slot h Slot h
Cow Bird Ant Snake Snake Snake Snake Snake Snake Snake
Bird Ant Tige r Tige r Tige r Tige r Tige r Tige r Tige r Tige r
An t Ya k Ya k Ya k Ya k Ya k Ya k Ya k Ya k Ya k
Zebr a Zebr a Zebr a Zebr a Zebr a Zebr a Zebr a Zebr a Zebr a Zebr a
Here we have the bubble sort results for each pass of the algorithm. Since in each instance the first item is always higher in alphabetical value than all the others, it will feature as part of every comparison in each pass. Since we can have up to n - 1 swaps in up to n - 1 passes (the last pass just ensures that there will be no more swaps) we can say that the bubble sort has O( n2).
Searching
Linear search is the most straightforward of search algorithms. Given a list of length n, each item will be looked at from the start of the list until the desired item is either found, or the end of the list is reached. In the worst case scenario, we may have to look at all n items therefore the algorithm is of O(n). If we have a sorted list we can use the more efficient binary search. In binary search, we look at the middle term of the list and compare it with the item that we are looking for. We then reject one half of the list based on whether the item we are looking for would be higher or lower in the list. Below we are searching for Dave in the list.
33
1Ant2Bird3Cow4Li on5Mouse
(1+5)/2 = 3 therefore we should look at the 3rd item. Cow < Dave so we reject items 4-5. (4+5)/2 = 4.5 therefore should look at the 5th item. Mouse > Dave so we reject item 5. Only item 4 remains. However, Lion Dave therefore Dave is not in the list.
4Lion5Mouse
4Lion (10 + 1)/2 = 5.5 therefore we should look at the 6th item in the list. Sloth > Dave so we reject items 6-10.
This is an example of the worst case scenario for a binary search since the item is in fact not in the list. We have made 4 comparisons for a 10 item list, this makes the order of complexity for the binary search O(log 2n).
Insertion sort
For an insertion sort algorithm part. We take the first item in sorted part. We then compare sorted part and insert them appropriate. we divide a list into a sorted part and an unsorted the list that we want to sort and insert it into our each item in the unsorted part to the items in the where they fit, rearranging the other items as
Quicksort
In a quicksort we split a list into two sub-lists with a "pivot" acting as the value that all items within that sub-list are compared to. By continually splitting the list in two into smaller sub-lists and selecting the middle value as the pivot, we are able to sort an entire list. 54 23 15 74 19 22 14 3 11 64 27 35
Above is a list of 12 items. We should choose the 7 th item of the list as our pivot and compare each value either side. We want to sort this list into ascending numerical order so we should place all the numbers of lesser value to the left and all the numbers of greater value to the right. Make sure not to reorder any of the values.
34
11
14
54
23
15
74
19
22
64
27
35
We then select new pivots in the two sub-lists that have been created by splitting about our initial pivot. 3 3 3 3 3 3 11 11 11 11 11 11 14 14 14 14 14 14 15 15 15 15 15 15 19 19 19 19 19 19 54 22 22 22 22 22 23 54 54 23 23 23 74 23 23 27 27 27 22 74 27 54 35 35 64 64 35 35 54 54 27 27 64 64 64 64 35 35 74 74 74 74
Only once all the items have themselves been pivots can we conclusively say that the list has been sorted into a correct order. The quicksort algorithm is an example of recursive programming at its best. However, its worst case scenario is a complexity of O( n2). This makes it seem as if quicksort is only as efficient as bubble sort. Nevertheless, quicksort has a much better average case complexity.
3.2.9 Simulations
Model - an abstraction of an entity in a real world or in the problem that enables an automated solution. The abstraction is a representation of the problem that leaves out unnecessary detail. State history - consists of state descriptions at each of a chronological succession of instants. Entities - the components that make up a system. Attributes - a property of an object, e.g. an object car has attributes make, Simulation is the imitation of a process of a real system. An example of the purpose of a simulation comes in the form of trying to understand the effect of a new supermarket being built, which means a reconfiguration of the current road. Although many of the proposed designs will never be realised, they represent the simulated effects of if they were to be carried out. Consider a queue. A queue consists of three entities: Customer Queue Server The customer could be waiting in the queue -- this is therefore one of its attributes. The state of a system at any instant is determined by where the entities are, what they are doing and their attributes. A succession of recorded new identifiable states results in a state history. Some possible states of a queue are: 35
nobody in the queue, server waiting nobody in the queue, a customer being served customers in the queue, a customer being served. Some possible events in this system are: a customer arriving end of serving. Some possible activities are: the serving of a customer the time between customers arriving. If we consider a queue system where a customer arrives at the end of every 3 minutes and serving takes 2 minutes, running a simulation for 20 minutes using time-driven simulation will give the following hand simulation.
Maste r clock 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Customer being served Customer1 Customer1 Customer1 Customer1 Customer2 Customer2 Customer2 Customer2 Customer3 Customer3 Customer3 Customer3 Customer4 Customer4 Customer4 Customer4 Customer5
Customers in queue 0 0 0 0 0 1 1 0 1 1 1 1 1 1 2 1 1 2 2 1
Server status Idle Idle Idle Serving Serving Serving Serving Serving Serving Serving Serving Serving Serving Serving Serving Serving Serving Serving Serving Serving
Each state represents the end of a minute. Since a customer only arrives at the end of every three minutes, the first three ticks of the master clock see no activity besides the server being at an idle state.
36
Real number - a number with a fractional part. Significant digits - those digits that carry meaning contributing to the accuracy of a number. This includes all digits except leading and trailing zeroes where they serve merely as placeholders to indicate the scale of the number. Floating-point notation - a real number represented by a sign, some significant digits, and a power of 2. Precision - the maximum number of significant digits that can be represented. Absolute error - the difference between the actual number and the nearest representable number. Relative error - the absolute error divide by the actual number. Underflow - the value is too small to be represented using the available number of bits. Overflow - the value is too large to be represented using the available number of bits. In the AS specification, we dealt only with fixed point numbers. This meant that there was a fixed way of representing a real number with a decimal point after a certain number of bits. In floating-point notation, real numbers are represented in the following way: a sign, some significant digits expressed as a number with a fractional part, and an integer power of two. Some examples are: 4.6 x 26, -3.12 x 25, 6.2 x 2-3. This gives us a general form of m x 2e where the significant digits are called the mantissa ( m) and the power of 2 is called the exponent ( e). In the exam, you will have to convert from a normalised twos complement floating-point number to its denary real number equivalent and vice versa. In the following format we have a number of size 16 bits with the 10 most significant bits reserved for the mantissa and 6 bits reserved for the mantissa. 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0
This represents the smallest positive normalised value. It is normalised since the first two bits are of opposite polarity i.e. 01 or 10. The exponent value = -32. This therefore makes adjusted mantissa -33 0.0000000000000000000000000000000012 = 2 .
37
Worked example
Using the above format of normalised floating point representation, convert -23.375 into binary. First, assess the number. We need a -32 which is equal to -2 5 since it is the nearest larger power of 2 to 23. Then 32 - 23.625 = 8.625, so we can break -23.375 down to -32 + 8 + + 1/8 = -25 + 23 + 2-1 + 2-3 = 101000.1012. 1010001010 Now that weve filled in our mantissa, we need to calculate the exponent. Our implied decimal point is 7 places to the left of our desired decimal place therefore our exponent should be 7 = 4 + 2 + 1 = 000111 2. 101000101000011 1
Why normalise?
Maximises precision and accuracy for a given number of bits. Creates a unique representation for every number (allowing equality to be checked more simply). The length of the mantissa increases the precision. The length of the exponent increases the range.
System program - a program that manages the operation of a computer. Operating system - the software that supports a computer's basic functions, such as scheduling tasks, executing applications, and controlling peripherals. Virtual machine - the apparent machine that the operating system presents to the user, achieved by hiding the complexities of the hardware behind layers of operating system software. Application programming interface - a layer of software that allows application programs to call on the services of the operating system.
38
Computer software can be divided into system programs, which manage the operation of the computer, and application programs, which solve problems for their users. The most fundamental of all the system programs is the operating system. An operating system has these roles: Hide the complexities of the hardware from the user. Manage the hardware resources to give orderly and controlled allocation of the processors, memories and input/ouput (I/O) devices among the various programs competing for them, and manage data and storage.
Managing resources
In a general-purpose computer, one purpose of an operating system is to manage the hardware so that a satisfactor performance is achieved. The operating system programs may be classified according to the resources they manage.
Key resource
Processors Storage Input/output devices Data
OS program
Processor scheduling Memory management I/O management File management
The following are examples of the functions that an operating system has to be able to do: Allocating a processor 'time slot' for each programming task that is running. Managing the priorities for each program task that is running. Allocating and keeping track of the memory used for storing programs and data. Managing the transfer of databetween memory and storage. Handling input operations from the user and from other input devices. Handling output operations. Managing the system security.
39
Virtual machine
Application programming interface I/O management File management Memory management Processor management Device Drivers Kernel Hardware An operating system hides from the user all the details of how the hardware works so that the useris presented with a virtual machine which is easier to use. More These details are progressively hidden by placing layers of software on complex top of the hardware.
layers.
User interface
Application programming interface A standard application programming interface (API) allows a software developer to write an application on one computer and have a high degree of confidence to that it will run on another computer of the same type, even if the other computer has a different specification.
User interface
Command line interface In a command line interface (CLI), a user responds to a prompt to enter commands by typing a single command word, followed by zero or more paramaters on a single line, before pressing the enter key. An example of such a command is ipconfig. Graphical user interface A graphical user interface (GUI) is made up of windows. One window has the focus at any moment. GUIs are event-driven with events being mouse button clicks, key presses or mouse movements. The operating system detects an even and correlates it with the current mouse position and the window currently in focus, in order to select an action to carry out.
40
Note: The notes on this section contain a lot of detail, mainly familiarise yourself with the key terms. Interactive operating system - an operating system in which the user and the computer are in direct two-way communication. Real-time operating system - inputs are processed in a timely manner so that the output can affect the source of inputs. Network operating system - a layer of software is added to the operating system of a computer connected to the network. This layer intercepts commands that reference resources elsewhere on the network, e.g. a file server, then redirects in a manner completely transparent to the user. Sandbox - a tightly controlled set of resources for guest programs to run in. Embedded computer system - a dedicated computer system with limited or non-existent user interface and designed to operate largely or completely autonomously from within other machinery. Desktop operating system - an operating system that allows a user to carry out a broad range of general-purpose tasks. Client-server system - a system in which some computers, the clients, request services provided by other computers, the servers. Server operating system - an operating system optimised to provide one or more specialised services to networked clients.
41
they can be updated easily. If they are found vulnerable to a security threat, then an update can be deployed to counter the threat. They can support sophisticated GUIs. o However, their memory footprint is very large and load times can be significant. They provide a virtual machine which allows the user to perform tasks more easily than if they had to interact directly with the hardware. Desktop operating systems often act as the client operating system in a client-server system. o In a client-server system some computers, the clients, request services provided by other computers, the servers. o
Device
42
The way a device works can usually be changed by altering the code of the operating system. o This means that the physical circuits of the device need not be rewired when new functionality is required but the code of the operating system can be rewritten. The OS code should be layered or modular with clear interfaces between these layers. This makes changing the interface as simple as altering the interface layer or module. Not all computer having operating systems. o A washing machine uses an example of such a computer. The input of this computer is simple (since all settings are preset), the process to be performed is equally as simple, and it is not necessary for the computer to complete more than one task at a time. In this case including an operating system would add complexity where none is required which would increase the development and manufacturing costs. The computer in this case would run a single firmware program all the time. Computer-operated devices which have to carry out more than one task benefit from these things that an operating system allows: o The device can multi-task. o The device can operate in real time with critical timing constraints observed, if required. o The hardware can be changedorupgraded without the need to change application code that runs on the hardware. o New applications can be added fairly easily. o Changes to basic functionality can be achieved by upgrading operating system code that runs on the hardware. o Applications can be developed in situ on the device or can be easily installed if developed on a more powerful machine. o The entire OS can be replaced by a different OS where the new OS allows a much greater range of software changes to be made. o Open Source operating systems can be used. The source code for an Open Source OS is available therefore applications that will work on devices running this OS can be designed easily. Operating systems for mobile devices need to consider the kind of resources available to these devices. For example, o the amount of energy provided o the amount of memory available Smartphones A smartphone is a mobile phone that offers advanced capabilities beyond a typical mobile phone, often with PC-like functionality. This means running complete operating system software, which provides a standardised interface and platform for application developers. Regular mobile phones typical only support sandboxes applications. o A sandbox is a tightly controlled set of resources for guest programs to run in, such as scratch space on disk and memory. o Network access and the ability to inspect the host system or read from input devices are usually disallowed or heavily restricted in sandboxed systems.
43
Applications for smartphones may be developed by anyone (including the manufacturer of the device, the network operator or any other third-party software developer) since the operating system is open.
Personal digital assistants A personal digital assistant (PDA) is a hand-held portable computer that can accomplish quite specific tasks and can take on the role of a personal assistant. PDA functionality has recently been included in smartphones therefore the sales of PDAs have declined. The operating system of a PDA takes on the tasks of basic input/output system (BIOS) and has to be designed to run on processors with low clock frequency and a main memory of limited capacity. The OS must use various techniques to save energy and must cater for short reaction times.
Examples: Airline reservation system: up to 1000 messages per second can arrive from any one of 11000 to 12000 terminals situation all over the world. The response time must be less than 3 seconds. Process control system: up to 1000 signals per second can arrive from sensors attached to the system being controlled. The response time must be less than 0.001 second. Real-time operating systems that are used to control machinery typically have limited user-interface capability and no end-user utilities. They must perform a task quickly whenever signalled to do so in a specific amount of time. Some RTOS manage the resources of the computer so that a particular operation executes in precisely the same amount of time every time it occurs. In a complex machine, this can be catastrophic and unnecessary.
a server are available to the client computer, exactly as if they resided on that client computer's system. Remove drives (perhaps N: and P: ) are usually available to all client computers connected to a network.
Exam note: Operating systems are a good basis for an extended answer question. Consider the differences between certain types of operating systems. A question on the definition of an operating system could equally be a short (2 mark) answer question or a longer (4-6 mark) answer question.
45
3.5 DATABASES
3.5.1Conceptual data modelling
Database - a structured collection of data. Database management system - a software system that enables the definition, creation and maintenance of a database and which provides controlled access to this database. Data model - a method of describing the data, its structure, the way it is interrelated and the constraints that apply to it for a given system or organisation. Conceptual model - a representation of the data requirements of an organisation constructed in a way that is independent of any software used to construct the database. Entity - an object, person, event or thing of interest to an organisation and about which data is recorded. Relationship - an association or link because two entities. Degree of relationshipbetween two entities - the number of entity Relationships between entities can be represented using Entity-Relationship diagrams. See the example on the next page. Here is a scenario which involves a college enrolling students for AS and A2 courses: Each course is assigned a unique course code and has a course name. Each student is assigned a unique student ID and has their name, address and date of birth recorded. Each student enrols on one or more courses. The students enrolled on a course will be assigned to one of several sets taught by different teachers. Teachers are assigned unique initials. If we want to model the three entities of "Set", "Student" and "Course", we get the below diagram.
belongs to
Many-tomany relationship
is assigned to
Set
One-to-one relationship
Student
is enrolled on
Course
A many-to-many relation is not clear, most of the time such a relation can be made a lot clearerby adding a link table.We can analyse the many-to-many relationship between the "Student" and "Courses" entities and decide to add an "Enrolment" entity.
46
Many-tomany relationship
is assigned to
Set
belongs to
One-to-one relationship
Student
makes
is enrolled on
Course
for
Enrolment
Husband
Wife
One-to-one
One husband has one wife (conventionally).
Area
Resident
One-to-many
One area has many residents.
Newspaper
Reader
Many-to-many
One reader reads many newspapers. One newspaper has many readers.
3.5.2Database design
Relation - a set of attributes and tuples, modelling an entity (a table). Attribute - a property or characteristic of an entity (a named column in a table). Tuple - a set of attribute variables (a row in a table). Primary key - an attribute which uniquely identifies a tuple. Relational database - a collection of tables. Composite key - a combination of attributes that uniquely identify a tuple. Foreign key - an attribute in one table that is a primary key in another table. Referential integrity - if a value appears in a foreign key in one table, it must also appear in the primary key in another table. Normalised entities - a set of entities that contain no redundant data. Normalisation - a technique used to produce a set of normalised entities.
A relation consists of a heading and a body. A heading is a set of attributes. A body is a set of tuples. A relation must have an identifier, an attribute that uniquely identifies a tuple.
Normalisation
The important thing about database design is that the correct attributes are groups into the correct tables in order to minimise duplication of data. If there is no duplicated data, the possibility for inconsistencies will be eliminated.
47
OnlineOrder(OrderNumber, CustomerID, DeliveryAddress, OrderDate, ItemCode, Description, OrderQuantity, UnitPrice) Order Numb er
01236 7
EmailAddress,
Custom er ID
BLF1
Delivery Address
Fred Bloggs 1, High Street Anytown Joe Smith 7, The Lane Anytown
Email Address
FredBlogg s @NT.co.u k JoeSmith @ NT.co.uk
OrderDa te
01/05/09
Ite m Cod e
123 4 896 7 345 6 968 4 345 6
Descripti on
Ring binder Divider Stapler Scissors Stapler
Order Quanti ty
3 4 1 2 4
Unit Pric e
1.50 0.50 2.99 1.99 2.99
03423 1
SMJ2
03/05/09
1NF: atomic data test Given a table that has a primary key; it is in first normal form (1NF) if all of the data values are atomic values. That is, the table does not contain repeating groups of attributes. To put a table into first normal form, we move any repeating attributes, with a copy of the primary key, to a separate table. OnlineOrder(OrderNumber, OrderDate) Order Customer ID Number
012367 034231 BLF1 SMJ2
CustomerID,
EmailAddress, OrderDate
01/05/09 03/05/09
Delivery Address
Fred Bloggs 1, High Street Anytown Joe Smith 7, The Lane Anytown
ItemOrder(OrderNumber, ItemCode, Description, OrderQuantity, UnitPrice) Order Item Code Description Order Unit Price Number Quantity
012367 012367 012367 034231 034231 1234 8967 3456 9684 3456 Ring binder Divider Stapler Scissors Stapler 3 4 1 2 4 1.50 0.50 2.99 1.99 2.99
OrderNumber is now not sufficient to be a primary key since it does not act as a unique identifier for a tuple. We therefore have to create a composite keyincluding ItemCode.
48
2NF: partial key dependence test A table is in second normal form (2NF) if it is in first normal form and contains no partial key dependencies. This means that if some parts of the row depend only on one half of the composite key, we should move these parts into a new table with the relevant primary key. ItemOrder(OrderNumber, ItemCode, OrderQuantity) Order Item Order Number Code Quantity
012367 012367 012367 034231 034231 1234 8967 3456 9684 3456 3 4 1 2 4
3NF: non-key dependence test A table is in third normal form (3NF) if it is in second normal form and contains no non-key dependencies.
CustomerID is a foreign key.
Delivery Address
Fred Bloggs 1, High Street Anytown Joe Smith 7, The Lane Anytown
Email Address
FredBloggs@NT.co.uk
SMJ2
JoeSmith @NT.co.uk
By splitting the original OnlineOrder table into two tables containing only the required information, we have successfully created a fully normalised database. A database is fully normalised if every attribute is a fact about the key, the whole key, and nothing but the key (so help me Codd).
49
PRIMARY
Note that the end of every line, excluding the final line, is followed by a comma. You must specify which field is the primary key and which type each field is. There are many types which would be appropriate. Field type INTEGER or INT TEXT or VARCHAR FLOAT or REAL DATE BOOLEAN Purpose Stores integer types. Stores text strings. Stores real number types. Stores dates. Stores Boolean values (true or false).
You can specify the length or precision of data types using brackets e.g. Description VARCHAR(200). For a table with a composite key, we can use the following syntax: CREATE TABLE ItemOrder ( OrderNumberINTEGER(6), ItemCodeINTEGER(4), OrderQuantity INTEGER, PRIMARY KEY(OrderNumber, ItemCode) )
50
DML
After having created a database and tables, DML commands can be used to query and manipulate the tables. For the exam you will need to know how to select from, insert into, update and delete values in a database.
Querying a database SELECT <fieldnames> FROM <tables> WHERE <conditions> ORDER BY <fieldnames>
e.g. SELECT Decription, Unit Price, OrderQuantity FROM Item, ItemOrder WHERE Item.ItemCode = ItemOrder.ItemCode AND ItemOrder.OrderNumber = OnlineOrder.OrderNumber AND CustomerID = "BLF1" ASC for ascending, ORDER BY ItemCode ASC DESC for descending. Our example code will provide us with the following table: Description Unit Price Order Quantity
Ring binder Stapler Divider 1.50 2.99 0.50 3 1 4
Syntax points: No symbols are required for numbers. Quotation marks are required for strings - either double or single quotes. Single or double quotes or hashes are used for dates. Inserting, updating and deleting data 1 INSERT INTO <tablename> INSERT INTO OnlineOrder VALUES ( <listofvalues> ) Values (034931, SMJ2, 13/05/09) 2 UPDATE <tablename> SET <newvalues> WHERE <conditions> UPDATE Customer SET EmailAddress = "FredBloggs@RealComputing.com" WHERE CustomerID = "BLF1" DELETE FROM ItemOrder WHERE OrderNumber = 012367 AND ItemCode = 8967
1. Inserts a new order into the online order table. 2. Changes customer BLF1's email "FredBloggs@RealComputing.com". 3. Deletes an item from order number 012367.
address
to
Exam note:Writing SQL is an inevitable part of your exam so know it well. Experiment with setting up your own databases and selecting fields from different tables in one query. Instead of using a where statement, you may want to use an inner join. This will accomplish the same task but is not necessary for you to get the marks. Stick with what you know well. 51
Data transmission - movement of data from one place to another. Serial data transmission - single bits are sent one after another along a single wire. Parallel data transmission - bits are sent down several wires simultaneously.
Data transmission
Data transmission occurs between a transmitter and receiver over some transition medium (often called a communication channel) which can either be guided or unguided. The data to be transmitted is encoded as electromagnetic signals. Guided communication channels are physical cables e.g. twisted pairs, coaxial cables and optic fibres. Unguided communication channels are media such as air, vacuums and sea water via radio waves. In both guided and unguided transmission media, the transmitted signal will decrease in strength with distance.
Ready/bu Strobe sy
PRINTER
Data
Return
Parallel data transmission was used for printers before the arrival of USB connections. When the printer was ready to receive information it would set the ready/busy wire to "ready". The computer would then send a signal down the strobe wire to alert the printer that data was going to be sent. The ready/busy 52
wire would be simultaneously set to "busy" and data would be sent down the 8 data wires. Parallel data transmission is used over short distances because if the resistance in the wires differs,signals can arrive at different times leading them to be read incorrectly; a problem called skew. In addition, it would be expensive to have 8 data wires stretched over a long distance. For this reason, parallel data transmission is restricted to computer-to-printer connections and computer buses.
Baud rate - the rate at which signals on a wire or line may change. 1 baud - one signal change per second. Bit rate - the number of bits transmitted per second. Bandwidth - the range of signal frequencies that a transmission medium may transmit. Latency - the time delay between the moment something is initiated and the moment its first effect begins.
Decim al numb er
0 1 2 3
Binar y numb er
00 01 10 11
V 10
10
01 10
11
00
10
11
Bandwidth
It is important to note that the range of signal frequencies which constitute a medium's bandwidth must not undergo a significant reduction in strength from one end of the wire to the other. Bandwidth is measured in hertz (Hz) which is a unit of frequency equal to the number cycle per second. There is a direct relationship between bit rate and bandwidth since the greater the bandwidth, the higher the bit rate that can be transmitted. If the data rate of the digital signal is W bits per second then a very good representation can be achieved with a bandwidth of 2 W Hz.
53
* *
Asynchronous serial data transmission - the arrival of data cannot be predicted by the receiver; so a start bit is used to signal the arrival of data and to synchronise the transmitter and receiver temporarily. Communication protocol - a set of pre-agreed signals, codes and rules to be used for data and information exchange between computers, or a computer and a peripheral device such as a printer, that ensure that the communication is successful. Handshaking protocol - the sending and receiving devices exchange signals to establish that the receiving device is connected and ready to receive. Then the sending device coordinates the sending of the data, informing the receiver that it is sending. Finally, the receiver indicates it has received the data and is ready to receive again.
Computer A
LSB
idle state
Computer B
MS B
star t
An example of the necessity of asynchronous data transmission is when typing on a keyboard. Keys are not pressed at precise and regular intervals therefore the receiving system does not know when to expect transmissions from the transmitting system. In order to synchronise the two systems, each binary word sent from the transmitting system must begin with a start bit and end with a stop bit. These two bits must be of opposite polarity so that the receiver can recognise when the next packet is being sent. Between the start and stop bits the two systems will be perfectly in synch therefore the receiving system will be able to read each bit. Data transmission requires adding a parity bit to the start of the binary word. If even parity is agreed on then there must be an even number of 1's. Similarly, if odd parity is agreed on then there must be an odd number of 1's. The receiver relies on parity to detect errors so that replacement bytes can be requested.
Handshaking
The below table summarises the handshaking protocol which is used by COM1 serial ports.
54
Clear to send (P pin 8) Request to send (C pin 7) Here it is Start bit Busy Clear to send (P pin 8) That's it Stop bit I'm ready Clear to send (P pin again 8) The example uses C to represent a computer and P to represent a printer. 1. The computer checks the voltage on its request to send (RTS) pin when it wishes to send. 2. If the printer is ready to receive, the computer starts sending. 3. When the printer receives the start bit, it sets its clear to send (CTS) pin to busy. 4. When the printer receives the stop bit, it sets its CTS pin to ready. This signals to the computer that it can send the next byte.
Baseband system - a system that uses a single data channel in which the whole bandwidth of the transmission medium is dedicated to one data channel at a time. Broadband system - a multiple data channel system in which the bandwidth of the transmission medium carries several data streams at the same time. A LAN can operate in baseband mode. In this instance, the whole bandwidth of the transmission medium is dedicated temporarily to one sending station and one receiving station (one data channel). This means that data channels must take it in turns to use the bandwidth. Baseband systems are used over short distances such as LANs, where they offer high performance at low cost. WANs use broadband media and operate in broadband mode so that two or more data streams may be carried at the same time. Several data channels are combined onto a carrier signal so that the bandwidth of the transmission medium can be shared by several data channels. Since longdistance communication media are expensive to install it would be wasteful to only allow control to one data channel at a time. For this reason, broadband systems are used for long-distance communication.
3.6.5 Networks
Local area network - linked computers in close proximity. Stand-alone computer - a computer that is not networked. It requires its own software and peripherals since it does not share any with other computers. Wide area network - a set of links that connect geographically remote computers and local area networks.
55
Network adapter
In order for computers to be connected together as a LAN, each computer will require a network adapter or network interface card. The network card converts computer data into a form that can be transmitted over the network. Data that is received needs to be converted into a form that can be understood by the receiving computer. A network adapter receives data to be transmitted from the motherboard of a computer into an area of memory called a buffer. A checksum value is calculated for the block of data and address information (source and destination) is added. The block is now known as a frame. The network adapter transmits the frame one bit at a time onto the network cable.
56
Checksu m
Data
Address information
The frame is transmitted serially, bit by bit, onto the network cable.
Topology - the shape, layout, configuration or structure of the connections that connect devices to a network.
Bu s
data flow
backbo ne
Bus topology
Sta r
Bus topology is the most basic networking topology and the basis for modern Ethernet LANs. All devices share a common cable for connection known as the backbone. When a device wishes to share information with another device on the network it simply sends a broadcast message onto the backbone cable. This message can be seen by all other devices on the network but will only be received and processed by the intended recipient. Advantages: The amount of cable needed is minimal. It is easy to add and remove nodes without affecting the network as a whole. Disadvantages: Performance degrades as traffic increases and as the cable length increases. (Signal boosters can be used to combat deterioration over long distances.) A single fault at one workstation will cause the whole network to fail. If the cable is long then the fault will take a long time to isolate and repair. If two computers try tosend data at the same time the signals will collide and the bus will become unusable for the duration of the transmissions of both computers. To reduce this duration, each device is limited to one frame of pulses per transmission. A minimum number of pulses per frame is enforced so that it is possible to detect rises is pulse voltages caused by collisions. A computer can
57
therefore send any amount of pulses in a single frame between the minimum and maximum. If each connected computer follows a protocol when transmitting, it is possible to operate the bus system correctly even when collisions do occur. A commonly used bus protocol is Carrier Sense Multiple Access with Collision Detection (CSMA/CD). The rules for CSMA/CD are: 1. If the bus is quiet, transmit a frame. 2. If the bus is busy, continue to listen until the bus is idle then transmit immediately. 3. While transmitting, monitor the bus for a collision. If a collision is detected, transmit a brief jamming signal to let all computers know that there has been a collision then stop transmitting. 4. After transmitting the jamming signal, wait a random amount of time, then attempt to transmit again, starting from step 1.
Star topology
In start topology each workstation is connected to a central computer or switch which regenerates the data signals which it receives and transmits them to the destination device. All data is passed through the switch but only the intended recipient computer will receive and act on the data. Advantages: A cable failure on one branch of the network will only affect that branch. Adding workstations to the network or removing workstations from the network is simple and will not interfere with the rest of the network. Collisions will not occur if two computers transmit data at the same time. Workstations cannot intercept messages since all messages go through the central switch.
Disadvantages: For a greater number of machines, a greater amount of cable is required. The cost of cable can escalate with expense also in the disguising of large amounts of cable. A number of star topology networks can be connected to form a distributed star. Each switch is connected via switches with each star operating independently until a message needs to be sent to a node on another star.
58
Network segment - in Ethernet, a run of Ethernet cable to which a number of workstations are attached. Thin-client network - a network where all processing takes place in a central server; the clients are dumb terminals with little or no processing power or local hard disk storage. Peer-to-peer network - a network that has no dedicated servers. All computers are equal, so they are called peers. Server-based network - a network in which resources, security, administration and other functions are provided by dedicated servers.
Segmentation
Segmentation is one solution to congestion, which can occur when many workstations are transmitting data which may eventually collide, on an Ethernet bus network. Segmentation involves splitting a larger non-switched network into two or morenetwork segments linked by bridges or routers which ensure communication between segments is possible. Since fewer workstations are competing to transmit data, fewer collisions will occur. A bridge holds a table of Ethernet interface card addresses, one for each machine connected to the segments joined by the bridge. A router holds a table of IP (Internet Protocol) addresses.
Thin-client networking
In a thin-client network, all processing takes place in a central server; the workstations connected to the central server have very little processing power and no hard disk storage. The central server consists of: A file server which stores users files. An application server which runs applications programs such as word processors and drawing packages. A domain controller which validates users when they initiate a login session. User commands travel along the network and are executed in the central server which displays the result of its processing on a video monitor. NB: A thick-client network is just the opposite of a thin-client network. This means that all client workstations will have local processing power and will run their own applications.
Peer-to-peer networking
A pure peer-to-peer networkdoes not have the notion of clients or servers but only equal peer nodes that simultaneously function as clients and servers to the other nodes on the network. The user of each computer determines which resources can be shared but cannot decide which specific users can use which resources, i.e. if a directory is not shared with the whole peer-to-peer network then it will be only available on the computer of user whose directory it is. A peer-to-peer LAN is an appropriate choice when:
59
There are fewer than 10 users. The users are all located in the same area and the computers will be located at user desks. Security is not an issue , so users may act as their own administrators to plan their own security. The organisation and the network will have limited growth over the foreseeable future.
Peer-to-peer (P2P) operation in WANs such as the Internet is used to share files among a large number of users connected temporarily. P2P protocols such as BitTorrent are used on the Internet to distribute large files. After preparing the file by splitting it up into smaller pieces, a source will send a piece to each of the clients in the group. These clients will then in turn become sources so that the source only has to send each piece once. In this way, each client gets the rest of the file from other clients. Using the BitTorrent P2P protocol, each client is capable of preparing, request and transmitting any type of computer file over a
Server-based networking
Server-based networks in which resources, security, administration and other functions are provided by dedicated servers are used for large networks where peer-to-peer networks do not suffice. A dedicated server is a server optimised for its purpose that is not used as a client or workstation. Dedicated servers provide quick responses to requests from network clients and ensure the security of files and directories. Client computers are usually less powerful than server computers. A server stores a list of client user IDs and associated passwords so that it can authenticate users logging on at client workstations. Servers are used for services such as file storage and printing. Web servers and FTP servers are examples of server-based systems. In a school network, a central domain controller will typically store user accounts and a central file server will store users work and some applications that users download into the client machines they work at.
Router - a device that receives packets or datagrams from one host (computer) or router and uses the destination IP address the packets contain to pass them, correctly formatted, to another host (computer) or router. Gateway - a device used to connect networks using different protocols so that information can be passed from one system to another.
60
Routers
Routers are packet switches. The route chosen is determined by the destination IP address; in this route a datagram may pass through several routers before reaching its destination. Each router maintains a table of routes to various destinations; for example, a router in England will know which router to send a datagram to next if the destination IP address indicates a destination in Poland. A router will know about its sub-networks and which IP range they usebut it willnot know about its super-networks. NSP backbone
router router
NSP backbone
router
When a router receives a packet it will look at the destination address of the packet and determine whether the sub-network is within its range of IP addresses. If it is not in the routers table of routes it will send the packet on a predefined default route, usually up the hierarchy to the next router. The routers connected to the NSP backbones hold the largest routing tables and therefore will inevitably send the router to its correct destination.
61
IPv4 210.25.0.48 :
IPv4 addresses are 32-bit numbers expressed as 4 octets in dotted decimal notation. IPv6 addresses are 128-bit numbers expressed using hexadecimal strings. IANAs role is to allocate IP addresses from the pool of currently unallocated addresses to the RIRs responsible for overseeing the allocation and registration of Internet numbers within a particular region of the world.
Routable (public) IP addresses Public or routable IP addresses are assigned by RIPE NCC in Europe. RIPE carefully manages the allocation of IP addresses and offer a WHOIS feature to look up the owners of IP addresses. Non-routable (private) IP addresses Non-routable IP addresses are used for home, office, school and college networks. These IP addresses are set aside to be use when it is neither necessary nor desirable to have a public IP address. They are especially useful where multiple computers are connected to a single proxy server, firewall or router. These IP addresses were originally allocated to delay IPv4 exhaustion and take the following IP ranges: 10.0.0.0 to 10.255.255.255 62
195.168.0. 19
210.5.0.24
195.168.0 .1 router
210.5.0.1 10 router
195.168.0.1 21
210.5.0.1
A packet is sent from a computer with IP address 195.168.0.19 to a computer with IP address 210.5.0.24. It will pass through two routers which each have two IP addresses since each router has two network cards. One network card connects the router to the LAN and the other connects the router to the other router.
Gateways
A gateway is a device used to connect networks using different protocols so that information can be passed from one system to another. LANs use a protocol that is very different from the protocol used on the Internet, which is a WAN. In this instance, a gateway will do the job of translating the LAN frame into its equivalent WAN frame or datagram, and vice versa. Sending a packet from one LAN to another over a WAN will require the packet to pass through two gateways. Both gateways have two network cards, one for the port used for the LAN and one for the port used for the WAN. Each of these cards is assigned an IP address.
LAN Details
When setting up a LAN there are certain details which have to be provided in order for the LAN to operate correctly. Subnet mask The subnet mask of a network defines its size. It also helps to tell a computer which LAN it is connected to, hence the addresses to which it can send packets directly and which packets it needs to send to the gateway. The standard subnet mask for computers on small LANs is 255.255.255.0. Gateway
63
The gateway, or router address, is the IP address of the machine that connects a computer to the next hop on the Internet. In a LAN using private addresses, this address will be the internal IP address of the machine that directs traffic between the LAN and the connection to the Internet. DNS servers A network of Domain Name System (DNS) servers keeps track of the assignment of IP addresses to website domains so that it is possible to receive these websites by typing their domain name into a web browser.
3.6.9Web services
Web 2.0 - software that becomes a service accessed over the internet. Web services - self-contained, modular applications that can be described, published, located and invoked over a network, generally the Web. Ajax - a web technology that allows the only part of a web page that needs updating to be fetched from the web server.
Web 2.0
Web 2.0 is a set of principles and practices that tie together a different approach to the use of the World Wide Web and the Internet. In Web 2.0, software becomes a service which customers pay for directly or indirectly that is accessed over the Internet. In general, Web 2.0 refers to the phenomenon of social media whereby users have the ability to publish and customise content unlike websites where users are merely passive viewers (consumers).
Web services
Web services are self-contained, modular applications that can be described, published, located and invoked over a network, generally the Web. Software as a service (SaaS) is a model of software deployment where an application is hosted as a service provided to customers across the Internet.
SaaSeliminates the need to install and run the application on the customers own computer. This means that the software vendor is now responsible for keeping the application up to date and the user does not have to download updates. This also means that the customer effectively relinquishes control over whether to accept an update. As opposed to paying a one-time fee to purchase the application, the user will pay each time they use the web service. From the software vendors standpoint, SaaS has the attraction of providing stronger protection of its intellectual property (since the user does not have access to the source code) and establishing an on-going revenue stream. The SaaS software vendor may host the application on its own web server, or this function may be handles by a third-party application server provider (ASP).
64
Ajax
Ajax is a technology used in Web 2.0 which allows pages to update sections of their content using programs or data held on a web server but without reloading the entire page.This makes the experience of using the softwaresmoother and more similar to that of the thick-clientby reducing the delay between operations since fewer bytes need to be transferred from web server to web browser.
3.6.10Wireless networking
Wireless network - any type of LAN in which the nodes (computers or computing devices, often portable devices) are not connected by wires but use radio waves to transmit data between them. Wi-Fi - trademarked IEEE 802.11 technologies that support wireless networking of home and business networks. Bluetooth - a wireless protocol for exchanging data over short distances from fixed and mobile devices. A wireless access point (WAP) allows devices operating wirelessly to connect a wired network. It allows data to be relayed between the wireless devices (such as computers or printers) and wired devices on the network. Wireless networks allow devices to be added to a network using little or no extra cabling.
Wi-Fi
Wi-Fi is the standard for wirelessly connecting computers. It is the trademark for the popular wireless technology used in home and business networks, mobile phones and other electronic devices that require some form of wireless networking capability. Wireless networks are: Typically slower than networks connected using Ethernet cable. More vulnerable because anyone can intercept the radio broadcasts that carry the data between wirelessly networked computers. Wi-Fi Security Wired Equivalent Privacy (WEP) was introduced to give wireless LANs an equivalent level of security to wired LANs. Since its introduction, serious weaknesses have been identified and therefore Wi-Fi Protected Access (WPA) was introduced. WPA provides more security than a WEP security set-up. In WEP, anyone wishing to access the wireless network must provide a passphrase which is stored in the wireless access point (WAP).
Bluetooth
65
Bluetooth allows users to send data from fixed and mobile devices over short distances, creating a personal area network (PAN). Bluetooth achieves a gross data transfer rate of 1 Mbps using a short-range ISM band at 2.4 GHz.
Web server extension - a program written in native code, i.e. an executable or a script that is interpreted by an interpreter running on the web server, which extends the functionality of the web server and allows it to generate content at the time of the HTTP request. Common Gateway Interface - a gateway between a web server and a web server extension that tells the server how to send information to a web server extension, and what the server should do after receiving information from a web server extension. Dynamic web page content - content that is generated when the web browser A web browser uses the HTTP application protocol to fetch web pages from a web server by sending a request message. A web server listens on port 80 for such requests. The response from the server also uses the HTTP protocol to simply transfer a file of bytes representing a web page or image or sound file back to the web browser. However, if the web browser requires information that has the potential to change with time then the server needs to have its functionality extended -- this extension is called a web server extension.
Server
HTTP Web server
Port no. 80
Output buffer Network card Network card Input buffer
Response message The Common Gateway Interface (CGI) is a gateway between the web server and a web server extension. The CGI specification tells the server how to send information to a web server extension, and what the server should do after receiving information from a web server extension.
Request message
66
Query string
An example of a query string is: Get /webpage.asp?myname=Fred&age=6. Here, we have a get command with a query string which consists of two names, myname and age, with the corresponding values of those names, separated by an equals symbol. The ampersand (&) separates two name-value pairs.
Post method
If the Post method is used by the browser or HTTP application, then the data is passed in the message body and the command is simply Post/webpage.asp, for example. Dynamic web page content is created by writing a web page in a mixture of HTML and a scripting language, for example PHP. This script will then be interpreted by the web server extension and sent to the client in HTML. The web server extension can use database connection components to connect to a database management system (DBMS), which in turn accesses data in a database. There are five steps to make a connection to a database: 1. Create a connection. 2. Select a database. 3. Perform a database query. 4. Use the returned data, if any. 5. Close the connection. MySQL is the most common DBMS, and a scripting language will incorporate MySQL queries in its code in order to perform a database query. You will not be required to write any PHP or ASP but you will need to know how to write SQL. For examples, see section 5 on databases.
3.6.12 Security
67
Virus - a small program attached to another program or data file. It replicates itself by attaching itself to other programs. Spam - unsolicited junk e-mails. Worm - a small program that exploits a network security weakness (security hole) to replicate itself through computer networks. It may attack computers. Remote login - when someone connects to a computer via the Internet. Trojan - a program that hides in or masquerades as desirable software, such as a utility or a game, but attacks computers it infects. Phishing - when someone tries to get you to give them your personal information. Pharming - when a phisher changes DNS server information so that customers are directed to another site. No doubt if you have a computer and a connection to the Internet, you will be familiar with the problems that arise. Familiarise yourself with the formal definitions of words that you may recognise. Before the Internet, viruses were usually spread through infected files on floppy disks. When the infected file was opened, the virus program would be executed and would attack the computer usually based on an event such as a specific date. Viruses can erase and damage files. When the Internet was introduced, viruses could be spread through the medium of email and therefore the presence of the virus could increase exponentially by sending itself from email address to email address. Similarly, spam is unwanted email which is sent to thousands of email addresses by redirecting the email messages through the SMTP server of an unsuspecting host; this is called SMTP session hijacking. A worm is a malicious computer program that replicates itself through networks. It uses up computer time and drastically increases network traffic. It may also attack the computers and servers of the networks it moves through. If an unprotected computer is connected to the Internet, someone could connect to it through remote login and control the computer, erasing files or executing programs. The main difference between a virus and a Trojan is that a Trojan either hides in or masquerades as desirable software. When the program is executed, the Trojan will commence attacking the computer it is stored on. Phishing is a type of scam that operates generally through email. An example is when the sender pretends to be a reputable bank asking for someone to confirm their bank details. Some software will allow the sender to transform the 'from' line so the email will often look legitimate. The email will include a URL to a website which looks equally legitimate with a form for the user to complete. Alternatively, the email may include a Trojan which will record the keystrokes that a user enters on a truly genuine website. Spyware is a type of malicious software that collects information about users without their knowledge. Spyware software can track the website history of the computer it is on and this can be used by phishers to create appropriate phishing scenarios. 68
Firewalls
A firewall is a hardware device or a program that controls traffic between the Internet and a private network (such as a school network) or computer system (such as a home computer). Firewalls can be customised and rules can be set up that control which data packets should be allowed through and which should not. Traffic can be blocked from specific IP addresses, domain names or port numbers. Firewalls can also be set up to search data packets for exact matches of text. Packet filtering In packet filtering, the firewall analyses the packets that are sent against a set of filters (firewall rules). Packets are either allowed through or blocked. Proxy server Using a proxy server, the computer which is requesting information from a server does not come immediately into contact with the response. This allows the information to be filtered before it is passed on to the client.
Virus detection
Virus detection software, often called an antivirus scanner, checks files against a dictionary of known viruses. This means the computer users must regularly update their software in order to ensure that this dictionary stays up to date. If an infected file is found, the antivirus scanner will try to delete the virus from the file. If this fails, the infected file will be quarantined -- kept in a separated area of the hard disk where it can't infect other files. The software may resort to deleting the infected file.
Computer security
Authentication Authentication of legitimate users can be achieved by using passwords, biometric data, security tokens or digital certificates. For example, an organisation may only accept emails which have been digitally signed. This digital signature must be authenticated through a digital certificate issued by a trusted third party such as a certification authority. Authorisation An authorised user will have a user ID and password which has been recorded as an existing user. This user will be able to access certain features according to permissions granted by the system administrator. Passwords and encryption are used to keep data secret from unauthorised persons. Accounting Systems will create an account of every activitymade on a network or individual computer. This means that security breaches can be detected as soon as possible and any compromised parts can be identified. This can be applied to
69
Internet activity, where a log will be created storing every IP address of the websites that have been visited.
3.6.13 Encryption
Encryption- using an algorithm and a key to convert message data into a form that is not understandable without the key to decrypt the text. Plain text - message data before it is encrypted. Cipher text - message data after it has been encrypted. Decryption - using an algorithm and a key to convert encrypted message data into its plain text equivalent. Cryptography - the science of designing cipher systems. Cryptanalysis - trying to find the plain text from the cipher text without the decryption key. Break the code - find the plain text from the cipher text by guessing or deducing the key. The main uses of encryption are to store information securely and to transmit messages so that only the sender and the legitimate recipient can read them. Encryption is the process of using an encryption algorithm and an encryption key to convert plain text into a form that is not understandable without the key to decrypt the text.
Symmetric encryption
Symmetric encryption is the most straightforward of encryption techniques whereby it is possible to decrypt a message if the encryption algorithm and key are known. A B C D E F G H I J K L M N O P Q R S T U V W X Y Z WX Y Z A B C D E F G H I J K L M N O P Q R S T U V Above is a simple substitution cipher which is created by moving each letter of the alphabet four places along. We can encrypt "CIPHER" to "YELDAN" and decrypt "AJYNULPEKJ" to "ENCRYPTION". The cipher, however, can be easily discovered by analysing the frequency of letters in passages of text and looking for expected words.
T P I P C U L
R O O H A S T
A S N E N E O
N I C R B F O
S T I S E U Z
70
A transposition cipher features the plain text written into a grid, row by row with any remaining spaces being filled with a Z. We then produce the cipher text by reading the table contents column by column.The key is the number of columns used; therefore we have used a key of 5. We can encrypt the message "TRANSPOSITION CIPHERS CAN BE USEFUL TOO" into "TPIPCULROOHASTASNENEONICRBFOSTISEUZ", making the message unrecognisable.
Asymmetric encryption
In asymmetric encryption, or public key encryption, both parties who want to communicate securely have a pair of keys, a private key and a public key. The public key is kept secret and the public key is freely available to anyone. The encryption algorithm is also publicly available. A message encrypted with a private key can only be decrypted with the corresponding public key and vice versa. Consider a scenario with two users A and B. If A encrypts a message with A's private key, then B (and anyone else who intercepts the message) can decrypt the message with A's public key. If A encrypts a message with A's public key, only A can decrypt the message with A's private key. If A encrypts a message with B's public key, only B can decrypt the message with B's private key. Asymmetric encryption involves complicated calculations, so encryption and decryption are slow. Secure web browsing and e-commerce use a protocol known as Secure Sockets Layer (SSL). The website accessed by the browser will send its public key to the browser. The browser creates a symmetric key (known as a session key) that it sends to the website encrypted by the website's public key. So only the website can decrypt the symmetric key. This symmetric key is then used for the rest of the session.
1. B sends A its public key. 2. A uses B's public key to create a symmetric key.
Symmetric key
Since both A and B can make an identical symmetric key, this can be used by A to encrypt any information that it needs to keep secret. B has its own private key and A's public key therefore it will be able to decrypt the cipher text sent to it.
71
To prove that a message is genuine, sender A can digitally sign the message. This makes it possible to detect whether the message has been tampered with, and the signature is proof that it has been sent by A. The processes required before A's message is sent to B are as follows: The message is hashed to produce a message digest. The message digest is encrypted with A's private key; this becomes the signature. The signature is appended to the message. The message is encrypted using B's public key. The encrypted message is sent to B. The processes required to ensure that the message received by B is genuinely from A are as follows: B decrypts the message with B's private key. B decrypts the signature with A's public key to retrieve the original message digest. The decrypted message is hashed again to reproduce the message digest. If the decrypted digest equals the reproduced digest, the message has not been tampered with.
72