Вы находитесь на странице: 1из 72

Comput ing

COMP3 Revision Guide


CONTENTS
Contents................................................................................................................ 1 3.1 Problem Solving............................................................................................... 3 3.1.1 Abstraction and Information Hiding...........................................................3 3.1.2 Comparing Algorithms...............................................................................3 3.1.3Types of Problem........................................................................................ 7 3.1.4 Finite State Machines................................................................................ 7 3.1.5 Language and Notation........................................................................... 11 3.2 Programming Concepts.................................................................................14 3.2.1 Programming Paradigms.........................................................................14 3.2.3Abstract Data Types.................................................................................18 3.2.4 Queues.................................................................................................... 20 3.2.5 Stacks...................................................................................................... 22 3.2.6 Hashing.................................................................................................... 23 3.2.7 Graphs and trees..................................................................................... 25 3.2.8 Searching and sorting.............................................................................. 32 3.2.9 Simulations.............................................................................................. 35 3.3 Real Numbers................................................................................................ 36 3.3.1 Real numbers.......................................................................................... 36 3.4 Operating Systems........................................................................................ 38 3.4.1 Role of an operating system....................................................................38 3.4.2 Operating system classification...............................................................41 3.5 Databases...................................................................................................... 46 3.5.1Conceptual data modelling.......................................................................46 3.5.2Database design....................................................................................... 47 3.5.2 Structured Query Language (SQL)...........................................................50 3.6 Communication and Networking....................................................................52

3.6.1 Communication Methods.........................................................................52 3.6.2Baud, bit rate, etc..................................................................................... 53 3.6.3 Asynchronous data transmission.............................................................53 3.6.4 Baseband and broadband........................................................................55 3.6.5 Networks.................................................................................................. 55 3.6.6 Network topologies..................................................................................57 3.6.7 Networks Part Two...................................................................................58 3.6.8 Routers and gateways............................................................................. 60 3.6.9Web services............................................................................................ 64 3.6.10Wireless networking............................................................................... 65 3.6.11 Server-side scripting..............................................................................66 3.6.12 Security................................................................................................. 67 3.6.13 Encryption............................................................................................. 70

3.1 PROBLEM SOLVING


3.1.1 Abstraction and Information Hiding

Abstraction - simplifying a problem to omit all unnecessary details. Information hiding - hiding the complexities of the system behind a simple to use interface. Interface - the boundary between the front end of a system and its

An example of information hiding


Consider a pocket calculator. Anyone familiar with the device is likely to be able to operate a calculator that they haven't seen before. The keypad and screen are the interface that is used for communication with the user. The workings of the calculator are normally behind plastic cases and are hidden from the user. The user does not need to know how the calculator works in order to use it - they just need to be familiar with the standard interface.

3.1.2 Comparing Algorithms

Time complexity - how long an algorithm takes to complete a task with a given input. Space complexity - how much memory an algorithm needs to complete a task with a given input. Order of growth - how quickly the complexity of a function grows with growth in Building on the definition of an algorithm from the AS course, there are some additional key points to remember: An algorithm is a sequence of unambiguous instructions. An algorithm has a range of legitimate inputs and should produce the correct result for all values within the range. Different algorithms can be developed to perform the same task. These algorithms can have different time and space complexities. The overall complexity of an algorithm depends on its time complexity and space complexity. Some algorithms are more time efficient than other -- insertion sort is faster than bubble sort. Some algorithms are more space efficient than others -- some use memory efficiently, others waste RAM. The table on the following page gives the order of complexity of algorithms. These points should be noted:

Quadratic and cubic time are examples of polynomial time which take the following form: The number of nested loops can dictate the value of c. For example, a bubble sort algorithm contains a loop inside a loop meaning that it will have quadratic time complexity.

Order of complexity Big O Graph Shape notation


f
1

Complexity
Constant time

Examples / Comments
Is a number odd or even? No matter the size of the input, it will always just be a matter of looking at the last digit. Binary search If a list is constantly being split into half with one half rejected then its time complexity will not grow as fast as linear time. Linear search If each item in a list has to be checked to see if it is the desired item then the time complexity will grow with the list size. Quick sort A quick sort splits a list and sorts it recursively. In the base-case scenario it has linearithmic time complexity. Bubble sort algorithm Two items in a list are looked at together and either swapped or not. Each pass can feature n 1 swaps and there are can be a total of n - 1 passes, giving quadratic time complexity.

FASTES T

n f Logarithmic time n f
x

Linear time

n Linearithmic time

n Quadratic time f

Cubic time

n f Exponential time n Factorial time Travelling Salesman Problem The travelling salesman problem tries to find the optimal tour between a The recursive Fibonacci algorithm roughly has this level of time complexity.

SLOWE ST

finite number n of nodes. As n grows in size, the problem becomes increasingly more difficult.

3.1.3Types of Problem

Non-computable - a problem that does not have an algorithmic solution. Tractable - a problem that has a reasonable (polynomial) time solution. Intractable - a problem for which no reasonable time solution has yet been found. Decision problem - a problem whose answer is always yes or no. Undecidable - a decision problem that is not computable. Heuristic solution - a trial and error approach using 'informed guesses' or learned knowledge to find a solution to an intractable problem.

The Travelling Salesman Problem


The travelling salesman problem is an example of an intractable problem. This is since the most straightforward method of finding the shortest tour between a number n of nodes is to try all possible permutations. The number of permutations will become larger and larger, with only small networks representing tractable problems. This is why heuristic methods such as starting from one node and choosing the smallest distance to another node (an adaptation of Prim's algorithm) are often devised to simplify the problem.

The Halting Problem


The halting problem is a famous non-computable and hence undecidable problem. It asks the question, is it possible to create a program which takes another program as an input and determineswhether it will halt or whether it will loop infinitely? Through reductio ad absurdum (proof by contradiction), Alan Turing proved that such a program is impossible.

3.1.4 Finite State Machines

Mealy machine - an FSM that determines its outputs from the present state and inputs. Moore machine - an FSM that determines its outputs from the present state only.

Mealy machine
inpu t 'a'|'A' outp ut

Moore machine
inpu t 'B' 'a' outp ut 'A'

S
0

transitio n

S
1
state

S
0
'b'

S
1

'b'|'B'

Inputs and outputs on transitions.

Inputs on transitions, outputs on

states.

A finite state machine without an output is known as a finite state automaton (FSA). FSA are restricted to decision problems (they only output yes or no). If a given input causes an FSA to stop at a valid halt state then the output is true. Otherwise, the output is false. Below is a diagram demonstrating a finite state automaton for unlocking a combination lock with the code 2371. Since an input from any given state only corresponds to one transition it can also be called a deterministic FSA or deterministic finite automaton (DFA).
NOT 2 2 3 7

STA RT NOT 3

23

23 7

UNLOCK ED

initial state

NOT 7 NOT 1

accepting state

Non-deterministic FMAs have their uses in pattern matching and can be converted into DFA's.

NFA(Nondeterministic Automaton)

Finite DFA(Deterministic Automaton)

Finite

4 6

6
b b a b

1
a a

2
b

b a

3
b a or b

1
b

3 7
a or b

Principle of universality - a universal machine is a machine capable of simulating any other machine. Equivalent Turing machine - all other types of computing machine are reducible to an equivalent Turing machine. Power of a Turing machine - no physical computing device can be more powerful than a Turing machine. If a Turing machine cannot solve a decision problem, nor can any physical computing device.

Alan Turing devised the Turing machine, an abstract computational device, in order to explore the limitations and capabilities of computer machines. Turing machines are the most basic of computing machines (their operations cannot be divided any further) and therefore have the theoretical potential to describe the operation of any computing machine. This is the principle of universality. By reasoning that every machine has an equivalent Turing machine, we can conclude that nothing is more powerful than a Turing machine.

Example Turing Machine


movem ent
|1 |

input

output

|0

The above state transition diagram shows an algorithm which will print "1 0 1 0 1 0" on our theoretical tape infinitely (until there is no more empty tape available).

If we halted the Turing machine as it approaches the end of its fifth pass our tape would appear as shown above. We can display the table of instructions which creates this pattern as shown below:

Current state
A B C D

New state
B C D A

Input

Output
1 0

Tape head
Move Move Move Move right right right right

10

Another way to express this is using a transition function with the following syntax: (current state, input symbol) = (next state, output symbol, movement) These are the appropriate transition functions for our example: (A, ) = (B, 1, ) (C, ) = (D, 0, ) (B, ) = (C, , ) (D, ) = (A, , )

The Busy Beaver Function


The function B(n) is defined as calculating the largest number of ones an n-state Turing machine can write on an initially empty tape and still stop. Busy beaversare machines which produce B(n) marks with n states. The function B(n) can be easily defined but remains non-computable. Turing also conceived a universal Turing machine (UTM). The following description details a UTM which uses a single one-dimensional tape. The instructions of the Turing machine, M, are placed on the tape followed by the data, D, to be processed by M. The UTM, U, processes M and D by starting with its read/write head positioned on M and then moving between M and D as M is executed. U may be a lot slower than M but it acts as an interpreter would, identifying the next instruction to be executed, and then executing it.

3.1.5 Language and Notation

Natural language - a real spoken and written language with grammar or syntax rules and ambiguities, such as English and French. Formal language - a language defined by an alphabet and rules of syntax. Regular language - any language that an FSM will accept. Bothregularand formal languagescan be defined by a regular expression or Backus-Naur form. Natural language is a lot less strict than both regular and formal languages.

Regular Expressions(Regex)
Regular Expressions are used for data validation, file searching, and matching.

Regex
a* a+ a? a|b [ab] [a-z]

Meaning
Matches zero or more a's. Matches one or more a's. Matches zero or one a's. Matches a or b. Matches a or b - an alternative form of a|b. All lowercase letters. 11

Examples
c(at)* ar+t colou?r gr(a|e)y gr[ae]y ([a-z])+ cat, c, catat, catatat art, arrt, arrrt colour and coloronly. gray and grey only. gray and grey only. zbcbs, acdssx

[A-Z] \d \w \W \s . ^f f$

All UPPERCASE letters. Matches any single digit. Matches any single alphanumeric character. Matches any single nonalphanumeric character. Matches a single space. Matches any single character. Matches an 'f' with nothing before it. Matches an 'f' with nothing before it.

[A-Z]([a-z])+ Student\d \w\w\d\d \d\d\W 123\d\d .ear ^re ine$

Tom, Harry, London Student5, Student8 Mk19, sC52 90?, 23$ 123 2, 123 8 hear, fear, dear rebound, rehab cosine, genuine

Backus-Naur Form(BNF)
Backus-Naur form is a notation for expressing the rules for constructing valid strings in a regular language. It can be expressed in a number of ways. Consider a British postcode for example SQ12 4YA. While "SQ" does not correspond to any city and therefore the postcode is invalid, it is in fact syntactically correct. We can break down this syntax as follows: <postcode> ::= <first-bit><optional-space><second-bit> <first-bit> ::= <letter><letter><digit><digit>|<letter><letter><digit>| <letter><digit><digit>|<letter><digit> <last-bit> ::= <digit><letter><letter> <letter> ::= A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z <digit> ::= 0|1|2|3|4|5|6|7|8|9 <optional-space> ::= " "|"" This can be represented on a parse tree.
<postcode > <first-bit> <optionalspace> <digit > <digit > <last-bit>

<letter >

<letter >

<digit >

<letter >

<letter >

""

Anything expressed in triangular brackets, <>, is non-terminal This value is terminal since it is an expression of since it cannot be broken terminal and/or non-terminal down any further. The same syntax can also be expressed by a series of syntax diagrams. values. 12

postco de firstbit optionalspace ""

first-bit

optionalspace

last-bit

lette r

digit lette r digit digit

""

last-bit lette r lette r

digit 0 letter A B C D etc . 1 2 3 etc .

Terminal values are represented in ellipses.

Reverse Polish Notation(Postfix Notation)


In reverse Polish notation (RPN), operands are followed by operators, rather than there being operators between the operands as we see in infix notation. There are many advantages of this notation of expressions: No need for parentheses to avoid ambiguity Calculations occur as soon as an operator is specified RPN calculators have no limit on the complexity of the expressions that can be evaluated No equals key needs to be included for an expression to be evaluated We can convert an expression using infix notation to a postfix expression like so: Take our infix expression e.g. 5 + ((1 + 2) * 4) - 3 Add all implicit parentheses: {[5 + ((1 + 2) * 4)] - 3}. You should ensure that each set of brackets only encloses one operator. Shift all operators to the right side of their containing brackets: {[5 ((1 2 +)

4 *) +] 3 -}
Remove all the brackets: 5 1 2 + 4 * + 3 -

13

5 + ((1 + 2) * 4) - 3
infix operan d

512+4*+3postfix operato r

We can convert postfix to infix using a stack which follows a last-in first-out (LIFO) basis.

Input
5 1 2 + 4 * + 3 -

Operation
Push operand Push operand Push operand Add Push operand Multiply Add Push operand Subtract

Stack
5 5, 1 5, 1, 2 5 5, 4 5 -3 --

Infix expression

1+2 1+2 (1+2)*4 ((1+2)*4)+5 ((1+2)*4)+5 ((1+2)*4)+5-3

3.2 PROGRAMMING CONCEPTS


3.2.1 Programming Paradigms
Structured Programming- reminder from COMP1
Structure programming is the preferred approach to writing code. Its aim is to create programs which are easy to understand and work efficiently. Outlined below are the main approaches towards following structured programming principles: Meaningful identifiers: When choosing what to name a variable you should consider what purpose the variable has. Ensure consistency for the style of identifier name (e.g. use camelCase throughout). Indentation: Indenting code ensures that it is easy to identify where a function begins and ends and that the code appears a lot neater and easier to read. Procedure & functions: Breaking tasks down into subtasks allows each function or procedure to deal with its own subtask only. Programs are made more readable if functions have only one task and it also becomes much easier to adjust the code since all functions have distinct operations. Avoid global variables: Since the value of global variables can be changed at any point during the function, it quickly becomes difficult to keep track of the variable's value. Especially for other programmers looking at another person's code, it is a lot better to pass variables as arguments and avoid global variables altogether.

14

Programming paradigm
Functional programming Logic programming Event-driven programming

Description
Treats computation as the evaluation of mathematical functions. Defines a set of facts and rules. The flow of the program is determined by events such as mouse clicks and key presses which trigger subroutines. Programmers use instances of class in order to create objects. Code is executed one line after the other. Code can be

Example
Haskell, F#. Prolog. C, C++, etc. Most languages.

Object-oriented programming Procedural/imperative programming

C++, Java, etc.

15

Object-oriented programming

Object - an instance of a class. Instantiation - the action of declaring an instance of a class (an object). Class definition - a pattern or template that can be used to create objects of that class. Encapsulation - combining a record with the procedures and functions that manipulate it to form a new data type, a class. Inheritance - defining a class and then using it to build a hierarchy of descendant classes with each descendant inheriting access to all its ancestors' code and data. Polymorphism - giving an action one name that is shared up and down a class hierarchy. Each class in the hierarchy implements the action in a way appropriate A class defines the properties and methods (procedures and functions behaviours) that will be used for each instance of that class. Below is an example of a class written in C#. class Member Private variables, procedures and { functions are only accessible inside private intMemberShipNo; the class. private string Name; private string Email; public Member(intnewmemberno, stringmembername, string email) { MemberShipNo = newmemberno; Name = membername; Email = email; } public voidAmendMember(stringmembername, string email) { Everything which is public Name = membername; can be accessed outside Email = email; the class. } public voidDisplayMember() { Console.WriteLine("ID: {0}, Name: {1}, Email: {2}", MemberShipNo, Name, Email); } } In this basic example we have a class named "Member". We initialise this class by saying that we can create a new member with three main properties; membership number, name, and email address. An instantiation of this class would be written, for example, "Member mymember = new Member(1, "Frederick Dragonswatter", "fred.dragon@thedragonclub.co.uk");". We can then use the public methods to alter or print the information which we have stored; this usage of public methods to deal with private variables is referred to as encapsulation.

16

Important points to note: A child class inherits all the properties and methods of its parent class. This may be used for adding more complexity to a basic functional class, for example, a scientific calculator child class with a calculator parent class. A child class can override methods, an idea called polymorphism. This is particularly useful when applying the same method to every object in a list when a different result is desired for different objects.

Clock is a is a

Alarm clock

Watch

Above is a simple inheritance diagram which shows the relationship between a clock, an alarm clock and a watch. We can expand on this diagram with a class diagram as shown below. TClockCLASS Fields:
Hours Minutes

Methods:
SetTime GetHours GetMinutes IncrementTim e

TAlarmClockCLASS Fields:
Hours Minutes

TWatchCLASS Fields:
Hours Minutes

Methods:
SetTime GetHours GetMinutes IncrementTim

Methods:
SetTime GetHours GetMinutes IncrementTim

Additional e field and methods: SetAlarmTime

AlarmHours GetAlarmHour AlarmMinutess GetAlarmMinu tes

These fields and methods are automaticall y inherited by the child class.

Additional e field and methods: SetDay

Below, again in C#, is an example of inheritance and polymorphism. public class Clock public classAlarmClock : Clock { { private intHours; public override void Declare() privateintMinutes; { public virtual void Declare() Console.WriteLine("I am an alarm

DayOfWeek SetDayNumbe DayNumber r GetDay GetDayNumbe r

17

{ Console.WriteLine("I am a clock."); } }

clock."); } }
The child class overrides the method defined by the parent class but inherits the other fields.

3.2.2 Recursive Techniques

Recursive routine - a routine defined in terms of itself. General case - the solution in terms of itself for a value n. Base case - a value that has a solution which does not involve any reference to the general case solution. Stack frame - the locations in the stack area used to store the values referring to one invocation of a routine.

Recursive routines can often offer elegant solutions to a problem. Despite this, they are less efficient in terms of both time and space than iteration. One of the most common examples of a recursive method is the calculation of a factorial value, n!, with the function name Factorial(n). Below is code, in Pascal, for the function. Function Factorial (n : Integer) : Integer; Begin This is the base case since a factorial is defined as n! If n = 1 n in the recursion. Then Result := 1 This is the general case since it is the Else Result := n * Factorial(n-1) End
= n x (n-1) x x 1, therefore 1 is the smallest value of

recursive definition for every value except the base value.

This only works because the routine is called with values passed as parameters. Recursion does not work with global variables. For each invocation of a recursive routine, a portion of the stack (the stack frame) is assigned to store return values. This means that, for some inputs, there is a risk of stack overflow. Trying to calculate 0! recursively will cause such an error since the base case will never be reached. Below is a dry run for calculating 5!:

Call number
1 2 3 4 5

Function call
Factorial(5) Factorial(4) Factorial(3) Factorial(2) Factorial(1)

n
5 4 3 2 1

Result
5 4 3 2 1 * * * * Factorial(4) Factorial(3) Factorial(2) Factorial(1)

Result
5 4 3 2 1 * * * * 24 6 2 1

Return value
120 24 6 2 1

3.2.3Abstract Data Types

18

* *

Abstract data type (ADT) - a data type whose properties are specified independently of any particular programming language. List - a collection of elements with an inherent order. Pointer - a variable that contains an address. The pointer points to the memory location with that address. Null pointer - a point that does not point to anything, usually represented by or -1. Dynamic data structure - the memory taken up by the data structure varies at run time. Static data structure - the memory required to store the data structure is declared before run time. Lists are a type of abstract data typesince they are a collection of elements with an inherent order but this order is expressed outside of the programming language with the programmer needing no knowledge of how the list is stored or how it functions (an example of information hiding). There are two key types of lists.

Linear lists
A linear list is a static structure. This means that on declaring the list it is reserved a portion of the heap using adjacent memory locations. In this instance, order is given by the order of the memory locations that the items occupy. Advantages: Linear lists are easy to program. If elements are stored in key order, a binary search is possible. Disadvantages: Memory locations may be wasted due to arrays being static. Insertion of an element within an ordered list requires moving elements. Deletion of an element within a list requires moving elements.

Linked lists
A linked list is a dynamic structure. Since each item in the list points to the next, there is no need for the list to occupy adjacent memory locations in the heap therefore the list can be as large or small as necessary. Let's say we want to insert the words "one", "two", "three" and "four" into our list. one Start four
A null pointer is required to indicate the end of the list.

two

three

19

If we want to put these into alphabetical order we can simply rearrange the pointers as demonstrated below.

one Start

two

three

four We can program this as an array, with each item connected to a pointer. The table on the next page shows how to express the above linked list as an array.

IndexData fieldsPointer field0one21two2three13four04

Start NextFre e

3 4

When adding an item in this format, the next item will be placed in the field with the "NextFree" index and the current last item in the list will point to the new item. The "NextFree" value will then be altered. When deleting an item, simply update the pointers and the "Start" or "NextFree" values in order to ensure that it is still possible to chain through the list and to know where the first empty field is. In dynamic allocation memory space is only allocated when required at run time. Each time a list requires more memory space, it will be allocated a portion of the heap. If the memory locations used by a dynamic structure type are not given back to the stack when they are no longer in use, memory leakage will occur. This is when there becomes no memory left in the stack.

3.2.4 Queues

Queue - a first-in first-out (FIFO) abstract data-type. Circular queue - when the array element with the largest possible index has been used, the next element to join the queue reuses the vacated location at the beginning of the array. Linear queue - elements join the queue at one end and leave the queue at the other.
Join here
VIP

Front Front of of queue queue

20

A queue is a type of list where the first item to be added is the first item to be removed (FIFO). In the programming sense, a queue has two operations; add a new item to the rear of the queue or remove an item from the front of the queue. Some uses of queues in a computing context are these: print jobs waiting to be printed characters entered at the keyboard and held in a buffer jobs waiting to be executed under a batch operating system simulations

Array implementation
On the next page is an example of using an array to implement a queue. In this example the items stay static therefore will always remain at the same index.

IndexData fields0Fred1Jack2
Matt345

Front Rear

0 2

IndexData fields0Fred1Jack2
Matt3Joe4Harry5Si mon

Front Rear

3 5

Adding Joe, Harry and Simon to the queue, and removing Fred and Jack from the queue, we now have a problem since we have reached the end of the memory locations reserved for this queue. Shuffle queue In a shuffle queue, once someone leaves the queue, all the items are moved to the next position along. A front pointer is not necessary since the item at the front of the queue will always be the item with the lowest index (in this case, 0).

IndexData fields0Fred1Jack2
Matt345

Rear

IndexData fields0Joe1Harry2
Simon345

Rear

Circular queue In a circular queue, vacated entries may be reused. This means that when an index is no longer reachable by chaining through the queue, we should delete the item that was in that position, i.e. when a person leaves the queue, the space they occupied becomes free. The example below shows Jack rejoining the queue.

21

IndexData fields0Fred1Jack2
Matt345

Front Rear

0 2

IndexData fields0Jack123Joe
4Harry5Simon

Front Rear

3 0

Linked list implementation


A linear queue includes a linked list of items in the queue which point to the item after them. We therefore need two extra pointers; one to point to the front of the queue and another to point to the rear. Front Rear

Fred

Jack

Matt

If we add an item to the queue, we simply add a pointer from Matt to the new item and move the rear pointer to point to this new item. Similarly, if we want to remove the Fred from the queue, we just need to move the front point to point to Jack.

Priority queues
Priority queues effectively take the first-in first-out principle of a queue and adjust it so that every element in the queue has an associated priority. The element in the queue with the highest priority will be the first to leave the queue therefore we essentially insert an element in the queue where they fit in terms of their given priority. Priority queues are especially used in simulations.

3.2.5 Stacks
Stack - a last-in first-out (LIFO) abstract type data. There are two operations which can be performed on a stack. These are adding a new item to the top of the stack (pushing) and removing an item from the top of the stack (popping). Some uses in a computing context are: Stacks are used to store the return address, parameter and register contents when a procedure or function call is made. When the procedure or function completes execution, the return address and other data are retrieved from the stack. Stacks are used for evaluating expressions in Reverse Polish Notation.

22

As with queues, there are two main ways of representing stacks.

Array implementation IndexData fields0Fred1Jack2


Matt345
TopOfSta ck

IndexData fields0Fred1Harry
2Joe345

TopOfSta ck

In this instance, Fred, Jack and Matt are added to the stack in that order. While this stack goes downwards, they can also go upwards take heed of the order of the index numbers. If Matt and Jack are popped from the stack and Harry and Joe are pushed onto the stack, the stack will appear as shown by the diagram on the right. Since items are pushed onto and popped from the top of the stack, we only need one pointer.

Linked list implementation


Start

Fred

Jack

Matt

Joe

In the linked list implementation of a stack, each item in the stack points to the item below it. If a new item is added to the top of the stack, the new item will point to Fred and the start pointer will point to the new item. Removing an item from the stack simply means the start pointer pointing to the next highest item on the stack.

3.2.6 Hashing

Hashing - The process of applying a hash function to a key to generate a hash value. Hash key - the key that the hash function is applied to. Hash function - a function H, applied to a key k, which generates a hash value H(k) of range smallerthan the domain of values of k. Hash value - the value generated by the application of a hash function to the key. Hash table - a table with a column dedicated to the range of hash values that can be generated by applying a hash function to a key. If the hash value

The premise
In databases there may be a table with thousands or even millions of records. This makes searching for a specific record more time-consuming especially since

23

a linear search may have to search through all records just to find one. A hash tablemakes reading, writing and deleting records a much quicker process. Hashing is also used for storing passwords in databases since the process of hashing cannot be reversed. This means that the user will be able to input a password which can be verified by passing it through the hash function but even if someone could access the database table containing the passwords, they would not be able to successfully access any of the accounts. According to the pigeonhole principle, a good hash function will be one which generates as many combinations of hash values as there are combinations of hash keys so as to ensure no collisions by different hash keys.

Collisions

Collision - when two or more different keys hash to the same hash value. Open hashing - a method in which a collision is resolved by storing the record in the "next available" location. Closed hashing - all other locations in the table are closed off therefore a pointer column is added and a linked list of records with the same hash key is created. Rehashing - when the initial hash results in a collision, the hash value of the key is rehashed to generate a new hash value. Linear rehashing - the original hash value is incremented by 1 modulo N, 2 modulo N, etc. until an empty slot is found in a table of size N rows. As explained by the pigeonhole principle, if there are N + 1 items to be inserted into a table of size N rows, then there must be at least one row which will have to contain two items. In hashing terms, hashing two different keys to the same hash value is called a collision. There are two main methods of dealing with collisions. 1. In open hashing, the hash value is rehashed in order to position itself in the next available location. This is especially advantageous if the table has a large number of rows so that collisions are infrequent. When searching, if the hash value is found in the table yet the record is not the desired one , rehashing is used to look for where the desired item could be. If the end of the table is reached and the item is not found, only then can we conclusively say that an item is not in the hash table.If an item is deleted, a special marker must be put in place in order to prevent the search stopping prematurely. 2. In closed hashing, collisions are predicted as almost common occurrences. A pointer column is introduced in the table and if a collision occurs between two hash keys, a linked list is created. When searching, if the hash value is found but the first item is not the desired one, the search

24

will follow through the linked list until it finds the item. If the end of the linked list is reached and the item has not been found, we can conclusively say that the item is not in the hash table. If an item is deleted there are two options for where a linked list exists. Either the deleted item in the chain is replaced by a special marker or each item is moved up.

A worked example
Give the contents of the hash table that results when you insert items with the keys CO M P U T I N G in that order into an initially empty table of M= 5 rows, using closed hashing. Use the hash function ktimes 11mod Mto transform the kth letter of thealphabet into a table index (row number), e.g., hash(B) = hash(2) = 22 Mod 5 = 2. Characte r C O M P U k 3 15 13 16 21 k * 11 Mod 5 33 Mod 5 = 3 165 Mod 5 =0 143 Mod 5 =3 176 Mod 5 =1 231 Mod 5 =1 Pointer
T U

Character T I N G

k 20 9 14 17

k * 11 Mod 5 220 Mod 5 =0 99 Mod 5 = 4 154 Mod 5 =4 187 Mod 5 =2

Index 0 1 2 3 4

Character O P G C I

M N

If we have a larger table, we can represent the same information using open hashing. We have nine letters to enter into our hash table so we should let the hash table have nine rows. Index 0 1 2 3 4 5 6 7 8 Character O P U C M T I N G Here, we have used linear rehashing, i.e. keep incrementing the hash value until the row belonging to that index is empty. You can see how, since there are many collisions in this example, the indexes of a lot of the characters do not represent its actual hash value or a value close thereof.

3.2.7 Graphs and trees

25

Graph - a diagram consisting of circles, called vertices, joined by lines, called edges or arcs; each edge joins exactly two vertices. Neighbours - two vertices are neighbours if they are connected by an edge. Degree (of a vertex) - the number of neighbours for that vertex. Labelled or weighted graph - a graph in which the edges are labelled or given a value called a weight. Automation - turning an abstraction into a form that can be processed by a computer. Directed graph or digraph - a diagram consisting of circles, called vertices, joined by directed lines, called edges. Simple graph - an undirected graph without multiple edges and in which each edge connects two different vertices. Closed path or cycle - a sequence of edges that starts and ends at the same vertex and such that any two successive edges in the sequence share a vertex.

B E A C F A

B E C

F D

Above are two diagrams which represent the same graph. The graph on the right is a directed graph or digraph form of the simple graph on the left. Simple graph cannot contain loops since these edges do not connect two different vertices. If we look again at the above diagrams, we can form a closed path or circuit from the graph by travelling on the path C-D-F-E-C. Here, we have visited different vertices sequentially and returned back to the node we began at. In computing, a graph often represents an abstration of a problem. For example, a company may have different business plans for generating profit and may want to discover which route would give them the most profit year on year. The London Underground map is a typical example of abstraction since it only keeps the important details and does not say true to the actual geography of the stations.

Data representation of a graph


A graph with multiple edges can be represented using an adjacency matrix or adjacency list.

26

Adjacency matrix An adjacency matrix of size n by n for a graph with n vertices stores whether or not two vertices are directly connected. We can use 0s and 1s to represent this information, 1 meaning that two vertices are neighbours and 0 meaning that they are not.

12345101110210010310011411101 500110
5

For an undirected graph there will always be a symmetrical pattern as shown in the above matrix, since aij = ajiwhere a is the cell in the adjacency matrix and i and j are two distinct vertices. Notice that the matrix tells us that vertex 1 is not adjacent to vertex 1.

12345101100200010300011410001 500000
5

When the graph becomes directed, the matrix is no longer symmetrical. If we read off a31, we have a value of 0 meaning that we cannot travel from vertex 3 to vertex 1. However, reading off a13 tells us that we can travel from vertex 1 to vertex 3. Therefore aijaji. You should fill in the matrix for each row, I, to column, j, representing whether it is possible to travel from vertex i to vertex j.
1 2 0 1 1 2 4 3 0 9 1 2 5

12345119202233904 11125

1 9

For a weighted or labelled graph, we can no longer use 0 in our adjacency matrix since it could easily be a valid distance between two vertices. For that reason, we may use the infinity symbol () instead. An adjacency matrix for a labelled graph may be called a distance matrix.

2 3

27

Adjacency list An adjacency list specifies which vertices are connected in a different way to an adjacency matrix.
1 3

VertexAdjacent vertices12, 3,
421, 431, 4, 541, 2, 3, 553, 4
5

Similarly, if we have a directed graph we fill in the list from the view of the vertex in the vertex column, i.e. which vertices can we go to from that node?
1 3

VertexAdjacent
32434, 541, 55
5

vertices12,

If we want to fill in the table with distances the following format can be used (this time we do not need to use the infinity symbol because we only include adjacent vertices):
1 2 0 1 1 2 4 3 0 9 1 2 5

VertexAdjacent

vertices12,

1 9

19; 3, 2024, 2334, 9; 5, 041, 11; 5, 125

2 3

Matrix or list? Adjacency matrix o If many vertex pairs are connected by edges, then the adjacency matrix doesnt waste much space and it indicates whether an edge exists with one access (rather than following a list).

Adjacency list o If the graph is sparse, so not many of its vertex pairs have edges between them, the adjacency list is preferable.

Trees
If a non-directed connected graph has no cycles then we can identify it as being a tree. In a tree, there is just one path between each pair of vertices. 28

root node E

A B D C

internal node E

C F F leaf node

Tree

Rooted tree

If a tree has a designated root from with every edge being directed away from this root it is called a rooted tree.

Algorithms
Graphs can be used to represent mazes by placing a node on each decision point of the maze, i.e. each place where a path splits into two or more paths, as well as at the start and end positions. We can then designate three Boolean flags to each node. These flags are: Undiscovered is the node yet to be found? Discovered has the node been found yet? Processed or completely explored Have we visited all the incident edges of the node?
D A D B C O G I N L M F H E K J G I O H J P N A F E E P B C K L M

In this abstraction, all the dead ends of the maze have been included to ensure that each path of the maze can be fully explored as it might be in real life. Since the graph that we have abstracted from our maze has no cycles, it is a tree therefore we can transform it into a rooted tree by choosing A as our root node. On a rooted tree we can apply both breadth-first and depth-first searching algorithms in order to fully explore the whole of the maze.

29

A
1 2 4

A
1 3 6 2

B
5

B
4 6

F
8

3 9

F
8

H
1 0

D
1 1

H
1 0

J
1 2

I
1 3

J
1 2

L
1 4

1 1

L
1 4

1 3

5 Breadth-first Depth-first O P O P search search In breadth-first searching we explore all the undiscovered neighbours of our root, and then all the undiscovered neighbours of these nodes until all the nodes are fully explored. In depth-first searching we explore all the neighbours to the left until we find a leaf, then we backtrack to that leafs root and explore to the right.

1 5

Left subtree

+ * A C E / G

Right subtree

Each node in a binary tree is the root node of a sub-tree in its own right. If we consider a branch connecting "A" and another node then "A" will be the root of its own sub-tree with that node. Whether this node is on the left or right of "A", we will still need to explore this sub-tree as if there were a node on the other side. There are three different ways of traversing a binary tree (a tree where each node only has two leaves).

30

1 2

+ * / C
4

A
3

E
6

G
7

Result: +*AC/EG
7 3

Pre-order Traversal Visit the root Traverse the left sub-tree in pre-order Traverse the right sub-tree in preorder Our recursive definition tells us that we should always pick up the value of the root of a sub-tree before we explore that subtree. We then always go to the left first. Post-order Traversal Traverse the left sub-tree in postorder Traverse the right sub-tree in postorder This is Visit effectively the same as pre-order but the root flipped. But remember, the left is always visited before the right. In-order Traversal Traverse the left sub-tree in in-order Visit the root Traverse the right sub-tree in in-order

+ * / C
2

A
1

E
4

G
5

Result: AC*EG/+
4 2

+ * / C
3

A
1

E
5

G
7

Result: A*C+E/G

For in-order traversal, the values from the leaves are always separated by the values from the nodes.

Storing binary trees


Given that any node on a binary tree only has left or right leaf connected to it, we can use a standardised data type to represent each node in a binary tree. All we need to achieve this is to write a class definition like so (written in Python): classNodeType: __int__(left, item, right): LeftPointer = left Item = item RightPointer = right We can then create a list of objects (or indeed a list of records) where the index corresponds to the index of that node.

31

1
LeftPoint er

Index

RightPoint er

2 4 + 5
Item

3 6 7

4 0 z 0 0

5 6 0

Null pointer

6 0 y 0 0

7 3 0

Above is an abstraction of a tree to show the information that we have to represent and store. You will notice that each leaf has a left and right pointer with a value of 0. This is a null pointer since there is nothing to the left or right of these leaves.

Root node pointer

12*324+536-740 z0506060y07030

Using this method of storing the tree, it is then possible to apply any of the aforementioned algorithms in order to traverse the tree.

3.2.8 Searching and sorting

Linear search - this search method starts at the beginning of the list and compares each element in turn with the required value until a match is found or the end of the list is reached. Bubble sort - during a pass through the list, neighbouring values are compare and swapped if they are not in the correct order. Several passes are made until one pass does not require any further swaps.

Bubble sort
Bubble sort was covered in the AS specification but it is important to remember exactly how it works and to recognise the complexity of this sorting algorithm. First, we should consider a list which has been sorted into reverse alphabetical order. This was actually a mistake and we would now like to take this sorted list and re-sort it into alphabetical order. This is an example of the worst case scenario for the bubble sort.

32

Zebr a

Yak

Tiger

Snake

Sloth

Mous e

Lion

Cow

Bird

Ant

Pass # (Swa ps) 1 (9) 2 (8) 3 (7) 4 (6) 5 (5) 6 (4) 7 (3) 8 (2) 9 (1) 10 (0)

Yak Tiger Snak e Sloth Mous e Lion Cow Bird Ant Ant

Tiger Snak e Sloth Mous e Lion Cow Bird Ant Bird Bird

Snake Sloth Mouse Lion Cow Bird Ant Cow Cow Cow

Sloth Mous e Lion Cow Bird Ant Lion Lion Lion Lion

Mous e Lion Cow Bird Ant Mous e Mous e Mous e Mous e Mous e

Lion Cow Bird Ant Slot h Slot h Slot h Slot h Slot h Slot h

Cow Bird Ant Snake Snake Snake Snake Snake Snake Snake

Bird Ant Tige r Tige r Tige r Tige r Tige r Tige r Tige r Tige r

An t Ya k Ya k Ya k Ya k Ya k Ya k Ya k Ya k Ya k

Zebr a Zebr a Zebr a Zebr a Zebr a Zebr a Zebr a Zebr a Zebr a Zebr a

Here we have the bubble sort results for each pass of the algorithm. Since in each instance the first item is always higher in alphabetical value than all the others, it will feature as part of every comparison in each pass. Since we can have up to n - 1 swaps in up to n - 1 passes (the last pass just ensures that there will be no more swaps) we can say that the bubble sort has O( n2).

Searching
Linear search is the most straightforward of search algorithms. Given a list of length n, each item will be looked at from the start of the list until the desired item is either found, or the end of the list is reached. In the worst case scenario, we may have to look at all n items therefore the algorithm is of O(n). If we have a sorted list we can use the more efficient binary search. In binary search, we look at the middle term of the list and compare it with the item that we are looking for. We then reject one half of the list based on whether the item we are looking for would be higher or lower in the list. Below we are searching for Dave in the list.

33

1Ant2Bird3Cow4Li on5Mouse6Sloth7 Snake8Tiger9Yak1 0Zebra

1Ant2Bird3Cow4Li on5Mouse

(1+5)/2 = 3 therefore we should look at the 3rd item. Cow < Dave so we reject items 4-5. (4+5)/2 = 4.5 therefore should look at the 5th item. Mouse > Dave so we reject item 5. Only item 4 remains. However, Lion Dave therefore Dave is not in the list.

4Lion5Mouse

4Lion (10 + 1)/2 = 5.5 therefore we should look at the 6th item in the list. Sloth > Dave so we reject items 6-10.

This is an example of the worst case scenario for a binary search since the item is in fact not in the list. We have made 4 comparisons for a 10 item list, this makes the order of complexity for the binary search O(log 2n).

Insertion sort
For an insertion sort algorithm part. We take the first item in sorted part. We then compare sorted part and insert them appropriate. we divide a list into a sorted part and an unsorted the list that we want to sort and insert it into our each item in the unsorted part to the items in the where they fit, rearranging the other items as

1Snake 2Yak3Li on4Zebr a5Tiger 6Ant7Sl oth

1Snake 2Yak3Li on4Zebr a5Tiger 6Ant7Sl oth

1Lion2S nake3Y ak4Zebr a5Tiger 6Ant7Sl oth

1Lion2S nake3Y ak4Zebr a5Tiger 6Ant7Sl oth

1Lion2S nake3Ti ger 4Yak5Z ebra6An t7Sloth

1Ant2Li on3Sna ke4Tige r 5Yak6Z ebra7Sl oth

1Ant2Li on3Slot h4Snak e5Tiger 6Yak7Z ebra

Quicksort
In a quicksort we split a list into two sub-lists with a "pivot" acting as the value that all items within that sub-list are compared to. By continually splitting the list in two into smaller sub-lists and selecting the middle value as the pivot, we are able to sort an entire list. 54 23 15 74 19 22 14 3 11 64 27 35

Above is a list of 12 items. We should choose the 7 th item of the list as our pivot and compare each value either side. We want to sort this list into ascending numerical order so we should place all the numbers of lesser value to the left and all the numbers of greater value to the right. Make sure not to reorder any of the values.

34

11

14

54

23

15

74

19

22

64

27

35

We then select new pivots in the two sub-lists that have been created by splitting about our initial pivot. 3 3 3 3 3 3 11 11 11 11 11 11 14 14 14 14 14 14 15 15 15 15 15 15 19 19 19 19 19 19 54 22 22 22 22 22 23 54 54 23 23 23 74 23 23 27 27 27 22 74 27 54 35 35 64 64 35 35 54 54 27 27 64 64 64 64 35 35 74 74 74 74

Only once all the items have themselves been pivots can we conclusively say that the list has been sorted into a correct order. The quicksort algorithm is an example of recursive programming at its best. However, its worst case scenario is a complexity of O( n2). This makes it seem as if quicksort is only as efficient as bubble sort. Nevertheless, quicksort has a much better average case complexity.

3.2.9 Simulations

Model - an abstraction of an entity in a real world or in the problem that enables an automated solution. The abstraction is a representation of the problem that leaves out unnecessary detail. State history - consists of state descriptions at each of a chronological succession of instants. Entities - the components that make up a system. Attributes - a property of an object, e.g. an object car has attributes make, Simulation is the imitation of a process of a real system. An example of the purpose of a simulation comes in the form of trying to understand the effect of a new supermarket being built, which means a reconfiguration of the current road. Although many of the proposed designs will never be realised, they represent the simulated effects of if they were to be carried out. Consider a queue. A queue consists of three entities: Customer Queue Server The customer could be waiting in the queue -- this is therefore one of its attributes. The state of a system at any instant is determined by where the entities are, what they are doing and their attributes. A succession of recorded new identifiable states results in a state history. Some possible states of a queue are: 35

nobody in the queue, server waiting nobody in the queue, a customer being served customers in the queue, a customer being served. Some possible events in this system are: a customer arriving end of serving. Some possible activities are: the serving of a customer the time between customers arriving. If we consider a queue system where a customer arrives at the end of every 3 minutes and serving takes 2 minutes, running a simulation for 20 minutes using time-driven simulation will give the following hand simulation.

Maste r clock 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Customer arriving Customer1 Customer2 Customer3 Customer4 Customer5 Customer6

Customer being served Customer1 Customer1 Customer1 Customer1 Customer2 Customer2 Customer2 Customer2 Customer3 Customer3 Customer3 Customer3 Customer4 Customer4 Customer4 Customer4 Customer5

Customers in queue 0 0 0 0 0 1 1 0 1 1 1 1 1 1 2 1 1 2 2 1

Minutes in queue for all customer 0 0 0 0 1 1 1 2 3 3 4 5 6 7 8 9 11 12

Server status Idle Idle Idle Serving Serving Serving Serving Serving Serving Serving Serving Serving Serving Serving Serving Serving Serving Serving Serving Serving

Each state represents the end of a minute. Since a customer only arrives at the end of every three minutes, the first three ticks of the master clock see no activity besides the server being at an idle state.

3.3 REAL NUMBERS


3.3.1 Real numbers

36

Real number - a number with a fractional part. Significant digits - those digits that carry meaning contributing to the accuracy of a number. This includes all digits except leading and trailing zeroes where they serve merely as placeholders to indicate the scale of the number. Floating-point notation - a real number represented by a sign, some significant digits, and a power of 2. Precision - the maximum number of significant digits that can be represented. Absolute error - the difference between the actual number and the nearest representable number. Relative error - the absolute error divide by the actual number. Underflow - the value is too small to be represented using the available number of bits. Overflow - the value is too large to be represented using the available number of bits. In the AS specification, we dealt only with fixed point numbers. This meant that there was a fixed way of representing a real number with a decimal point after a certain number of bits. In floating-point notation, real numbers are represented in the following way: a sign, some significant digits expressed as a number with a fractional part, and an integer power of two. Some examples are: 4.6 x 26, -3.12 x 25, 6.2 x 2-3. This gives us a general form of m x 2e where the significant digits are called the mantissa ( m) and the power of 2 is called the exponent ( e). In the exam, you will have to convert from a normalised twos complement floating-point number to its denary real number equivalent and vice versa. In the following format we have a number of size 16 bits with the 10 most significant bits reserved for the mantissa and 6 bits reserved for the mantissa. 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0

This represents the smallest positive normalised value. It is normalised since the first two bits are of opposite polarity i.e. 01 or 10. The exponent value = -32. This therefore makes adjusted mantissa -33 0.0000000000000000000000000000000012 = 2 .

37

Worked example
Using the above format of normalised floating point representation, convert -23.375 into binary. First, assess the number. We need a -32 which is equal to -2 5 since it is the nearest larger power of 2 to 23. Then 32 - 23.625 = 8.625, so we can break -23.375 down to -32 + 8 + + 1/8 = -25 + 23 + 2-1 + 2-3 = 101000.1012. 1010001010 Now that weve filled in our mantissa, we need to calculate the exponent. Our implied decimal point is 7 places to the left of our desired decimal place therefore our exponent should be 7 = 4 + 2 + 1 = 000111 2. 101000101000011 1

Why normalise?
Maximises precision and accuracy for a given number of bits. Creates a unique representation for every number (allowing equality to be checked more simply). The length of the mantissa increases the precision. The length of the exponent increases the range.

Underflow and overflow


Dividing a very small number by a non-decimal number may make a value too small for it to be represented by a given number of bits. In this case, it will be stored as zero. This is known as underflow. Multiplying two large numbers together may make a value too large for it to be represented by a given number of bits. This is known as overflow.

3.4 OPERATING SYSTEMS


3.4.1 Role of an operating system

System program - a program that manages the operation of a computer. Operating system - the software that supports a computer's basic functions, such as scheduling tasks, executing applications, and controlling peripherals. Virtual machine - the apparent machine that the operating system presents to the user, achieved by hiding the complexities of the hardware behind layers of operating system software. Application programming interface - a layer of software that allows application programs to call on the services of the operating system.

38

Computer software can be divided into system programs, which manage the operation of the computer, and application programs, which solve problems for their users. The most fundamental of all the system programs is the operating system. An operating system has these roles: Hide the complexities of the hardware from the user. Manage the hardware resources to give orderly and controlled allocation of the processors, memories and input/ouput (I/O) devices among the various programs competing for them, and manage data and storage.

Managing resources
In a general-purpose computer, one purpose of an operating system is to manage the hardware so that a satisfactor performance is achieved. The operating system programs may be classified according to the resources they manage.

Key resource
Processors Storage Input/output devices Data

OS program
Processor scheduling Memory management I/O management File management

The following are examples of the functions that an operating system has to be able to do: Allocating a processor 'time slot' for each programming task that is running. Managing the priorities for each program task that is running. Allocating and keeping track of the memory used for storing programs and data. Managing the transfer of databetween memory and storage. Handling input operations from the user and from other input devices. Handling output operations. Managing the system security.

39

Virtual machine

Application programming interface I/O management File management Memory management Processor management Device Drivers Kernel Hardware An operating system hides from the user all the details of how the hardware works so that the useris presented with a virtual machine which is easier to use. More These details are progressively hidden by placing layers of software on complex top of the hardware.
layers.

User interface

Application programming interface A standard application programming interface (API) allows a software developer to write an application on one computer and have a high degree of confidence to that it will run on another computer of the same type, even if the other computer has a different specification.

User interface
Command line interface In a command line interface (CLI), a user responds to a prompt to enter commands by typing a single command word, followed by zero or more paramaters on a single line, before pressing the enter key. An example of such a command is ipconfig. Graphical user interface A graphical user interface (GUI) is made up of windows. One window has the focus at any moment. GUIs are event-driven with events being mouse button clicks, key presses or mouse movements. The operating system detects an even and correlates it with the current mouse position and the window currently in focus, in order to select an action to carry out.

40

3.4.2 Operating system classification

Note: The notes on this section contain a lot of detail, mainly familiarise yourself with the key terms. Interactive operating system - an operating system in which the user and the computer are in direct two-way communication. Real-time operating system - inputs are processed in a timely manner so that the output can affect the source of inputs. Network operating system - a layer of software is added to the operating system of a computer connected to the network. This layer intercepts commands that reference resources elsewhere on the network, e.g. a file server, then redirects in a manner completely transparent to the user. Sandbox - a tightly controlled set of resources for guest programs to run in. Embedded computer system - a dedicated computer system with limited or non-existent user interface and designed to operate largely or completely autonomously from within other machinery. Desktop operating system - an operating system that allows a user to carry out a broad range of general-purpose tasks. Client-server system - a system in which some computers, the clients, request services provided by other computers, the servers. Server operating system - an operating system optimised to provide one or more specialised services to networked clients.

Server operating system


A server operating system is an operating system optimisedto provide one or more specialisedservices to networked clients such as: file storage, domain control, running applications. Since they are specialised, their performance is optimal as little generalpurpose processing is needed.

Desktop operating system


All desktop computers have operating systems which must support a broad range of general-purpose tasks. Examples of operating systems are the Windows family by Microsoft, the Macintosh family by Apple and the UNIX/Linux family developed by collaborators. They are very sophisticated since they have to deal with many types of hardware and software. o Modern PCs have large main memory capacities, multiple processors, huge disk storage capacities, various types of optical disks, and flash memory drives. They have real-time requirements for multimedia applications. They must support a wide range of network protocols. They are written in a layered or modular fashion so that

41

they can be updated easily. If they are found vulnerable to a security threat, then an update can be deployed to counter the threat. They can support sophisticated GUIs. o However, their memory footprint is very large and load times can be significant. They provide a virtual machine which allows the user to perform tasks more easily than if they had to interact directly with the hardware. Desktop operating systems often act as the client operating system in a client-server system. o In a client-server system some computers, the clients, request services provided by other computers, the servers. o

Embedded computer systems


Embedded computer systems are those which are embedded in machinery other than PC, such as cars, telephones, appliances and peripherals for computer systems. Today's motor cars may have 12 or more embedded computer systems. These systems o have a dedicated purpose o have a limited or non-existent user interface o are designed to operate completely or largely autonomously within other machinery (e.g. an engine management system) o have a limited memory capacity. As embedded systems have become more complex and have developed more features, their applications increasingly require an operating system to keep the development timereasonable and to manage multiple tasks or threads that need to meet specific time constraints. o A low-level piece of code switches between tasks or threads based on a timer (connected to an interrupt). o At this level a system is considered to have an operating system kernel. Any code can potentially damage the data of another task. Therefore o programs must be carefully designed and tested. o access to shared data must be controlled by some synchronisation strategy. Operating systems for embedded systems are designed to work with the constraints of o limited memory size o limited processor performance If the system is portable, the operating system must also take account of limited battery life. In addition to core operating system, many of these systems have additional upper-layersoftware components. o These consist of networking protocol stacks such as TCP/IP, FTP, HTTP and HTTPS. o They also include storage capabilities such as flash memory management systems. o If the embedded device has audio and video capabilities, then the appropriate drivers and codecs will be present in the system.

Device
42

The way a device works can usually be changed by altering the code of the operating system. o This means that the physical circuits of the device need not be rewired when new functionality is required but the code of the operating system can be rewritten. The OS code should be layered or modular with clear interfaces between these layers. This makes changing the interface as simple as altering the interface layer or module. Not all computer having operating systems. o A washing machine uses an example of such a computer. The input of this computer is simple (since all settings are preset), the process to be performed is equally as simple, and it is not necessary for the computer to complete more than one task at a time. In this case including an operating system would add complexity where none is required which would increase the development and manufacturing costs. The computer in this case would run a single firmware program all the time. Computer-operated devices which have to carry out more than one task benefit from these things that an operating system allows: o The device can multi-task. o The device can operate in real time with critical timing constraints observed, if required. o The hardware can be changedorupgraded without the need to change application code that runs on the hardware. o New applications can be added fairly easily. o Changes to basic functionality can be achieved by upgrading operating system code that runs on the hardware. o Applications can be developed in situ on the device or can be easily installed if developed on a more powerful machine. o The entire OS can be replaced by a different OS where the new OS allows a much greater range of software changes to be made. o Open Source operating systems can be used. The source code for an Open Source OS is available therefore applications that will work on devices running this OS can be designed easily. Operating systems for mobile devices need to consider the kind of resources available to these devices. For example, o the amount of energy provided o the amount of memory available Smartphones A smartphone is a mobile phone that offers advanced capabilities beyond a typical mobile phone, often with PC-like functionality. This means running complete operating system software, which provides a standardised interface and platform for application developers. Regular mobile phones typical only support sandboxes applications. o A sandbox is a tightly controlled set of resources for guest programs to run in, such as scratch space on disk and memory. o Network access and the ability to inspect the host system or read from input devices are usually disallowed or heavily restricted in sandboxed systems.

43

Applications for smartphones may be developed by anyone (including the manufacturer of the device, the network operator or any other third-party software developer) since the operating system is open.

Personal digital assistants A personal digital assistant (PDA) is a hand-held portable computer that can accomplish quite specific tasks and can take on the role of a personal assistant. PDA functionality has recently been included in smartphones therefore the sales of PDAs have declined. The operating system of a PDA takes on the tasks of basic input/output system (BIOS) and has to be designed to run on processors with low clock frequency and a main memory of limited capacity. The OS must use various techniques to save energy and must cater for short reaction times.

Real-time operating system


In a real-time operating system inputs are processed in a timely manner so that the output can affect the source of the inputs. Real-time operating systems are characterised by four requirements: 1. They have to support application programs which are non-sequential in nature, i.e. programs which do not have a START-PROCESS-END structure. 2. They have to deal with a number of events which happen in parallel and at unpredictable moments. 3. They have to carry out processing and produce a response within a specific time interval. 4. Some systems are safety-critical, so they must be fail-safe and guarantee a response within a specified time interval.

Examples: Airline reservation system: up to 1000 messages per second can arrive from any one of 11000 to 12000 terminals situation all over the world. The response time must be less than 3 seconds. Process control system: up to 1000 signals per second can arrive from sensors attached to the system being controlled. The response time must be less than 0.001 second. Real-time operating systems that are used to control machinery typically have limited user-interface capability and no end-user utilities. They must perform a task quickly whenever signalled to do so in a specific amount of time. Some RTOS manage the resources of the computer so that a particular operation executes in precisely the same amount of time every time it occurs. In a complex machine, this can be catastrophic and unnecessary.

Network operating system


In a network operating system, a layer of software is added to the operating system of a computer connected to the network. This layer intercepts commands that reference resources elsewhere on the network, e.g. a file server. The network layer then redirects the request to the remote resourcein a manner completely transparent to the user. In this way, files which reside on 44

a server are available to the client computer, exactly as if they resided on that client computer's system. Remove drives (perhaps N: and P: ) are usually available to all client computers connected to a network.

Exam note: Operating systems are a good basis for an extended answer question. Consider the differences between certain types of operating systems. A question on the definition of an operating system could equally be a short (2 mark) answer question or a longer (4-6 mark) answer question.

45

3.5 DATABASES
3.5.1Conceptual data modelling

Database - a structured collection of data. Database management system - a software system that enables the definition, creation and maintenance of a database and which provides controlled access to this database. Data model - a method of describing the data, its structure, the way it is interrelated and the constraints that apply to it for a given system or organisation. Conceptual model - a representation of the data requirements of an organisation constructed in a way that is independent of any software used to construct the database. Entity - an object, person, event or thing of interest to an organisation and about which data is recorded. Relationship - an association or link because two entities. Degree of relationshipbetween two entities - the number of entity Relationships between entities can be represented using Entity-Relationship diagrams. See the example on the next page. Here is a scenario which involves a college enrolling students for AS and A2 courses: Each course is assigned a unique course code and has a course name. Each student is assigned a unique student ID and has their name, address and date of birth recorded. Each student enrols on one or more courses. The students enrolled on a course will be assigned to one of several sets taught by different teachers. Teachers are assigned unique initials. If we want to model the three entities of "Set", "Student" and "Course", we get the below diagram.
belongs to

Many-tomany relationship

is assigned to

Set

One-to-one relationship

Student

is enrolled on

Course

A many-to-many relation is not clear, most of the time such a relation can be made a lot clearerby adding a link table.We can analyse the many-to-many relationship between the "Student" and "Courses" entities and decide to add an "Enrolment" entity.

46

Many-tomany relationship

is assigned to

Set

belongs to

One-to-one relationship

Student
makes

is enrolled on

Course
for

Enrolment

Husband

Wife

One-to-one
One husband has one wife (conventionally).

Area

Resident

One-to-many
One area has many residents.

Newspaper

Reader

Many-to-many
One reader reads many newspapers. One newspaper has many readers.

3.5.2Database design

Relation - a set of attributes and tuples, modelling an entity (a table). Attribute - a property or characteristic of an entity (a named column in a table). Tuple - a set of attribute variables (a row in a table). Primary key - an attribute which uniquely identifies a tuple. Relational database - a collection of tables. Composite key - a combination of attributes that uniquely identify a tuple. Foreign key - an attribute in one table that is a primary key in another table. Referential integrity - if a value appears in a foreign key in one table, it must also appear in the primary key in another table. Normalised entities - a set of entities that contain no redundant data. Normalisation - a technique used to produce a set of normalised entities.

A relation consists of a heading and a body. A heading is a set of attributes. A body is a set of tuples. A relation must have an identifier, an attribute that uniquely identifies a tuple.

Normalisation
The important thing about database design is that the correct attributes are groups into the correct tables in order to minimise duplication of data. If there is no duplicated data, the possibility for inconsistencies will be eliminated.

47

OnlineOrder(OrderNumber, CustomerID, DeliveryAddress, OrderDate, ItemCode, Description, OrderQuantity, UnitPrice) Order Numb er
01236 7

EmailAddress,

Custom er ID
BLF1

Delivery Address
Fred Bloggs 1, High Street Anytown Joe Smith 7, The Lane Anytown

Email Address
FredBlogg s @NT.co.u k JoeSmith @ NT.co.uk

OrderDa te
01/05/09

Ite m Cod e
123 4 896 7 345 6 968 4 345 6

Descripti on
Ring binder Divider Stapler Scissors Stapler

Order Quanti ty
3 4 1 2 4

Unit Pric e
1.50 0.50 2.99 1.99 2.99

03423 1

SMJ2

03/05/09

1NF: atomic data test Given a table that has a primary key; it is in first normal form (1NF) if all of the data values are atomic values. That is, the table does not contain repeating groups of attributes. To put a table into first normal form, we move any repeating attributes, with a copy of the primary key, to a separate table. OnlineOrder(OrderNumber, OrderDate) Order Customer ID Number
012367 034231 BLF1 SMJ2

CustomerID,

DeliveryAddress, Email Address

EmailAddress, OrderDate
01/05/09 03/05/09

Delivery Address
Fred Bloggs 1, High Street Anytown Joe Smith 7, The Lane Anytown

FredBloggs@NT.c o.uk JoeSmith@NT.co. uk

ItemOrder(OrderNumber, ItemCode, Description, OrderQuantity, UnitPrice) Order Item Code Description Order Unit Price Number Quantity
012367 012367 012367 034231 034231 1234 8967 3456 9684 3456 Ring binder Divider Stapler Scissors Stapler 3 4 1 2 4 1.50 0.50 2.99 1.99 2.99

OrderNumber is now not sufficient to be a primary key since it does not act as a unique identifier for a tuple. We therefore have to create a composite keyincluding ItemCode.

48

2NF: partial key dependence test A table is in second normal form (2NF) if it is in first normal form and contains no partial key dependencies. This means that if some parts of the row depend only on one half of the composite key, we should move these parts into a new table with the relevant primary key. ItemOrder(OrderNumber, ItemCode, OrderQuantity) Order Item Order Number Code Quantity
012367 012367 012367 034231 034231 1234 8967 3456 9684 3456 3 4 1 2 4

Item(ItemCode, UnitPrice) Item Descripti Code on


1234 8967 3456 9684 Ring binder Divider Stapler Scissors

Description, Unit Price


1.50 0.50 2.99 1.99

3NF: non-key dependence test A table is in third normal form (3NF) if it is in second normal form and contains no non-key dependencies.
CustomerID is a foreign key.

OnlineOrder(OrderNumber, CustomerID, OrderDate) Order Number Customer ID OrderDate


012367 034231 BLF1 SMJ2 01/05/09 03/05/09

Customer(CustomerID, DeliveryAddress, EmailAddress) Customer ID


BLF1

Delivery Address
Fred Bloggs 1, High Street Anytown Joe Smith 7, The Lane Anytown

Email Address
FredBloggs@NT.co.uk

SMJ2

JoeSmith @NT.co.uk

By splitting the original OnlineOrder table into two tables containing only the required information, we have successfully created a fully normalised database. A database is fully normalised if every attribute is a fact about the key, the whole key, and nothing but the key (so help me Codd).

49

3.5.2 Structured Query Language (SQL) DDL


DDL is used to create a database structure; that is, to define which attributes belong in which tables. It also allows you to create users and grant access rights to users. For the exam, you will only need to know how to create tables using DDL. The following notation is used: CREATE TABLE <tablename> ( <fieldname1><type1>, <fieldname2><type2>, ) e.g. For Item table. CREATE TABLE Item ( ItemCode INTEGER KEY, Description TEXT, UnitPrice FLOAT )

PRIMARY

Note that the end of every line, excluding the final line, is followed by a comma. You must specify which field is the primary key and which type each field is. There are many types which would be appropriate. Field type INTEGER or INT TEXT or VARCHAR FLOAT or REAL DATE BOOLEAN Purpose Stores integer types. Stores text strings. Stores real number types. Stores dates. Stores Boolean values (true or false).

You can specify the length or precision of data types using brackets e.g. Description VARCHAR(200). For a table with a composite key, we can use the following syntax: CREATE TABLE ItemOrder ( OrderNumberINTEGER(6), ItemCodeINTEGER(4), OrderQuantity INTEGER, PRIMARY KEY(OrderNumber, ItemCode) )

50

DML
After having created a database and tables, DML commands can be used to query and manipulate the tables. For the exam you will need to know how to select from, insert into, update and delete values in a database.

Querying a database SELECT <fieldnames> FROM <tables> WHERE <conditions> ORDER BY <fieldnames>

e.g. SELECT Decription, Unit Price, OrderQuantity FROM Item, ItemOrder WHERE Item.ItemCode = ItemOrder.ItemCode AND ItemOrder.OrderNumber = OnlineOrder.OrderNumber AND CustomerID = "BLF1" ASC for ascending, ORDER BY ItemCode ASC DESC for descending. Our example code will provide us with the following table: Description Unit Price Order Quantity
Ring binder Stapler Divider 1.50 2.99 0.50 3 1 4

Syntax points: No symbols are required for numbers. Quotation marks are required for strings - either double or single quotes. Single or double quotes or hashes are used for dates. Inserting, updating and deleting data 1 INSERT INTO <tablename> INSERT INTO OnlineOrder VALUES ( <listofvalues> ) Values (034931, SMJ2, 13/05/09) 2 UPDATE <tablename> SET <newvalues> WHERE <conditions> UPDATE Customer SET EmailAddress = "FredBloggs@RealComputing.com" WHERE CustomerID = "BLF1" DELETE FROM ItemOrder WHERE OrderNumber = 012367 AND ItemCode = 8967

3 DELETE FROM <tablename> WHERE <conditions>

1. Inserts a new order into the online order table. 2. Changes customer BLF1's email "FredBloggs@RealComputing.com". 3. Deletes an item from order number 012367.

address

to

Exam note:Writing SQL is an inevitable part of your exam so know it well. Experiment with setting up your own databases and selecting fields from different tables in one query. Instead of using a where statement, you may want to use an inner join. This will accomplish the same task but is not necessary for you to get the marks. Stick with what you know well. 51

3.6 COMMUNICATION AND NETWORKING


3.6.1 Communication Methods

Data transmission - movement of data from one place to another. Serial data transmission - single bits are sent one after another along a single wire. Parallel data transmission - bits are sent down several wires simultaneously.

Data transmission
Data transmission occurs between a transmitter and receiver over some transition medium (often called a communication channel) which can either be guided or unguided. The data to be transmitted is encoded as electromagnetic signals. Guided communication channels are physical cables e.g. twisted pairs, coaxial cables and optic fibres. Unguided communication channels are media such as air, vacuums and sea water via radio waves. In both guided and unguided transmission media, the transmitted signal will decrease in strength with distance.

Serial data transmission


Serial data transmission is used for long-distance communication since it is easier to regenerate a single signal. Serial data transmission makes it easier to route signals through telecommunication switches. It also saves on the cost of cabling.

Parallel data transmission


COMPUTER

Ready/bu Strobe sy

PRINTER

Data

Return

Parallel data transmission was used for printers before the arrival of USB connections. When the printer was ready to receive information it would set the ready/busy wire to "ready". The computer would then send a signal down the strobe wire to alert the printer that data was going to be sent. The ready/busy 52

wire would be simultaneously set to "busy" and data would be sent down the 8 data wires. Parallel data transmission is used over short distances because if the resistance in the wires differs,signals can arrive at different times leading them to be read incorrectly; a problem called skew. In addition, it would be expensive to have 8 data wires stretched over a long distance. For this reason, parallel data transmission is restricted to computer-to-printer connections and computer buses.

3.6.2Baud, bit rate, etc.

Baud rate - the rate at which signals on a wire or line may change. 1 baud - one signal change per second. Bit rate - the number of bits transmitted per second. Bandwidth - the range of signal frequencies that a transmission medium may transmit. Latency - the time delay between the moment something is initiated and the moment its first effect begins.

Baud rate vs. bit rate


Baud rate and bit rate may appear to be essentially the same thing. This is true for transmissions which only use two voltage levels. The below example shows a bit rate of 2 bits per second and a baud rate of 1 baud.

Signal level (volts)


0 2.5 5 7.5

Decim al numb er
0 1 2 3

Binar y numb er
00 01 10 11

V 10
10

01 10

11

00

10

11

Bandwidth
It is important to note that the range of signal frequencies which constitute a medium's bandwidth must not undergo a significant reduction in strength from one end of the wire to the other. Bandwidth is measured in hertz (Hz) which is a unit of frequency equal to the number cycle per second. There is a direct relationship between bit rate and bandwidth since the greater the bandwidth, the higher the bit rate that can be transmitted. If the data rate of the digital signal is W bits per second then a very good representation can be achieved with a bandwidth of 2 W Hz.

3.6.3 Asynchronous data transmission

53

* *

Asynchronous serial data transmission - the arrival of data cannot be predicted by the receiver; so a start bit is used to signal the arrival of data and to synchronise the transmitter and receiver temporarily. Communication protocol - a set of pre-agreed signals, codes and rules to be used for data and information exchange between computers, or a computer and a peripheral device such as a printer, that ensure that the communication is successful. Handshaking protocol - the sending and receiving devices exchange signals to establish that the receiving device is connected and ready to receive. Then the sending device coordinates the sending of the data, informing the receiver that it is sending. Finally, the receiver indicates it has received the data and is ready to receive again.

Computer A

idle stop parit y

LSB

idle state

Computer B

MS B

star t

An example of the necessity of asynchronous data transmission is when typing on a keyboard. Keys are not pressed at precise and regular intervals therefore the receiving system does not know when to expect transmissions from the transmitting system. In order to synchronise the two systems, each binary word sent from the transmitting system must begin with a start bit and end with a stop bit. These two bits must be of opposite polarity so that the receiver can recognise when the next packet is being sent. Between the start and stop bits the two systems will be perfectly in synch therefore the receiving system will be able to read each bit. Data transmission requires adding a parity bit to the start of the binary word. If even parity is agreed on then there must be an even number of 1's. Similarly, if odd parity is agreed on then there must be an odd number of 1's. The receiver relies on parity to detect errors so that replacement bytes can be requested.

Handshaking
The below table summarises the handshaking protocol which is used by COM1 serial ports.

54

Clear to send (P pin 8) Request to send (C pin 7) Here it is Start bit Busy Clear to send (P pin 8) That's it Stop bit I'm ready Clear to send (P pin again 8) The example uses C to represent a computer and P to represent a printer. 1. The computer checks the voltage on its request to send (RTS) pin when it wishes to send. 2. If the printer is ready to receive, the computer starts sending. 3. When the printer receives the start bit, it sets its clear to send (CTS) pin to busy. 4. When the printer receives the stop bit, it sets its CTS pin to ready. This signals to the computer that it can send the next byte.

Are you ready? Yes I am

3.6.4 Baseband and broadband

Baseband system - a system that uses a single data channel in which the whole bandwidth of the transmission medium is dedicated to one data channel at a time. Broadband system - a multiple data channel system in which the bandwidth of the transmission medium carries several data streams at the same time. A LAN can operate in baseband mode. In this instance, the whole bandwidth of the transmission medium is dedicated temporarily to one sending station and one receiving station (one data channel). This means that data channels must take it in turns to use the bandwidth. Baseband systems are used over short distances such as LANs, where they offer high performance at low cost. WANs use broadband media and operate in broadband mode so that two or more data streams may be carried at the same time. Several data channels are combined onto a carrier signal so that the bandwidth of the transmission medium can be shared by several data channels. Since longdistance communication media are expensive to install it would be wasteful to only allow control to one data channel at a time. For this reason, broadband systems are used for long-distance communication.

3.6.5 Networks

Local area network - linked computers in close proximity. Stand-alone computer - a computer that is not networked. It requires its own software and peripherals since it does not share any with other computers. Wide area network - a set of links that connect geographically remote computers and local area networks.

55

Local area network(LAN)


Local area networks emerged as a substitute for large mainframe computers. They connect a number of computers in a small geographic area (such as a single building) in order that they might share data and peripherals. Since LANs are small and have linked computers in close proximity, communication links offerhigher speeds and lower error rates than in WANs.

Wide area network(WAN)


Wide area networks (WANs) were invented to solve the problem of connecting a LAN to a distant workstation or another remote LAN. Since companies can often have branches all over a country it is necessary to share resources over long distances. Expressed simply, a wide area network is a set of connections between geographically remote local area networks. These connections may use one or more of the following: the public switched telephone network high-speed, high-bandwidth dedicated leased lines high-speed fibre-optic cable microwaves transmission links satellite links radio waves the Internet When two LANs are interconnected by a WAN so that computers or nodes on one network are able to communicate with computer or nodes on the other network, and vice versa, the two LANs are said to be internetworked or to form an internet. The Internet is the largest example of these.

Network adapter
In order for computers to be connected together as a LAN, each computer will require a network adapter or network interface card. The network card converts computer data into a form that can be transmitted over the network. Data that is received needs to be converted into a form that can be understood by the receiving computer. A network adapter receives data to be transmitted from the motherboard of a computer into an area of memory called a buffer. A checksum value is calculated for the block of data and address information (source and destination) is added. The block is now known as a frame. The network adapter transmits the frame one bit at a time onto the network cable.

56

Network adapter card

Checksu m

Data

Address information

The frame is transmitted serially, bit by bit, onto the network cable.

3.6.6 Network topologies

Topology - the shape, layout, configuration or structure of the connections that connect devices to a network.

Bu s

data flow

backbo ne

Bus topology

central computer or switch

Sta r

Bus topology is the most basic networking topology and the basis for modern Ethernet LANs. All devices share a common cable for connection known as the backbone. When a device wishes to share information with another device on the network it simply sends a broadcast message onto the backbone cable. This message can be seen by all other devices on the network but will only be received and processed by the intended recipient. Advantages: The amount of cable needed is minimal. It is easy to add and remove nodes without affecting the network as a whole. Disadvantages: Performance degrades as traffic increases and as the cable length increases. (Signal boosters can be used to combat deterioration over long distances.) A single fault at one workstation will cause the whole network to fail. If the cable is long then the fault will take a long time to isolate and repair. If two computers try tosend data at the same time the signals will collide and the bus will become unusable for the duration of the transmissions of both computers. To reduce this duration, each device is limited to one frame of pulses per transmission. A minimum number of pulses per frame is enforced so that it is possible to detect rises is pulse voltages caused by collisions. A computer can

57

therefore send any amount of pulses in a single frame between the minimum and maximum. If each connected computer follows a protocol when transmitting, it is possible to operate the bus system correctly even when collisions do occur. A commonly used bus protocol is Carrier Sense Multiple Access with Collision Detection (CSMA/CD). The rules for CSMA/CD are: 1. If the bus is quiet, transmit a frame. 2. If the bus is busy, continue to listen until the bus is idle then transmit immediately. 3. While transmitting, monitor the bus for a collision. If a collision is detected, transmit a brief jamming signal to let all computers know that there has been a collision then stop transmitting. 4. After transmitting the jamming signal, wait a random amount of time, then attempt to transmit again, starting from step 1.

Star topology
In start topology each workstation is connected to a central computer or switch which regenerates the data signals which it receives and transmits them to the destination device. All data is passed through the switch but only the intended recipient computer will receive and act on the data. Advantages: A cable failure on one branch of the network will only affect that branch. Adding workstations to the network or removing workstations from the network is simple and will not interfere with the rest of the network. Collisions will not occur if two computers transmit data at the same time. Workstations cannot intercept messages since all messages go through the central switch.

Disadvantages: For a greater number of machines, a greater amount of cable is required. The cost of cable can escalate with expense also in the disguising of large amounts of cable. A number of star topology networks can be connected to form a distributed star. Each switch is connected via switches with each star operating independently until a message needs to be sent to a node on another star.

3.6.7 Networks Part Two

58

Network segment - in Ethernet, a run of Ethernet cable to which a number of workstations are attached. Thin-client network - a network where all processing takes place in a central server; the clients are dumb terminals with little or no processing power or local hard disk storage. Peer-to-peer network - a network that has no dedicated servers. All computers are equal, so they are called peers. Server-based network - a network in which resources, security, administration and other functions are provided by dedicated servers.

Segmentation
Segmentation is one solution to congestion, which can occur when many workstations are transmitting data which may eventually collide, on an Ethernet bus network. Segmentation involves splitting a larger non-switched network into two or morenetwork segments linked by bridges or routers which ensure communication between segments is possible. Since fewer workstations are competing to transmit data, fewer collisions will occur. A bridge holds a table of Ethernet interface card addresses, one for each machine connected to the segments joined by the bridge. A router holds a table of IP (Internet Protocol) addresses.

Thin-client networking
In a thin-client network, all processing takes place in a central server; the workstations connected to the central server have very little processing power and no hard disk storage. The central server consists of: A file server which stores users files. An application server which runs applications programs such as word processors and drawing packages. A domain controller which validates users when they initiate a login session. User commands travel along the network and are executed in the central server which displays the result of its processing on a video monitor. NB: A thick-client network is just the opposite of a thin-client network. This means that all client workstations will have local processing power and will run their own applications.

Peer-to-peer networking
A pure peer-to-peer networkdoes not have the notion of clients or servers but only equal peer nodes that simultaneously function as clients and servers to the other nodes on the network. The user of each computer determines which resources can be shared but cannot decide which specific users can use which resources, i.e. if a directory is not shared with the whole peer-to-peer network then it will be only available on the computer of user whose directory it is. A peer-to-peer LAN is an appropriate choice when:

59

There are fewer than 10 users. The users are all located in the same area and the computers will be located at user desks. Security is not an issue , so users may act as their own administrators to plan their own security. The organisation and the network will have limited growth over the foreseeable future.

Peer-to-peer (P2P) operation in WANs such as the Internet is used to share files among a large number of users connected temporarily. P2P protocols such as BitTorrent are used on the Internet to distribute large files. After preparing the file by splitting it up into smaller pieces, a source will send a piece to each of the clients in the group. These clients will then in turn become sources so that the source only has to send each piece once. In this way, each client gets the rest of the file from other clients. Using the BitTorrent P2P protocol, each client is capable of preparing, request and transmitting any type of computer file over a

Server-based networking
Server-based networks in which resources, security, administration and other functions are provided by dedicated servers are used for large networks where peer-to-peer networks do not suffice. A dedicated server is a server optimised for its purpose that is not used as a client or workstation. Dedicated servers provide quick responses to requests from network clients and ensure the security of files and directories. Client computers are usually less powerful than server computers. A server stores a list of client user IDs and associated passwords so that it can authenticate users logging on at client workstations. Servers are used for services such as file storage and printing. Web servers and FTP servers are examples of server-based systems. In a school network, a central domain controller will typically store user accounts and a central file server will store users work and some applications that users download into the client machines they work at.

3.6.8 Routers and gateways

Router - a device that receives packets or datagrams from one host (computer) or router and uses the destination IP address the packets contain to pass them, correctly formatted, to another host (computer) or router. Gateway - a device used to connect networks using different protocols so that information can be passed from one system to another.

60

Routers
Routers are packet switches. The route chosen is determined by the destination IP address; in this route a datagram may pass through several routers before reaching its destination. Each router maintains a table of routes to various destinations; for example, a router in England will know which router to send a datagram to next if the destination IP address indicates a destination in Poland. A router will know about its sub-networks and which IP range they usebut it willnot know about its super-networks. NSP backbone
router router

NSP backbone
router

NAP Regional ISP backbone Local area network


router router

Regional ISP backbone Local area network

When a router receives a packet it will look at the destination address of the packet and determine whether the sub-network is within its range of IP addresses. If it is not in the routers table of routes it will send the packet on a predefined default route, usually up the hierarchy to the next router. The routers connected to the NSP backbones hold the largest routing tables and therefore will inevitably send the router to its correct destination.

Routable and non-routable IP addresses


An IP address defines where a host is on the Internet. The Internet Assigned Numbers Authority (IANA) is responsible for global coordination of the IP addressing systems and for routing Internet traffic. IP addresses are generally assigned in a hierarchical manner by ISPs.

61

IPv4 210.25.0.48 :

Each section can represent from 0 to 255 (a byte of data).

IPv6 2001:0cd8:56a7:0000:0000:3d8a:0170: : digit represents a 7554 Each


number up to 15 (a in Hexadecimal).

IPv4 addresses are 32-bit numbers expressed as 4 octets in dotted decimal notation. IPv6 addresses are 128-bit numbers expressed using hexadecimal strings. IANAs role is to allocate IP addresses from the pool of currently unallocated addresses to the RIRs responsible for overseeing the allocation and registration of Internet numbers within a particular region of the world.

RIPE NCC ARIN APNIC AfriNIC LACNIC

Routable (public) IP addresses Public or routable IP addresses are assigned by RIPE NCC in Europe. RIPE carefully manages the allocation of IP addresses and offer a WHOIS feature to look up the owners of IP addresses. Non-routable (private) IP addresses Non-routable IP addresses are used for home, office, school and college networks. These IP addresses are set aside to be use when it is neither necessary nor desirable to have a public IP address. They are especially useful where multiple computers are connected to a single proxy server, firewall or router. These IP addresses were originally allocated to delay IPv4 exhaustion and take the following IP ranges: 10.0.0.0 to 10.255.255.255 62

172.16.0.0 to 172.31.255.255 192.168.0.0 to 192.168.255.255

Connecting two LANs by routers

195.168.0. 19

210.5.0.24

195.168.0 .1 router

210.5.0.1 10 router

195.168.0.1 21

210.5.0.1

Network ID: 195.168.0.0

Network ID: 210.5.0.0

A packet is sent from a computer with IP address 195.168.0.19 to a computer with IP address 210.5.0.24. It will pass through two routers which each have two IP addresses since each router has two network cards. One network card connects the router to the LAN and the other connects the router to the other router.

Gateways
A gateway is a device used to connect networks using different protocols so that information can be passed from one system to another. LANs use a protocol that is very different from the protocol used on the Internet, which is a WAN. In this instance, a gateway will do the job of translating the LAN frame into its equivalent WAN frame or datagram, and vice versa. Sending a packet from one LAN to another over a WAN will require the packet to pass through two gateways. Both gateways have two network cards, one for the port used for the LAN and one for the port used for the WAN. Each of these cards is assigned an IP address.

LAN Details
When setting up a LAN there are certain details which have to be provided in order for the LAN to operate correctly. Subnet mask The subnet mask of a network defines its size. It also helps to tell a computer which LAN it is connected to, hence the addresses to which it can send packets directly and which packets it needs to send to the gateway. The standard subnet mask for computers on small LANs is 255.255.255.0. Gateway

63

The gateway, or router address, is the IP address of the machine that connects a computer to the next hop on the Internet. In a LAN using private addresses, this address will be the internal IP address of the machine that directs traffic between the LAN and the connection to the Internet. DNS servers A network of Domain Name System (DNS) servers keeps track of the assignment of IP addresses to website domains so that it is possible to receive these websites by typing their domain name into a web browser.

3.6.9Web services

Web 2.0 - software that becomes a service accessed over the internet. Web services - self-contained, modular applications that can be described, published, located and invoked over a network, generally the Web. Ajax - a web technology that allows the only part of a web page that needs updating to be fetched from the web server.

Web 2.0
Web 2.0 is a set of principles and practices that tie together a different approach to the use of the World Wide Web and the Internet. In Web 2.0, software becomes a service which customers pay for directly or indirectly that is accessed over the Internet. In general, Web 2.0 refers to the phenomenon of social media whereby users have the ability to publish and customise content unlike websites where users are merely passive viewers (consumers).

Web services
Web services are self-contained, modular applications that can be described, published, located and invoked over a network, generally the Web. Software as a service (SaaS) is a model of software deployment where an application is hosted as a service provided to customers across the Internet.

SaaSeliminates the need to install and run the application on the customers own computer. This means that the software vendor is now responsible for keeping the application up to date and the user does not have to download updates. This also means that the customer effectively relinquishes control over whether to accept an update. As opposed to paying a one-time fee to purchase the application, the user will pay each time they use the web service. From the software vendors standpoint, SaaS has the attraction of providing stronger protection of its intellectual property (since the user does not have access to the source code) and establishing an on-going revenue stream. The SaaS software vendor may host the application on its own web server, or this function may be handles by a third-party application server provider (ASP).

64

Ajax
Ajax is a technology used in Web 2.0 which allows pages to update sections of their content using programs or data held on a web server but without reloading the entire page.This makes the experience of using the softwaresmoother and more similar to that of the thick-clientby reducing the delay between operations since fewer bytes need to be transferred from web server to web browser.

3.6.10Wireless networking

Wireless network - any type of LAN in which the nodes (computers or computing devices, often portable devices) are not connected by wires but use radio waves to transmit data between them. Wi-Fi - trademarked IEEE 802.11 technologies that support wireless networking of home and business networks. Bluetooth - a wireless protocol for exchanging data over short distances from fixed and mobile devices. A wireless access point (WAP) allows devices operating wirelessly to connect a wired network. It allows data to be relayed between the wireless devices (such as computers or printers) and wired devices on the network. Wireless networks allow devices to be added to a network using little or no extra cabling.

Wi-Fi
Wi-Fi is the standard for wirelessly connecting computers. It is the trademark for the popular wireless technology used in home and business networks, mobile phones and other electronic devices that require some form of wireless networking capability. Wireless networks are: Typically slower than networks connected using Ethernet cable. More vulnerable because anyone can intercept the radio broadcasts that carry the data between wirelessly networked computers. Wi-Fi Security Wired Equivalent Privacy (WEP) was introduced to give wireless LANs an equivalent level of security to wired LANs. Since its introduction, serious weaknesses have been identified and therefore Wi-Fi Protected Access (WPA) was introduced. WPA provides more security than a WEP security set-up. In WEP, anyone wishing to access the wireless network must provide a passphrase which is stored in the wireless access point (WAP).

Bluetooth

65

Bluetooth allows users to send data from fixed and mobile devices over short distances, creating a personal area network (PAN). Bluetooth achieves a gross data transfer rate of 1 Mbps using a short-range ISM band at 2.4 GHz.

3.6.11 Server-side scripting

Web server extension - a program written in native code, i.e. an executable or a script that is interpreted by an interpreter running on the web server, which extends the functionality of the web server and allows it to generate content at the time of the HTTP request. Common Gateway Interface - a gateway between a web server and a web server extension that tells the server how to send information to a web server extension, and what the server should do after receiving information from a web server extension. Dynamic web page content - content that is generated when the web browser A web browser uses the HTTP application protocol to fetch web pages from a web server by sending a request message. A web server listens on port 80 for such requests. The response from the server also uses the HTTP protocol to simply transfer a file of bytes representing a web page or image or sound file back to the web browser. However, if the web browser requires information that has the potential to change with time then the server needs to have its functionality extended -- this extension is called a web server extension.

Client Web browser HTTP

Server
HTTP Web server

Port no. 2985

Port no. 80
Output buffer Network card Network card Input buffer

Response message The Common Gateway Interface (CGI) is a gateway between the web server and a web server extension. The CGI specification tells the server how to send information to a web server extension, and what the server should do after receiving information from a web server extension.

Request message

66

Databas e Client Web browser DBMS

Server Web server CGI extensio response n request Web server

Response message Get/webpage.asp? myname=Fred

Query string
An example of a query string is: Get /webpage.asp?myname=Fred&age=6. Here, we have a get command with a query string which consists of two names, myname and age, with the corresponding values of those names, separated by an equals symbol. The ampersand (&) separates two name-value pairs.

Post method
If the Post method is used by the browser or HTTP application, then the data is passed in the message body and the command is simply Post/webpage.asp, for example. Dynamic web page content is created by writing a web page in a mixture of HTML and a scripting language, for example PHP. This script will then be interpreted by the web server extension and sent to the client in HTML. The web server extension can use database connection components to connect to a database management system (DBMS), which in turn accesses data in a database. There are five steps to make a connection to a database: 1. Create a connection. 2. Select a database. 3. Perform a database query. 4. Use the returned data, if any. 5. Close the connection. MySQL is the most common DBMS, and a scripting language will incorporate MySQL queries in its code in order to perform a database query. You will not be required to write any PHP or ASP but you will need to know how to write SQL. For examples, see section 5 on databases.

3.6.12 Security

67

Virus - a small program attached to another program or data file. It replicates itself by attaching itself to other programs. Spam - unsolicited junk e-mails. Worm - a small program that exploits a network security weakness (security hole) to replicate itself through computer networks. It may attack computers. Remote login - when someone connects to a computer via the Internet. Trojan - a program that hides in or masquerades as desirable software, such as a utility or a game, but attacks computers it infects. Phishing - when someone tries to get you to give them your personal information. Pharming - when a phisher changes DNS server information so that customers are directed to another site. No doubt if you have a computer and a connection to the Internet, you will be familiar with the problems that arise. Familiarise yourself with the formal definitions of words that you may recognise. Before the Internet, viruses were usually spread through infected files on floppy disks. When the infected file was opened, the virus program would be executed and would attack the computer usually based on an event such as a specific date. Viruses can erase and damage files. When the Internet was introduced, viruses could be spread through the medium of email and therefore the presence of the virus could increase exponentially by sending itself from email address to email address. Similarly, spam is unwanted email which is sent to thousands of email addresses by redirecting the email messages through the SMTP server of an unsuspecting host; this is called SMTP session hijacking. A worm is a malicious computer program that replicates itself through networks. It uses up computer time and drastically increases network traffic. It may also attack the computers and servers of the networks it moves through. If an unprotected computer is connected to the Internet, someone could connect to it through remote login and control the computer, erasing files or executing programs. The main difference between a virus and a Trojan is that a Trojan either hides in or masquerades as desirable software. When the program is executed, the Trojan will commence attacking the computer it is stored on. Phishing is a type of scam that operates generally through email. An example is when the sender pretends to be a reputable bank asking for someone to confirm their bank details. Some software will allow the sender to transform the 'from' line so the email will often look legitimate. The email will include a URL to a website which looks equally legitimate with a form for the user to complete. Alternatively, the email may include a Trojan which will record the keystrokes that a user enters on a truly genuine website. Spyware is a type of malicious software that collects information about users without their knowledge. Spyware software can track the website history of the computer it is on and this can be used by phishers to create appropriate phishing scenarios. 68

Firewalls
A firewall is a hardware device or a program that controls traffic between the Internet and a private network (such as a school network) or computer system (such as a home computer). Firewalls can be customised and rules can be set up that control which data packets should be allowed through and which should not. Traffic can be blocked from specific IP addresses, domain names or port numbers. Firewalls can also be set up to search data packets for exact matches of text. Packet filtering In packet filtering, the firewall analyses the packets that are sent against a set of filters (firewall rules). Packets are either allowed through or blocked. Proxy server Using a proxy server, the computer which is requesting information from a server does not come immediately into contact with the response. This allows the information to be filtered before it is passed on to the client.

Virus detection
Virus detection software, often called an antivirus scanner, checks files against a dictionary of known viruses. This means the computer users must regularly update their software in order to ensure that this dictionary stays up to date. If an infected file is found, the antivirus scanner will try to delete the virus from the file. If this fails, the infected file will be quarantined -- kept in a separated area of the hard disk where it can't infect other files. The software may resort to deleting the infected file.

Computer security
Authentication Authentication of legitimate users can be achieved by using passwords, biometric data, security tokens or digital certificates. For example, an organisation may only accept emails which have been digitally signed. This digital signature must be authenticated through a digital certificate issued by a trusted third party such as a certification authority. Authorisation An authorised user will have a user ID and password which has been recorded as an existing user. This user will be able to access certain features according to permissions granted by the system administrator. Passwords and encryption are used to keep data secret from unauthorised persons. Accounting Systems will create an account of every activitymade on a network or individual computer. This means that security breaches can be detected as soon as possible and any compromised parts can be identified. This can be applied to

69

Internet activity, where a log will be created storing every IP address of the websites that have been visited.

3.6.13 Encryption

Encryption- using an algorithm and a key to convert message data into a form that is not understandable without the key to decrypt the text. Plain text - message data before it is encrypted. Cipher text - message data after it has been encrypted. Decryption - using an algorithm and a key to convert encrypted message data into its plain text equivalent. Cryptography - the science of designing cipher systems. Cryptanalysis - trying to find the plain text from the cipher text without the decryption key. Break the code - find the plain text from the cipher text by guessing or deducing the key. The main uses of encryption are to store information securely and to transmit messages so that only the sender and the legitimate recipient can read them. Encryption is the process of using an encryption algorithm and an encryption key to convert plain text into a form that is not understandable without the key to decrypt the text.

Symmetric encryption
Symmetric encryption is the most straightforward of encryption techniques whereby it is possible to decrypt a message if the encryption algorithm and key are known. A B C D E F G H I J K L M N O P Q R S T U V W X Y Z WX Y Z A B C D E F G H I J K L M N O P Q R S T U V Above is a simple substitution cipher which is created by moving each letter of the alphabet four places along. We can encrypt "CIPHER" to "YELDAN" and decrypt "AJYNULPEKJ" to "ENCRYPTION". The cipher, however, can be easily discovered by analysing the frequency of letters in passages of text and looking for expected words.

T P I P C U L

R O O H A S T

A S N E N E O

N I C R B F O

S T I S E U Z

70

A transposition cipher features the plain text written into a grid, row by row with any remaining spaces being filled with a Z. We then produce the cipher text by reading the table contents column by column.The key is the number of columns used; therefore we have used a key of 5. We can encrypt the message "TRANSPOSITION CIPHERS CAN BE USEFUL TOO" into "TPIPCULROOHASTASNENEONICRBFOSTISEUZ", making the message unrecognisable.

Asymmetric encryption
In asymmetric encryption, or public key encryption, both parties who want to communicate securely have a pair of keys, a private key and a public key. The public key is kept secret and the public key is freely available to anyone. The encryption algorithm is also publicly available. A message encrypted with a private key can only be decrypted with the corresponding public key and vice versa. Consider a scenario with two users A and B. If A encrypts a message with A's private key, then B (and anyone else who intercepts the message) can decrypt the message with A's public key. If A encrypts a message with A's public key, only A can decrypt the message with A's private key. If A encrypts a message with B's public key, only B can decrypt the message with B's private key. Asymmetric encryption involves complicated calculations, so encryption and decryption are slow. Secure web browsing and e-commerce use a protocol known as Secure Sockets Layer (SSL). The website accessed by the browser will send its public key to the browser. The browser creates a symmetric key (known as a session key) that it sends to the website encrypted by the website's public key. So only the website can decrypt the symmetric key. This symmetric key is then used for the rest of the session.
1. B sends A its public key. 2. A uses B's public key to create a symmetric key.

B's public key

Symmetric key

A's private key

Since both A and B can make an identical symmetric key, this can be used by A to encrypt any information that it needs to keep secret. B has its own private key and A's public key therefore it will be able to decrypt the cipher text sent to it.

Digital signatures and digital certificates

71

To prove that a message is genuine, sender A can digitally sign the message. This makes it possible to detect whether the message has been tampered with, and the signature is proof that it has been sent by A. The processes required before A's message is sent to B are as follows: The message is hashed to produce a message digest. The message digest is encrypted with A's private key; this becomes the signature. The signature is appended to the message. The message is encrypted using B's public key. The encrypted message is sent to B. The processes required to ensure that the message received by B is genuinely from A are as follows: B decrypts the message with B's private key. B decrypts the signature with A's public key to retrieve the original message digest. The decrypted message is hashed again to reproduce the message digest. If the decrypted digest equals the reproduced digest, the message has not been tampered with.

72

Вам также может понравиться