114 views

Uploaded by muhammad_nasir_atd

- A Level Computing 9691 Paper 2 Notes
- user's manual rslogix 5000.pdf
- Algorithms
- 6.1 Caches
- chache
- Hybrid Test Automation Framework
- 5
- Cache Memory Mapping Techniques
- Product Allocation in APO-GATP
- Sys Info
- Datastage Questions
- Prefetching
- 4th Seminar
- FusionSim Simulator
- Computer Live Memory
- Arrays
- FBFA0307en
- IJEA_V2_I2
- cx programming langauge
- Dragan

You are on page 1of 29

0

http://www.hunterthinks.com/GRE Hunter Hogan

Table of Contents

Computer Science GRE Study Guide .............................................................................. 1

Table of Contents ........................................................................................................ 1

Document Organization............................................................................................... 1

Comments? ................................................................................................................. 1

Study Guide ................................................................................................................ 2

I. Software Systems and Methodology – 40% ..................................................... 2

II. Computer Organization and Architecture – 15% .............................................. 9

III. Theory and Mathematical Background – 40%............................................ 13

IV. Other Topics – 5% ..................................................................................... 24

Appendix .................................................................................................................. 25

Greek letters .......................................................................................................... 25

Mathematics .......................................................................................................... 25

Index......................................................................................................................... 26

Begging Copyright .................................................................................................... 28

Sources ..................................................................................................................... 28

Acknowledgements ................................................................................................... 29

Authority................................................................................................................... 29

Version and Updates ................................................................................................. 29

Document Organization

I have organized the information using the topics listed on the insert dated July 2003 that

comes with the test registration booklet. The website lists slightly different topics.

http://www.gre.com/subdesc.html#compsci I refer to the practice test and its instructions

on multiple occasions. I highly encourage you to have a copy of the practice test

available. ftp://ftp.ets.org/pub/gre/15255.pdf

A few topics that do not fit into this hierarchy are appended (e.g. Greek letters).

Bolded items are in place to allow you to skim the study guide. I have not had time to

bold everything that I think is important.

Some topics are presented in a less than ideal order. For example, we talk about NP-

completeness and reference Turing machines before we explain what a Turing machine

is. I do not plan to cross-reference the sections. I suggest using the index or find feature.

You may find the index at the end useful. On the other hand, I stopped adding new

entries to the index recently. I suspect that it includes 90% of all terms.

Comments?

Please send me your comments. CompSciGre@hunterthinks.com

Page 1 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

Study Guide

I. Software Systems and Methodology – 40%

A. Data organization

1. Data types

a. Definition: Data types are assigned to blocks of information. The type tells

the program how to handle the specific information. For example, an integer

data type can be multiplied, but a Boolean data type cannot.

b. In C, we have a data type called a struct that allows us to create records with

fields. Declare a struct variable using the form

struct node { int age;

char gender;}

To access a field in the record use the form “node->age;”.

c. A pointer is an unusual type. The value that it holds is a memory address. We

then use the memory address to manipulate the data held at that memory

address. In Pascal, to declare a pointer use:

type Link = ↑type-name;

If you want to manipulate the memory address, then use the ↑ before the

variable name. For example, ↑H means, “points to H”. If you want to

manipulate the data, then use the ↑ after the variable name. So, H↑ means “H

points at this data”. Remember that we read English left to right; this might

help you remember what the ↑ operator does in which position. In C, to

declare a pointer, use the * operator: int *p;. To store the address of a variable

v in the pointer p use the statement p = &v; The ampersand (&) is the operator

that returns the address instead of the value. Finally, to store a value in the

pointer use the * operator again; for example, *p = v; will store the value of v

at the location of p.

2. Data structures and implementation techniques

a. Definition: A data structure is a way of organizing one or more data types

into groups. For example, many languages do not have a string data type. To

manipulate a string, you actually create a data structure that organizes multiple

character data types into one string data type.

b. There are times when we place items somewhere and then retrieve them later.

If you always retrieve the first item you stored then we call it First-in/First-

out or FIFO. If you always retrieve the last item stored, then we call it Last-

in/First-out or LIFO.

c. A linked list is a simple data structure that uses cells (or elements). Each cell

has at least two data fields. At least one field holds the information that you

want stored in the list. The other field holds a pointer to the next cell. We

often call this simple linked list a singly linked list. A doubly linked list has

two pointers - one pointer points to the next cell, the other points to the

previous cell. If the last cell points to the first cell, then we call it a circularly

linked list. Interesting and important properties of lists include:

i. You cannot move backwards on a singly linked list

ii. In a singly linked list, to see any given cell you must move through all

previous cells

Page 2 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

iii. The first and last cells present unique problems and you must deal with

them to avoid errors.

d. In Pascal, we declare linked lists like this:

type Link = ↑Cell;

Cell = record

Info : integer;

Next : Link

end;

The Link type is a pointer. The Cell is a record that has two parts - an

information field and a pointer (field) to the next cell. To use linked lists, use

the rules for pointers. If the list is sorted, then we call it a sorted list.

e. We use a hash table to store elements so that we can search them in a

constant average time. We do not sort hash tables like binary search trees. The

table consists of an array of locations. To store an element, we hash the value

of the element using a hash function.

f. Hash functions use some property of the element to figure out where to store it

in the hash table. For example, we could take the numerical value of the

element (remember that all data in a computer is ultimately a binary number)

and divide by the size of the table (the number of locations in the array). We

then discard the dividend and keep the remainder as the result (also called

modulo). We also may hash on the value of the element.

g. When programming hash functions, remember that strings are very large

numbers and will likely overflow at the modulo operator. You must take care

to use a function and execution order that will produce valid and distributed

results. The result of the hash function is an index to our array. We place the

element in that location (i.e. Array[hash index value]). It is possible that more

than one element will hash to the same location (a collision). To avoid this,

we try to make the hash table sufficiently large. When two values do hash to

the same location, we have many choices. The easiest choice is to place the

second element in the next available space (linear probing). Quadratic

probing is more complicated and yields better results by distributing the

elements into more unique locations.

h. Data structures that are dynamically allocated memory (e.g. linked lists) are

given memory from the heap. From time-to-time, the memory area referenced

by the pointer will become unreachable. When this happens, we would like

the memory to be deallocated, but we have no way to make that happen (i.e.

with code). The process of garbage collection is run by the system to

deallocate this memory.

i. Binary trees are hierarchal graphs that have exactly one root and no more

than two branches per node. The height or depth of a tree is the number of

nodes from the root to the leaf that is farthest away. The breadth of the tree

refers to dealing with the nodes that are the same distance from the root. The

top level of the tree is considered level 0. The maximum number of nodes you

can have at level n is 2n.

j. If a node has no sub nodes, then we call it a leaf.

k. A strictly binary tree is a tree where all non-leaf nodes have two sub nodes.

Page 3 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

l. A complete binary tree is a strictly binary tree of height h where all of the leaf

nodes are at level h.

m. A binary tree is a binary search tree if all of the keys (values of the nodes)

on the left side of a node are less than the node and all of the keys on the right

side of the node are greater than the node.

n. Three common methods for traversing the nodes of the tree are preorder,

inorder, and postorder. The practice test lists the precise order of traversal for

each in the instructions. It is useful to recognize that inorder traversal will

properly order the values for a binary search tree. Preorder traversal starts at

the top and ends in the lower right. Postorder traversal starts in the lower left

and ends at the top.

o. A balanced binary search tree is a search tree that has a maximum height of

log n, where n is the number of nodes in the tree.

p. An AVL tree is a balanced binary tree with an additional rule. From any

node, the height of the left and right sub trees can differ by no more than 1.

q. A priority queue is any data structure that is only concerned with the highest

priority item. It only provides 3 operations: insert, locate the highest priority

item, and remove the highest priority item.

r. Since you do not need to organize all of the elements to make a priority queue,

it makes no sense to use highly ordered data structures like binary search

trees. A binary heap is tree where the highest priority node is placed at the

top of the tree. You can think of this as throwing the highest priority item on

the top of the heap. Without going into painful detail, it is easy to implement a

tree like this using an array. The elements in the array start at the root, then the

first level, then the second level, etc. When you remove the highest priority

item, you do not have to reorder all of the elements, just some of them.

B. Program control and structure

1. Iteration and recursion

a. Definition: Iteration is the process of performing the same actions multiple

times. Loops are the best examples of iteration.

b. Definition: Any function that calls itself is recursive. It looks a lot like

iteration, but iteration always has a way to end (i.e. a conditional check).

Poorly constructed recursive functions could potentially exclude a conditional

check of some sort.

c. We use loops or iteration to perform the same sequence of actions repeatedly.

d. There are 3 basic loops in C and Pascal: fixed repetition, pretest, and posttest.

i. Fixed repetition uses the form for <assignment> to <value> do

<statement>. The statement is the set of actions that the program repeats.

The for/to section controls the loop. Since this loop does not do any

conditional check, it will repeat a fixed number of times.

ii. The pretest style loop will perform a conditional check before it executes

the statement. Pascal uses the form while <conditional check> do

<statement>.

iii. The posttest loop performs the conditional check at the end of the

statement. This has a major implication: this loop will always perform one

Page 4 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

iteration. The Pascal form of the posttest loop is repeat <statement> until

<conditional check>.

e. Recursive functions are functions that call themselves. We use them because

they are often easy to code because they translate very easily from a recursive

mathematical function. There are four rules to remember when implementing

recursive functions.1 1) Always have a base case that is solved without calling

itself, 2) Progress towards that base case, 3) Have faith (believe that the

recursive call will work), and 4) avoid duplication of work – do not solve the

same instance of a case more than once. If you leave out the base case, then

you will have problems. If you break rule 4, then you will probably have an

exponentially time complex algorithm. Remember that loops are sometimes

more efficient and easier than recursive functions.

f. Each call to a recursive function uses a process called an activation record.

The major issue with having the same function called more than once is that

each instance of the function needs to keep its local variables separate from

the other calls. When the function is called, the activation record details

certain conditions for the call. When the function ends, the computer needs to

know the memory location to execute next (the return address). In a stack-

based programming language, the computer keeps track of recursive calls by

placing each activation record on a stack. More specifically, the stack holds

pointers to each of the activation records (which hold the information about

how to access the function and related non-local variables and types).

g. Since recursive functions require activation records and other resources, they

are usually slower than the equivalent iteration.

2. Procedures, functions, methods, and exception handlers

a. Definition: A procedure and a function are very similar animals. They allow

the programmer to call them from multiple places in the program. If the called

subroutine returns a value, then we typically call it a function.

b. Definition: In object-oriented programming, we have a special type of

subroutine called a method. A method is defined in a class to work only on

objects in that same class.

c. Definition: From time-to-time, the software or hardware may want or need to

change the normal execution of the program. To do this, an exception is

generated. Your program or operating system must have some way to deal

with the exception; the code that processes the exception is called the

exception handler.

d. A macro is very similar to a function or procedure. Originally, when you used

a macro, the compiler literally substituted the text of the macro for the name

of the macro. Therefore, if your program called the macro twice then the

compiler inserted the code into the program twice. Compare this to a function

call; if you call a function twice, the compiler still only inserts the code once

into the program.

3. Concurrency, communication, and synchronization

1

Weiss

Page 5 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

a. Definition: Concurrency refers to the capability of a system to run multiple

programs at the same time. This is commonly referred to as multi-tasking.

This is very different from multiple processors.

b. Definition: Coordinating multiple sub-routines that may pass information to

each other can be a tricky task. There are many techniques for communication

and synchronization.

c. We often produce data with one function and then use it in another function.

We say that these two functions have a producer-consumer relationship.

This relationship can generate problems if they operate at different rates. To

solve this, we implement a buffer. The producing function places the data in

the buffer and the consuming function takes the oldest data out of the buffer.

(The buffer is FIFO.) Unfortunately, the buffer itself generates new issues that

we must deal with. There are four basic implementation techniques for the

buffer. The most basic is that both functions have direct access to the buffer.

This is poor design. This means that any function that produces must make

sure that the buffer is not full. When it is full, there might be multiple

functions waiting to place data in the buffer. Since all functions are

independent of each other, there are no rules of precedence. Furthermore, it is

relatively difficult to make sure that the buffer stays FIFO when both the

producer and consumer could be accessing the buffer at the same time. The

second way to implement the buffer is to use a semaphore. The semaphore

acts as a synchronizer and does not allow simultaneous access to the buffer.

Semaphores can force something called busy-waiting. Now, when the buffer

is full or empty the semaphore forces the appropriate function to loop in place

waiting for a change. This looping consumes processor cycles and is

inefficient. Monitors are the third implementation technique. Monitors also

allow only one function to have access to the buffer, but they do so in a more

stringent way. The problem is that with this extra security, if one function gets

control of the monitor and cannot continue (e.g. a producer function when the

buffer is full), then the process has to abort and the next process is offered

control. Remember that monitors suck and are inefficient. The most modern

method for buffers is to make the buffer a totally separate function. With this

technique, the producer does not even attempt to produce if the buffer is full

and the consumer does not attempt to take something off the buffer when the

buffer is empty. This is the most efficient technique we currently have.

d. There are times when two (or more) functions are in deadlock. For example,

imagine two functions that send information to each other, but cannot because

all buffers are full. Neither function will attempt to empty a buffer, because

they want to place something in a buffer. There are many ways to deal with

deadlock. You could require that a process request exclusive access to the

resources necessary to complete the operation before beginning the operation.

If the resources are not available, the process waits. You can prioritize the

resources and require that all processes request the resources in order. You

can have processes time out and restart if they wait too long. You can have the

operating system restart processes if the wait queues back up.

C. Programming languages and notation

Page 6 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

1. Constructs for data organization and program control

a. Definition: In this section, data organization is not referring to anything like

data structures. There are many ways to think about information and

programming. Object-oriented programming has come to dominate the way

we organize data and our programs. Older languages used imperative

programming or functional programming. Prolog was a niche language that

depended on a logical programming organization. The test seems to focus on

imperative languages like C and Pascal, with a few questions about object-

oriented design and Prolog.

b. Website: A great Pascal tutorial can be found here.

http://web.mit.edu/taoyue/www/tutorials/pascal/

c. Website: A decent C tutorial can be found here.

http://members.tripod.com/~johnt/c.html (If you know of a better one, please

email me.)

2. Scope, binding, and parameter passing

a. Definition: All data and procedures in a program have a name. The scope of

that name refers to which parts of the program can access the name. For

example, if a variable is local to a function, then it can only be used in that

function.

b. Definition: When a name is attached to a specific piece of data or function

then we say they are bound together.

c. Definition: If you call a procedure and provide values for the called procedure

to use, then we call it parameter passing. The values are the parameters.

d. We use names to identify variables, types, functions, etc. Lexical scope (or

static binding) is done at compile time. Dynamic Scope is done at the time the

call is made to the name.

e. There are three ways to pass parameters between functions (procedures). Call-

by-value passes only the value of the parameter. In call-by-value, the original

variable and its value become disconnected. Call-by-reference passes the

memory location of the original variable. So, the new variable (in the called

function) holds a memory location and not a value. The value can be changed

by using the memory location appropriately. Call-by-value-result (also

known as copy-in/copy-out) also changes the original variable, but it only

changes the variable at the end of the called function. In call-by-reference, if

we change the new variable, then the original variable is also changed. Call-

by-name passes the name of the argument. This differs from call-by-

reference, because it does not pass a memory location. Arguments to macros

are a good example of call-by-name.

3. Expression evaluation

a. Definition: Any set of code that returns a value when it terminates is an

expression. Mathematical formulas are the best example of this (e.g. 3+4).

The order of operations of a mathematical expression is an example of

expression evaluation in a programming language.

D. Software engineering

1. Formal specifications and assertions

a. Definition: Whatever. (If you think I should put something here, email me.)

Page 7 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

2. Verification techniques

a. Definition: Whatever. (If you think I should put something here, email me.)

3. Software development models, patterns, and tools

a. Definition: Whatever. (If you think I should put something here, email me.)

E. Systems

1. Compilers, interpreters, and run-time systems

a. Definition: Programming languages are for the convenience of the

programmer, not the computer. A compiler converts the code of the

programming language into either assembly language or machine code.

b. Definition: It is possible to directly execute the code of a programming

language without compiling it, just use an interpreter. Think of the command

line in an operating system; it is often called the command interpreter because

the code is executed without compiling it first.

c. Definition: Most programs are designed to run in a specific operating system

on a specific platform. When this is the case, the programmer (and compiler)

do not have to worry about certain things like memory allocation and program

startup – the run-time system will handle these and other general issues.

2. Operating systems, including resource management and protection/security

a. Definition: An operating system is the software that provides an environment

for other programs to run in. Through various run-time systems, applications

gain access to other hardware and software resources.

b. Some operating systems allow multiple users to use the computer resources at

the same time (e.g. mainframes). Requiring each program to have its own

memory space for each user that is executing the program is inefficient. If the

program never modifies itself then we call it reentrant. Reentrant programs

are more efficient because all users can use the exact same set of instructions,

thereby saving memory.

c. Memory-mapped I/O is a method of communicating with I/O devices using

the same bus as the main memory. The alternative to using the same bus is to

have a separate bus and instruction set for devices. To access one of the

devices, you may need to use registers, but often the address is still the

primary concept. We often say that the memory address used to access the I/O

device is the I/O port.

3. Networking, Internet, and distributed systems

a. Definition: If you connect multiple computers together, then we say they are

networked. The Internet is the largest and most well known network.

b. Definition: There are certain programs that are actually running on multiple

computers at the same time. I do not mean 10,000 different people playing

10,000 different games of minesweeper. I mean 10,000 computers running

what amounts to one program. The best and most well known example of this

is the DNS (domain name system) service used on the Internet. Thousands of

computers work together to provide what amounts to one massive program

called DNS. This is an example of a distributed system.

4. Databases

Page 8 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

a. Definition: A database is bunch of data structures in one place. We usually

refer to something as a database if it organizes the data is some efficient

manner and provides a way to view and update the data.

5. System analysis and development tools

a. Definition: Whatever. (If you think I should put something here, email me.)

II. Computer Organization and Architecture – 15%

A. Digital logic design

1. Implementation of combination and sequential circuits

a. Definition: Sequential circuits have memory, and the output depends on the

input and the memory. Combinational circuits do not have memory and the

output is based only on the input.

b. A flip-flop (or latch) is a simple memory element that we use to store a value.

The flip-flop has two inputs and one output. (One of the inputs is usually the

clock.) When the clock is asserted, then you can change the stored value of the

flip-flop (the latch is said to be open).

2. Optimization and analysis

a. Definition: Whatever. (If you think I should put something here, email me.)

B. Processors and control units

1. Instruction sets

a. Definition: An instruction set is all of the machine instructions that a

processor understands.

b. An instruction set is all of the machine language instructions that the

designers of a processor provide for computation. For a general-purpose

computer, the instruction set should include instructions to perform arithmetic,

logical operations, data transfers, conditional branches, and unconditional

jumps. An individual instruction is literally made up of a certain number of

bits in a specific order. The Intel x86 architecture has a huge set of

instructions and the instructions can have variable bit lengths. Other

instruction sets keep the instruction length the same for all instructions to keep

the hardware simple and fast. When the processor first receives an instruction,

it must first decide what type of instruction it is. This can be based on the

length of the instruction and/or certain flag bits in the instruction.

2. Computer arithmetic and number representation

a. Definition: Since computers use binary and cannot perform arithmetic on

infinite numbers, it is important to understand the limitations and processes of

a system.

b. When humans write numbers, the smallest digit is the rightmost digit. Since

computers do not need to represent numbers from right to left, we use the

terms least significant bit and most significant bit instead.

c. There are multiple methods to represent the sign of a given number. Sign and

magnitude is the most obvious is to use where we simply use a bit to

represent positive or negative. This system has major problems; modern

computers do not use it. Now imagine a computer that did not use any sign,

and you try to subtract a large number from a small one. The result is that you

will borrow from a string of zeros leaving a string of ones. This makes it

natural to represent positive numbers with a bunch of leading zeros and

Page 9 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

negative numbers with a bunch of leading ones. We call this system two’s

complement. Complementing a binary number means changing a 0 to a 1 and

a 1 to a 0.

d. One’s complement is not used very often. To use all negative numbers start

with the positive version and then complement the entire number. This system

causes some unusual problems like operating on negative zero.

e. Computer addition works a lot like addition with pen and paper. The amount

that is carried to the next operation is called the carry out. The amount

carried from the previous operation is called the carry in. We use an end-

around carry when performing some addition functions on one’s

complement numbers, but not two’s complement numbers.

f. If you add two positive numbers together and get a negative number, then

overflow has occurred.

3. Register and ALU organization

a. Definition: Registers are extremely fast memory in the processor that is used

during calculations performed by the instruction set.

b. Definition: The arithmetic logic unit (ALU) is the part of the processor that

actually performs addition, multiplication, and Boolean operations.

c. Caveat: I am weak in the area of ALU and processor architecture. I find it

boring and plan to skip any questions about this on the test. I am guessing that

as many as three questions will relate to an understanding of exactly how

instructions are processed. I recommend you look over your computer

architecture texts.

4. Data paths and control sequencing

a. Definition: A processor has two general parts: the data path and the control.

The data path moves the information around the processor and performs

calculations; the control directs the actions of the data path.

b. We implement job-scheduling queues at many levels in the machine. The

simplest and most obvious is FIFO (first in/first out); this processes the oldest

job first. LIFO is the opposite; it processes the youngest job first. Priority

queuing processes the job with the highest priority first. You can also process

the shortest job first. Shortest job first produces the smallest average

turnaround time. Lastly, you can force the jobs to share the system resources

by using round-robin. Round-robin uses an interval timer to force processing

of a program to suspend and allow others to run.

C. Memories and their hierarchies

1. Performance, implementation, and management

a. Definition: Because fast memory is significantly more expensive that slow

memory, you cannot load up your system with fast memory. So, you must

have different speeds of memory. Managing that memory is a sophisticated

operation.

2. Cache, main, and secondary storage

a. Definition: In older systems, the processor would only talk directly to main

memory. This memory was too small and lost data when it lost power, so we

created secondary storage like hard drives.

Page 10 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

b. Definition: Main memory is often so slow and so large (which makes it

slower to find anything), that system performance is horrible. Cache memory

is smaller and faster. It helps speed up transfers between main memory and

the processor.

c. We use cache to speed up access to instructions and data by the processor.

Cache is usually very fast, low capacity memory that is between the processor

and the main physical memory. Modern processors have level 1 cache that is

very small and on the same die as the processor, and level 2 cache that is in

the same package as the processor but larger than the level 1 cache. Since

cache cannot typically hold all of the data and instructions that we want access

to, we try to make sure that the ones we will want next are in the cache. When

the processor makes a memory request, it should first make sure that the

requested information is not in cache. Addressing for cache is usually rather

simple. Since main memory is a binary multiple of the cache memory, we can

use a system called direct mapping. For example, if the address of the

information in memory is 101011010101, then the address location in cache

might be 0101. In this system, when we put information from memory into

cache, then we only place it into the cache location that has the same ending

address as the original location. Each cache location has an additional field

that appends the suffix of the original memory location to what is currently

stored in the cache location. The advantage to this system is processor

accesses to cache are extremely fast. If the processor first had to lookup the

cache location, then it would add overhead to every memory access. The

disadvantage is that multiple memory locations can only be mapped to one

cache location. If it impossible to have two conflicting locations in cache at

the same time. So, when the processor performs a memory access, it first

looks in cache. It starts by looking at the cache location that corresponds with

the memory location. If the information is not in cache (a cache miss), then

the information is read from memory and placed into cache. Why place it into

cache? When programs execute, when a piece of data or instruction is used it

is often reused very soon afterwards (think of loops). We call this temporal

locality. The processor might also copy the information near the requested

memory address into cache. Why copy information that has not been

explicitly requested? Programs are largely linear, meaning that most programs

run an instruction and then run the very next instruction (instead of jumping to

a different location in the program). So, we often copy the next memory

address to cache to take advantage of this spatial locality.

d. An alternative to direct mapping is set-associative cache. Instead of forcing

the information to only one cache location, we allow it to go to any of a set of

locations. However, we still do not let the information be placed in any

location at all (that is fully associative cache). Since the location of the

information in cache is still related to the location of the information in

physical memory, we gain a performance advantage over a fully associative

cache.

3. Virtual memory, paging, and segmentation

Page 11 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

a. Definition: Virtual memory is secondary storage that is logically treated as

main memory. Performance can be terrible, but it is often necessary to present

the illusion of a very large main memory space. Since the processor cannot

talk directly to secondary storage, it is necessary to move information between

the main memory and the virtual memory (on the secondary storage). If you

move a fixed amount of information each time, it is called paging.

Segmentation allows you to move variable amounts.

b. Virtual memory is the system that allows us to have memory space on

secondary storage (the hard drive). In most situations, it is not possible to fit

all instructions and data into physical memory. Especially, when we are

running multiple programs or manipulating large data structures (databases).

Virtual memory uses almost identical technology and techniques as caching.

However, we have developed a different vernacular for describing virtual

memory.

c. We move instructions and data to the disk in units called pages. We tend to

make pages large to take advantage of spatial locality (Think KB instead of

bytes). When a requested page is not in main (physical) memory, and we must

retrieve it from the disk (virtual memory), then we call it a page fault. Keep

in mind that a disk access can take up to one million instruction cycles. This

fact creates a massive penalty for page faults.

d. Since page faults are very expensive, we use a fully associative paging table,

we sometimes track the least recently used (LRU) pages in physical memory,

and we mark unchanged pages so that we do not have to write them back to

disk.

e. We might also track the frequency of use of each page and page the least

frequently used (LFU) pages back to disk first. In a fully associative page

table system, we store the actual location for every possible virtual page. In

the page table, we only store the base address of the page in memory. When a

program wants to access a location in memory, it will construct the call using

the virtual page and the offset from the base of that page. Look at question

#18 on the practice test. The job makes the call [0, 514]. The 0 refers to the

virtual page, not the actual page. The virtual memory system translates the 0

into 3 using the page table. Since each page has 1024 locations, the actual

base address is 3072. Now add the offset (514). The actual address is 3586.

D. Networking and communications

1. Interconnect structures (e.g. buses, switches, routers)

a. Caveat: I do not have time to put any information in this section. According

to people that took the test in November 2003, there are networking questions

on the test.

2. I/O systems and protocols

a. Caveat: I do not have time to put any information in this section. According

to people that took the test in November 2003, there are networking questions

on the test.

3. Synchronization

Page 12 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

a. Caveat: I do not have time to put any information in this section. According

to people that took the test in November 2003, there are networking questions

on the test.

E. High-performance architectures

1. Pipelining superscalar and out-of-order execution processors

a. Definition: If tasks must be executed in a specific order, then we say they are

scalar. However, when trying to complete the same task multiple times, it may

be possible to perform one step from each of the iterations at the same time.

b. Most modern processors use pipelining - performing multiple instructions at

the same time.

2. Parallel and distributed architectures

a. Definition: Many programs lend themselves to multiple calculations being

performed simultaneously. If we have multiple processors accessing the same

main memory, then we tend to say they are parallel. If each processor has its

own main memory, then we tend to say the system is distributed.

b. It is possible and often beneficial to have multiple processors in the same

computer. This parallel processing often improves throughput but it does not

affect the speed of computing a single instruction. Furthermore, because

certain instructions are dependent on other instructions, many functions do not

benefit from the additional processors.

c. Throughput is the measure of how much work is done in a certain time. We

use execution time (or response time) to measure how long it takes to

perform a specific instruction or set of instructions. The contrast is subtle.

Think of a public laundry. The execution time measures how long it takes to

get your dirty clothes clean and folded. The throughput is the measure of how

many loads of dirty clothes were cleaned and folded in a given time.

d. In a computer, designing the processor to handle multiple instructions

simultaneously requires the processor to spend instruction cycles organizing

the parallel processing; we call this overhead.

III. Theory and Mathematical Background – 40%

A. Algorithms and complexity

1. Exact and asymptotic analysis of specific algorithms

a. Definition: When looking at an algorithm and evaluating it, we might want to

precisely define how long it will take the program to run (exact analysis), but

we usually just approximate the time (asymptotic analysis).

b. The worst-case analysis of an algorithm is not the same as the upper bound.

Upper bound is a property of the equations that represent the algorithm. The

worst-case analysis finds the worst-case scenario and expresses the time or

space to solve. On the other hand, the concepts are so closely related that we

often use the terms interchangeably.

c. Sorting methods

i. http://www.cs.ubc.ca/spider/harrison/Java/sorting-demo.html

ii. Insertion sort

♦ Simple to code

♦ Good for a small number of elements

Page 13 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

♦ Starts with one element, then inserts the next element in the correct

position by scanning until the successor is found (or the end of the list)

♦ O(n2)

♦ If the information is already sorted, then it runs at O(n)

iii. Shellsort

♦ Named after Donald Shell

♦ Also known as diminishing gap sort

♦ Simple to code

♦ Sorts elements that are not next to each other – this is what makes it

possible to perform better than O(n2)

♦ Elements are sorted only relative to items that are k elements away. So

at the 6-sort, all of the elements that are 6 away from each other are

treated as a set and sorted using insertion sort

♦ The rate and method for diminishing the gap (the size of the k-sort) is

important to the performance of the sort. Empirically, it appears that

the best way to decrement the gap is to divide the size of the set by 2.2

♦ Using the best known design we can get O(n5/4)

iv. Merge sort

♦ Divide-and-conquer

♦ O(n log n) because of the O(n) overhead

♦ The array is divided into two sub arrays, then the first element in each

array is compared (e and f), the smaller element (e) is copied to the

new third sorted array. Repeat using e+1 and f until everything has

been sorted.

♦ Rarely used because of the large amounts of memory and copying

necessary

v. Quicksort

♦ Divide-and-conquer

♦ Fastest algorithm known

♦ Average case O(n log n)

♦ Worst case O(n2) (If you choose the smallest element every time.)

♦ Uses recursion

1. If the number of elements in the set is 0 or 1, then return

2. Choose any element in the set to be a pivot

3. The remaining elements are divided into two groups: left and right

4. Perform the recursion on the two groups left and right

5. The elements are sorted into left and right based on if they are

smaller or larger than the pivot

♦ The choice of pivot is important to running time – try to choose the

middle element

♦ Not great for small arrays

vi. Selection sort

♦ Finds the smallest item and puts it in the correct place

♦ Not efficient as it must look at all remaining items

♦ O(n2)

Page 14 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

vii. Heapsort

♦ O(n log n)

♦ Place all of the items into a heap

♦ Grab the largest one and put it in place

♦ Time efficient, but space can be a problem

viii. Bubble sort

♦ Compare two items and swap if necessary; repeat through all items

♦ O(n2)

♦ But, if the list is sorted, then O(n)

2. Algorithmic design techniques (e.g. greedy, dynamic programming, divide and

conquer)

a. Definition: Since we really do not know how to make algorithms, we have

created artistic guidelines.

b. Divide-and-conquer is an effective algorithmic technique that requires

recursion. First recursively divide the problem into multiple sub-problems

until you get to the base case), then add all of the sub-problems together to

find the answer.

c. Dynamic programming is something like being spontaneous on a tour of

England. When making any decision, you do not think about what you did in

the past, you only thinking about what you want to do right this second and

what you want to do later.

d. A greedy algorithm always makes the choice that is best for that moment and

does not consider the future consequences.

3. Upper and lower bounds on the complexity of specific problems

a. Definition: Complexity refers to how much time or memory (space) it takes

to solve a problem.

b. The time complexity of an algorithm talks about how long it takes to solve

problems using that algorithm. The space complexity of an algorithm talks

about how much memory it takes to solve a problem using that algorithm.

c. We typically measure the upper bound of time complexity with Big-O

notation.

d. The formal definition for Big-O notation is: given two functions f and g (that

have their domain in natural numbers and their range in real numbers), then

f(n) = O(g(n)) if we can find a positive integer constant c and a case of n (n0)

where f(n) = c (g(n)). In other words, there exists a multiple of g(n) that is

greater than or equal to all f(n). Keep in mind that the constant c is not

important when we express complexity (i.e. drop all constants in the

expression).

e. We measure the least case of time complexity with the notation f(n) =

Ω(g(n)). This means that the function f will take at least as long as some

multiple of g(n).

f. We measure the average case of time complexity with the notation f(n) =

Θ(g(n)). This means that, on average, the function f will take as long as some

multiple of g(n).

g. For space complexity, we use the same notation, but we usually explicitly let

the reader know that we are talking about space complexity and not time

Page 15 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

complexity. For example, we might write something like “f(n) = O(log n)

space” means that it take logarithmic space to solve the algorithm.

h. Let us order different complexities.

Shortest O(1) Constant time

O(log n) Logarithmic time

O(n) Linear Time

O(n log n) n log n time

O(n2) Quadratic time

x

O(n ) Polynomial time

O(xn) (n > 1) Exponential time

Longest O(n!) Factorial time

i. Imagine you have a sorted array of 1000 numbers. How would you find a

specific element in that array? A sequential search of that array is an easy

choice. On average, this will take Θ(n) time. The worst case is O(n) time (to

find the last element, you have to examine all elements). A binary search

starts with the middle element. If the middle element is bigger than the

element you are looking for, then you go to the element that is half way

between the first element and the middle element. If the middle element is

smaller than the element you are looking for, then you go to the element that

is half way between the last element and the middle element. Repeat until you

find the element you are looking for. Result: Θ(log n) and O(log n).

4. Computational complexity, including NP-completeness

a. Definition: Computability refers to whether or not a certain type of machine

can solve a certain problem.

b. Definition: NP-complete problems are special problems in modern

computing. The name comes from the fact that you can use a

Nondeterministic Polynomial Turing machine to solve these problems.

c. If any problem can be solved in polynomial time, then it is part of the class P.

All CFL are in P. The complement of a P language is also in P.

d. If a problem is not known to be solvable in polynomial time, but can be

verified in polynomial time, then we say it is in class NP. We say NP because

we know how to make nondeterministic polynomial time Turing machines to

represent the problems.

e. There exist problems in NP that all other NP problems are reducible to. We

say that these problems are NP-complete. Classic NP-complete problems

include:

i. The Hamiltonian path problem (does a directed graph have a path that

travels each node exactly once?)

ii. Composites (is a number a composite?)

iii. Clique (does a graph have a clique of a given size?)

iv. Subset-sum (given a set of numbers and target number, does the set have a

subset that will add up to the target?)

v. Satisfiabilty (is a given Boolean expression satisfiable?)

vi. Vertex cover (given a graph, is there a set of k nodes that all edges touch?)

f. The compliment of an NP language is in coNP. We do not know much about

this group of languages; it is not obvious if there are different from NP.

Page 16 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

g. Other problems are not known to be in NP, but that all NP are polynomial

time reducible to. We say these problems are NP-hard. Integer factorization

is a classic example of an NP-hard problem.

B. Automata and language theory

1. Models of computation (finite automata, Turing machines)

a. Definition: Before we made computers from sand, people designed models on

paper that could calculate. These hypothetical machines are the basis for all

modern computers.

b. A deterministic finite automaton (DFA) is a model of a machine that has a

finite number of states and inputs. Some DFA also have output. Question #11

in the practice test shows an output-producing DFA. The arrows between the

states show the input and output of the transition between the states. The

number on the left of the slash is input, the other number is output. (Memory

tip: English is read left-to-right; the input happens before the output, so it is on

the left.) If a set of input ends on an accept state, then we say that the machine

recognizes the input. The start state is the state that the machine begins to

process the input from. The set of all inputs that the machine recognizes is the

language of the machine.

c. A nondeterministic finite automata (NDFA) (or NFA) is a DFA that can

transition to and from multiple states at one time. We can convert any NDFA

into a DFA.

d. A Pushdown automaton (PDA) has states and transitions like a DFA, but it

also has a feature called a stack. The stack can hold information in a LIFO

order. The stack allows a PDA to recognize a wider range of languages than a

DFA.

e. A Turing machine is the most powerful model of computation that we

discovered that actually represents a real world device. It has transitions states

like PDA and DFA, but it uses a tape as both input and as working memory.

The tape is of infinite length and can be accessed by the machine in any order.

We have discovered that these two features are the key to this model’s power.

Modern computers are based on the Turing machine.

f. A nondeterministic Turing machine is a Turing machine that can branch into

multiple states (like a NDFA). We do not know of any way to convert all

nondeterministic Turing machines into Turing machines. As such, there is no

real world example of a nondeterministic Turing machine.

g. The Church-Turing thesis says that there are certain problems that can be

solved using “effective functions.” For a function to be effective, it must be

possible to solve the function in a finite number of steps with only paper and

pencil. We sometimes say that this is a mechanical function. Think about

calculating the first 100,000 digits of PI. It is possible for humans with paper,

pencil, and the formula to accomplish this. (It would take a long time, but it is

possible.) If there were not functions like this, then computers would not be

possible. Unfortunately, this is more philosophy than math and cannot be

proven using mathematical concepts.

2. Formal languages and grammars (regular and context free)

Page 17 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

a. Definition: The users manual for the computational machines (see III.B.1.a)

consist of the things you are allowed to do to them. To say it another way, all

of the inputs you are allowed to use on a machine are said to be the machine’s

language.

b. We studied context free grammars (CFG) originally to describe human

languages, but they are useful for understanding computer operations that

need to parse (e.g. compilers). A CFG is a series of statements in the form

α→β | t. We call this statement a production. α and β are variables and t is

terminal. Given the starting variable, the CFG parses the string until all

elements have been reduced to only the terminal statements. Here is a very

simple example.

A -> Aa | B

B -> BB | b

This CFG can produce a string that has any number of “a” followed by at least

one “b”. Note that it is common to represent variables as capital letters and

terminals as lower case. We also often call the first variable on the first line

the start variable.

c. Backus-Naur Form (BNF) is a very common metasyntax for CFGs. We

enclose non-terminals in angle brackets < > (See question #12). We read the

series of symbols ::= as “can be”, and the pipe symbol | as “or”. We use the

normal meaning for ellipse (…); in question #12, it represents a series of

letters and numbers.

d. Strings are also elements of a grammar or a language. When expressing

strings in languages we sometimes use regular expressions. Boolean

operators are legal in regular expressions and they have the typical meaning.

To concatenate two sides of an expression we can use the symbol ? or we can

use the same shorthand as multiplication. For example, we can write (0 ∪ 1) ?

0 or we can just use the shorthand (0 ∪ 1)0. The star operator * means that we

can repeat that section of the expression infinite times. For example, 0*1

means as many zeros as we want (including none) followed by exactly one 1.

We can also use it in complex situations: (0 ∪ 1)*0 means any number of 0 or

1 followed by exactly one 0.

e. Regular languages ⊂ Context-free languages ⊂ Decidable languages ⊂

Recursively enumerable languages.

f. A regular language is a language that can be expressed by a regular

expression and recognized by a DFA. We can use the pumping lemma to

prove that a language is regular. Regular languages are closed under union,

concatenation, and the star operation. The compliment of any regular language

is also regular.

g. A Context-free language (CFL) is any language that can be expressed using

a pushdown automata (PDA).

h. We use a different pumping lemma to show that a language is a CFL or is not

a CFL. CFL are closed under concatenation and union. The compliment of a

CFL is not necessarily a CFL.

3. Decidability

Page 18 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

a. Definition: A machine cannot solve some problems. If the machine can solve

it, then we say it is decidable on that machine.

b. Some problems cannot be solved by an algorithm. For example, it is

impossible to create a program that will always know whether all programs

will always halt on all inputs. On the other hand, some problems (languages)

are decidable.

c. All CFL are decidable. And some languages used on Turing machines are

decidable.

d. A language is decidable if and only if it and its complement are Turing-

recognizable.

e. Some important undecidable problems

i. Accepting Turing machine: We cannot calculate if a certain Turing

machine will accept a given input.

ii. The halting problem: It is impossible to decide if a given Turing machine

will halt on a certain input.

iii. Empty Turing machines: It is impossible to decide if a certain Turing

machine will accept any input at all.

iv. DFA equivalent: It is impossible to compute if a given Turing machine has

a DFA equivalent.

v. Equivalent Turing machines: We cannot calculate if two Turing machines

recognize equivalent languages.

C. Discrete structures

1. Mathematical logic

a. Definition: Mathematical logic is based on formulas that are always true or

false.

b. Three Americans (a scientist, a mathematician, and a logician) are traveling

on a train in China. The scientist sees a green cow in the middle of a field. He

says, “Wow, a green cow, the cows in China must be green.” The

mathematician corrects him, “No, all we know is that there is one cow in

China that is green.” The logician wakes up from his nap and corrects them

both, “All we know is that there exists at least one side of one cow that is

green in China.” The symbol ∃ is used to mean “there exists”. We use the

symbol ∀ to mean “for all”. An example of using these symbols is ∀n ∃(2n).

Read: For all n, there exists a number that is 2n.

c. Boolean logic is based on T/F (true/false) statements. Truth tables are the

most basic way to interpret Boolean functions. There are many basic operators

in Boolean

OR (Inclusive or) ∨+ One, the other, both are true

AND ^• Both must be true

NOT ¬ Negates

Exclusive OR (XOR) ⊕ One or the other, not both

Implies ⇒ If P then Q

Bidirectional Implication (IFF) ⇔ P if and only if Q

Page 19 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

OR ∨ NOT ¬ Implies ⇒

T T T T F T T T

T F T F T T F F

F T T F T T

F F F F F T

T T T T T F T T T

T F F T F T T F F

F T F F T T F T F

F F F F F F F F T

OR ∨ Implies ⇒

T T T T T T T T

T T F T T T F F

T F T T T F T T

T F F T T F F T

F T T T F T T T

F T F T F T F F

F F T T F F T T

F F F F F F F F

AND ^

T T T T IFF ⇔

T T F F T T T T

T F T F T T F F

T F F F T F T F

F T T F T F F T

F T F F F T T F

F F T F F T F T

F F F F F F T T

F F F F

XOR ⊕

T T T T

T T F F

T F T F

T F F T

F T T F

F T F T

F F T T

F F F F

d. We may express all Boolean operators using other operators. Said differently,

there are logical equivalents for each Boolean operator using other operators.

The result is that one does not really need all of the Boolean operators to

Page 20 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

express Boolean functions. It is possible to use just a subset of the operators.

Any subset of Boolean operators that can express all Boolean functions is

complete. You will note that it is impossible to have a complete set of

operators without including at least one negating operator. Here are some

examples of logical equivalence2

i. P ∨ Q = ¬(¬P ∧ ¬Q)

ii. P ⇒ Q = ¬P ∨ Q

iii. P ⇔ Q = (P ⇒ Q) ∧ (Q ⇒ P) (of course!)

iv. P ⊕ Q = ¬(P ⇔ Q)

e. If you have a statement that always evaluates to true, then it is a tautology (or

is valid). If you create a statement that always evaluates as false, then it is a

contradiction. We say that two statements p and q are logically equivalent,

written p ≡ q, if p ⇔ q. If a statement has at least one way to evaluate to true,

then we say it satisfiable.

Logical Equivalences

p ^ True ≡ p Identity

p ∨ False ≡ p

p ∨ True ≡ True Domination

p ^ False ≡ False

p^p≡p Idempotent (unchanged)

p∨p≡p

¬ (¬p) ≡ p Double negation

p^q≡q^p Commutative

p∨q≡q∨p

p ^ (q ^ z) ≡ (p ^ q) ^ z Associative

p ∨ (q ∨ z) ≡ (p ∨ q) ∨ z

p ∨ (q ^ z) ≡ (p ∨ q) ^ (p ∨ z) Distributive

p ^ (q ∨ z) ≡ (p ^ q) ∨ (p ^ z)

¬(p ^ q) ≡ ¬p ∨ ¬q De Morgan’s Laws

¬(p ∨ q) ≡ ¬p ^ ¬q

p ∨ (p ^ q) ≡ p Absorption

p ^ (p ∨ q) ≡ p

p ∨ ¬p ≡ True Negation

p ^ ¬p ≡ False

f. Logic gates are graphical representations of Boolean logic. The symbols are

in the instructions for the test. They include graphical representations for

AND, OR, XOR, NOT, NAND, and NOR. NAND is an AND followed by a

NOT. NOR is an OR followed by a NOT. A circuit is a bunch of logic gates

connected together. We can represent any circuit using a Boolean statement

(although it may be very complex statement).

2. Elementary combinatorics and graph theory

a. Definition: Combinatorics is the study of counting and sets.

2

Sipser, p. 15

Page 21 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

b. Definition: Graphs help us to study various types of problems. In this case, a

graph is set of lines and vertices.

c. Prefix notation (also called Polish notation) is a way of writing

mathematical formulas that accurately show how a computer parses and

executes the expression. Prefix notation does not use parentheses or rules of

precedence. The order of execution is entirely dependent on the order of the

expression. When evaluating a prefix expression, imagine that the values and

operators of the expression are nodes in a binary tree. To evaluate the

expression, traverse the tree in preorder. Alternatively, once the expression is

in prefix notation, you can evaluate it by finding the rightmost operator.

Evaluate the expression that uses that operator and the two values

immediately to the right of the operator. For example

*++234**5++6789

6 + 7 = 13

* + + 2 3 4 * * 5 + 13 8 9

13 + 8 = 21

* + + 2 3 4 * * 5 21 9

5 * 21 = 105

* + + 2 3 4 * 105 9

…

d. Postfix notation (also called reverse Polish notation) traverses the tree in

postorder. Alternatively, evaluate the expression starting with the leftmost

operator and the two values immediately to the left of that operator.

e. There are two basic rules when counting. We call the first the product rule. If

a task P can be divided into two parts P1 and P2, if P1 happens x times and P2

happens y times, then the task P happens xy times. The sum rule says that if a

task Q can be broken into two tasks that have no impact on each other, then Q

= Q1 + Q2.

f. The most basic type of graph consists of nodes (or vertices) and edges. An

edge connects two nodes together. If the edges point from one node to

another, then they are directed, and we say it is a directed graph.

g. A clique is a graph where all of the nodes are connected directly to each other.

h. A path in a graph is the set of nodes and edges you travel to get from one node

to another node.

i. The degree of a node is the number of edges that connect to it.

3. Discrete probability, recurrence relations, and number theory

a. Definition: Discrete means that it can be separated from other things. Digital

clocks are discrete because 9 o’clock is a very specific state. Analog (face)

clocks are not discrete because there is a range of states that are considered to

be 9 o’clock.

b. Definition: For thousands of years, geeks and nerds all over the world have

looked and found patterns in numbers. Before computers, most of these

patterns had little value, so we called it number theory.

c. Sets are unordered collections of elements. We need to know how to describe

the members of the set and we need to know operators on sets.

Page 22 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

d. We list the members of a set B inside of a pair of braces{}. If an element is a

member of the set, we write x ∈ B. If an element is not a member of the set,

we write x ∉ B. If the elements of a set follow from some sort of function,

then we write {x | f(x)} (read this as all x such that f(x). If two sets have the

same elements, then B = D. If a set has no elements, then it is the empty set

or null set written ∅. If all of the elements in B are also in D, then B is a

subset of D written B ⊆ D. If B ⊆ D and there is at least one element in D that

is not in B, then B is a proper subset of D written B ⊂ D. The power set of a

given set is set of all possible subsets of the original set. For example, the

power set of {0,1} is [{∅}, {0}, {1}. {0,1}]. You should note that that the

number of elements (which are actually subsets) in the power set with n

elements is 2n. When counting the number of elements in a set, we often say it

has cardinality of n if there are n elements in the set. We say sets are cardinal,

because the order of the elements is not important. If the order of the elements

is important, we usually call it an ordered set or a sequence.

e. The compliment of B written B has all of the elements that are not in B, but

are in the set that is made up of B and B . The union of two sets S and T,

written S ∪ T is a new set that has all of the elements from S and T. The

intersection of S and T, written S ∩ T is a new set that has only the elements

that are in both S and T. The Cartesian product (or cross product) of A and

B, written A X B is {(a, b) | a ∈A ^ b ∈ B}. Meaning the new set has all of the

ordered pairs with one element from A and one element from B.

f. Strings are elements in a specific order. Again, we need to know how to

express them and operate on them.

g. A string that has no elements is called the empty string written λ or ∈.

Strings are often constructed using summation notation.

h. Two strings S and T are concatenated, written ST, if the elements in S are

followed immediately by the elements in T. Sn means that the string S has n

elements; also called the length of S. T* creates a new set that includes the

empty set and a bunch of new strings made up of elements from T. Each of

these new strings is of varying length. We could write T* = { ∈ + T1 + T2 + T3

+ …}. T+ is similar to T* except it does not include the empty set. The

reversal of a string B, written BR, is the same elements but in the opposite

order.

i. A matrix is a set of numbers arranged in two dimensions.

1 0 4

2 1 1

3 1 0

0 2 2

j. We calculate the product AB of two matrices A and B by producing a new

matrix that has the height of the first matrix (A) and the width of the second

matrix (B). We can only perform this calculation on matrices where the first

matrix’s width is equal to the second matrix’s height. Each value in the new

matrix is the sum of the products of each set of values in the first matrix’s row

Page 23 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

and the second matrix’s column. For example, take two matrices A and B.

1 0 4

2 3 1 1 4

A B 1 1

3 1 0

3 0

0 2 2

The value in the upper left hand corner of the new matrix is 1x1 + 0x1 + 4x3

= 13. The new matrix is

13 4

8 11

AB

4 13

8 2

k. We use the modulo (mod) operator to find the remainder of a division

operation. For example, 17 mod 5 is 2. When you divide 17 by 5 you get 3

remainder 2. All by itself, this is a silly operation, but it turns out that there are

many applications in things like cryptography and hash tables. So, we need to

know some of the special properties of modulo arithmetic. Take two integers

A and B and a positive integer M. If (A-B)/M is an integer, then we say that A

is congruent to B modulo M, written A ≡ B (mod M). Furthermore, A ≡ B

(mod M) ⇔ A mod M = B mod M. Meaning that the remainder after dividing

A/M will equal the remainder of B/M. Also, A ≡ B (mod M) ⇔ A = B + kM,

for some integer k. This is intuitive if you look closely. We already know that

A ≡ B (mod M) ⇔ (A-B)/M. There must be some multiple of M that we can

add to B to get A. Just like there are an infinite set of numbers that are

divisible by M, there are an infinite set of numbers that are mod M. All of the

integers that are congruent to A modulo M are in the congruence class of A

modulo M. We also have some very interesting properties when we deal with

congruence classes. If A ≡ B (mod M) and C ≡ D (mod M), then A + C ≡ B +

D (mod M) and AC ≡ BD (mod M). The multiplication of AC and BD being

congruent is intuitive. Since each factor is congruent, then the product should

be congruent (remember that modulo is just weird division, and that division

is just multiplication having a bad-hair day). I do not consider the addition

property very intuitive. You should try a couple of problems just to see that it

works. Also, here is a little proof. If A ≡ B (mod M) then A = B + kM for

some integer k. If C ≡ D (mod M) then C = D + jM for some integer j.

Therefore, B + D = (A + kM) + (C + jM). Rearrange stuff a little and you get

B + D = (A + C) + (k + j)M. Remember that A ≡ B (mod M) ⇔ A = B + kM;

treat B + D as the A, (A + C) as the B, and the (k + j) and the k, and you will

see that A + C ≡ B + D (mod M).

IV. Other Topics – 5%

A. Example areas include numerical analysis, artificial intelligence, computer graphics,

cryptography, security, and social issues.

1. Cryptography is the study of using encrypting information. Integer factorization is

has an average case time complexity that is very long.

Page 24 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

Note: Students are assumed to have mathematical background in the areas of calculus and

linear algebra as applied to computer science.

Appendix

Greek letters

Name Symbol Meaning

Alpha α

Beta β

Epsilon ∈ Member of a set or an empty string

Lambda ? Empty string

Theta Θ Average case of complexity

Omicron Ο Worst case of complexity, usually just called Big-O

Omega Ω Least case of complexity

Mathematics

1. I am of the opinion that many of the questions on the sample test are really a test

of mathematical skills. For example, practice questions 17, 31, and 32 do not

really require computer knowledge to solve - they mostly involve mathematical

skills. I recommend reviewing your discrete mathematics text as preparation for

this test.

2. Binary is the language and number system of the computer. Binary uses base 2

instead of Base 10. The easiest decimal numbers to represent using binary are

some power of 2. Practice test question #2 requires you to realize that 0.5 = 2-1. I

highly recommend looking at this website that describes how to do basic binary

counting on your fingers. http://www.johnselvia.com/binary/binary.html

Furthermore, in the real world, there are certain numbers that appear regularly. It

is useful to memorize this table

Power Value Description

28 256 Byte

10

2 1024 Kilo-

216 65,536 2 Bytes

20

2 ~1,000,000 Mega-

24

2 ~16.7 million 3 Bytes

230 ~1,000,000,000 Giga-

232 ~4,000,000,000 4 Bytes

3. There are three types of equivalence relations.

a. A reflexive relation (R) on x means that xRx.

b. A symmetric relation R for x and y means that xRy ⇔ yRx.

c. A transitive relation R for x, y and, z means that (xRy and yRz) ⇒ xRz.

4. ! is the factorial function. The notation x! (read as x factorial) represents a number

whose factors are all the integers from 1 to x. For example, 6! = 6 * 5 * 4 * 3 * 2 *

1 = 720

Page 25 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

5. x is a unary function called the floor function. We apply this function to real

numbers and always create an integer. It is almost like forcing the real number to

round down. We will see this most often when trying to find a minimum integer

value for a function. For example, log2 10 = 3 because it takes at least 23 to get

to 10.

6. x is a unary function called the ceiling function. We apply this function to real

numbers and always create an integer. It is almost like forcing the real number to

round up. We will see this most often when trying to find a maximum integer

value for a function. For example, log2 10 = 4 because it takes at least 24 to get

to 10.

7. We often use hexadecimal (base 16) to make it easier to write long binary

numbers. Base 16 converts easily to base 2 (binary).

Hex Binary Decimal

0 0000 0

1 0001 1

2 0010 2

3 0011 3

4 0100 4

5 0101 5

6 0110 6

7 0111 7

8 1000 8

9 1001 9

A 1010 10

B 1011 11

C 1100 12

D 1101 13

E 1110 14

F 1111 15

Index

Activation record, 5 Bubble sort, 15

ALU, 10 Busy-waiting, 6

Ampersand, 2 C, 2, 4, 7, 11, 13, 15, 24, 26

AND, 19, 20, 21 Cache, 11

Arithmetic logic unit, 10 Cache miss, 11

Asymptotic, 13 Call-by-name, 7

Backus-Naur Form, 18 Call-by-reference, 7

Balanced binary search tree, 4 Call-by-value, 7

Bidirectional Implication, 19 Call-by-value-result, 7

Big-O notation, 15, 25 Cardinality, 23

Binary, 3, 4, 9, 10, 11, 16, 22, 25, 26 Carry in, 10

Binary heap, 4 Carry out, 10

Binary search, 3, 4, 16 Cartesian product, 23

Binary search tree, 3, 4 Ceiling function, 26

Binding, 7 Cells, 2, 3

BNF, 18 CFG (Context free grammars), 18

Boolean, 2, 10, 16, 18, 19, 20, 21 CFL, 16, 18, 19

Page 26 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

Church-Turing thesis, 17 Latch, 9

Circularly linked list, 2 Least significant bit, 9

class NP, 16 Lexical scope, 7

Class P, 16 LIFO, 2, 10, 17

Collision, 3 Linear probing, 3

Compliment, 10, 16, 18, 23 linked list, 2

Concatenate, 18 Linked list, 2, 3

Concatenated, 23 Logic gates, 21

Concurrency, 5, 6 Logically equivalent, 21

Congruence class, 24 Loop, 4, 5, 6, 11

Context free grammars, 18 Loops, 4, 5, 11

Context-free language, 18 Macro, 5

Context-free languages, 18 Matrix, 23

Contradiction, 21 Member, 23, 25

Copy-in/copy-out, 7 Memory-mapped I/O, 8

Data path, 10 Merge sort, 14

Deadlock, 6 Method, 5, 6, 8, 14

Decidability, 18 Mod, 24

Decidable language, 18 Modulo, 3, 24

Deterministic finite automaton, 17 Monitors, 6

DFA, 17, 18, 19 Most significant bit, 9

Diminishing gap sort, 14 Multi-tasking, 6

Direct mapped cache, 11 Name, 2, 5, 7, 8, 16, 25

Doubly linked list, 2 NAND, 21

Element, 2, 3, 9, 14, 16, 23 NDFA, 17

Elements, 2, 3, 4, 13, 14, 16, 18, 22, 23 Nodes, 3, 4, 16, 22

Empty set, 23 Nondeterministic finite automata, 17

Empty string, 23, 25 NOR, 21

End-around carry, 10 NOT, 19, 20, 21

Equivalence relations, 25 NP, 1, 16, 17

Exception, 5 NP-complete, 1, 16

Exception handler, 5 NP-hard, 17

Exclusive OR, 19 Null set, 23

Execution time, 13 o, 2, 3, 4, 5, 7, 11, 18, 22

FIFO, 2, 6, 10 O, 2, 3, 4, 5, 7, 8, 11, 12, 14, 15, 16, 18, 22, 25

First-in/First-out, 2 Offset, 12

Fixed repetition loops, 4 OR, 19, 20, 21

Flip-flop, 9 Overflow, 3, 10

Floor function, 26 Overhead, 11, 13, 14

Garbage collection, 3 P, 16, 19, 21, 22

Greek letters, 1, 25 Page, 12, 29

Hash table, 3, 24 Page fault, 12

Heap, 3, 4, 15 Paging tables, 12

Heapsort, 15 Parallel processing, 13

I/O port, 8 Parameter passing, 7

IFF, 19, 20 Pascal, 2, 3, 4, 7

Implies, 19, 20 PDA, 17, 18

Inorder, 4 Pipelining, 13

Insertion sort, 13, 14 Pointer, 2, 3, 5

Instruction set, 8, 9, 10 Polish notation, 22

Intersection, 23 Postfix notation, 22

Iteration, 4, 5 Postorder, 4, 22

Job scheduling, 10 Posttest loops, 4

Keys, 4 Power set, 23

Language, 5, 7, 8, 9, 16, 17, 18, 19, 25 Prefix notation, 22

Last-in/First-out, 2 Preorder, 4, 22

Page 27 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

Pretest loops, 4 Shortest job first, 10

Priority queue, 4 Sign and magnitude, 9

Priority queuing, 10 Singly linked list, 2

Producer-consumer relationship, 6 Sorted list, 3

Product rule, 22 Space complexity, 15

Production, 18 Spatial locality, 11, 12

Proper subset, 23 Star operator, 18

Pumping lemma, 18 State, 17, 22

pushdown automata, 18 String, 18, 23

Quadratic probing, 3 Strings, 3, 18, 23

Quicksort, 14 Subset, 16, 21, 23

Recognizes, 17 Sum rule, 22

Recursive functions, 4, 5 Symmetric relation, 25

Recursively enumerable, 18 Tautology, 21

Reentrant, 8 Temporal locality, 11

Reflexive relation, 25 Terminal, 18

Regular expressions, 18 Throughput, 13

Regular language, 18 Time complexity, 15, 16, 24

Return address, 5 Transitive relation, 25

Round-robin, 10 Turing-recognizable language, 18

Satisfiable, 16, 21 Union, 18, 23

Scope, 7 Valid, 3, 21

Selection sort, 14 Variables, 5, 7, 18

Semaphore, 6 Virtual memory, 11, 12

Sequential search, 16 XOR, 19, 20, 21

Sets, 9, 21, 22, 23 Youngest job first, 10

Shellsort, 14

Begging Copyright

No one has to pay to use this for his or her own personal use. You cannot resell this or

use it to teach a class without buying it from me. If you reprint any part of this, please

provide a citation and a link to my website. Finally, if you find this study guide useful, I

encourage you to donate $3 to me.

http://www.amazon.com/paypage/P3FXTGMULEY7KP or take a look on my website for

the donation box. All other rights are reserved.

Sources

For each question in the practice exam (ftp://ftp.ets.org/pub/gre/15255.pdf), I have tried

to provide all of the information necessary to answer it and questions that may be like it.

As of now, I have skipped instruction #13, and questions #51, #58, and #60. I have also

tried to provide enough information to answer most of the questions on the old practice

test. http://groups.yahoo.com/group/grecs/files/CompSci_Booklet_1999.pdf (You may

have to join the yahoo group.) I did skip #24, #30, #31, #32, and #35.

1. Introduction to the Theory of Computation by Michael Sipser, 1997 PWS

Publishing Company

2. Discrete Mathematics and it Applications, 5th edition by Kenneth H. Rosen, 2003

McGraw Hill

Page 28 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

Computer Science GRE Study Guide Version 1.0

http://www.hunterthinks.com/GRE Hunter Hogan

3. Computer Organization and Design, 2nd edition by Hennessy and Patterson, 1998

Morgan Kaufmann Publishers, Inc

4. Programming Languages, 2nd edition by Ravi Sethi, 1996 Addison-Wesley

5. Data Structures and Problem Solving using JAVA, 2nd edition by Mark Allen

Weiss, 2002 Addison-Wesley

Acknowledgements

1. Thanks to Jack for being the first to subscribe to the email list.

2. Thanks to Kristi for her support.

3. Thank you to the person that donated $20 (wow!) to help me pay for the website.

4. Thanks to Xin for sending the first email comments.

5. Thanks to Richard Conlan for many great suggestions.

6. Thanks to Joni Laserson for your suggestion

7. Thanks to Wood for his many great suggestions

Authority

I am not an authority on computer science; I am a student. However, I do have experience

in the computer industry and experience as an educator of computer concepts. My

credentials include MCSE, MCSA, A+, Network+, 2 years of teaching those

certifications, and many years in technical support, web design, and general debauchery.

However, if your professor and this document disagree, I suggest you believe your

professor.

To receive an email when I post updates, subscribe to the email list.

http://www.hunterthinks.com/solo/announce/subscribe.html Since I have already taken

the test, I may not update this study guide. If people like this study guide enough to

donate money, then I will continue to update the study guide. Donate on my website:

http://www.hunterthinks.com/gre/

Page 29 of 29

© 2003 Hunter Hogan. Usage rights are detailed in this document.

- A Level Computing 9691 Paper 2 NotesUploaded byNash Ash Sh
- user's manual rslogix 5000.pdfUploaded byDennis Brenes Rodríguez
- AlgorithmsUploaded byManish Kaushal
- 6.1 CachesUploaded byDaniel Fs
- chacheUploaded byAbhishek Pandey
- Hybrid Test Automation FrameworkUploaded byupsr111
- 5Uploaded byapi-3814854
- Cache Memory Mapping TechniquesUploaded byAyush Shrestha
- Product Allocation in APO-GATPUploaded byVarun Kumar
- Sys InfoUploaded byOmar94
- Datastage QuestionsUploaded byMaurizio Baader
- PrefetchingUploaded byMuhammad Haseeb ul Hassan Zahid
- 4th SeminarUploaded byDoinita Constantin
- FusionSim SimulatorUploaded byshingaridavesh93
- Computer Live MemoryUploaded bysumon1982
- ArraysUploaded bySafder Khan Babar
- FBFA0307enUploaded byPaul
- IJEA_V2_I2Uploaded byAI Coordinator - CSC Journals
- cx programming langaugeUploaded byAsher N Gonzalez
- DraganUploaded bynpnbkck
- Modified LinklistUploaded byJay Rajput
- Guard Banding in MetCalUploaded byBart Geeraerts
- CH10_95(cha)Uploaded bySean Patrick Altea
- AMCAT SYLLABUSUploaded byGanesh Prabhakar Pasupaleti
- b14227.pdfUploaded bytariqbashir
- lec24Uploaded byAbner HA
- Cache MemoryUploaded byEduardo Martinez
- Pl SQL Quick RefUploaded byMohd Waseem
- Mod Array ParaUploaded byKamugasha Kagonyera
- DS 7 5 Quest Ansrs TuningUploaded byGeorge E. Coles

- 111Uploaded byJoong Ki
- python flashcards _ Quizlet.pdfUploaded byMahmoud Magdy
- LPS Assignments linux questionsUploaded bysiddu
- Methods and Classes Java Study GuideUploaded byJava OOP
- Chapter 3Uploaded byravi
- Java Script Notes Sem-II-1Uploaded byShital Ghodke
- Lecture 3 PSTAT 130Uploaded byEddie Aguilera
- en_msgsUploaded byPrawidya Destianto
- Copa Syllabus 2011Uploaded byKiran Somanache
- .net technologiesUploaded byHajiram Beevi
- Spring Webflow ReferenceUploaded byDavid Gérard
- Core Java Interview QuestionsUploaded byanarki85
- AD41700 Unity3D Workshop03 F13Uploaded byCarlos Jacob Guerra
- 277588 Maxscript cUploaded by0815976
- awk.pdfUploaded bySxireti
- Question PaperUploaded byAkshay7994
- documentslide.com_section-8-5659af320768b.docxUploaded byerwin
- Python WorkbookUploaded bykamranali
- A Guided Tour Of AwkUploaded bycomputercrazy1989
- Classes and Objects in VB2010Uploaded byjosefalarka
- GFS-384 M05 User Defined Function BlocksUploaded byftomazinii
- Calculation ManagerUploaded byMiguel Ángel Gutiérrez Plata
- C Programming LoopsUploaded byPradeep Poudel
- OOP - C2Uploaded byBeatrice Secosan
- L-12-State Variables and ScriptingUploaded byManuel Suero
- CursorsUploaded byjunaid_id
- ClassesUploaded byUtpal Kumar Pal
- UDFUploaded bysamil68
- C++ VTU notesUploaded byarmyraj09
- Tft Sdk ManualUploaded byRenier Serven