Вы находитесь на странице: 1из 17

Table of Contents

0) Definition of algorithms ........................................................................................2


1) Brief intro to algorithmic analysis .........................................................................2
1.1) Formal versus empirical .................................................................................2
1.2) An example of execution efficiency in real life .............................................3
2) Analysis of algorithms ...........................................................................................3
2.1) Terminologies that you might encounter in analysis of algorithm.................3
2.2) Algorithmic efficiency continued ...................................................................6
2.3) Computational complexity continued.............................................................9
2.4) Asymptotic computational complexity.........................................................11
2.5) Analysis of algorithms (full version) ............................................................12
3) General rules for estimation of the run-time of an algorithm’s implementation 13
) Big O notation........................................................................................................14
4) Common complexity (growth rate) .................................................................16
)Space-time tradeoff .................................................................................................17
0) Definition of algorithms
What is an algorithm in the field of mathematics and computer science? It is is a
finite sequence of well-defined, computer-implementable instructions, typically to
solve a class of problems or to perform a computation. Algorithms are
unambiguous specifications for performing calculation, data processing, automated
reasoning, and other tasks.
As an effective method (a procedure for solving a problem for a specific class of
problems), an algorithm can be expressed within a finite amount of space and time,
and in a well-defined formal language for calculating a function.

1) Brief intro to algorithmic analysis


It is frequently important to know how much of a particular resource (such as time
or storage) is theoretically required for a given algorithm. Methods have been
developed for the analysis of algorithms to obtain such quantitative answers
(estimates);
For example, the sorting algorithm above has a time requirement of O(n), using the
big O notation with n as the length of the list. At all times the algorithm only needs
to remember two values: the largest number found so far, and its current position in
the input list. Therefore, it is said to have a space requirement of O(1), if the space
required to store the input numbers is not counted, or O(n) if it is counted.
Different algorithms may complete the same task with a different set of
instructions in less or more time, space, or 'effort' than others. For example, a
binary search algorithm (with cost O(log n) ) outperforms a sequential search (cost
O(n) ) when used for table lookups* on sorted lists or arrays.
*a lookup table: is an array that replaces runtime computation with a simpler array indexing
operation. The savings in terms of processing time can be significant, since retrieving a value
from memory is often faster than undergoing an "expensive" computation or input/output
operation. The tables may be precalculated and stored in static program storage, calculated (or
"pre-fetched") as part of a program's initialization phase (memoization), or even stored in
hardware in application-specific platforms (more on Wikipedia)

1.1) Formal versus empirical


The analysis, and study of algorithms is a discipline of computer science, and is
often practiced abstractly without the use of a specific programming language or
implementation. In this sense, algorithm analysis resembles other mathematical
disciplines in that it focuses on the underlying properties of the algorithm and not
on the specifics of any particular implementation.
Usually pseudocode is used for analysis as it is the simplest and most general
representation. However, ultimately, most algorithms are usually implemented on
particular hardware/software platforms and their algorithmic efficiency is
eventually put to the test using real code.
For the solution of a "one off" (one-off means limited to a single time, occasion, or
instance/unique) problem, the efficiency of a particular algorithm may not have
significant consequences (unless n is extremely large) but for algorithms designed
for fast interactive, commercial or long life scientific usage it may be critical.
Scaling from small n to large n frequently exposes inefficient algorithms that are
otherwise benign.
Empirical testing is useful because it may uncover unexpected interactions that
affect performance. Benchmarks may be used to compare before/after potential
improvements to an algorithm after program optimization. Empirical tests cannot
replace formal analysis, though, and are not trivial to perform in a fair manner

1.2) An example of execution efficiency in real life


To illustrate the potential improvements possible even in well-established
algorithms, a recent significant innovation, relating to FFT algorithms (used
heavily in the field of image processing), can decrease processing time up to 1,000
times for applications like medical imaging. In general, speed improvements
depend on special properties of the problem, which are very common in practical
applications. Speedups of this magnitude enable computing devices that make
extensive use of image processing (like digital cameras and medical equipment) to
consume less power.

2) Analysis of algorithms
2.1) Terminologies that you might encounter in analysis of
algorithm
a) The computational complexity (or just simply complexity) of an algorithm
is the amount of resources required for running it (a property unrelated to
“complexity” in a conventional sense)
As the amount of needed resources varies with the input. The complexity is
generally expressed as a function n → f(n), where n is the size of the input, and
f(n) is either the worst-case complexity, that is the maximum of the amount of
resources that are needed for all inputs of size n, or the average-case complexity,
that is average of the amount of resources over all input of size n.
Generally, when "complexity" is used without being further specified, this is the
worst-case time complexity that is considered.
b) The computational complexity (or just simply complexity) of a problem is
the minimum of the complexities of all possible algorithms for this problem
(including the unknown algorithms). The study of the complexity of explicitly
given algorithms is called analysis of algorithms, while the study of the complexity
of problems is called computational complexity theory. Clearly, both areas are
strongly related, as the complexity of an algorithm is always an upper bound of the
complexity of the problem solved by this algorithm.
c) Worst-case complexity (usually denoted in asymptotic notation) measures the
resources (e. g. running time, memory) an algorithm requires in the worst -case. It
gives an upper bound on the resources required by the algorithm.
In the case of running time, the worst-case time-complexity indicates the longest
running time performed by an algorithm given any input of size n, and thus this
guarantees that the algorithm finishes on time. Moreover, the order of growth of
the worst-case complexity is used to compare the efficiency of two or more
algorithms.
d) Average case complexity of an algorithm is the amount of some computational
resource (typically time) used by the algorithm, averaged over all possible inputs.
It is frequently contrasted with worst-case complexity which considers the
maximal complexity of the algorithm over all possible inputs.
There are three primary motivations for studying average-case complexity:
 First, although some problems may be intractable in the worst-case, the inputs
which elicit this behavior may rarely occur in practice, so the average-case
complexity may be a more accurate measure of an algorithm's performance.
 Second, average-case complexity analysis provides tools and techniques to
generate hard instances of problems which can be utilized in areas such as
cryptography and derandomization
 Third, average-case complexity allows discriminating the most efficient
algorithm in practice among algorithms of equivalent based case complexity
(for instance Quicksort).
Average-case analysis requires a notion of an "average" input to an algorithm,
which leads to the problem of devising a probability distribution over inputs.
Alternatively, a randomized algorithm can be used. The analysis of such
algorithms leads to the related notion of an expected complexity. (need to
research more for this!)
e) Computational resources: The simplest computational resources are
computation time, the number of steps necessary to solve a problem, and memory
space, the amount of storage needed while solving the problem, but many more
complicated resources have been defined. Resources needed to solve a problem are
described in terms of asymptotic analysis, by identifying the resources as a
function of the length or size of the input. Resource usage is often partially
quantified using Big O notation.
f) Algorithmic efficiency is a property of an algorithm which relates to the number
of computational resources used by the algorithm. An algorithm must be analyzed
to determine its resource usage, and the efficiency of an algorithm can be measured
based on usage of different resources
For maximum efficiency we wish to minimize resource usage. However, different
resources such as time and space complexity cannot be compared directly, so
which of two algorithms is considered to be more efficient often depends on which
measure of efficiency is considered most important.
For example, bubble sort and timsort are both algorithms to sort a list of items
from smallest to largest. Bubble sort sorts the list in time proportional to the
number of elements squared O(n2) but only requires a small amount of extra
memory which is constant with respect to the length of the list O(1). Timsort sorts
the list in time linearithmic (proportional to a quantity times its logarithm) in the
list's length O(n log n), but has a space requirement linear in the length of the list
O(n). If large lists must be sorted at high speed for a given application, timsort is a
better choice; however, if minimizing the memory footprint of the sorting is more
important, bubble sort is a better choice.
f) Performance
2.2) Algorithmic efficiency continued
a) Brief overview
An algorithm is considered efficient if its resource consumption, also known as
computational cost, is at or below some acceptable level. Roughly speaking,
'acceptable' means: it will run in a reasonable amount of time or space on an
available computer, typically as a function of the size of the input
There are many ways in which the resources used by an algorithm can be
measured: the two most common measures are speed and memory usage; other
measures could include transmission speed, temporary disk usage, long-term disk
usage, power consumption, total cost of ownership, response time to external
stimuli, etc. Many of these measures depend on the size of the input to the
algorithm, i.e. the amount of data to be processed. They might also depend on the
way in which the data is arranged; for example, some sorting algorithms perform
poorly on data which is already sorted, or which is sorted in reverse order.
In practice, there are other factors which can affect the efficiency of an algorithm,
such as requirements for accuracy and/or reliability. As detailed below, the way in
which an algorithm is implemented can also have a significant effect on actual
efficiency, though many aspects of this relate to program/software optimization
issues.
b) Theoretical analysis
In the theoretical analysis of algorithms, the normal practice is to estimate their
complexity in the asymptotic sense. The most commonly used notation to describe
resource consumption or "complexity" is Donald Knuth's Big O notation,
representing the complexity of an algorithm as a function of the size of the input n.
Big O notation is an asymptotic measure of function complexity, where f(n) –
O(g(n)) roughly means the time requirement for an algorithm is proportional to
g(n), omitting lower-order terms that contribute less than g(n) to the growth of the
function as n grows arbitrarily large. This estimate may be misleading when n is
small, but is generally sufficiently accurate when n is large as the notation is
asymptotic
For example, bubble sort may be faster than merge sort when only a few items are
to be sorted; however either implementation is likely to meet performance
requirements for a small list. Typically, programmers are interested in algorithms
that scale efficiently to large input sizes, and merge sort is preferred over bubble
sort for lists of length encountered in most data-intensive programs.
Implementation issues can also have an effect on efficiency, such as the choice of
programming language, or the way in which the algorithm is actually coded, or the
choice of a compiler for a particular language, or the compilation options used, or
even the operating system being used
c) Measures of resource usage

Measures are normally expressed as a function of the size of the input .


The two most common measures are:

 Time: how long does the algorithm take to complete?


 Space: how much working memory (typically RAM) is needed by the
algorithm? This has two aspects: the amount of memory needed by the code
(auxiliary space usage), and the amount of memory needed for the data on
which the code operates (intrinsic space usage).

For computers whose power is supplied by a battery (e.g. laptops and


smartphones), or for very long/large calculations (e.g. supercomputers), other
measures of interest are:

 Direct power consumption: power needed directly to operate the computer.


 Indirect power consumption: power needed for cooling, lighting, etc.

As of 2018, power consumption is growing as an important metric for


computational tasks of all types and at all scales ranging from embedded Internet
of things devices to system-on-chip devices to server farms. This trend is often
referred to as green computing.

Less common measures of computational efficiency may also be relevant in some


cases:

 Transmission size: bandwidth could be a limiting factor. Data compression can


be used to reduce the amount of data to be transmitted. Displaying a picture or
image (e.g. Google logo) can result in transmitting tens of thousands of bytes
(48K in this case) compared with transmitting six bytes for the text "Google".
This is important for I/O bound computing tasks.
 External space: space needed on a disk or other external memory device; this
could be for temporary storage while the algorithm is being carried out, or it
could be long-term storage needed to be carried forward for future reference.
 Response time (latency): this is particularly relevant in a real-time application
when the computer system must respond quickly to some external event.
 Total cost of ownership: particularly if a computer is dedicated to one particular
algorithm.

Time measure in theory: analyze the algorithm, typically using time complexity
analysis to get an estimate of the running time as a function of the size of the input
data. The result is normally expressed using Big O notation. This is useful for
comparing algorithms, especially when a large amount of data is to be processed.
More detailed estimates are needed to compare algorithm performance when the
amount of data is small, although this is likely to be of less importance. Algorithms
which include parallel processing may be more difficult to analyze.

Time measure in practice:

Use a benchmark to time the use of an algorithm. Many programming languages


have an available function which provides CPU time usage. For long-running
algorithms the elapsed time could also be of interest. Results should generally be
averaged over several tests.

Run-based profiling can be very sensitive to hardware configuration and the


possibility of other programs or tasks running at the same time in a multi-
processing and multi-programming environment.

This sort of test also depends heavily on the selection of a particular programming
language, compiler, and compiler options, so algorithms being compared must all
be implemented under the same conditions.

Space measure

This section is concerned with the use of memory resources (registers, cache,
RAM, virtual memory, secondary memory) while the algorithm is being executed.
As for time analysis above, analyze the algorithm, typically using space
complexity analysis to get an estimate of the run-time memory needed as a
function as the size of the input data. The result is normally expressed using Big O
notation.

There are up to four aspects of memory usage to consider:


 The amount of memory needed to hold the code for the algorithm.
 The amount of memory needed for the input data.
 The amount of memory needed for any output data
 The amount of memory needed as working space during the calculation
(including local variables and any stack space needed by routines called during
a calculation; this stack space can be significant for algorithms which use
recursive techniques)

Some algorithms, such as sorting, often rearrange the input data and don't need any
additional space for output data. This property is referred to as "in-place" operation

***Further reading of implementation concerns for algorithms and


benchmarking:
https://en.wikipedia.org/wiki/Algorithmic_efficiency#Benchmarking:_measuring_
performance and
https://en.wikipedia.org/wiki/Algorithmic_efficiency#Implementation_concerns

2.3) Computational complexity continued


a) Time complexity
In computer science, the time complexity is the computational complexity that
describes the amount of time it takes to run an algorithm. Time complexity is
commonly estimated by counting the number of elementary operations performed
by the algorithm, supposing that each elementary operation takes a fixed amount of
time to perform.
Since an algorithm's running time may vary among different inputs of the same
size, one commonly considers the worst-case time complexity. Less common, and
usually specified explicitly, is the average-case complexity (this makes sense
because there are only a finite number of possible inputs of a given size). In both
cases, the time complexity is generally expressed as a function of the size of the
input. Since this function is generally difficult to compute exactly, and the running
time for small inputs is usually not consequential, one commonly focuses on the
behavior of the complexity when the input size increases—that is, the asymptotic
behavior of the complexity. Therefore, the time complexity is commonly expressed
using big O notation, typically O(n), O(n log n), O(na), O(2n), etc. where n is the
input size in units of bits needed to represent the input.
The resource that is most commonly considered is the time. When "complexity" is
used without being qualified, this generally means time complexity.
The usual units of time are not used in complexity theory, because they are too
dependent on the choice of a specific computer and of the evolution of the
technology. Therefore, instead of the real time, one generally consider the
elementary operations that are done during the computation. These operations are
supposed to take a constant time on a given machine, and are often called steps.
In sorting and searching, the resource that is generally considered is the number of
entries comparisons. This is generally a good measure of the time complexity if
data are suitably organized.
Constant time
An algorithm is said to be constant time (also written as O(1) time) if the value of
T(n) is bounded by a value that does not depend on the size of the input.
For example, accessing any single element in an array takes constant time as only
one operation has to be performed to locate it. In a similar manner, finding the
minimal value in an array sorted in ascending order; it is the first element.
However, finding the minimal value in an unordered array is not a constant time
operation as scanning over each element in the array is needed in order to
determine the minimal value. Hence it is a linear time operation, taking O(n) time.
If the number of elements is known in advance and does not change, however,
such an algorithm can still be said to run in constant time.
b) Space complexity
In computer science, the space complexity of an algorithm or a computer program
is the amount of memory space required to solve an instance of the computational
problem as a function of the size of the input. It is the memory required by an
algorithm to execute a program and produce output
Similar to time complexity, Space complexity is often expressed asymptotically in
big O notation, such as O(n), O(n log n), O(na), O(2n), etc. where n is the input
size in units of bits needed to represent the input.
c) Others: arithmetic complexity, bit complexity (which in some cases may be
larger than the arithmetic complexity)
2.4) Asymptotic computational complexity
In computational complexity theory, asymptotic computational complexity is the
usage of asymptotic analysis for the estimation of computational complexity of
algorithms and computational problems, commonly associated with the usage of
the big O notation.
Why use Big O notation?
It is generally difficult to compute precisely the worst-case and the average-case
complexity. In addition, these exact values provide little practical application, as
any change of computer or of model of computation would change the complexity
somewhat. Moreover, the resource use is not critical for small values of n, and this
makes that, for small n, the ease of implementation is generally more interesting
than a good complexity.
For these reasons, one generally focuses on the behavior of the complexity for
large n, that is on its asymptotic behavior when n tends to the infinity. Therefore,
the complexity is generally expressed by using big O notation.
Scope
With respect to computational resources, asymptotic time complexity and
asymptotic space complexity are commonly estimated. Other asymptotically
estimated behavior include circuit complexity and various measures of parallel
computation, such as the number of (parallel) processors.
Since the ground-breaking 1965 paper by Juris Hartmanis and Richard E. Stearns
and the 1979 book by Michael Garey and David S. Johnson on NP-completeness,
the term "computational complexity" (of algorithms) has become commonly
referred to as asymptotic computational complexity.
Further, unless specified otherwise, the term "computational complexity" usually
refers to the upper bound for the asymptotic computational complexity of an
algorithm or a problem, which is usually written in terms of the big O notation, e.g.
O(n3). Other types of (asymptotic) computational complexity estimates are lower
bounds ("Big Omega" notation; e.g., Ω(n)) and asymptotically tight estimates,
when the asymptotic upper and lower bounds coincide (written using the "big
Theta"; e.g., Θ(n log n)).
2.5) Analysis of algorithms (full version)
In computer science, the analysis of algorithms is the process of finding the
computational complexity of algorithms – the amount of time, storage, or other
resources needed to execute them. Usually, this involves determining a function
that relates the length of an algorithm's input to the number of steps it takes (its
time complexity) or the number of storage locations it uses (its space complexity).
An algorithm is said to be efficient when this function's values are small, or grow
slowly compared to a growth in the size of the input.
When not otherwise specified, the function describing the performance of an
algorithm is usually an upper bound, determined from the worst case inputs to the
algorithm.
In theoretical analysis of algorithms it is common to estimate their complexity in
the asymptotic sense, i.e., to estimate the complexity function for arbitrarily large
input. Big O notation, Big-omega notation and Big-theta notation are used to this
end.
For instance, binary search is said to run in a number of steps proportional to the
logarithm of the length of the sorted list being searched, or in O(log(n)),
colloquially "in logarithmic time"
Usually asymptotic estimates are used because different implementations of the
same algorithm may differ in efficiency. However the efficiencies of any two
"reasonable" implementations of a given algorithm are related by a constant
multiplicative factor called a hidden constant.
Exact (not asymptotic) measures of efficiency can sometimes be computed but
they usually require certain assumptions concerning the particular implementation
of the algorithm, called model of computation. A model of computation may be
defined in terms of an abstract computer, e.g., Turing machine, and/or by
postulating that certain operations are executed in unit time. For example, if the
sorted list to which we apply binary search has n elements, and we can guarantee
that each lookup of an element in the list can be done in unit time, then at most log2
n + 1 time units are needed to return an answer.
a) Order of growth
For a given input size n greater than some n0 and a constant c, the running time of
that algorithm will never be larger than c x f(n). This concept is frequently
expressed using Big O notation. For example, since the run-time of insertion sort
grows quadratically as its input size increases, insertion sort can be said to be of
order O(n2).
Big O notation is a convenient way to express the worst-case scenario for a given
algorithm, although it can also be used to express the average-case — for example,
the worst-case scenario for quicksort is O(n2), but the average-case run-time is O(n
log n).
b) Evaluating run-time complexity
https://en.wikipedia.org/wiki/Analysis_of_algorithms#Evaluating_run-
time_complexity

***Further reading of Cost Model and interesting Run-time analysis using


empirical metrics
https://en.wikipedia.org/wiki/Analysis_of_algorithms#Cost_models

3) General rules for estimation of the run-


time of an algorithm’s implementation
a) Sequence of statement
Statement 1; // take T1 amount of time
Statement 2; // take T2 amount of time

Statement n; // take Tn amount of time
The total time is calculated as follow: total time T = T1 + T2 + … + Tn
If each statement is "simple" and only involves basic operations like arithmetic
operations, assignments, tests (x == 0), reads and writes (of a primitive type:
integer, float, character, boolean) then the time for each statement is constant and
the total time is also constant: O(1).
b) If-then-else
if (cond) then
block 1 (sequence of statements)
else
block 2 (sequence of statements)
end if;
Here, either block 1 will execute, or block 2 will execute. Therefore, the worst-case
time is the slower of the two possibilities: max(time(block 1), time(block 2))
c) Loops
for i in 1 .. N loop
sequence of statements
end loop;
The loop run N times, so the sequence of statements inside it also execute N times.
Assume the time needed to execute the sequence of statements is T, then the total
time is N*T
d) Nested loops
for I in 1 .. N loop
for J in 1 .. M loop
sequence of statements
end loop;
end loop;
The outer loop runs N times. Every time the outer loop runs, the inner loop runs M
times. As a result, the statements in the inner loop runs a total of N * M times

4) Big O notation (see more in MIT doc)


Different implementations of the same algorithm in/on different programming
languages, compilers, and hardware may differ in efficiency, therefore Big-O
(asymptotic analysis) is used to give an estimation of a particular algorithm. The
letter O is used because the rate of growth of a function is also called its order.
For example, the time (or the number of steps) a particular algorithm takes to
complete a problem of size n is given by T(n) = 6n3 - 4n2 + 3. If we ignore
constants (which makes sense because those depend on the particular hardware the
program is run on) and slower growing terms, we could say "T(n) grows at the
order of n3 and write: T(n) = O(n3).
Here are the rules of using Big-O notation when analyzing algorithms:
a) Different steps get added
function something() {
doStep1(); // O(a)
doStep2(); // O(b)
}
The total estimation is O(a+b)
b) Drop constants
function minMax(array) {
min,max = null;
foreach e in array // O(n)
min = MIN(e,min);
foreach e in array //O(n)
max = MAX(e,max);
}
The total time is not O(2N) but O(N) because constant 2 is removed adhering to
the rule
c) Different inputs => Different variables
int intersectionSize(arrayA, arrayB) {
int count = 0
foreach a in array A
foreach b in arrayB
if (a==b) count ++
}
The total time is based on the length of array A and length of array B => O(a*b)
d) Drop insignificant terms
i := 0;
while i< 3*n do
j:=10;
while j<= 50 do
j++;

j:=0;
while j < n*n*n do
j += 2;
i++;
The total time for the code above to execute is f(n) = 3n*(40 + n3/2) = 120n + 1.5n4
=> O(f(n)) = O(n4)

4) Common complexity (growth rate)


Algorithms can be classified by the amount of time they need to complete
compared to their input size:
5) Space-time tradeoff

Вам также может понравиться