Вы находитесь на странице: 1из 242

PARALLEL

PROGRAMMING
PARALLEL PROGRAMMING

Ivan Stanimirović

ARCLER
P r e s s

www.arclerpress.com
Parallel Programming
Ivan Stanimirović

Arcler Press
2010 Winston Park Drive,
2nd Floor
Oakville, ON L6H 5R7
Canada
www.arclerpress.com
Tel: 001-289-291-7705
001-905-616-2116
Fax: 001-289-291-7601
Email: orders@arclereducation.com

e-book Edition 2020

ISBN: 978-1-77407-389-6 (e-book)

This book contains information obtained from highly regarded resources. Reprinted material
sources are indicated and copyright remains with the original owners. Copyright for images and
other graphics remains with the original owners as indicated. A Wide variety of references are
listed. Reasonable efforts have been made to publish reliable data. Authors or Editors or Publish-
ers are not responsible for the accuracy of the information in the published chapters or conse-
quences of their use. The publisher assumes no responsibility for any damage or grievance to the
persons or property arising out of the use of any materials, instructions, methods or thoughts in
the book. The authors or editors and the publisher have attempted to trace the copyright holders
of all material reproduced in this publication and apologize to copyright holders if permission has
not been obtained. If any copyright holder has not been acknowledged, please write to us so we
may rectify.

Notice: Registered trademark of products or corporate names are used only for explanation and
identification without intent of infringement.

© 2020 Arcler Press

ISBN: 978-1-77407-227-1 (Hardcover)

Arcler Press publishes wide variety of books and eBooks. For more information about
Arcler Press and its products, visit our website at www.arclerpress.com
ABOUT THE AUTHOR

Ivan Stanimirovic gained his PhD from University of Niš, Serbia in 2013.
His work spans from multi-objective optimization methods to applications of
generalized matrix inverses in areas such as image processing and computer
graphics and visualisations. He is currently working as an Assistant professor
at Faculty of Sciences and Mathematics at University of Niš on computing
generalized matrix inverses and its applications.
TABLE OF CONTENTS

List of Figures ........................................................................................................xi


List of Tables .......................................................................................................xiii
List of Abbreviations ............................................................................................xv
Preface........................................................................ ......................................xvii

Chapter 1 Introduction .............................................................................................. 1


1.1. Background ........................................................................................ 2
1.2. Definition of the Problem ................................................................... 3
1.3. Main Objectives ................................................................................. 3
1.4. Justification ......................................................................................... 4
1.5. Cloud Computing ............................................................................... 4
1.6. Fdtd Method ....................................................................................... 5
1.7. Computational Parallelism .................................................................. 5
1.8. Parallel Programming Models ............................................................. 7

Chapter 2 Creating and Managing Clusters for FDTD Computational


Simulations with Meep Package on the EC2 Service for Amazon
Web Services............................................................................................. 9
2.1. Starcluster ......................................................................................... 10
2.2. Meep Parallel Package ...................................................................... 11
2.3. GWT – Google Web Toolkit .............................................................. 12
2.4. Ganglia............................................................................................. 12
2.5. Architecture ...................................................................................... 14
2.6. Creating And Configuring A Public Ami ............................................ 16
2.7. Efficiency Test (FDTD Problem) ......................................................... 22
2.8. Efficiency Tests (Using Amazon EC2 Platform) ................................... 22
2.9. Analysis Of The Results ..................................................................... 31
Chapter 3 Parallel Algorithm Designed by Technique “PCAM” ............................... 33
3.1. Partition ............................................................................................ 34
3.2. Domain Decomposition ................................................................... 34
3.3. Functional Decomposition................................................................ 35
3.4. List Partitions Design ......................................................................... 36
3.5. Communication ................................................................................ 36
3.6. Agglomeration .................................................................................. 37
3.7. Reducing Costs of Software Engineering ........................................... 40
3.8. Load Balancing Algorithms ............................................................... 41
3.9. Task Scheduling Algorithms............................................................... 41
3.10. Allocation List Design ..................................................................... 42
3.11. Model of The Atmosphere ............................................................... 42
3.12. Agglomeration ................................................................................ 45
3.13. Load Distribution ............................................................................ 47

Chapter 4 Parallel Computer Systems ...................................................................... 51


4.1. History.............................................................................................. 53
4.2. Parallel Computing ........................................................................... 53
4.3. Background ...................................................................................... 54
4.4. Types Of Parallelism .......................................................................... 59
4.5. Hardware ......................................................................................... 61
4.6. Applications ..................................................................................... 67
4.7. History.............................................................................................. 68

Chapter 5 Parallelization of Web Compatibility Tests In Software Development .... 71


5.1. Web Compatibility Tests .................................................................... 72
5.2. Proposed Technique .......................................................................... 73
5.3. Results ............................................................................................. 76
5.4. Conclusion ....................................................................................... 76

Chapter 6 Theoretical Framework ........................................................................... 79


6.1. Definition of Process......................................................................... 80
6.2. Analysis of Key Processes.................................................................. 84
6.3. Review Process ................................................................................. 88

viii
6.4. Statistical Tools ................................................................................. 95
6.5. Methodological Framework .............................................................. 99

Chapter 7 Modular Programming .......................................................................... 101


7.1. Programs And Judgments ................................................................ 102
7.2. Modularity Linguistics..................................................................... 102
7.3. Normal And Pathological Connections ........................................... 103
7.4. How To Achieve Minimum Cost Systems ........................................ 103
7.5. Complexity In Human Terms........................................................... 104
7.6. Cohesion ........................................................................................ 116

Chapter 8 Recursive Programming ........................................................................ 125


8.1. Classification Of Recursive Functions ............................................. 126
8.2. Design Recursive Functions ............................................................ 127
8.3. Bubble Method ............................................................................... 138
8.4. Sorting By Direct Selection ............................................................. 140
8.5. Method Binary Insertion ................................................................. 141
8.6. Method Quicksort (Quicksort)......................................................... 141
8.7. Mixing Method (Merge Sort) ........................................................... 143
8.8. Sequential Search ........................................................................... 146
8.9. Binary Search (Binary Search) ......................................................... 147
8.10. Seeking The Maximum And Minimum .......................................... 149
8.11. Greedy Method ............................................................................ 152
8.12. Optimal Storage On Tape (Optimal Storage On Tapes) .................. 152
8.13. The Knapsack Problem .................................................................. 153
8.14. Single Source Shortest Paths (Shortest Route From an Origin) ........ 156

Chapter 9 Dynamic Programming ......................................................................... 161


9.1. Optimality Principle ....................................................................... 162
9.2. Multistage Graphs (Multistage Graphs) ........................................... 163
9.3. Traveling Salesman Problem (TSP) ................................................... 166
9.4. Ix Return On The Same Route (Backtracking) .................................. 170
9.5. The Eight Queens Puzzle (8-Queens) .............................................. 171
9.6. Hamiltonian Cycles (Hamiltonian Path) .......................................... 177

ix
Chapter 10 Branch And Bound ................................................................................ 181
10.1. General Description ..................................................................... 182
10.2. Poda Strategies.............................................................................. 184
10.3. Branching Strategies...................................................................... 184
10.4. The Traveling Salesman Problem (TSP) .......................................... 187

Chapter 11 Turing’s Hypothesis ............................................................................... 201


11.1. Hypothesis Church–Turing ............................................................ 202
11.2. Complexity ................................................................................... 202
11.3. Thesis Sequential Computability ................................................... 203
11.4. NP Problems................................................................................. 204

Conclusions ........................................................................................... 211

Bibliography .......................................................................................... 213

Index .................................................................................................... 217

x
LIST OF FIGURES

Figure 1.1. Distributed memory architecture


Figure 1.2. Sending messages via MPI step between two computers
Figure 2.1. Access files shared via NFS
Figure 2.2. Functional diagram of ganglia
Figure 2.3. Result from the post-processing problem resonance ring
(animatedfigure.gif)
Figure 2.4. Graphic node vs. time (minutes) exercise resonance ring
Figure 2.5. Graphic simulation transmission ring
Figure 2.6. Graphic vs. frequency spectrum (input)
Figure 2.7. Graphic vs. frequency spectrum (Step)
Figure 2.8. Graphic vs. frequency spectrum (removal)
Figure 2.9. Statistical graph nodes vs. time (minutes) exercise transmission
with ring
Figure 2.10. Graph nodes vs. time (minutes) exercise with ring transmission
using Harminv
Figure 2.11. Graph nodes vs. time (minutes) without exercise ring transmission
using Harminv
Figure 3.1. The task and channel structure for calculating the difference of
two finite-dimensional templates nine points, assuming a grid point for each
processor. Only the channels used by the task are shown shaded.
Figure 3.2. Use agglomeration to reduce the communication needs in the
model atmosphere. (a) A single point is responsible for each task and therefore
must obtain data from eight other tasks to apply the template nine points. (b)
Granularity is increased and therefore it increases granularity by 2x2 points
Figure 4.1. Supercomputer cray-2 – the fastest in the world from 1985 to 1989
Figure 6.1. Outline of a process
Figure 6.2. Histogram
Figure 6.3. Pareto diagram

xi
Figure 6.4. Sample letter of control
Figure 8.1. Johann Gutenberg (1398–1468)
Figure 8.2. Al Khwarizmi (lived between 780 and 850 D.C.)
Figure 8.3. Leonardo of Pisa (1170–1250)
Figure 8.4 . The growth of the main functions of low complexity
Figure 8.5. An example for the shortest path algorithm
Figure 8.6. An example for the shortest time
Figure 9.1. Graph 5 stages
Figure 9.2. Corresponding graph 3 project stages problems
Figure 9.3. Recursive tree traveling salesman
Figure 9.4. Positions that can attack the queen
Figure 9.5. Example two threatened queens on the board 4 by 4
Figure 9.6. Scheme reduced tree solutions
Figure 9.7. A decision tree for the program 4 queens
Figure 9.8. Example of a Hamiltonian cycle
Figure 10.1. Strategies branch FIFO
Figure 10.2. Strategies branching LIFO
Figure 10.3. Tree states for a traveling salesman problem with n = 4 and i0 = i4
=1
Figure 10.4. Possible paths
Figure 11.1. Computable and non-computable problems
Figure 11.2. A quantum Turing machine

xii
LIST OF TABLES

Table 2.1. Geometrical data structure ring resonator


Table 2.2. Data optical ring resonator geometric structure
Table 2.3. Example tabulation of results optical resonator ring
Table 2.4. Example tabulation of results Harminv
Table 2.5. Example tabulation of results Harminv
Table 8.1. Algorithmic complexity
Table 8.2. Time in seconds it takes to perform f (n) operations
Table 8.3. Knapsack Problem
Table 8.4. Cost Matrix
Table 9.1. Traveling Salesman Problem
LIST OF ABBREVIATIONS

AOG AND/OR decision graph problem


ASIC application-specific integrated circuit approaches
BOINC Berkeley open infrastructure for network computing
CDP decision-clique problem
CN chromatic number
CSS code and style sheets
DFD data flow diagram
DHC directed Hamiltonian cycle
DOM document object model
DRR database Round-Robin
EBS elastic block storage
EC2 elastic compute cloud
FDTD finite difference time domain
FPGA field-programmable gate
GPGPU general purpose computing units on graphics processing
HPC high-performance computing
MIMD multiple-instruction-multiple-data
MISD multiple-instruction single-data
MPI message passing interface
MPP massively parallel processor
MTND non-deterministic Turing machine
NOT involving operators negation
NUMA non-uniform memory access architecture
RRD round-robin database
SIMD single-instruction-multiple-data
SISD single-instruction-single-data
SMP symmetric multiprocessor
SSE streaming SIMD extensions
TCS total clear sky
TSP traveling salesman problem
TSP traveling salesperson problem
UI user interface
UMA uniform memory access systems
VAC value customer
VAO organizational activities that add value
VLSI very-large-scale integration technology manufacturing
computer-chip

xvi
PREFACE

Currently, performing electromagnetic simulations is often a problem with a


high degree of complexity. Thereat, when the large amount of processing is
performed to generate a solution, it leads to a problem. A good alternative
to effect efficient processing and reduce the time thereof is based on cloud
computing platform and tools.
On the other hand, there is an algorithmic complexity term. This complexity
is not about the number line; rather, it is about the execution time of a specific
algorithm. There are untreatable algorithms, which have the property of being
recursive. So, it must have the notion of what is recursion, both the mathematics
and programming environment.
Among other topics, this book describes the implementation of a tool for creating
and managing computational clusters is presented for FDTD simulations on
Amazon EC2 Web services. The details of the problem are provided while
analyzing the background, objectives achieved, and justification for the
implementation of the algorithm.
The theoretical framework such as cloud computing platform, the FDTD method
computational electromagnetics, the concept, and architecture are described
in this chapter. The tools are used for the both efficient management of the
cluster and for electromagnetic simulations. Applications of this field within
the framework GWT and StarCluster Ganglia tool, along with the parallel Meep
package are described .
In this book, the architecture and implementation of the final tool and aspects
such as installation, configuration, and development of each of its components
are detailed. Also, various tests, performance comparisons, cloud computing
platforms, and efficient management of available resources are detailed.
xviii
Chapter 1

Introduction

CONTENTS
1.1. Background ........................................................................................ 2
1.2. Definition of the Problem ................................................................... 3
1.3. Main Objectives ................................................................................. 3
1.4. Justification ......................................................................................... 4
1.5. Cloud Computing ............................................................................... 4
1.6. Fdtd Method ....................................................................................... 5
1.7. Computational Parallelism .................................................................. 5
1.8. Parallel Programming Models ............................................................. 7
2 Parallel Programming

At present, to solve electromagnetic simulations, there is a standard called


the FDTD method. This method is based on solving mathematical equations
(Maxwell’s equations) by means of finite difference equations.
At the software level, there are applications that implement this algorithm
[1]; and allow these simulations, in addition, provide a visualization of
their results. One of the difficulties is that there are present very complex
simulations performed because of the large number of calculations it has.
In addition, investigators indicate the need to run multiple simulations that
require parameter values within the structure of a problem with the aim to
make comparisons of results in different settings.
The problem is that to do this on a local machine can consume many
resources and time stage; most optimum is to realize a high-performance
cluster.
A good alternative to effect efficient processing is to reduce costs and
time using the cloud computing platform and their respective tools.
This chapter describes the implementation of a tool for creating and
managing computational clusters for FDTD simulations on AWS EC2
service is presented.
This chapter also provides information about the background on the
objectives. Also, a profound explanation of the problem, and justification of
this work were raised.

1.1. BACKGROUND
Maxwell made, circa 1870, partial differential equations of electrodynamics.
It represents the fundamental unification of the electric and magnetic fields,
predicting the phenomenon of electromagnetic waves. To which Nobel
and Feynman called the most outstanding achievement of science in the
nineteenth [2] century.
The method of finite difference time domain (FDTD) solves Maxwell’s
equations, directly modeling the propagation of electromagnetic waves
within a volume. This method was introduced in 1966 by Kane Yee, a
numerical modeling technique electrodynamic.
At first, it was almost impossible to implement this method
computationally, probably due to the lack of computing resources.
However, with the advent of more powerful, modern equipment, with
greater accessibility to acquire and further improvements in the algorithm,
the FDTD method has become a standard tool for solving problems of this
Introduction 3

type. Today, we can say that this method brings together a set of numerical
techniques for solving Maxwell’s equations in the time domain and allow
the electromagnetic analysis of a wide range of problems [3]. Now scientists
and engineers use computers for solutions of these equations in order to
investigate these electromagnetic fields.

1.2. DEFINITION OF THE PROBLEM


Currently, the FDTD method or algorithm is one of the techniques used
for calculations in the computational electromagnetic field. There is an
implementation of this technique in Linux called Meep. However one of
the main problems of this method is the use of time and resources required
to solve a problem, especially if the simulations involve three-dimensional
structures. This is due to high processing it makes when generating the
solution.
Today, these types of problems are solved more easily, with the
emergence of what is now called Cluster, a set of powerful computers that
function as a single system that can improve performance in terms of mass
processing [4]. The problem with using them is the cost involved in getting
one and the difficult management and configuration for operation.
“System creation and management of computer clusters for simulations
FDTD with Meep package service Elastic Compute Cloud (EC2)” has
been created in order to facilitate the administration and implementation
of computer clusters on the cloud computing service Amazon EC2 FDTD
to solve problems, improve performance and optimize the time it takes to
build the solution using cloud computing services offered by Amazon EC2
platform.

1.3. MAIN OBJECTIVES


“System creation and management of computer clusters for FDTD
simulations with the package Meep service EC2” was raised to provide a
monitoring tool and running multiple simulations FDTD. In order to achieve
these following objectives were set:
• Provide a public AMI that allows creation and management
of computer clusters for FDTD simulations using distributed
processing.
• Integrate a Web tool that offers to monitor of resources used by
clusters with graphs of the results.
4 Parallel Programming

• Implement a Web interface for managing public AMI.

1.4. JUSTIFICATION
The main justification for the development of “System creation and
management of computer clusters for simulations FDTD with Meep
package service EC2” is to decrease the time it takes to perform FDTD
simulations with a yield more optimal, and monitoring the resources used
during execution of multiple simulations are performed in parallel FDTD
that. In this way, the user can check the status of their jobs while they are
resolved through Meep package.

1.5. CLOUD COMPUTING


It is a technology that provides computer services through the Internet
platform and pay as you consume. Its operation is based applications
externally hosted services within the web. [5]
Thanks to this technology, all you can offer a computer system is offered
as service, so that users can access them available “in the cloud” without
being experts in managing resources to use.
No need to know the infrastructure is a cloud where applications and
services can be easily scalable, efficient, without knowing the details of its
operation and installation. The examples include Amazon EC2, Google App
Engine, eyeOS, and Microsoft Azure.

1.5.1. Amazon EC2


To implement our tool we use Cloud Computing service offered by Amazon
EC2, a web service that provides specific computing capability with growth
option in the cloud, according to user requirement. It is designed to facilitate
computational scalability web developers.

1.5.2. Functionality
EC2 presents a computational environment virtually, allowing us to use
Web services interfaces to request cluster for use, load our own application
environment, manage permissions and network access and execute the
image using as many systems as required.
It is designed for use with other Amazon services together as Amazon
S3, Amazon EBS, Amazon Simple DB, and Amazon SQS to provide a
Introduction 5

complete computing solution.


It provides safety since it has numerous security mechanisms such as
firewall, access, network configurations, etc.

1.6. FDTD METHOD


The FDTD method to simulate the evolution of the electromagnetic field
in a region of interest, in addition to making changes in its structure. The
original formulation proposed by Yee method studies the behavior of the
electromagnetic field in a vacuum. It also allows defining some simple
contour conditions, such as electric and magnetic wall, being possible to
apply the method to study resonant problems.
For this method, the Maxwell equations that describe the evolution in
time and space using magnetic fields B and electrical E.
These partial differential equations are replaced by a set of finite
difference equations. Finite difference is a mathematical expression that
allows an approach to solutions of differential equations.
The FDTD technique is based on the division into several parts, both
spatial and temporal, of electromagnetic fields and the approximation of
the partial derivatives appearing in the Maxwell equations of rotational
expressed in the time domain by finite difference quotients. As a result, an
algebraic problem explicit type lets go calculating, in successive instants, the
value of the electric field (magnetic) at each point in space from the electric
field (magnetic) the same point in time is obtained above and the values of
the magnetic field (electric) in adjacent nodes and in the time preceding time
instant.
It took nine years until the original FDTD method was suitably modified
to solve a problem of scattering [6].

1.7. COMPUTATIONAL PARALLELISM


Parallel computing is the simultaneous use of multiple computing resources
to solve a computational problem.
One problem is divided into different parts that can be solved
simultaneously. Each part is broken down into a series of instructions and
each instruction executed simultaneously on different CPUs.
Calculation of resources may include a single computer with multiple
processors or an arbitrary number of computers connected by a network
6 Parallel Programming

or a combination of the above. The computational problem usually shows


features like the ability to be divided into discrete pieces of work that can be
solved simultaneously run multiple program instructions at any point in time
and solve in the shortest time with multiple computing resources.
Parallelism has been employed for many years, especially for
high-performance computing (HPC) [7]. The HPC are those who use
supercomputers and computer clusters to solve advanced calculations.
Usually used for scientific research. [8]

1.7.1. Memory Architecture Distributed Memory


A distributed memory system comprises several independent processors
with a local memory and connected via a network; i.e., each processor has
its own memory when performing a process. This type of memory provides
the ability to easily scale power, but also entails that there must be a means
of communication between nodes so that there is synchronization. This
communication task is implemented in the case of our project, the same that
is in charge of Meep package (Figure 1.1).
This type of architecture brings many advantages such as:
• Each processor can directly access the memory without interfering
and without overburdening others.
• It offers scalability to the number of processors to connect, and
only depends on the network.
• No cache coherency problems because each processor has its
own data, and not have to worry about local copies.

Figure 1.1: Distributed memory architecture.


The main difficulty of this architecture is the method of communication
between processors, since if a processor requires another this information
must be sent through messages. The time to build and send a message from
one processor to another and disruption of a receiver processor to handle
Introduction 7

messages sent by other processors which two aspects highlighted overload


[7].

1.8. PARALLEL PROGRAMMING MODELS


We have already mentioned about Parallel Memory Architecture we use in
our project. Now in this section of the document, we will proceed with an
explanation of parallel programming model that we have implemented.
Usually, a parallel programming model is based on some memory
architecture, in our case, for the development of our application, we used
the architecture of distributed memory.
Parallel programming model is simply a set of algorithms, processes
or software tools that allow creation of applications and communication
systems, Parallel I/O.
For implementing distributed applications, developers must know how
to choose an appropriate model of parallel programming and in some cases,
a combination thereof, which engages the type of problem to be solved.
Our tool includes and implements an application that is based on the
model message passing interface (MPI).

1.8.1. MPI (Message Passing Interface)


It is a technique used in parallel programming for exchanging information
through a communication based on receiving and sending messages. Sending
message passing can be synchronously or asynchronously. When the message
expects the process to continue execution we say it is synchronous whereas
when the sending process does not expect that the message is received and
continues its execution, we say it is asynchronous.
Information transfer requires synchronization between processes to
improve performance. Its main feature is that it uses its own local memory
during execution and no shared memory.
MPI has become a standard for communication between nodes running
a particular problem within a distributed system (Figure 1.2).
8 Parallel Programming

Figure 1.2: Sending messages via MPI step between two computers.
Chapter 2

Creating and Managing Clusters for


FDTD Computational Simulations with
Meep Package on the EC2 Service for
Amazon Web Services

CONTENTS
2.1. Starcluster ......................................................................................... 10
2.2. Meep Parallel Package ...................................................................... 11
2.3. GWT – Google Web Toolkit .............................................................. 12
2.4. Ganglia............................................................................................. 12
2.5. Architecture ...................................................................................... 14
2.6. Creating And Configuring A Public Ami ............................................ 16
2.7. Efficiency Test (FDTD Problem) ......................................................... 22
2.8. Efficiency Tests (Using Amazon EC2 Platform) ................................... 22
2.9. Analysis Of The Results ..................................................................... 31
10 Parallel Programming

In this chapter, we will see the details about the tools we have used to
develop the project, and explain about important concepts and features
provided each of them.

2.1. STARCLUSTER
StarCluster is a utility that enables the creation, management, and monitoring
of computer clusters that are hosted on the service Amazon Elastic Compute
Cloud (EC2) all through a master instance. Its main objective is to minimize
the administration associated with the configuration and use of computer
clusters used in research laboratories or general applications using distributed
computing.
To use this tool a configuration file where account information on
Amazon Web Services (AWS), type of AMI’s to use and configure additional
features we want in the detailed cluster is created. Subsequently, it can be
made running the tool by using commands [10].

2.1.1. StarCluster Features


• Use through commands to perform the tasks of creating,
managing, and monitoring one or more clusters on EC2.
• AMI provides a public pre-configured with everything you need
to install StarCluster.
• Support services cloud storage such as Elastic Block Storage
(EBS) and Simple Storage Service (S3) also offered by Amazon.
• The public AMI includes tools such as OpenMPI, ATLAS,
Lapack, NumPy, and SciPy.
• All nodes in the cluster are automatically configured with NFS
services, SGE, and OpenMPI.
For our project, we will use the NFS and OpenMPI services so that
detail the concepts of each below:

2.1.2. Network File System (NFS)


It allows different clients connected to the same network to access remote
file sharing as if they were part of your local file system. This application
protocol works on client/server environment, where the server indicates
directories that customers want to share and mount these directories is your
file system.
Creating and Managing Clusters for FDTD Computational Simulations.... 11

This can be used in a distributed processing for shared memory


environment (Figure 2.1).

Figure 2.1: Access files shared via NFS.

2.1.3. OpenMPI
It is a project combining technologies and resources from other projects (FT-
MPI, LA-MPI, LAM/MPI, Y PACX-MPI) to build an MPI library. OpenMPI
is an open source implementation of the MPI-1 and MPI-2 standard [11].

2.2. MEEP PARALLEL PACKAGE


Meep is a simulation software package developed to model electromagnetic
systems. Meep implements the algorithm finite difference time domain
(FDTD) method, computational electromagnetics. This algorithm is to
divide the space into a mesh and see how the fields change over time, using
time steps, the solution for continuous equations, in this way, becomes
essentially approximated many practical problems are simulated.
Meep parallel package provides support for distributed memory parallel
with, and work on very large problems (problems in 3D space), so they can
be resolved in a distributed manner.
The problem must be large enough to benefit from many processors [12].
To achieve this, the parallel package Meep cell divides the computational
simulation in “chunks” that are allocated between processors. Each “chunk”
12 Parallel Programming

is set in time steps, and processors are responsible for communicating the
values using MPI.

2.3. GWT – GOOGLE WEB TOOLKIT

2.3.1. Introduction
GWT or Google Web Toolkit is a framework created by Google which
facilitates the use of technology AJAX. Can solve the big problem of
compatibility of client code (HTM, javascript) between browsers, enabling
the user to develop an application without the need to test them in various
browsers. The concept of Google Web Toolkit is quite simple, basically what
you should do is create the code Java using any development environment
(IDE) Java, and the compiler will translate to HTML Y JavaScript.

2.3.2. GWT Platform


GWT has four main components: a Java-to-JavaScript Compiler, a Hosted
Web Browser, and two class libraries:
• GWT Java-to-JavaScript Compiler: The function of this
component is to translate the code developed in Java to JavaScript
language.
• Hosted Web Browser: This component runs the Java application
without translating it into JavaScript, using the host mode Java
Virtual Machine.
• JRE Emulation Library: JavaScript implementations contain
the most commonly used class libraries in Java as Java.Lang,
Java.util, etc.
• GWT Web UI Class Library: It contains a set of elements user
interface (UI) which allows the creation of objects such as text,
text boxes, images, and buttons [13].

2.4. GANGLIA

2.4.1. Introduction
Monitoring a computer cluster requires proper management of resources
and the administrator can spend less time detect, investigate, troubleshoot
failures happen with this information and also may pose a contingency plan.
Creating and Managing Clusters for FDTD Computational Simulations.... 13

Ganglia are a scalable distributed system for monitoring computer


clusters and Grids in real time. It is a robust implementation that has been
adapted to many different architectures and operating systems of computers
and is currently used in thousands of clusters around the world including
universities, laboratories, and government research business.
Ganglia was initially developed by the University of Berkeley in the
department of computer science to link clusters campus, is currently in
charge of Source Forge.
It is based on a hierarchical clustering scheme and is configured by files
XML and XDRon each node that allows extensibility and portability. It is
completely open-source and does not contain any proprietary component.
Ganglia clusters and link lines so distributed computing is known as “cluster
to cluster” [14].

2.4.2. Functioning
Ganglia are defined in a hierarchical scheme. It is based on a communication
through a multicast protocol send/receives to monitor the status of the cluster
and uses a tree point connections between levels of cluster nodes to report
their status. Ganglia used status messages in a multicast environment, such
as the basics for a communication protocol. To maintain communication
each node sends its state in a time interval; in this way discloses that is active
at the time to stop sending that node is no longer involved in monitoring.

Figure 2.2: Functional diagram of ganglia.


14 Parallel Programming

In addition, each node monitors local resources and sends multicast


packets with status information each time an update occurs. All nodes in the
same cluster always have an approximate view of the entire cluster state,
and this state is easily reconstructed if a collapse occurs (Figure 2.2).

2.4.3. Architecture Ganglia


The main components of Ganglia are two demons gmond and gmetad.
The Ganglia Monitoring Daemon (gmond) is the cornerstone of the tool
is a multi-threaded daemon which runs on each of the cluster nodes to be
monitored. Installation is very easy. There is no need for an NFS file system
in common or a database. Gmond has its own distributed database and its
own redundancy.
The Ganglia Meta Daemon (gmetad), this daemon allows to obtain the
information via XML at regular intervals from the nodes, the gmetad takes
the information and stores it in a database Round-Robin (DRR) data and
concatenates the XML node for share information with the web server or
other front-end that runs the demon gmetad.
Another major component is integrated into the web application tool,
Ganglia uses a round-robin database (RRD) to store, and query historical
information for the cluster presents metrics based on the time gradually.
RRD Tool is a popular system for storing data and graphing time series,
which uses specially designed compactly for storage of data in time series.
RRD tool generates graphs which show the trend of metrics versus time.
These graphs are then exported to be deployed in the Front-end.
Template Front-end can be customized or use which is default, and
Ganglia makes a separation between content and presentation, the content is
an XML file if desired can be accessed directly for some other application.
[14]

2.5. ARCHITECTURE

2.5.1. Input Files


Input files for the application are the type CTL (Control Type Language)
that are used by the Meep package to perform the simulations. A CTL file
specifies the geometry of the problem, sources used, outputs, and everything
else necessary to perform the calculation. This file is written as a scripting
language and allows us to define the structure of a problem as a sequence.
Creating and Managing Clusters for FDTD Computational Simulations.... 15

The CTL file is part of the libctl library is a set of language-based tools
Scheme. The CTL file can be written in any of three ways:
• Scheme, which is a programming language developed by MIT.
This language conforms to the shape (Function arguments.) and
which can be executed under a GNU Guile interpreter;
• Libctl, it is a library for the compiler Guile, which simplifies
communication between Scheme and scientific computing
software. Libctl defines the basic interface and a host of useful
features; and
• Meep, you can write a CTL-based on Meep. This defines all the
problems specific to the calculation of FDTD interfaces.

2.5.2. Web Application “StarMeep”


The implementation of the “StarMeep” Web application management
improves computational cluster for a user since it allows benefits such as:
• Availability through a Web browser;
• Perform configurations forms rather than through a console;
• Display calculation processing of an electromagnetic simulation.
• Access files simulation results.
The development is based on the Java programming language along
with the GWT framework for the front-end application. GWT allows us
to create a simple UI and also be able to make requests to the server via
AJAX technology. In developing the back-end was the main access point
and sending commands to instances or nodes in the cluster. This was done
through the SSH protocol. JSch so the library, which lets you create different
types of connections such as SSH, SCP, and others from Java was used.

2.5.3. Monitoring
Monitoring resources used by the cluster nodes is performed by Ganglia tool.
Usage information is sent from the cluster via XML files and is processed by
the tool for viewing.
Ganglia provide a graphical view of information in real time through its
Web Front-End, administrators, and cluster users about the use of resources
consumed.
16 Parallel Programming

2.5.4. Master Node and Slave Nodes


Within the processing for calculating the FDTD simulations, the cluster
nodes are responsible for processing the input file in parallel by Meep
package. This divided equally simulation tasks, and each task is delivered
to a respective node to the same processing. Each node performs tasks
synchronously communicating with others via the MPI communication
protocol.
Once the tasks are performed by the nodes, the results are written by
HDF5 library and are located on a cluster shared NFS directory. This directory
may be within the same instance or storage can be used as Amazon’s S3.
The information in the use of resources within the cluster is recompiled
by a demon gmond lies within each of the nodes. Nodes to be monitored are
specified within the demon you gmetad found in the Master Node.
Finally, the simulation results and information resources are accessed
through the master node.

2.5.5. Output File Storage


The output files generated by the simulation have the HDF5 format is a
standard format used by many scientific visualization tools such as Matlab,
GNU Octave, and others. Apart from HDF5 also we find files that tell us
whether there were any problems at runtime and the status of the simulation
through time steps.
The results generated by the simulation can be stored in the same manner,
using a block or storage in EBS AWS S3.

2.6. CREATING AND CONFIGURING A PUBLIC


AMI
This section explains the installation and configuration of various applications
using our tool. Details of these applications were mentioned in Chapter 1.

2.6.1. Installing StarCluster


To install this tool we use as a basis the AMI that provides community
StarCluster, it provides some services and agencies that help quick
installation. Visit the official website of StarCluster, which is specified
identifier AMI. A carry it through the lifting of a respective instance. It has
the following characteristics:
Creating and Managing Clusters for FDTD Computational Simulations.... 17

• 1.7 GB RAM;
• 2 virtual cores 2.5 GHz;
• 350 GB hard drive;
• Ubuntu 9.04 32-bit.
Before installing StarCluster, we describe what their some of their most
important rooms are.
• Python (2.4): Interpreted programming language that can divide
a program into modules for reuse in other programs.
• boto(1.9b+): It is an integrated python to handle current and
future services offered by AWS infrastructure module.
• paramiko (1.7.6+): It is another python module that implements
the SSH2 protocol (encryption and authentication) for connections
to remote computers.
When installing StarCluster have downloaded the latest development
version from the repository GIT (software version management), then
compiled and installed via python, the following commands describe
explained:
• Download the installer the StarCluster from the repository

• Once downloaded we move to the directory created from


StarCluster

• And with Python compile and install the StarCluster.

• Once installation is complete, we can see running.

When you run this command, StarCluster prompted a configuration file;


it will not be created from here, but through the Web Application StarMeep
talk later.
18 Parallel Programming

2.6.2. Meep-OpenMPI Installation Package


Prior to installing the package Meep, we will review the main units required
by this software.
• Guile-1.8-libs: Guile is an implementation of Scheme (functional
programming language) designed for programming.
• Libctl3: It is the implementation of a free library based on Guile,
used for scientific simulations.
• Libhdf5: Library that allows you to print your output Meep in
HDF5 format to later be processed as required by the user.
• Libmeep-openmpi2: Library that allows Meep FDTD solves
problems in parallel using OpenMPI.
Installation of MEEP-OpenMPI use the Python-Meep package offers.
Next, we will detail the installation was performed:
• First we must add the repository Ubuntu install two new addresses
where the Meep-OpenMPI package is downloaded, the file
repository found in “/etc/apt/source.list” was modified with an
editor and finally add:

• Then we update the Ubuntu repository as follows:

• After the update repository proceeded to install two additional


packages for Meep.

• At the end, we will proceed to install the MEEP-OpenMPI


package with the command:

2.6.3. Installing Apache


This server contains the web application Ganglia to monitor the hardware
resources of the nodes that are running.
Creating and Managing Clusters for FDTD Computational Simulations.... 19

For installation must do the following:


• First install PHP version 5 with its respective modules for Apache
can support applications written in this language. In the terminal
must perform the following:
• Apache installation is then performed by running:

• Usually if you want to make changes to the web server


configuration you can perform in “/etc/apache2/apache2.conf.”
We leave the default settings.

2.6.4. Apache-Tomcat Installation


The Apache-Tomcat web server can support applications that are written in
Java. This allows containing the StarMeep application that helps with the
administration and execution of works FDTD.
The installation contains the following:
• Download the installer apache-tomcat from the official website,
open the terminal and run:

• Then we unzip the downloaded file to generate the apache tomcat


directory using the following command packet:

• Then we moved this directory generated directory facilities. In


this case, we chose to place it in “/ usr/local.”

• At the end an executable file “tomcat.sh” that can lift or stop the
server and placed in “/etc/init.d/”de so you can get up the apache-
tomcat whenever a start node configured.

2.6.5. Installing and Configuring Ganglia


Installation and configuration Ganglia Monitor proceeded to perform the
following steps.
20 Parallel Programming

• First we install the package dependencies that will allow the


proper functioning Ganglia, in a terminal run

• Once installed dependencies, we downloaded the installer from


the official website ganglia, then run:

• After downloading the installer note that was compressed, then


we proceeded to running decompression:

• Decompressing automatically generated directory with the name


“ganglia-3.1.7,” we switched to it and run the command to
configure an installer.

• We can see that when configuring the installer Ganglia we send


as parameters “--with-gmetad” This allows the demon gmetad
additionally installed. We also sent as a parameter the address
where you want the installation directory is created, in this case
“/ etc/ganglia.”
Now to generate the installer set parameters and execute it, we perform
the following:

• Additionally ganglia also provide an application to monitor


resources from the web. For this, we must move the “web”
Creating and Managing Clusters for FDTD Computational Simulations.... 21

Ganglia package to the directory “/ var/www” directory of the


Apache server. Then simply change the name of the folder web
ganglia so we can call the application with that name. Run the
following commands to perform these actions:

• Once the installation is complete perform the configurations of


the gmetad and gmond demons. Gmond to generate the default
configuration file with the following command:

• On the recommendation only you need to change the information


in the section “cluster” in the configuration file gmond.

• Now we turn to the daemon configuration gmetad, and we also


use a guide template that is in the installation package Ganglia
and then moved to the installation directory of Ganglia.

• In the configuration file gmetad it is where we add the names of


those nodes you want to monitor. This file is edited by the Web
Application StarMeep because this same manages the lifting and
running nodes.
• Finally we had to define the directory where you will create the
RRD files that are required for the Web Monitor ganglia addition
to changing the owner user “ganglia” that can read them.
22 Parallel Programming

2.7. EFFICIENCY TEST (FDTD PROBLEM)


For this test, we have taken an example of a FDTD problem to be solved
with Meep. When run on a single machine shows the following error:

This happens when we need to solve a problem with a high degree


of processing, such as large structures or images with higher resolution
involving large memory usage. This severely limits users as always depends
on the characteristic speed and memory of a single machine.
Thanks to our tool, these types of problems are solved because to
divide the problem into smaller jobs between nodes, minimizing the use of
processing and memory required.

2.8. EFFICIENCY TESTS (USING AMAZON EC2


PLATFORM)
The problems listed below were resolved on the Amazon EC2 platform,
using different numbers of nodes, so we will demonstrate improvement
offered by our tool.
We tested three different types of problems that are detailed in the
following subsections.

2.8.1. Ring Resonator


One of the common tasks performed in FDTD simulations is to examine
the behavior of the electromagnetic field of an object when it is affected
by a power source. By solving these types of problems output data with a
post-processing to generate a chart where we can see how the magnetic field
behaves is done is created.
The objective is to perform the next exercise aforementioned a “ring
resonator” which is merely a waveguide bends circularly. The result of it is
lively where you can appreciate the resonances of the electromagnetic field
image.
The structure of the example of ring resonator is given by the following
steps:
Creating and Managing Clusters for FDTD Computational Simulations.... 23

• Define the parameters to be used in the problem.


• Draw the geometry of the problem, namely the ring which are
echoes for this two cylinders one object defined as dielectric
material and the other material defined air so as to form a ring
was performed.
• Define the Gaussian pulse that allow the ring resonate and thereby
alter its electromagnetic field.
• Finally run the simulation, the basic idea of the problem is to run
until the sources are finished and then add a time from which
we will calculate the signals and produce images of how the
electromagnetic field behaves.
Table 2.1 shows the data of the geometric structure.

Table 2.1: Geometrical Data Structure Ring Resonator

Waveguide index 3.4


Waveguide width (micron) 1

Inner radius of the ring (micron) 1

Space between the waveguide and PML layer (micron) 4

PML thickness (micron) 2

Pulse width 0.15

Pulse rate 0.1

Resolution 40

After the execution of the problem the result obtained is processed


through the following commands:

When you run with different numbers of nodes the problem five
outcomes was obtained, the relevant comparison was made to verify the
drivability of them and actually found that all generate exactly the same.
Thus, it is assumed that no matter the number of nodes to be used provided
the result is the same.
24 Parallel Programming

Figure 2.3 shows the final result of the exercise.

Figure 2.3: Result from the post-processing problem resonance ring.


The next thing will do is observe the time it took for each node to
complete the exercise (see, Figure 2.4 graph).

Figure 2.4. Graphic node vs. time (minutes) exercise resonance ring.
As we can see in Figure 2.4, we have executed the problem five times
with different numbers of nodes and can realize improved time there when
solving the exercise with more nodes. This happens to some extent; in this
Creating and Managing Clusters for FDTD Computational Simulations.... 25

case 9 nodes that time back up.

2.8.2. 3D Simulation of an Optical Ring Resonant for the Trans-


mission Spectrum
Another of the most common problems is FDTD simulations performed to
study the transmission spectrum of the electromagnetic flux.
The following exercise show how the resonance is calculated by the
transmission spectrum in a system of two waveguides and a ring.
A guide is affected by a power source and transmits this energy to the
ring due to the resonance, which in turn can transmit energy to the other
waveguide.
To do this you have to run the exercise twice, the first time it is done
with the presence of the ring and the second time without the ring. Then it
proceeds to perform processing of the results.
Figure 2.5 illustrates the above-mentioned system.

Figure 2.5: Graphic simulation transmission ring.


The result of the problem is three graphs showing the behavior of the
electromagnetic spectrum transmission.
The structure of the exercise code is given by the following:
• Define parameters or variables to be used in the problem.
• Draw the geometry of the problem, namely the ring which are
echoes for this two waveguides defined as dielectric material and
a dielectric resonator ring if required is performed.

• Define the Gaussian pulse source allow the ring into resonance.
26 Parallel Programming

• Write the output condition in this case when the pulse is turned
off and waiting 150 timeslots, and then define what we, in this
case, the flow spectrum.
Table 2.2 shows the data of the geometric structure.

Table 2.2: Data Optical Ring Resonator Geometric Structure

Refractive index 3.03


Refractive index substrate 1.67

Inner radius of the ring (microns) 2

Ring outer radius (microns) 2.5

Width of the waveguide (microns) 0.55

Along the waveguide (microns) 0.405


Space between the waveguides and the
0.2
ring (microns)
Distance from the substrate relative to
0.75
the guide (Microns)
Substrate width (microns) 0.6

PML thickness (microns) One

As we had mentioned should run the problem twice, once I realized


that each generates an output file, we all results which contain lines shown
below:

Then we tabulate the results that cause problems for post-processing, as


shown in Table 2.3.

Table 2.3: Example Tabulation of Results Optical Resonator Ring

Spectral Flow
Frequency Input port Paso port Extraction port
Creating and Managing Clusters for FDTD Computational Simulations.... 27

5,33633135330676e-
0.36 1,87190375615421e-8 2,25302194395918e-10
9
5,74839347857691e-
0.360200100050025 1,91956223787719e-8 2,32817150153588e-10
9

Such as two executions at the end we obtain two tables as shown above,
which were then made to divide the corresponding data in each column
except the frequency between the outturn rings for execution result ringless.
At the end, we get a single table with which to plot the frequency proceed
against each spectral flow as shown in Figure 2.6.

Figure 2.6: Graphic vs. frequency spectrum (input).


From Figure 2.6, we can observe the behavior of the transmission
spectrum between the source and the first waveguide. The graph indicates
that most of fluctuation occurs around the unit value between 0.3 and 0.5
(Figure 2.7).

Figure 2.7: Graphic vs. frequency spectrum (Step).


Figure 2.7 can observe the behavior of the spectrum during the
transition between the first waveguide and the ring, which the valleys that
28 Parallel Programming

we see are due to the resonance of the ring, we must take into account the
miscalculations, for this case, between the range of 0.3 and 0.4, we can see
that several fluctuations appear in the response curve (Figure 2.8).

Figure 2.8: Graphic vs. frequency spectrum (removal).


From Figure 2.8, we can observe the behavior of the spectrum for energy
extraction. Because the source end does not generate power, it begins to be
extracted from the ring.
Like the previous when running with different numbers of nodes
problem the problem, five were obtained, all of which generated exactly the
same values. Finally see the times when the problem was completed, as the
execution is in two steps, we get two graphs shown in Figure 2.9.

Figure 2.9: Statistical graph nodes vs. time (minutes) exercise transmission
with ring.
Creating and Managing Clusters for FDTD Computational Simulations.... 29

As we can see in Figure 2.9, we have executed every problem five times
with different numbers of nodes, we can denote first execution problem with
the ring takes a little more than the implementation of the problem without
ring, however, the interesting thing lies that the proportion decreasing over
time in the two years to run more nodes are very approximate.

2.8.3. Calculation of Resonance Frequency and Quality Factor


of a Resonant Ring 3D
After seeing how the electromagnetic field behaves through a graph and
calculate the transmission spectrum to find resonances, it is important also
find numerical values to identify the frequencies and decay rates as well
as the quality factor, this we can make using the investment method using
harmonic package Harminv.
The aim of this exercise is to process the electromagnetic signals to
calculate frequencies and quality factor resonance system previously used.
For this problem also we study the existing two cases: the system with
ring and ringless system. The result of the problem shows several values that
help us indicate the resonance frequency, quality factor Q, amplitudes, and
the margin of error.
The structure of the code and data are the same as the previous year,
all that we change is that at the end instead of obtaining the flow spectra,
will process the signals for this Harminv receives four parameters are field
component electrical Ez, the position where we analyze, and the frequency
range. In both cases, the problem (with ring without ring), to finish running
generates an output file which shows all the steps made towards processing
and the final result (Harminv values). At the end the final result is tabulated,
Table 2.4 shows how data is tabulated; this is done with the results in both
embodiments.

Table 2.4: Example Tabulation of Results Harminv

Freq. Imag-
Freq. Real Q | Amp | Amplitude Error
inaria
harminv0: .4621 1,76E-04 -1,329.10 0,002 –0,0095–0,0012i 2,69E–04

harminv0: .4935 –0.0016 149.42 0,049 0.017 + 0,04874i 3,63E–05

harminv0: .5065 –5,20E–04 490.13 0,065 –0,037–0,05496i 1,40E–05

harminv0: .5189 –0.0027 94.93 0.059 0.0519 + 0,013851i 1,15E–04

harminv0: .5225 –3,66E–04 723.34 0.134 0.06928 + 0,11025i 2,31E–05


30 Parallel Programming

To process the end result of the tabulated data, we find the frequencies
where no resonance, but not all values should be considered because as we
might look there is a margin of error should be considered. To find out if each
frequency values are still correct comparison is performed. The imaginary
frequency absolute value must be greater than the margin of error. In Table
2.5, we note which candidates are to be correct values.

Table 2.5: Example Tabulation of Results Harminv

Freq, Freq.
Q | Amp | Amplitude Error
Real Imaginaria
harminv0: 0.46 1,76E–04 –1,329.10 0.002 –0.0095–0.0012i 2,69E–04
harminv0: 0.49 –0.0016 149.42 0.049 0.017 + 0.04874i 3,63E–05
harminv0: 0.50 –5.20E–04 490.13 0.065 –0.037–0.05496i 1,40E–05
harminv0: 0.51 –0.0027 94.93 0.059 0.0519 + 0.013851i 1,15E–04
harminv0: 0.52 –3.66E–04 723.34 0.134 0.06928 + 0.11025i 2,31E–05

Reviewing the results generated problems according to the number of


nodes used; you can verify that the values obtained are the same.
Finally, we note the time when the problem was completed, as it consists
of two parts (one ring and one without ring), we obtain two graphs (Figures
2.10 and 2.11).

Figure 2.10: Graph nodes vs. time (minutes) exercise with ring transmission
using Harminv.
Creating and Managing Clusters for FDTD Computational Simulations.... 31

Figure 2.11: Graph nodes vs. time (minutes) without exercise ring transmission
using Harminv.
As we can see in Figures 2.10 and 2.11, to run each issue five times with
various numbers of nodes, we decrease the time we notice that in each case
as we use more nodes.

2.9. ANALYSIS OF THE RESULTS


In each graph shows the performance behavior under three conditions, using
StarMeep resources. In general, it is observed that a larger number of nodes,
it has less processing time. However, the successful use of the number
of nodes depends on the type of problem solved and output files that are
generated as a result.
For the type of problem, there are several factors that influence the most
important of which we have the resolution that desired output files and the
size of the structure you want to perform the simulation are generated.
While for the output files, influences the amount of information in
gigabytes that the problem required, because the disk write usually time-
consuming.
Chapter 3

Parallel Algorithm Designed


by Technique “PCAM”

CONTENTS
3.1. Partition ............................................................................................ 34
3.2. Domain Decomposition ................................................................... 34
3.3. Functional Decomposition................................................................ 35
3.4. List Partitions Design ......................................................................... 36
3.5. Communication ................................................................................ 36
3.6. Agglomeration .................................................................................. 37
3.7. Reducing Costs of Software Engineering ........................................... 40
3.8. Load Balancing Algorithms ............................................................... 41
3.9. Task Scheduling Algorithms............................................................... 41
3.10. Allocation List Design ..................................................................... 42
3.11. Model of The Atmosphere ............................................................... 42
3.12. Agglomeration ................................................................................ 45
3.13. Load Distribution ............................................................................ 47
34 Parallel Programming

In order to design a parallel algorithm, first we have to change the issue


on an algorithm that shows us the concurrency, scalability, and locality;
taking into account that such algorithms are not simple because they require
integrated thinking. However, we can verify from a methodical approach to
maximize the options that you provide mechanisms to evaluate options, and
avoid the high costs of some bad decisions.
At each stage, it will release the concept with a brief description and
with respective example.

3.1. PARTITION
Stage partition is a design intended to expose opportunities for parallel
execution. Therefore, the focus is defining a large number of small tasks in
order to produce a fine-grained decomposition of a problem. Fine-grained
decomposition of a problem, just as fine sand, is easier to pour than a pile
of bricks. The fine-grained decomposition provides the most flexibility in
terms of possible parallel algorithms. In later design stages, evaluation of
communication requirements, the target architecture, or software engineering
issues lead us to forego may opportunities identified for parallel execution
at this stage.
A good partition divides into small pieces both the computation
related to a problem and the data on which this calculation operates. When
designing a partition, programmers most commonly focus first on the data
associated with a problem, then determine an appropriate partition for data
and eventually find a way to associate the calculation data. This partitioning
technique is known as domain decomposition. The alternative approach—
first decomposing the computation to be performed and first focus to break
the calculation to be performed and then dealing with the data—is termed
functional decomposition. These are complementary techniques applicable
to different components of a single problem or even applied at the same
problem for alternative parallel algorithms.

3.2. DOMAIN DECOMPOSITION


In the domain decomposition approach of partitioning problem, we seek
first to decompose the data associated with a problem.
Therefore, let us first break the data associated with a problem. If
possible, these data are divided into small pieces of approximately equal
size. Then calculate the partition to be performed, typically by associating
Parallel Algorithm Designed by Technique “PCAM” 35

each operation data in which it operates. This division provides a number of


tasks, each consisting of some data and a set of operations on that data. An
operation may require data from multiple tasks. In this case, communication
is required to move data between tasks. This requirement is addressed in the
next phase of the design process.
The data can be decomposed program entry, the output calculated by
the program, or intermediate values maintained by the program. Different
partitions may be possible, on the basis of different data structures. The
good rules of thumb are to focus first on the largest data structure or data
structure that is frequently accessed.
Different steps of calculation can operate in different data structures, or
order different decompositions for the same data structures. In this case, we
treat each phase separately and then determine how the decompositions and
parallel algorithms developed for each phase of fit. The issues that arise in
this situation are discussed in Chapter 4.

3.3. FUNCTIONAL DECOMPOSITION


Functional decomposition represents a different and complementary
way of thinking about problems. In this approach, the initial focus is on
the calculation to be performed rather than on the data manipulated by
calculation. If we succeed in dividing this calculation in discontinuous tasks,
it proceeds to examine the data requirements of these tasks.
These data requirements may be discontinuous, in which case the
partition is complete. Alternatively, they may overlap significantly, in which
case considerable communication is required to avoid data replication. This
is often a sign that a domain decomposition approach should be considered
instead.
While domain decomposition is the basis for most parallel algorithms,
functional decomposition is valuable as a different way of thinking about
problems. For this reason alone, it should be considered when exploring
possible parallel algorithms. A focus on the computations are to be performed
that can sometimes reveal structure in a problem, and hence opportunities
for optimization, that would not be obvious from a study of data alone.
As an example of a problem for which functional decomposition is most
appropriate. It explores a search tree looking for nodes that correspond to “
solutions.” The algorithm has no obvious structure that data can decompose.
However, a fine-grained partition can be as described in section obtained.
36 Parallel Programming

Initially, one task is created by the tree root. A task evaluates its node and
then if that node is not a leaf, creates a new task for each page call (subtree).

3.4. LIST PARTITIONS DESIGN


Phase partitioning of a design is to produce one or more possible
decompositions of a decompositions problem. Before assessing the
communication needs, the following checklist is used to ensure that the
design has obvious flaws. In general, all these questions should be answered
in the affirmative.
1. Does your partition defining at least an order of magnitude more
than the tasks are not the target computer processors? If not, you
have little flexibility in subsequent design stages.
2. Does your partition to avoid redundant computation and storage?
3. They are tasks of comparable size.
4. Does the number of tasks large scale with the size of the problem?
Ideally, an increase in problem size may increase the number of
task rather than the size of single tasks.
5. Have you identified several alternative partitions?

3.5. COMMUNICATION
Generated tasks must run independently but cannot. The calculation requires
is that the data are associated with other tasks that mean that tasks require
communication between two homework, as direct linking tasks “in which
tasks can send messages, and the other can receive.” Then the communication
associated with an algorithm can be specified on two faces. The first phase is
to define the structure of the canal linking either directly or indirectly, tasks
that require data (consumers) with tasks that have that data (producers). In
the second phase, the messages will be sent and received on these channels
are specified. It all depends on our technology deployment ultimately.
In the domain decomposition problems, reporting requirements may be
difficult to determine. Remember that this strategy produces the tasks for the
first partition data structures into disjoint subsets and then associated with
each data operations that operate solely on that data. This part of the design
is usually simple. However, some operations that require data from multiple
tasks usually remain. Communication is then necessary to manage the
transfer of data required for these tasks to continue. The organization of this
Parallel Algorithm Designed by Technique “PCAM” 37

communication can efficiently be challenging. Even simple decompositions


may have complex communication structures.
In order to be clearer, we have classified communication as follows:
• In the local communication, each task communicates with a small
set of other tasks (their neighbors);
• In the global communication, each task requires communicating
with many tasks.
• In the structured communication, task and its neighbors form a
regular structure, such as a tree or network.
• In the static communication, the identity of the communication
partners does not change over time.
• The communication dynamics can be determined by the calculated
data at runtime and can be very varying.
• In the synchronous communication, producers and consumers are
executed in coordination with producer/consumer pairs cooperate
in data transfer operations.
• But in asynchronous communication, it may require the consumer
to obtain data without the cooperation of the producer.

3.6. AGGLOMERATION
Regarding the previous two phases, an algorithm that is not considered
complete in the sense that it is not specialized for efficient execution on
any parallel equipment, in particular is indeed very inefficient. For example,
if many more tasks are created, it is obtained that processors on the target
computer are not designed for the efficient execution of small tasks.
Therefore, rereading the decisions made in the division and the
communication phase focuses on obtaining an algorithm to be implemented
efficiently in a class of parallel computing. In particular, it is judged whether
it is useful to combine or agglomerate, tasks identified by the partitioning
phase, in order to provide fewer, larger tasks. It also determines whether it is
useful to replicate data calculations.
The three examples describing this phase are:
1. In this example, the size of the tasks are increased by reducing the
size of the decomposition of three to two.
38 Parallel Programming

2. Adjacent tasks are combined to produce a three-dimensional


decomposition of higher granularity.

3. Substructures joined in a “divide and conquer.”

• Nodes in a tree algorithm combined.

Despite the reduced tasks produced by this stage, the design is still
somewhat abstract, since issues relating to the allocation of processors
remain unsolved. On the other hand, we can choose at this stage to reduce
the number of tasks to exactly one processor. We are able to do this because
our goal is to create a parallel program within the environment that requires
an SPMD program.
Parallel Algorithm Designed by Technique “PCAM” 39

This phase focuses on the general issues that arise when there is
increased granularity of tasks. There are three possible choices or alternatives
objectives agglomeration and reproduction: (i) reducing communication
costs by increasing computing and granularity communication, (ii) maintain
flexibility with regard to scalability and allocation decisions, and (iii) reduce
the costs of software engineering.
Regarding the increased granularity, we have the following:
• A critical issue that affects performance parallel is communication
costs; clearly this improvement can be achieved by sending less
data, or can also be achieved by using fewer messages, even if it
is sent amount of data.
• Another concern is the cost of creating the task.
The following images show the same task of fine-grained and coarse-
grained second that example is exploded to show your outgoing messages
(dark gis) and incoming messages (shaded of course).

The grid is divided into 8x8 = 64 tasks, each responsible for one point,
and they require 64x4 = 256 communications, 4 tasks 256 the data value.
40 Parallel Programming

Partitioned 2x2 = 4 task, each responsible for 16 points, only 4x4 = 16


communications are required and only 16x4 = 64 values refer.
If the number of partners per task is small, often you can reduce both the
number of communication actions and the total volume of communication
by increasing the granularity of our part, that is, by agglomerating several
tasks at a time.
In other words, the communication requirements of a task are proportional
to the surface of the subdomain in which it operates, while the computational
requirements are proportional to the volume of the subdomain. Therefore, the
amount of communication performed by a calculation unit (communication/
computing ratio) decreases as size increases the task. This effect manifests
when a partition is obtained by using technique ‘Domain Decomposition.’
• It is important not agglomerating a multidimensional data
structure in one dimension and is inconvenient to carry when the
program to a parallel structure.
• The ability to create a variable number of tasks is essential for a
program to be portable and scalable.
• The flexibility does not necessarily imply that a design always
creates a number of tasks; the granularity can be controlled by a
parameter compile time or run time.

3.7. REDUCING COSTS OF SOFTWARE ENGINEER-


ING
An additional concern, which can be particularly important when the
shutdown of existing sequential code are the relative costs related
development strategies different partition.
Another software engineering topic to be considered is the distribution
of data used by other program components. For example, the best algorithm
for some program components may require an input matrix data structure is
decomposed in three dimensions, while a previous calculation step generates
a two-dimensional decomposition. Either one or both algorithms must be
changed, or a restructuring phase should be explicitly incorporated in the
calculation. Each approach has different performance characteristics.
The final stage of the parallel algorithm design specifies that each task
is to execute the mapping on uniprocessors or on shared-memory computers
that provide automatic task scheduling. This sub-problem of mapping
uniprocessor or shared memory computers that provide automatic task
Parallel Algorithm Designed by Technique “PCAM” 41

scheduling should be dealt separately. In these computers, a set of tasks and


associated communication requirements represents a sufficient specification
for a parallel algorithm; hardware or operating system to be relied upon.
There exist several mechanisms to schedule tasks to available executable
processors. On these computers, a set of tasks and associated communication
requirements is a sufficient specification for our parallel algorithm. The
mechanisms of the operating system or hardware can be invoked to schedule
tasks executable on available processors. Unfortunately, general-purpose
mapping mechanisms still have to be developed for scalable parallel
computers, as well as commonly used allocation mechanisms. In general,
mapping problem must be explicitly addressed when designing parallel
algorithms.
Your goal in developing mapping algorithms is to minimize the total
execution time. We use two strategies in order to achieve this goal:
1. Scheduling tasks: to run simultaneously on different processors
in order to increase concurrency.
2. Scheduling tasks: to communicate frequently on the same
processor, in order to increase the locality.
Clearly, these two strategies are often conflicting, in which case the
design of our algorithm involves some advantages and disadvantages. In
addition, resource limitations tend to restrict the number of tasks that can be
allocated to a single processor.

3.8. LOAD BALANCING ALGORITHMS


A wide variety of both general and specific implementations of load
balancing techniques have been proposed for parallel algorithms, based
on domain decomposition techniques. Several representative approaches
are reviewed here, namely recursive bisection methods, local algorithms,
probabilistic methods, and cyclic mappings. These techniques are all
intended to agglomerate fine-grained tasks defined in an initial partition to
yield coarse-grained tasks per processor. Alternatively, we can think of them
as our computational domain partition for a subdomain for each processor.
For this reason, it is often referred to as partitioning algorithm.

3.9. TASK SCHEDULING ALGORITHMS


Task scheduling algorithms can be used when a functional decomposition
approach produces many tasks, each with the weak locality requirement.
42 Parallel Programming

Thereat, a centralized or distributed task pool is maintained, into which new


tasks are placed and from which tasks are taken for allocation to processors.
Effectively, we will reformulate the parallel algorithm to be solved by a set
of worker tasks, one per processor typically.

3.10. ALLOCATION LIST DESIGN


Whenever possible, a static mapping scheme that assigns each task to a
single processor is used. However, the number or size of tasks is unknown
until the runtime, and we can use a dynamic load balancing scheme or
reformulate the problem, and a task scheduling structure can be used for
schedule computation.
The following questions can serve as a basis for an informal assessment
of the mapping design.
1. Can we consider a SPMD design for a complex problem?
2. Can a design based on the creation of dynamic tasks and disposal
be considered?
3. Can a balancing centralized system is used?
4. Have we assessed the relative costs of different strategies, and
included the implementation costs in our analysis?
5. Are there a sufficiently large number of tasks to ensure reasonable
load balancing? Typically, at least ten times as many tasks as
processors are required.

3.11. MODEL OF THE ATMOSPHERE


The model of the atmosphere is a program that simulates ordered atmospheric
processes (clouds, wind, precipitation, etc.) influencing the climate. It can
be used to study the evolution of tornadoes, to predict time tomorrow, or to
study the impact on the climate of increased concentrations of atmospheric
carbon dioxide. Like many of the numerical models of physical processes,
a model of the atmosphere meets a set of partial differential equations, in
this case describing the behavior of fluids dynamic basis of the atmosphere.
The behavior of these equations in a continuum approaches for their
behavior in a finite set of points regularly spaced in that space. Usually,
these points are at a latitude and longitude rectangle size , with
in the range of 15 to 30, in the range of 50 to 500.
Parallel Algorithm Designed by Technique “PCAM” 43

This network is periodic within the x and y dimensions, meaning that grid
point is regarded as being adjacent to , and . A vector
of values is maintained at each grid point, representing quantities such as
pressure, temperature, wind speed, and humidity.
At this stage, let us explain the example with each of the techniques
“Partition, Communication, Agglomeration, and Mapping.” The grid used
to represent the state on the model of the atmosphere is a natural candidate
for domain decomposition. Decompositions of x, y, and/or z dimensions are
possible.

This task maintains its status as the various values associated with the grid point
and is responsible for the calculation required to update that state in each time step.
Therefore, we have a total of tasks, each with O(1) data and computing
time.
First, we consider the communication needs. Let us identify three
distinct communications as depicted in Figure 3.1.

Figure 3.1: The task and channel structure for calculating the difference of two
finite-dimensional template nine points, assuming a grid point for each proces-
sor. Only the channels used by the task are shown shaded.
44 Parallel Programming

1. Finite difference stencils. If we assume a fine-grained


decomposition which encapsulates each task to single grid point,
the nine-point stencil used in the horizontal dimension requires
each task values from eight neighboring tasks. Templates finite
differences, if we assume a fine-grained decomposition in which
each task encapsulates a single mesh point, nine-point template
used in the horizontal dimension requires each task of obtaining
the values of neighboring eight tasks. The corresponding channel
structure is illustrated in Figure 3.1. Similarly, the three-point
stencil used in the vertical dimension that requires each task
oftener values from two neighbors. Similarly, the three-point
template used in the vertical dimension requires that each task
get values from two neighbors.
2. Global operations. The atmosphere model computes the total
mass of the periodical elements in the atmosphere, in order to
verify the simulation is proceeding correctly. The atmosphere
model periodically calculates the total mass of the atmosphere,
in order to verify the simulation is successful. This quantity is
defined as follows:
N x −1 N y −1 N z −1
Total Mass = ∑ ∑ ∑M
=i 0=j 0 =k 0
ijk

where denotes the mass at grid point (i, j, k). This sum can
be computed using one of the parallel algorithms presented in
Section 2.4.1.
3. Physics Computations. If each task encapsulates a single grid
point, then a component of the physics the model requires
significant communication atmosphere. For example, the overall
clear sky (TCS) at level, for example, the total clear sky (TCS) at
is defined as
k
TCS
= k ∏ (1 − cld )TCS
i =1
i 1

= TCSk −1 (1 − cld k )1
where 0 is the top level of the atmosphere and the EPC is the cloud
fraction at level I. is the fraction of clouds at level I. This product
prefixes operation. In all, the component of the physics model
requires on the order of 30 per grid point and communications per
Parallel Algorithm Designed by Technique “PCAM” 45

time step. In total, the component of the physical model requires


on the order of 30 communications grid point and over time.
The communication associated with the finite difference stencil is
distributed. This is the communication necessary for the overall operation of
communication (we might also consider performing overall operations less
frequently since its value is intended only for diagnostic purposes.) The one
component of our algorithm’s communication structure that is problematic
is the physics. (We could also consider making this global operation
less frequently because its value is only for diagnostic purposes.) The
component of a communication structure of our algorithm is problematic is
physics. However, we see that the need for this communication can prevent
agglomeration (Figure 3.2).

Figure 3.2: Use agglomeration to reduce the communication needs in the mod-
el atmosphere. (a) A single point is responsible for each task and therefore must
obtain data from eight other tasks to apply the template nine points. (b) Granu-
larity is increased and therefore it increases granularity by 2x2 points.

3.12. AGGLOMERATION
Our fine grain domain decomposition atmosphere created model
N ×N ×N
x y
between tasks: 10 − 10 , depending on the size of the
z
5 7

problem. This is likely to be much more than they need and some degree of
agglomeration can be considered. We identified three reasons for achieving
agglomeration:
1. A small amount of agglomeration (from one to four mesh points
per task) can reduce communication requirements associated
46 Parallel Programming

with the template nine points from eight to four posts per task
over time.
2. Communication requirements in the horizontal dimension are
relatively small: a total of four messages containing eight values of
data. In contrast, the vertical dimension requires communication
not only for the template finite difference (two messages, two
data values) but also for various other computations. These
communications can be avoided by agglomerating tasks within
each vertical column.
3. Agglomeration in the vertical is also desirable from a standpoint
of software engineering. Horizontal units are restricted to the
dynamic component model; the physical component operates
within individual columns only. Therefore, a two-dimensional
horizontally sequential physical decomposition allows existing
code to be reused in a parallel program without modification.
This analysis makes it appear sensible to refine our parallel algorithm to
use a horizontal decomposition two-dimensional mesh model in which each
task encapsulates at least four mesh points. Communication requirements
are then reduced to the template associated with nine points and the addition
operation. Note that this algorithm can create in most N x × N y / 4 between
tasks: 10 − 10 , depending on the size of the problem. This number is likely
3 5

to be sufficient for most practical purposes.


It is evident from the figure that in this case the further agglomeration
can be performed, in the limit; each processor can assign a single task
responsible for many columns, thereby giving one program SPMD.
This allocation strategy is efficient if each column grid task performs the
same amount of computation at each time step. This assumption is valid for
many of the problems of finite differences but turns out to be valid for some
models of the atmosphere. The reason is that the cost of physics calculations
can vary significantly depending on the state variables of the model. For
example, radiation calculations are not performed at night, and clouds only
form when the humidity exceeds a certain threshold.
The simple strategy simple mapping can be used.
Parallel Algorithm Designed by Technique “PCAM” 47

From figure above, it can be seen that additional agglomeration can be


performed, in the limit, each processor can assign a single task responsible
for many columns since thereby is a program SPMD.
This allocation strategy is efficient if each task is performed by a column
of the grid and performs the same amount of computation at each time step.

3.13. LOAD DISTRIBUTION

Load distribution in a model of the atmosphere with a grid 64x128. The


above figure shows for each point of computational load to a single time
step, with the relative frequency histogram giving different values of load.
The left image shows a time interval in which the radiation measurements of
time are performed and the right image a step of ordinary time.
48 Parallel Programming

Load distribution in component physics model atmosphere in the absence


of load balancing. At the top of the figure, the hatching is used to indicate the
computational load on each of 16x32 processors. Strong spatial variation is
evident. This effect is due to the day/night cycle (radiation calculations are
performed only in sunlight).

In many circumstances, this loss of performance can be considered


acceptable. However, if a model is widely used, worth leaving to spend time
to improve efficiency. One approach is to use a form of cyclic assignment:
for example, assigning tasks to each processor in the western and eastern
Parallel Algorithm Designed by Technique “PCAM” 49

and northern and southern hemispheres. The image shows the reduction in
load imbalance can be achieved with this technique; this reduction should be
compared with the resulting increase in communication costs.
Chapter 4

Parallel Computer Systems

CONTENTS
4.1. History.............................................................................................. 53
4.2. Parallel Computing ........................................................................... 53
4.3. Background ...................................................................................... 54
4.4. Types Of Parallelism .......................................................................... 59
4.5. Hardware ......................................................................................... 61
4.6. Applications ..................................................................................... 67
4.7. History.............................................................................................. 68
52 Parallel Programming

Parallel systems are those that have the ability to perform multiple
operations simultaneously. Generally, these systems typically handle large
amounts of information in the order of terabytes and can process hundreds
of requests per second. Parallel systems are composed of several systems
sharing information, resources, and memory somehow. Parallel Systems
– multiprocessor systems with more than one processor communication
between strongly coupled system – processors share memory and clock;
communication is usually done by shared memory. The advantages are
gained through increase in reliability (Figure 4.1).

Figure 4.1: Supercomputer cray-2 – the fastest in the world from 1985 to 1989.
Parallel computing is a programming technique in which many
instructions are executed simultaneously. It is based on the principle that large
problems can be divided into smaller parts that can be solved concurrently
(“in parallel”). Several types of parallel computing: bit-level parallelism,
instruction level parallelism, data parallelism, and task parallelism. For
many years, parallel computing has been implemented in high-performance
computing (HPC), but interest in it has increased in recent years due to the
physical constraints preventing frequency scaling. Parallel computing has
become the dominant paradigm in computer architecture mainly in multicore
processors. But recently, the power consumption of parallel computers has
become a concern.
Parallel computers can be classified according to the level of parallelism
that supports your hardware: multicore and multiprocessing computers have
multiple processing elements on a single machine, while clusters and MPP
grids use multiple computers to work on the same task.
The parallel computer programs are more difficult to write than sequential
because concurrency introduces new types of software errors. Communication
and synchronization between the different subtasks are typically the greatest
Parallel Computer Systems 53

barriers to achieve good performance of parallel programs. The increase in


speed achieved as a result of a program parallelization is given by Amdahl
law.

4.1. HISTORY
The software has traditionally oriented computing series. To solve a problem,
it builds an algorithm and is implemented on a serial instruction stream.
These instructions are executed on the central processing unit of a computer.
At the time when an instruction is completed, the next run.
Parallel computing uses multiple processing elements simultaneously to
solve a problem. This is achieved by dividing the problem into independent
parts so that each processing element can execute its part of the algorithm
at the same time as everyone else. Processing elements can be diverse and
include resources such as a single computer with many processors, multiple
networked computers, specialized hardware or a combination thereof.

4.2. PARALLEL COMPUTING


Parallel computing is a form of calculation in which many instructions are
carried out simultaneously, operating on the principle that large problems
can often be divided into smaller ones, which are then solved concurrently
(“in parallel”). There are several different ways of computing the parallel:
bit-level parallelism, Instruction-level parallelism, data parallelism, and task
parallelism. It has been used for many years, mainly in high-performance
computing (HPC), but interest in it has grown in recent years due to physical
constraints preventing frequency scaling. Parallel computing has become
the dominant paradigm in computer architecture mainly in the form of
multicore processors. However, in recent years, energy consumption by
parallel computers has become a concern.
Parallel computers can be roughly classified according to the level in
parallel with which the hardware supports multicore and multiprocessor
computers having multiple processing elements within the one machine,
while clusters, MPPs, and grids use multiple computers to work on the same
task.
Parallel computer programs it more difficult to write than sequential ones,
because concurrency introduces several new classes of potential software
bugs, of what race conditions are the most common. Communication and
synchronization between the different subtasks is typically one of the
54 Parallel Programming

biggest barriers to getting good parallel program performance. Speedup of a


program as a result of parallelization is given by Amdahl’s Law.

4.3. BACKGROUND
Traditionally, software has been written for serial computation. To solve
a problem, algorithm it is constructed which produces a serial stream of
instructions. These instructions are executed on the central processing unit
in a computer. Just it runs only one instruction can be executed in a time-
after that instruction, the next.
Parallel computing, on the other hand, uses multiple processing elements
simultaneously to solve a problem. This is accomplished by breaking the
problem into separate parts so that each processing element can execute its
part of the algorithm simultaneously. Processing elements can be diverse
and include resources such as a single computer with multiple processors,
multiple networked computers, specialized hardware, or any combination
of the above.
Frequency scaling was the dominant reason for improvements in
computer performance from the mid-eighties until 2004. The runtime of a
program is equal to the number of instructions multiplied by the average time
per instruction. Keeping everything constant, increasing the clock frequency
decreases the average time it takes to execute an instruction. An increase in
frequency thus decreases runtime for all computer-limited programs.
However, power consumption by a chip is given by the equation P = V
× C2 × F the where P is power, and C is capacitance. It is being changed by
being clock cycle (proportional to the change of the inputs of the transistors),
V voltage and F is the processor frequency (cycles per second). Increases
in frequency increase the amount of energy used in a processor. Increase
power consumption processor ultimately led to Intel May 2004 cancellation
of its Tejas and Jayhawk processors, which is usually cited as the end of
frequency scaling as the dominant paradigm in computer architecture.
Moore’s Law is the empirical observation that transistor density in a
microprocessor doubles every 18 to 24 months. Despite power consumption
issues, and repeated predictions of its end, Moore’s Law is still in effect.
With the end of frequency scaling, these additional transistors (which are
no longer used for frequency scaling) can be used to add extra hardware for
parallel computing.
Parallel Computer Systems 55

4.3.1. Amdahl’s Law and Gustafson’s Law


Theoretically, the acceleration of parallelization must linear-doubling the
number of processing elements if Halve runtime, and folded a second time
halved again runtime. However, very few parallel algorithms achieve optimal
speed. Most make a near-linear speed for a small amount of processing
elements, which flattens out at a constant value for a large number of
processing elements.
The potential speedup algorithm in a computing platform parallel is
given by Amdahl’s Law, Originally formulated by Gene Amdahl in the 60
states that a small portion of the program that cannot be done parallelism will
limit the overall speeds available from parallelization. Any mathematical or
large engineering problem typically consists of several parallelizable parts
and several pieces (sequential) non-parallelizable. This relationship is given
by the equation:
Where S is the acceleration program (as a factor of time original
sequential pass), and P is the fraction that is parallelizable. If the sequential
portion of a program is 10% last time, we cannot get any more than a 10x
speedup, regardless of how many processors are added. This puts an upper
limit on the usefulness of adding more parallel execution units. “When a task
cannot be spread because of sequential constraints, the use of more effort
has no effect on the schedule. The bearing of a child takes nine months, no
matter how many women assigned.”
Gustafson’s Law is another law in computer engineering, closely related
to Amdahl’s law. It can be formulated as:
G(x) = 1 – P + SP
where P is the number of processors, S is the acceleration α, and the non-
parallelizable part of the process [11]. Amdahl’s law assumes a fixed-problem
size and the size of the sequential section is independent of the number of
processors, whereas Gustafson’s law does not make these assumptions.

4.3.2. Outbuildings
Understanding data dependencies it is essential to implement parallel
algorithms. No program can run faster than the longest chain of dependent
calculations (known as critical path), since calculations that depend on
previous calculations in the chain must be executed in order. However, most
algorithms do not consist of just a long chain of dependent calculations; there
are usually opportunities to execute independent calculations in parallel.
56 Parallel Programming

Let Pi and Pj be two fragments of the program. Bernstein conditions de-


scribe when the two are independent and can be executed in parallel.
The violation of the first condition introduces a flow dependency,
corresponding to the first statement producing a result used by the second
statement. The second condition represents anti-dependency, when the first
statement writes a needed second variable expression. The third and final
condition, q is a dependency output. When two variables write to the same
location, the final output must have arisen from the second statement [13].
Consider the following functions, which demonstrate several kinds of
dependencies:
Function Dep (a, b)
c: = a · b
d: = 2 · c
End Function
Operation 3 in Dep (a, b) cannot be executed before (or even parallel)
Operation 2, because Operation 3 uses a result of the Operation 2. Viola one
position, and thus introduces a flow dependence.
Function No Dep (a, b)
c: = a · b
d: = 2 · b
e: = a + b
End Function
In this example, there are no dependencies between the instructions, so
they can all be run in parallel.
Bernstein conditions do not allow the memory to be shared between
different processes. For that, some means of enforcing order among
accesses are required, for example, semaphores, barriers or some other
synchronization method.

4.3.3. Race Conditions, Mutual Exclusion, Synchronization,


and Parallel Slowdown
The subtasks in a parallel program are often called threads. Some architectures
of parallel computer using smaller, lighter yarns known as thread fiber
versions, while others use larger versions known as processes. However, the
“threads” are generally accepted as a generic term for subtasks. The threads
Parallel Computer Systems 57

will often need to put any day variable that is shared between them. The
instructions between the two programs may be interleaved in any order. For
example, consider the following program:
Thread A Thread B
1A: Read variable V 1B: Read variable V
2A: Add 1 to variable V 2B: Add 1 to variable V
3A write back to V Variable 3B: Write back to variable V
If instruction 1B is executed between 1A and 3A, or if instruction 1A is
executed between 1B and 3B, the program will produce incorrect data. This
is known as race condition. The programmer must use a lock to provide
mutex. A lock is a programming language construct that allows one thread to
take control of a variable and prevent other threads from reading or writing
it, until it opens that variable. The thread holding the lock is free to execute
its critical section (the section of a program that requires exclusive access to
some variable), and open the data when it is finished. Therefore, to guarantee
correct program execution, the program can be rewritten above to use locks:
Thread A Thread B
1A: Lock variable V 1B: Lock variable V
2A: Read variable V 2B: Read variable V
3A: Add 1 to variable V 3B: Add 1 to variable V
4A written back to variable V 4B: Write back to variable V
5A: Open V Variable 5B: Open V Variable
Here, A thread will lock successfully V variable while the other will
thread locked out- unable to proceed until V is opened again. This ensures
the correct execution of the program. Locks while they are necessary to
ensure correct program execution can greatly slow a program.
Fixing multiple uses of non-atomic variables locks introduce the
possibility of program dead end. An atomic lock locks multiple variables
all at once. If you cannot lock everyone, which wants not lock them? If two
threads each need to lock the same two variables using non-atomic clocks,
it is possible that a thread will lock one and the second thread will lock the
second variable. In this case, neither threads can end, and the results of the
impasse.
Many parallel programs require that their subtasks act in synchrony.
This requires the use of the barrier. Barriers typically are implemented using
a software lock. One class of algorithms, known as algorithms lock-free
and wait-free avoids altogether the use of locks and barriers. However, this
approach is generally difficult to implement and requires data structures
58 Parallel Programming

properly designed. Not all parallelization results in speed. Generally, such a


task is divided into increasingly threads; these threads spend an increasing
portion of their time communicating with each other. Eventually, the
overhead of communication dominates the time spent solving the problem,
and future of parallelization increases (i.e., from excess load screw even
more work) rather than decrease the amount of time required to finish. This
is known as parallel slowdown.

4.3.4. Fine-Grained Parallelism, Coarse-Grained, and Embar-


rassing
Applications are often classified by how many times their subtasks need
to synchronize or communicate with each other. An application exhibits
fine-grained parallelism if its subtasks must communicate many times per
second; exhibits coarse-grained parallelism if they do not communicate
many times per second, and is embarrassingly parallel if rarely or never
have to communicate. Embarrassingly parallel applications are considered
the easiest to parallelism.

4.3.5. Consistency Models


Parallel programming languages and parallel computers must have model
consistency (also known as memory model). The consistency model defines
rules for how operations on computer memory occur and how results are
produced.
One of the first models of consistency was Leslie Lamport sequential
consistency model. Sequential consistency is the characteristic of a parallel
program parallel execution produces the same results as a sequential
program. Specifically, a program is sequentially constant if.” the results of
any execution is the same as if the operations of all the processors were
executed in some sequential order, and the operations of each individual
processor appear in this sequence in the order specified by its Program” [14].
Software transactional memory is a common type of consistency model.
It borrows from transactional memory software database theory the concept
of atomic transactions and applies them to memory accesses.
Mathematically, these models can be represented in several ways. Petri
nets, which were introduced in the doctoral thesis of Carl Adam Petri 1962,
were an early attempt to encode the rules of the model consistency. Dataflow
theory was later built on these, and data flow architectures were created to
Parallel Computer Systems 59

physically put ideas into execution of the data flow theory. Beginning in the
late ‘70s, calculation process, for example, calculation of communicating
systems and communicating sequential processes were developed to permit
algebraic reasoning about components composed of interacting systems.
The most recent calculation process additions are π-calculus, which added
the ability for reasoning about dynamic topologies. Logics such as Lamport
TLA + mathematical models, for example, traces and event diagrams agent
have also been developed to describe the behavior of concurrent systems.

4.3.6. Flynn’s Taxonomy


Michael J. Flynn systems created earlier classification for parallel (and
sequential) computers and programs, now known as Flynn’s taxonomy. Flynn
classified programs and computers by whether they worked with a single
system or multiple sets of instructions, whether or not those instructions
were using single or multiple data systems.
Flynn’s taxonomy Single Instruction Multiple Instructions
Single SISD MISD
Multiple SIMD MIMD
The classification of the single-instruction-single-data (SISD) is
equivalent to an entirely sequential program. The classification of the
single-instruction-multiple-data (SIMD) is analogous to perform the same
operation repeatedly over a large modem. This is commonly done in process
uses signal. Multiple-instruction single-data (MISD) are rated rarely used.
While computer architectures to deal with this were devised (e.g. orders), few
applications that fit this class materialized. Programs multiple-instruction-
multiple-data (MIMD) are by far the most common type of parallel programs.
According to David Patterson and John L. Hennessy: “Some machines
are hybrids of these categories, of course, but this classic model has
survived because it is simple, easy to understand, and gives a good first
approximation. It is also perhaps because of its understandability-the most
widely used scheme.”

4.4. TYPES OF PARALLELISM

4.4.1. Bit-Level Parallelism


The advent of very-large-scale integration (VLSI) technology manufacturing
of computer-chips in the ‘70s until 1986, which accelerated the computer
60 Parallel Programming

architecture, was driven by doubling the size of the word in computers –


the amount of information that the processor can execute per cycle [16].
Increasing the word size reduces the number of instructions the processor
must execute to perform an operation on variables sizes are greater than the
length of the word. For example, 8 bit processor must add two 16 bit integer
numbers, The processor must first add the 8 bits of the low-order of each
integer using the standard addition instruction, then add the higher-order 8
bits using an instruction from the add-with-carry and carry bit adding the
lowest order; and, a processor requires two 8 bit instructions finishing one
operation, which would be capable of 16 bit processor to end the operation
with a single instruction.
Historically, 4-bit microprocessor was replaced by 8-bit, 16-bit, and 32-
bit microprocessors. This trend usually ended with the introduction of 32-bit
processors, which has been a standard in computing commonly used for two
decades. Not until recently (circa 2003–2004), with the advent of x86–64
architectures, the computers with 64-bit processors become present.

4.4.2. Instruction-Level Parallelism


A computer program is essentially a stream of instructions executed by a
processor. These instructions can be reordered combined into groups which
are then executed in parallel without changing the result of the program.
This is known as parallelism of instruction-level. Advances in instruction-
level parallelism dominated computer architecture from the mid-eighties to
the mid-1990s.
Modern processors have gradual pipes instruction. Each stage in the
pipeline corresponds to a different action that the processor is performed
on that instruction in that stage; i.e., a processor with pipeline stages N may
have up to N different instructions at various stages of completion. The
canonical example of a channelized processor is a RISC processor, with five
stages: instruction fetch, decode, running, memory access and write back.
Pentium 4 the processor had a 35 pipe stages.
In addition to instruction-level parallelism CAN pipelining, some
processors can post more than one instruction at a time. These are known
as superscalar processors. Instructions can be grouped together only if there
is no data dependency among them. Score boarding Y. Tomasulo algorithm
(Which it is similar but uses score boarding Renaming Registry) are two
of the most common techniques to get spoiled execution and instruction
running-level parallelism.
Parallel Computer Systems 61

4.4.3. Data Parallelism


Data parallelism is parallelism inherent Indies program, which focuses on
distributing the data across different computing nodes to be processed in
parallel. They make parallel sets of often similar sequences or functions of
the operations (not necessarily identical) that are performed on elements of
large data structures. Many scientific and engineering applications exhibit
parallel data.
A loop-carried dependence is dependence on a loop iteration output
one or more previous iterations. The loop-carried dependencies prevent the
parallelization of loops. For example, consider the following pseudocode
that computes the first Fibonacci numbers:
• PREV2: = 0
• prev1: = 1
• cur: = 1:
• CUR: = PREV1 + PREV2
• PREV2: = PREV1
• PREV1: = CUR
• While (CUR <10)
This loop cannot be made because the CUR parallelism depends on
itself (PREV1) and PREV2, which are computed in each iteration of the
loop. Since each iteration depends on the result of the above, they cannot be
performed in parallel. While the size of a problem gets bigger, the amount
of data-parallelism available usually does as well [20].

4.4.4. Task Parallelism


Task parallelism is the characteristic of a parallel program that “entirely
different calculations can be performed in the same or different sets of data”
[19]. This contrasts with the parallel data, where the same calculation is
performed in the same or different data systems. Task parallelism does not
usually scale with the size problem.

4.5. HARDWARE

4.5.1. Memory and Communication


The main memory in a parallel computer is either shared memory (shared
62 Parallel Programming

between all processing elements in a single address space), or distributed


memory (in which each element has its own process local address space).
The distributed memory refers to memory that is logically distributed, but
often involves a physically distributed memory, a combination of the two
approaches, where the processing element has its own local memory and
memory access in non-local processors. Accesses to local memory are
typically faster than accesses to non-local memory.
Computer architectures in which all the central memory can be achieved
with the same latency and bandwidth are known as Uniform memory access
Systems (UMA). Typically, only shared memory the system (where the
memory is physically distributed) can reach these. A system that does not
have this feature is known as Non-Uniform Memory Access Architecture
(NUMA). Distributed memory systems have non-uniform memory access.
Computer systems make use hideouts- small memories, fast located near
the processor storing temporary copies of memory value (near the physical
and logical sense). Parallel computer systems have difficulties with caches
that may store the same value in more than one location, with the possibility
of incorrect program execution. These computers require cache coherency
system, which keeps track of deposited securities and strategic purge,
thus ensuring correct program execution. Bus snooping is one of the most
common methods for keeping track of what values are being accessed (and
thus should be purged). Designing large systems, high-performance cache
coherency is a very difficult problem in computer architecture. Consequently,
computer architectures shared-memory does not scale as well as distributed
memory systems [21].
The processor-processor communication and the processor-memory
of may be implemented in hardware in several ways, including via shared
(or multiplexed any multi-ported) memory, crossbar switch shared bus, or
network interconnecting a myriad of topologies including star, ring, tree,
hypercube, Fat hypercube (one hypercube with more than one processor
at a node), or n-dimensional coupling. Parallel computers based on
interconnect networks need to have some sort of routing to allow passing
messages between nodes that are not directly connected. The medium used
for communication between processors is likely to be hierarchical in large
multiprocessor machines.

4.5.2. Classes of Parallel Computers


Parallel computers can be classified as rough as the level at which the
Parallel Computer Systems 63

hardware supports parallelism. This classification is broadly analogous


to the distance between nodes basic computing. These are not mutually
exclusive; for example, clusters of symmetric multiprocessors (SMPs) are
relatively common.

4.5.2.1. The Multicore Computing


A multicore processor is a processor that includes multiple execution units
(“Hearts”). These processors differ from superscalar processors, which can
publish multiple instructions per cycle from a current instruction (thread);
by contrast, a multicore processor can publish multiple instructions per cycle
from multiple instruction streams. Each based on a multicore processor can
potentially be as superscalar as well; that is, in each cycle, each base can
publish multiple instructions from a current instruction.
Simultaneous multithreading (of which Intel Hyper-Threading is the
best known) was an early form of pseudo-multicoreism. A processor capable
of simultaneous multithreading has only one execution unit (“base”), but
when that unit is idling execution (e.g., during cache miss), which uses
execution unit processing a second thread. Intel Base Y Base 2 families are
first multicore processor architectures Intel true. IBM Cell microprocessor,
designed for use in Sony Play station 3, is another prominent multicore
processor.

4.5.2.2. Symmetric Multiprocessing


A SMP is a computer system with multiple identical processors that share
memory and connect via a bus. Bus contention prevents bus architectures
from scaling. Consequently, SMPS generally covers more than 32 processors.
“Due to the small size of the processors and the significant reduction in
the requirements for the bandwidth of the bus reached by the large caches,
such SMPs are extremely cost-effective, provided that there is sufficient
bandwidth of memory.”

4.5.2.3. Distributed Computing


A distributed computer (also known as distributed multiprocessor memory)
is a distributed computing system memory in which the processing elements
are connected by a network. Distributed computers are highly scalable.
64 Parallel Programming

4.5.2.4. Cluster Computing


A cluster is a group of loosely coupled computers that work together closely,
for in some respects they can be regarded as a single computer. Clusters are
composed of multiple standalone machines connected by a network. While
machines in a cluster do not have to be symmetric, load balancing is more
difficult if they are not. The most common type of cluster is Beowulf cluster,
a cluster running on multiple set identical commercial available computers
connected with a TCP/IP Ethernet local area network. Beowulf technology
was originally developed by Thomas Sterling and Donald Becker. The vast
majority ofTOP500 supercomputers are clusters.

4.5.2.5. Massively Parallel Processing


A massively parallel processor (MPP) is a single computer with many
networked processors. MPPs has many of the same characteristics as we
clusters, but are generally larger, typically having “far more” than 100
processors [27]. In an MPP, “each CPU contains its own memory and copy
of the operating system and application. Each subsystem communicates
with the other via a high speed interconnect” [28].
Gene/L blue, the fastest in the world according to TOP500 ranking, is an
MPP supercomputer.

4.5.2.6. Grid Computing


Grid computing is a distributed form of parallel computing. Computers
that communicate overuse Internet to work on a given problem. Because
of the low bandwidth and extremely high latency available on the Internet,
grid computing typically deals only with embarrassingly parallel problems.
Many uses computing grid they have been created, of which SETI @ home
and folding @ Home are the best-known examples.
Most applications use grid computing middleware software that works
between the operating system and use that manages network resources and
standardize the software interface uses grid computing. The middleware’s
computing grid is the most common Berkeley Open Infrastructure for
Network Computing (BOINC). Often the software grid computing makes
use of “spare cycles,” performing computations at times when a computer
is idling.
Parallel Computer Systems 65

4.5.2.7. Specialized Parallel Computers


Within parallel computing, there are specialized parallel devices that remain
niche interest fields. While not domain-specific, they tend to be applicable
only to some kinds of parallel problems.

4.5.2.8. Reconfigurable Computing with Orders Field-Program-


mable Gate (FPGA)
Reconfigurable computing it is the use of an arsenal of field-programmable
gate (FPGA) as a co-processor general purpose computer. An FPGA is
essentially a computer chip that can rewire for a given task.
FPGAs can be programmed with Hardware description languages, for
example, VHDL or Verilog. However, programming in these languages can
be tedious.
Several vendors have created C to HDL languages that seek to emulate the
syntax and/or semantics C programming language with most programmers
are familiar. The best-known languages C to HDL-C are Mitrion, impulse C,
TELL ME C, Y Handel-C.
AMD’s decision to open its Hyper Transport technology to third-party
vendors has become the enabling technology for reconfigurable computing
for high performance. According to Michael R. D’Amour, CEO of DRC
Computer Corporation, “when I first walked into AMD, they called us
‘plinth stealers,’ and now they call us their partners.”

4.5.2.9. GPGPU Processing Units with Graphics


General purpose computing units on graphics processing (GPGPU) is
a fairly recent in computer engineering research trend. GPUs are co-
processors which are optimized heavily computer graphics process. The
process of computer graphics is a field dominated by the parallel operations-
particularly data linear algebra matrix operations.
In the early days, GPGPU programs used the normal graphics APIs for
executing programs. However, recently several new programming languages
and platforms have been built to do general purpose computation on GPUs
with both Nvidia YAMD launch programming environments CUDAYCTM
respectively. Other GPU programming languages are brook GPU, Peak
Stream, and Rapid Mind. Nvidia has also released specific products for
computation in their Tesla series.
66 Parallel Programming

4.5.2.10. Application-Specific Integrated Circuits


Various application-specific integrated circuit approaches (ASIC) have been
devised to deal with parallel applications.
Because an ASIC is (by definition) specific to a given use, it can be
fully optimized for that use. Consequently, for a given use, an ASIC tends
to overcome a general purpose computer. However, ASICs are created by
X-ray Lithography. This process requires a mask, which can be extremely
costly. A single mask can cost over a million US dollars (the smaller the
transistors required for the chip, the more expensive the mask will be.)
Meanwhile, performance increases in overtime computing in general use
(as described by Moore Law) tend to clean out these increases in only one or
two chip generations. The high initial cost and the tendency to be achieved
by computing the Moore’s-law-driven general purpose ASICs has made
unworkable for most uses computed parallel. However, some have been
constructed. An example is the petaflop RIKEN MDGRAPE-3 machine
which uses custom ASICs molecular dynamics simulation.

4.5.2.11. Vector Processors


A vector processor is a computer system or CPU which can execute the
same instruction on large sets of data. The “vector processors have high-
level operations that work on linear arrays of numbers or vectors. Vector
operation example is where vectors A, B, and C of 64-bit elements are
present [35].” They are closely related to the classification of SIMD Flynn
[35].
Cray computers became famous for their computers vector-process
in the 70s and 80s; however, both as vector processor-disappeared as the
CPU and computer system-generally filled. Modern processor instruction
sets include some vector processing instructions, for example, AltiVec and
Streaming SIMD Extensions (SSE).

4.5.3. Parallel Programming Languages


Concurrent programming languages, Libraries, APIs, Y parallel programming
models. They have been created for programming parallel computers. These
can usually be divided into those based on assumptions they make about
the underlying memory architecture-shared memory, distributed memory, or
shared distributed memory classes. Languages shared memory programming
communicates via shared variables memory manipulation. Distributed
Parallel Computer Systems 67

memory applications such as message passing. POSIX threads and OpenMP


are some of the most widely used shared memory APIs, whereas Message
Passing Interface (MPI) is the message-passing system most widely used
API. One concept used in programming parallel programs is future concept,
wherein a portion of a program promises to deliver a data required to another
part of a program at some future time.

4.5.4. Automatic Parallelization


Automatic parallelization of a sequential program to compiler is “holy
grail” parallel computing. Despite decades of work by compiler researchers,
automatic parallelization has been the only limited success [36].
Parallel programming languages mainstream remain either explicitly
parallel or (at best) partially implicit, in which a programmer gives the
compiler directories for parallelization. Some languages fully implicit
parallel programming like SISAL, Parallel Haskell, and (for FPGAs)
Mitrion-C, but these are the languages of the place that are not widely used.
The largest and most complex computer, the more can go wrong and
shorter average time between failures. Use of checkpointing is a technique
whereby the computer system takes a “snapshot” of use-a record of all
current resource allocations and variable states, related to Memory clear;
this information can be used to restore the program if the computer fails. Use
checkpointing means that the program needs to restart only its last checkpoint
rather than the beginning. For a purpose which can run for months, that’s
critical. The use checkpointing can be used to facilitate migration process.

4.6. APPLICATIONS
While parallel computers become larger and faster, it becomes feasible to
solve problems that previously took too long operation. Parallel computing
is used in a wide range of fields; Bioinformatics (For protein folding) to
economics (do simulation in mathematical finance). Common types of
problems found in parallel computing applications are:
● Dense linear algebra;
● Sparse linear algebra;
● Spectral methods (e.g., Cooley-Tukey Fast Fourier transform);
● N-body problems (for example Simulation of Barnes-Hut);
● structured grid problems (e.g., Lattice Boltzmann methods);
68 Parallel Programming

● Unstructured grid problems (such as infinite element analysis);


● Monte Carlo simulation;
● Combinational logic (for example Cryptographic techniques of
brute force);
● Graph traversal (for example Sorting algorithms);
● Dynamic programming;
● Branch and methods limit;
● Graphical models (e.g., detection of Hidden Markov Models and
building Bayesian networks);
● Finite automaton simulation.

4.7. HISTORY
The origins of true parallelism (MIMD) go back to Federico Luigi, Conte
Menabrea and his “sketch Analytical Engine Invented by Charles Babbage.
“IBM 1954 introduced the 704, with a project in which Gene Amdahl. It
was one of the main architects. It became the first commercially available
computer to use fully automatic floating point Arithmetic command. In
1958, researchers at IBM John Cocke and Daniel Slotnick discussed the use
of parallelism in numerical calculations for the first time.
Burroughs Corporation introduced in 1962 a four-processor computer
had access to memory modules 16 with a crossbar switch. In 1967, Amdahl,
and Slotnick published a discussion about the feasibility of parallel processing
at American Federation Conference societies of information processing. It
was during this debate that Amdahl’s Law was coined to define the limit
speeds due to parallelism.
In 1969, US Company introduced its first Multics system, a SMP
system capable of running up to eight processors in parallel. “C-mmp”, a
multiprocessor from the 70s at the Carnegie Mellon University was “among
the first multiprocessors with more than a few processors.” “The first bus-
connected multiprocessor with snooping caches was the Synapse N + 1 in
1984.”
SIMD parallel computers can be traced back to the 70. The motivation
behind early SIMD computers was to amortize the door delay processor
control unit Multi-statements excessive. In 1964, Slotnick had proposed
building a massively parallel computer for National Laboratory Lawrence
Livermore. His design was funded by the US Air Force, which was parallel-
Parallel Computer Systems 69

computing effort before the SIMD ILLIAC-IV. The key to its design was
a fairly high parallelism, with up to 256 processors, which allowed the
machine to work on large datasets in what would be known later as vector
process.
However, ILLIAC IV was called “the most infamous of Supercomputers,”
because the project was only one fourth completed, but took 11 years and
cost nearly four times the original estimate. When I was finally ready to run
its first real application in 1976, it was surpassed by existing commercial
supercomputers such as Cray-1.
Chapter 5

Parallelization of Web Compatibility


Tests in Software Development

CONTENTS
5.1. Web Compatibility Tests .................................................................... 72
5.2. Proposed Technique .......................................................................... 73
5.3. Results ............................................................................................. 76
5.4. Conclusion ....................................................................................... 76
72 Parallel Programming

The following section provides a brief theoretical framework on web


compatibility tests. Then, the proposed technique and the algorithms used
and the results of the implementation of the technique are shown. Finally,
the book outlines the conclusions and future work.

5.1. WEB COMPATIBILITY TESTS


A website compatible with all browsers means that it looks the same (or very
similar) in all of them. For this, the web compatibility tests are performed.
Some authors consider it sufficient if the site can be perceived by the user
with the same characteristics, in browsers such as Internet Explorer, Firefox,
Chrome, Opera, Safari, and Mozilla.
The problem is that not all browsers interpret HTML code and style
sheets (CSS) in the same way [11]. Some of these differences are so important
that they cause the malfunction of the site or the loss of display. One of the
goals of building a website is that it can be visited by the largest number of
people (and that they see it correctly). Therefore, it is very important that
the site works in the same possible browsers [10]. Compatibility Tests or
Cross-Testing [12] are tests that are performed on a particular application by
checking its compatibility with all Internet browsers and market. For this,
there are different techniques:
• Verification of compliance with standards (such as W3C or
ECMA): It consists of analyzing the graphics components of
the website in different browsers, to verify that they follow the
guidelines and specifications of the standards. This technique
is preferably used in intermediate stages of the development
process; it being understood that the actual design will be carried
out with the consideration of corresponding standards [13, 14].
• User interface testing (UI): It is the most common of the
techniques. Can be performed manually or with specialized
software (automated testing). The purpose of this type of test is to
check the visual content of the website at through the navigation
of its pages in the different browsers [15, 16].
• Analysis of the Document Object Model (DOM): It is a dynamic
technique which is to compare the behavior of a web application
in different navigators, identifying the differences as defects. The
comparison was by combining structural analysis of information
in the DOM together with a visual analysis of the page in question.
[17].
Parallelization of Web Compatibility Tests in Software Development 73

• Image Comparison: This technique is based on taking a screenshot


of the site in one type of browser, and compare it with another site
capture in another browser other than the first. If both images
match, then the site will be compatible between both browsers
[18].
Of all of them, the technique of comparison of images is chosen by
diverse authors [18–22] as the most suitable for testing web compatibility,
due to the ease of launching it.

5.2. PROPOSED TECHNIQUE


In a continuous development environment, automated functional testing
play a key role, since they allow rapid regression in small development
cycles for each change introduced in the product of software. With the
emergence of continuous integration servers, it is possible to detect the
introduction of defects in the source code as soon as possible [23]. Of all
incompatibility problems are detected by performing manual exploratory
tests on different browsers. These teams often rely on specialized software
to perform comparison of pictures.
At present, there are a large number of non-proprietary tools that allow
implementing algorithms for image comparison [24]. However, the problem
of these tools is that none can be coupled to the execution of tests functional
on websites [19].
The proposed technique is, therefore, to complement the testing process
functional on a website, with an algorithm of comparison of images
automated, and in this way accelerate the detection of incompatibilities
between different browsers.
The environment was developed using Java and consists of the following
components:
• A tool for interacting with different browsers.
• A tool to verify compliance with conditions (existence of
elements, expected behaviors, etc.)
• A tool for processing images.
• An algorithm for performing image comparison.
• A mechanism for running parallel tests on different Operating
systems and browsers.
• A reporting tool to display the results of the performance of tests.
74 Parallel Programming

The following facts were considered:


1. The tests are run through Maven. This tool is responsible to
compile the code and give the command to the main node to
begin its execution.
2. The main node is a hub of Selenium Grid, a server that allows
triggering test execution threads on multiple nodes in parallel.
East Server instance browsers configured on each node. The tests
are generated using Selenium Webdriver.
3. During the execution of the tests, screenshots are made and stored
on a hard disk to which all nodes have access synchronized. The
tests conclude when all functional validations are completed.
These validations are performed using TestNG.
4. The image comparison algorithm takes screenshots generated, and
begins to analyze them in pairs generating other images results,
which then stores them on the same disk where the catches are
found.
5. When the process is finished, the final report is generated with the
results of the execution of the tests in an HTML file.

5.2.1. Screen Capture Process


The interaction with the website is the responsibility of the Selenium tool
Web driver. The same test script is run in different browsers in parallel,
through a configuration of Selenium Grid that allows instances of a thread
of execution for each virtual machine (node) of the environment. As the tests
go from one page of the site to another, it is taking a capture of screen.
One of the challenges was to find the most get different screenshots
with the same dimensions. An alternative was to use a function given by the
Selenium Web driver tool to obtain screenshots of the site. But this caused
problems with the catches of the Mozilla Firefox, where they not only
contained what would be seen in the screen, but all the content of the page
that is accessed by performing the scrolling down/up. The second option
was to use an API of Java, which allows taking screenshots of the computer.
However, while the results were images with the same dimensions, they all
contained the toolbar of each browser, and at the time of images compatibility
tests always failed.
Finally, the algorithm is consisting of three steps:
1. Put the browser in “full screen” mode;
Parallelization of Web Compatibility Tests in Software Development 75

2. Get the full-screen dimension using the Java Toolkit API; and
3. Generate the screenshot through the Java Robot API.

5.2.2. Image Comparison Algorithm


The images taken are stored on the hard disk of a computer, with a
nomenclature that represents: the identifier of the test. The name of the page
in the one that took the catch; the Navigator; and the name and version of the
operating system of the node.
The tool to carry out the verifications of the expected results and
(TestNG), presents a mechanism for executing instructions upon completion
tests. This mechanism is defined by AfterSuite/AfterTest/AfterClass. It also
allows the execution of pre-test instructions
To perform the image comparison, the @AfterSuite annotation is used.
Within the method that contains this annotation, the call to the algorithm of
comparison of images.
The first step in this stage is to get all the images corresponding to a
certain node, using the browser and operating system specified in the name
of the images. Then, the images are obtained corresponding to a second
node. Once the images are obtained corresponding to the first two nodes, an
algorithm that takes an image of a node and searches for its corresponding
(according to the identifier and name of the page) between the captures of
the other node. Comparing images begins to be performed in pairs.
Finally, the image comparison algorithm is executed:
1. It is verified that both images have the same dimensions. If they
are not equal, the comparison is ignored, and a fault is obtained.
2. An array of pixels is obtained from the base image, and researchers
begin to traverse all pixels one by one, comparing them with their
reciprocal in the other image. If they are different, the position of
the different pixels is saved.
3. If there were no differences, the compatibility test has been
completed with success. In the event of a fault, a third image
resulting from the comparison is generated. The image is painted
with dots in each position corresponding to the different pixels.
This image is saved with a heat map format next to the pair of
catches that were analyzed.
Finally, the result of the comparisons can be observed in the final report.
The final report is generated with ReportNG, as a complement to TestNG.
76 Parallel Programming

For the compatibility testing section, an HTML report has been developed
that has the format which is accessed by clicking on each fault of the
comparison.

5.3. RESULTS
This technique has been implemented in a real environment of continuous
development, and the results demonstrate their efficiency. On the one hand,
the implementation of the algorithm was a very simple task to be carried
out by the development team. The technique has accelerated by 92% the
time of execution of the tests of compatibility and therefore the whole test
stage. A conventional compatibility test performed by a team member takes
approximately 10 minutes depending on the page. A compatibility test on
the same page with the technique proposal takes approximately 1 minute,
required to observe the heat map with the results. A decrease in total time of
the release process of each version of the site, from the implementation of
the proposed technique, is observed.

5.4. CONCLUSION
The proposed technique is an initiative to automate the compatibility through
an image comparison algorithm. To validate it, it has been implemented in
a large company that works with continuous development of software. The
results show that the tool accelerates the testing, through the automation of
web compatibility tests.
On the one hand, test execution times decreased by 82%, and this also
reduced the total time of the release process of the site versions. In addition,
it allows generating reports in HTML format very easy to understand.
With the implementation of this approach, comprehensive visualization
of components in search of incompatibilities about the different browsers. In
this way, manual tasks are simply reduced to observe the reports. Likewise,
it allows detecting more defects that can be passed when the tests are
performed manually.
However, there were drawbacks in comparing sites that contain different
advertisements and/or emerging elements, producing false in the tests.
As future work it is proposed, firstly, to improve the algorithm
implemented in order to avoid comparisons in non-site elements web (such
as ads). Secondly, tools will be used virtualization (such as the Docker tool)
to running functional tests. Finally, techniques that can be implemented
Parallelization of Web Compatibility Tests in Software Development 77

to avoid work required to observe the results of way to make the process
fully automated. Also, algorithm to avoid alarming scenarios should be
investigated because of small differences between the browsers.
Chapter 6

Theoretical Framework

CONTENTS
6.1. Definition of Process......................................................................... 80
6.2. Analysis of Key Processes.................................................................. 84
6.3. Review Process ................................................................................. 88
6.4. Statistical Tools ................................................................................. 95
6.5. Methodological Framework .............................................................. 99
80 Parallel Programming

6.1. DEFINITION OF PROCESS


A process can be defined as a series of activities, actions or jacks interrelated
decisions aimed at a specific result as a consequence of the added value of
each of the activities carried out at different stages of the process (Figure
6.1).1

Figure 6.1: Outline of a process.


In general, any process can be represented by a flow chart. Also, its
performance should be measurable.

6.1.1. Low Maintenance Process Control


It is arguable that a process is “under control” when you have controlled
variability. That is the cause of variability is known, and this is within
the parameters (limits); for this process must be maintained understood,
documented, and measured.

6.1.2. Process Understanding


It is considered that a process is understood when everyone involved know
the following:
• What is the purpose and basic description of the process;
• Who are your customers;
• Who are your suppliers;
• Who you own;
• What performance is being obtained.

1 Strategic Process Management, Juan B. Roure, Manuel Monino, Miguel Ro-


driguez, p.9.
Theoretical Framework 81

6.1.3. Documented
Some of the aspects that should include documentation of a process are as
follows:
a. Process flow diagram including possible interactions with other
processes.
b. Performance measures of the different phases of the process
(usually stands PPM, abbreviation used Performance
Measurement Process).
c. Process owner’s name.
d. Members of the management team process.
Narrative of the process steps must be clearing, concise, operational, and
communication capabilities, so that it is useful for training and
analysis. Besides the flowchart are useful the use of checklists,
performance criteria and the classification of inputs and outputs
of the process2.

6.1.4. Measured
The process must be measured to know their level of performance relative
to the expectations of their internal or external customers, and we act
accordingly.
Measures performance of a process, or PPM, should be a clear indicator
of the health of this. Such measures have to be few and very representative
of the “health” of the process. They should be an indicator of the value
added, both business operations and to customer satisfaction.
It is also important to establish a hierarchy among the metrics used
throughout the process, so that, ultimately, we can ensure the satisfaction of
customer requirements.

6.1.5. Process Design


Process design for a simple methodology will follow that of the stakeholders/
customers/users and the needs and expectations of the same3.

2 Strategic Process Management, Juan B. Roure, Manuel Monino, Miguel Ro-


driguez, Pp.27–29.
3 Process Management; Braulio García Mejía MD-MSP, Fifth Edition, ECO-E
Editions (2007).
82 Parallel Programming

6.1.6. Stakeholders/Clients/Users
For mapping of processes of an organization should identify stakeholders,
clients’ users.
• Interest Group: all those who have an interest in an organization,
its activities, and achievements. These can include customers,
partners, employees, shareholders, owners, management, and
legislators4.
• Customer/user: Person who regularly uses the services of a
professional or business
To identify stakeholders/customers/users Unit/Service must perform a
work session starting with an initial brainstorm. Then for each of the possible
groups, clients’ users a debate in which we will try to clearly identify each
and relate to the following points were made: services required, needs that
they would in the view of the Unit or based on previously conducted studies
and analysis, expectations that customers might have about our services or
future, etc.

6.1.7. Needs and Expectations of Customers/Users


We understand by “needs” those services that are required by customers/
users.
We understand by “expectations” features or benefit those customers/
users expect to have services that are demanded.
The key processes aim to meet the needs of customers/users.
“Expectations” mark the level of customer satisfaction. Depending on how
expectations of customers/users are met will get a higher or lower degree of
satisfaction of them.

6.1.8. Identification Services


At this point, they will be identified and analyzed each of the services
provided by the Unit with identifying characteristics thereof and stakeholders,
customers/users to whom it is intended. The characteristics of each moment
that is supplied and all those points that could contribute to the definition
thereof: for each of the services identified in the previous section a small tab
that will be identified shall be drawn5.
4 Competitive advantage; Michael Porter, Editorial Continental (1997).
5 Process Management; Braulio García Mejía MD-MSP, Fifth Edition, ECO-E
Editions (2007).
Theoretical Framework 83

6.1.9. Needs and Expectations


Finally, work contrast between the identified needs and expectations for
stakeholders/clients/users and which can be covered by services that have
been identified and described above is performed.
Work on processes is aimed at improving and this section is a first
reflection on what are the points where improvement is needed and therefore
concentrate the efforts of the Unit or Service.

6.1.10. Sheet Stakeholders/Clients/Users


Previous work will be reflected in a chip client/user. The tab indicates the
following points:
• Identification stakeholder/customer/user.
• Current services.
• Studies or contacts made to identify their needs and expectations.
• Procedures currently used to measure the level of satisfaction
(degree expectations are met).
• Needs that are or could be covered from the unit.
• Expectations for each of the stakeholders/customers/users.6

6.1.11. Process MAP


A process is a set of interrelated activities and resources that transform
inputs into output elements providing added value for the customer or user.
Resources may include personnel, finance, facilities, technical equipment,
methods, etc.
The purpose is to have the entire process is to provide the customer/user
a good service that meets your needs, to meet your expectations with the
highest level of performance in cost, service, and quality.
One method is the specific form of completing a process or a part thereof.
Processes the desired results depend on the resources, ability, and
motivation involved in it, while the procedures are just a series of elaborate
instructions for you to follow a person or group of persons7.

6 Process Management; Braulio García Mejía MD-MSP, Fifth Edition, ECO-E


Editions (2007).
7 Strategic Process Management, Juan B. Roure, Manuel Monino, Miguel Ro-
driguez.
84 Parallel Programming

A process map is a plot of value; a graphical inventory of the processes


of an organization.
The process map provides a global-local perspective, forcing “position”
each process regarding the value chain. At the same time, it relates the
purpose of the organization with the processes that manage, also used as a
tool for consensus and learning.

6.1.12. Key Operational Processes


They are those directly linked to the services provided, and therefore oriented
client/user and requirements. As a result, the outcome is perceived directly
by the customer/user (focus on bring you value).
These processes generally several functional areas involved in
implementation and are those that can lead to greater resources.
In summary, the key processes are the sequence value-added service
from understanding the needs and expectations of the customer/user to the
service, and its ultimate goal the satisfaction of customer/user8.

6.1.13. Strategic Processes


Strategic processes are those established by senior management and define
how the business operates and how customer value/user are created and the
organization.
Support decision-making on planning, strategies, and organizational
improvements. They provide guidance, performance limits to other
processes.

6.1.14. Support Processes


The supporting processes are those which serve to support the key processes.
Without them would not be possible or the key strategic processes. These
processes are, in many cases, can be achieved decisive for the objectives of
the processes aimed at meeting the needs and expectations of customers/
users.

6.2. ANALYSIS OF KEY PROCESSES


This is the initial and most delicate management stage processes. In the
same aims comminuting processes identified, compiling a record for each
8 Time and motion study, Fred Meyers, Pearson Education Mexico (2000).
Theoretical Framework 85

in which shall include, as basic elements, inputs, outputs, and process


indicators or control and results9.
The analysis of the map of processes previously developed. For each of
the identified processes, it is part of the service delivery time to the customer/
user. Between now and picking the process back will be identified steps,
tasks, inputs, and outputs, responsible, etc., that have been made.
It is important that this work is done in detail and dedicating the time
necessary.
The analysis of each process culminates with the preparation of
flowchart, process details, identification of monitoring indicators and results,
and finally the organization of the documentation.

6.2.1. Flowchart
The flowchart is one of the most widespread processes analysis tools. The
graphical view of a process facilitates a comprehensive understanding of
it and detection of improvement points. The flow diagram is the graphical
representation of the process. There is an extensive bibliography and
standards for the preparation of flowcharts. However, it is advisable to use
some very simple concepts that are easily assimilated by all components of
the unit or service. Once developed the flowchart, it can be used to identify
opportunities for improvement or simple adjustments and, on it, perform
process optimization. The flow chart is used in these cases to visualize the
sequence of changes to run.
The flow chart should be drawn up while describing the process is
done; thus the work of the commission and understanding of the process is
facilitated. It should start by setting the starting point and end of the process.
Subsequently, they identify and classify the different activities that make up
the process, the interrelationship between all the areas of decision making,
etc. This whole network is represented by the predefined symbols according
to the type of diagram.
Flowcharts use a number of predefined to represent the flow of
operations with their relationships and dependencies symbols. The format
of the flowchart is not fixed; there being different types employing different
zymology. An important aspect before making the flowchart is to establish
how deep it is intended in the description of activities, always trying to
maintain the same level of detail.

9 Time and motion study, Fred Meyers, Pearson Education Mexico (2000).
86 Parallel Programming

6.2.2. Identification and Chips Indicators


Identifying indicators is another complicated and important in the orientation
process management task.
Measurement is required of management. What is not measured
cannot be managed and, therefore, cannot be improved. This applies to any
organization, including public institutions, municipalities, and government
agencies in general.
An indicator is associated with a characteristic magnitude (the result of
the process, activities, structure, etc.) which allows, through its measurement
and comparison successive periods, periodically evaluate said characteristic
and verifying compliance the targets set10.
Depending on the nature of the object to be measured, one can distinguish
the following types of indicators:

6.2.2.1. Indicators of Results


Directly measure the degree of effectiveness or direct impact on customer/
user. They are the most related to the purposes and tasks of the unit itself or
Service11.
Other names that are known indicators:
• Objectives indicators;
• Impact indicators;
• Indicators of effectiveness, (effectiveness and efficiency);
• Satisfaction indicators.
Examples of indicators:
• Number of people attending exhibitions depending on the number
of inhabitants.
• Percentage of resolved cases per month.
• Degree of coverage of the information campaign in the media
centers.
• Satisfaction of the results of citizens with a service.

10 Fundamentals of Quality Control and Improvement, Amitava Mitra, New Jer-


sey, (1998).
11 Fundamentals of Quality Control and Improvement, Amitava Mitra, New Jer-
sey, (1998).
Theoretical Framework 87

6.2.2.2. Process Indicators


Value aspects related to the activities. They are directly related to the
approach called Process Management. They refer to measurements on the
effectiveness and efficiency of the process. Usually related measures on
cycle time, error rate or index queue.
Examples of process indicators may be:
• Resolution time record.
• Queuing time.
• Percentage of requests for opening licenses subject to
environmental assessment.
• Waiting list in days.
• Indicator queue records.
• Extent of use of computer equipment.
The working group will identify two or three indicators at most per
process. The limitation is imposed by the requirement for subsequent
monitoring of indicators. If a process has already been used in more
indicators without incurring any extra workload for service or unit, it is wise
to keep them.
Given the complexity and work involved monitoring indicators, it is
appropriate to reflect on the indicators to be defined for each process will
be. Collecting information for calculating indicators can be an arduous
and difficult TARE, so simple indicators will be contemplated and, at the
same time reflect the progress and results of the processes. For the choice of
indicators should take into account those already defined in the corresponding
indicators Service Charter can simultaneously serve as process control and
charter service12.

6.2.3. Management of Documentation


The documentation management is one of the most important in Quality
Systems aspects and provides a clear indication of the level of organization
of a Unit or Service.
Documents: Writings on susceptible consisting reliable data or be used
as such to try something and, therefore, for use as evidence.

12 Fundamentals of Quality Control and Improvement, Amitava Mitra, New Jer-


sey, (1998).
88 Parallel Programming

Document Management: Document management is the use of


technology and procedures enabling unified management and access to
information generated in the organization:
• Personnel Unit or Service;
• Customers/users and suppliers.
Key benefits of these practices are:
• Establish a new shared workspace Unit/Service – Client/User.
• Increase the value of information of the Organization.
• Avoid duplication and search times inside information.
• Increase the quality of service and productivity.
The process management facilitates the analysis of the documentation
generated by them and allows units or services can address a document
management organized according to their own processes |6 |.

6.3. REVIEW PROCESS


According to James Harrington, in his book improvement of processes, the
review process is the process by which data and indicators are reviewed
and proceed to make the necessary modifications to improve the results
thereof. Depending on the complexity of the review process that will be
more or less complex review. During the review process, it is very important
to note that these are related to other processes of the organization and that
a simple change in a process could involve significant problems that are
related processes. That is why changes in the processes should be treated
with extreme caution and transparency,

6.3.1. Modernization Process


Modernization involves reducing waste and excesses, attention to every
single detail that can lead to improved performance and quality.
There are 12 tools of modernization, the same as will be detailed below13.

6.3.1.1. Bureaucracy Decline


Bureaucracy implies an impediment to organized execution, systemic,
and wide business concepts and methods of improving processes, often
the bureaucracy associated with departments that have large numbers of
13 Improving business processes, James Harrington, McGraw-Hill, Bogota,
1994.
Theoretical Framework 89

officials who are fighting for their individual progress and their areas by
creating jobs and rigid useless and incomprehensible rules.
Bureaucracy generates excessive paperwork in the office. Heads employ
between 40% and 50% of their time writing and reading material related to
work; 60% of the time all the administrative work is used in activities such
as review, archive, locate, and select information, while only 40% is spent
on important tasks related to the process.
The downside of bureaucracy is innumerable, so we must evaluate and
minimize delays to eliminate them. Excessive bureaucracy can be identified
by asking the following questions:
a. Unnecessary checks and balances are made?
b. Do you test or approve the activity work for someone else?
c. Is it required more than one signature?
d. Multiple copies are needed?
e. Are copies are stored without any apparent reason?
f. Are copies are sent to people who do not need the information?
g. Are there individuals or entities that hinder the effectiveness and
efficiency of the process?
h. Unnecessary correspondence is written?
i. Do you regularly prevent existing organizational procedures
effective, efficient, and timely execution of tasks?
j. Should anyone approve what is already approved?
Management must lead an attack against excessive bureaucracy that
has infiltrated the systems that control an entity. Many activities do not
contribute to the content of the process output. These are for information
only and protective purposes, and should be made every effort to minimize
them14.
The attack against bureaucracy must begin with guidelines to inform
management and employees that the company will not tolerate unnecessary
bureaucracy; each signature of approval and review each activity must be
justified financially; the total cycle time reduction is a key purpose of the
enterprise, and that any non-value added activity, and that slows the process,
will target for elimination.

14 Improving business processes, James Harrington, McGraw-Hill, Bogota,


1994.
90 Parallel Programming

Responsible heads of bureaucratic activities should justify the costs and


delays related to the activity. Often the boss will try to put aside the issue
saying: “Only spending two or three seconds to sign the document. This will
not cost the company anything. “The answer to such a remark would be,
“Well, if you do not read the document would not have to sign it.”

6.3.1.2. Eliminating Duplication


IF in a process the same activity is carried out in different parts of the process
or perform different individuals within it, we must consider whether both
activities are necessary. Often, the same information is generated or some
such in different parts of the process, sometimes by different organizations.
This not only adds to the total cost of the process but also accommodates the
possibility of conflicting data that unbalance the process. It often happens
that a department within the process produces certain information provider
generates you similar information that is generated to a different department.
When you cannot sustain duplication or confusion generated when there
are differences being two data sources. The integrity of these is of great
importance for our processes within the company. We cannot hold duplicate
data sources to be reviewed; mutually integrity should seek a single source.
There are cases of redundancy because the working groups do not know
that the activity has already been performed or processes are not designed
to link user organizations with the previous output. This gives us the
opportunity again to improve the overall effectiveness of the corporation.

6.3.1.3. Assessment of Value Added


The added value is an essential principle in the modernization process. The
technique is simple and very effective direct. To understand the importance
of the tool, initially, we explore the concept of added value through simplified
analogy regarding the manufacture of a product15.
The added value can often be measured in economic factors mainly
in manufacturing companies, in the case of the Business Services added
value can be measured in terms of compliance with the characteristics or
requirements that he has proposed either as a measure of quality or as part
of their needs will be a review of each of the activities that are part of the
selected processes for change and are classified the following groups for
added value analysis:
15 Improving business processes, James Harrington, McGraw-Hill, Bogota,
1994
Theoretical Framework 91

 Activities that add value to the customer;


 Organizational activities that add value;
 Activities that do not add value.
Activities that add value Customer (VAC) are those that allow meet
customer expectations.
Organizational activities that add value (VAO) are the activities required
by the business or organization to obtain the product or service. These
activities have the characteristic of not contributing to business requirements;
these activities are generally legal regulations, controls, among others.
Activities that do not add value or no value added (SVA) are activities
that are not necessary for l obtaining the product or service being eliminated
and do not produce an adverse effect on functionality or quality of product
or service.
To this will be done using the following procedure
The results of this review will move into a new classification for what
will be done using an array of added value.
The added value matrix is a tool used in process improvement activities
to analyze the process from two dimensions:
• Adds value to the process or not;
• It is not necessary or in the process.
Combinations of these two dimensions determine the quadrants based
on which the following may be established.
 Add value and improve if necessary.
 Optimize if not add value but it is necessary.
 Transfer to another area if you add value but not necessary.
 Delete if not add value and not necessary.
We must take into account that not all activities that do not provide
added value are unnecessary; these may be support activities, control or
regulatory or legal issues16.

6.3.1.4. Simplification
The increasing complexity creates increasing difficulties in all parties,
as the activities, decisions, relationships, and essential information

16 Improving business processes, James Harrington, McGraw-Hill, Bogota,


1994.
92 Parallel Programming

becomes more difficult to understand and manage. In an era of rapid and


increasing complexity, it is basic committed actively and continuously with
simplification to counter it.
When simplifying the business processes applies, evaluate all elements
in an effort to make them less complex, easier, and require less data to other
elements. When the organization does ongoing simplification efforts are an
important part of the management process, encourages the difficulties and
poor performance17.
List of activities to be evaluated for simplification:
• Duplication and/or fragmentation of tasks;
• Complex flows and bottlenecks;
• Memos and other correspondence;
• Meetings;
• Similar reduction activities;
• Reducing the amount of manipulation;
• Delete unused data;
• Remove copies;
• Use standard reports.

6.3.1.5. Cycle Times Reduction Process


The company and its critical processes should follow the rule that time
is money. The process generates a time consuming valuable resources.
Prolonged courses hinder product delivery to our customers and increase
storage costs18.
Some typical to reduce the cycle time forms are:
• Activities serial versus parallel activities;
• Change the sequence of activities;
• Reducing interruptions;
• Improve regulation time;
• Reduce movement of the output;
• Analysis of placement;
• Priority setting.
17 Improving business processes, James Harrington, McGraw-Hill, Bogota,
1994.
18 Time and motion study, Fred Meyers, Pearson Education Mexico (2000).
Theoretical Framework 93

6.3.1.6. Mistake Proofing


This tool helps us to hinder improper performance of an activity.
It is very easy to make a mistake, we must ensure it difficult to commit,
and then the most common methods for testing errors occur:
• Paper use different control for different jobs;
• Use computer programs to verify writing;
• Encourage effective communication asking subordinates to
repeat instructions;
• Use the cross-checking when it intends to conduct reviews;
• It will help questioning: “If I wanted to harm, how would this
work?”

6.3.1.7. Efficient Use of Equipment


Capital goods environment and have the company should have effective use
to improve overall performance; for this ergonomic studies must be obtained
to improve processes, driving efficiency of people also called technical
vitality; Training and education are an investment made in staff and in
organizing and paying high dividends in terms of loyalty and performance.

6.3.1.8. Simple Language


In modern enterprises written material is difficult to understand, with
the file where are stored, the writing is highly redundant, indirect vague
and complex, then we will detail some important factors that simplify
communication within the company:
• Determine understanding that has the audience that written
communication is directed, so that the message to be spread is
easy to understand.
• It should ask if the audience is familiar with the technical terms
and abbreviations.
• The acronym should be used carefully, repeating the phrase if
necessary and occupy more space to the abridged version.
• So that the writing is good need not be extensive.

6.3.1.9. Standardization
You should choose an easy way to do an activity, ensuring that those involved
94 Parallel Programming

in the process take him just the same way every time.
This standardization is very important because it verifies that all
current and future workers use best practices relate to the use of successful
procedures, for which they should be:
• Realists;
• Defining responsibilities;
• Establish limits of authority;
• Cover emergencies;
• Not be exposed to different interpretations;
• Easy to understand.
These procedures often include a flow chart and instructions.

6.3.1.10. Alliance with Suppliers


The output of the process depends on the quality of the input or input that
receives the process; the overall performance of any process improvement
increases when the input from your provider; processes depend on outsiders
who provide information and/or ideas, the quality of input should answer the
following questions:
• Really you need an input process?
• Enter this in the right place?
• Is correct regulation time?
• The format in which it is received is the best?
• It receives more than what is needed?
And process a provider of products and services for customers, people
that provide an input to their processes becomes their suppliers, etas
relationship both sides have responsibilities19.
The customer should never ask for more than you need or more than
what you use as everything has a cost to the organization.

6.3.1.11. Improving the General Framework


It is used when the tools above have not worked, define significantly change
processes; it is an effective means for substantial change in the embodiment
of business as they are created:
19 Improving business processes, James Harrington, McGraw-Hill, Bogota,
1994.
Theoretical Framework 95

• New concepts;
• New view of the process;
• New factors for success;
• Development of new options;
• Overcoming organizational barriers.

6.3.1.12. Automation and/or Mechanization


It is to apply tools, equipment, and computers to routine activities and
time-consuming to deliver cargo to employees. In developing flow charts
can detect several activities that can be automated, to decide that you can
automate look for:
• Repetitive operations;
• Operations would improve when people are physically isolated
communicate faster.
• Operations for which there are standardized components of
computer systems.
Computer systems, internet communication, social networks can be
used to facilitate communications between customers and the company OS.

6.4. STATISTICAL TOOLS

6.4.1. Histogram
Histograms are a statistical tool that allows a variable graphed using bars.
The values found on the vertical axis represent the frequencies of the
values shown on the horizontal axis. For that, classes are merely intervals in
which they will find the observations are set, so we would have to:
The base of each rectangle represents the interval width and height is
determined by the frequency (Figure 6.2)20.

20 Histogram – http://www.ucv.cl/web/estadistica/histogr.htm.
96 Parallel Programming

Ejempo de Histograma
50

40
Frecuencia

30

20

10

0
-13,5 -9,0 -4,5 0,0 4,5 9,0 13,5
C1

Figure 6.2: Histogram.

6.4.2. Pareto Chart


According to the article given in http://es.wikipedia.org/wiki/Diagrama_de_
Pareto, the Pareto Chart, also called Distribution 80–20 or ABC, is a graph
where the data is organized in descending order from left to right as bars. It
is based on the Pareto principle (vital few and trivial many) that is that there
are few issues that are of vital importance and many problems that can be
labeled with lower priority.
The diagram facilitates comparative study of many processes within
industries or commercial enterprises as well as social or natural phenomena,
as seen in the example of the graph at the beginning of the article.
It must take into account that the distribution of the effects and their
possible causes is not a linear process, but 20% of the total causes make
them originated 80% of the effects.
According to the article published in Pareto http://www.slideshare.net/
ligoneLiga/diagrama-de-pareto-12711505 the steps to perform a Pareto
chart are:
1. Determine the problem or effect to study.
2. Investigate the factors or causes of this problem and how to
collect data concerning them.
3. Note the size (e.g., EUR, number of defects, etc.) of each factor.
In the case of factors whose magnitude is very small compared
Theoretical Framework 97

to other factors include them in the “Other” category (e.g., see


Figure 6.3).

Diagrama de Pareto de C1
160

140 100

120
80

Porcentaje
100
60
C2

80

60 40

40
20
20

0 0
C1 Corte Pulido Pegado Lacado Otro
C2 60 45 20 11 6
Porcentaje 42,3 31,7 14,1 7,7 4,2
% acumulado 42,3 73,9 88,0 95,8 100,0

Figure 6.3: Pareto chart.

6.4.3. Control Boards


Minitab support 15 defines: Tables Control, also known as Shewhart charts
or table’s behavior processes are tools used in statistical process control
used to determine when a process is in control.
A table consists of control:
• Points representing observations of a quality characteristic in a
sample taken at different times.
• The statistical average calculated with all the observations taken
whose value represents the centerline.
• Upper and lower limits control. Typically they are plotted on
values representing three times the standard deviation from the
center line.
Control charts are used to track statistics for a while and process for
detecting the presence of special causes.

6.4.3.1. Structure of a Control Chart


Special causes lead to variations that can be detected and controlled.
98 Parallel Programming

Examples include differences in vendor, shift or day of the week. The


common cause variation, on the other hand, is inherent in the process. A
process is under control when only common causes – no special causes –
affect the process output.
A process is under control when the points are within the limits of the
control limits, and points do not show non-random patterns.
When a process is under control, you can use the graphical control to
estimate the process parameters necessary to determine the capacity.
Control charts for variables for subgroups graphed data statistics
continuous measured such as length or pressure, for subgroup data. Control
charts for individual observations variables, weighted time diagrams, and
graphs also illustrate multivariate measurement data. Control charts plotted
attribute count data such as the number of defects or defective units.
Control charts for variables for individuals graphed data statistics
continuous measured such as length or pressure, for individual data.
The attribute control charts graphed count data such as the number of
defects or defective units have a structure similar to the graphics control
structure variables except graphed statistics and counting data instead of
measurement data. For example, you can compare products with reference
to a standard, and classify them as defective or not defective. The products
can also be ranked by number of defects.
As with the graphics control variables, a statistical process, as the
number of defects, is plotted versus the number of samples or time control
charts attributes are:
Control Charts for defective, you can compare one product with a
standard and classified as defective or not defective. For example, certain
length of wire meets the requirements of resistance or not fulfilled. Control
charts for defective are as follows:
• P chart that plots the proportion of defective in each subgroup.
• NP graph plotting the number of defects in each subgroup.
Control Charts for defects when a product is complex, a defect does not
always imply that the product is defective. Sometimes it is easier to classify
a product by its number of defects. For example, you could count the number
of scratches on the surface of an artifact. Control charts for defects are as
follows:
• Graph C, which plots the number of defects in each subgroup.
Use the chart C when the subgroup size is constant.
Theoretical Framework 99

• U graph which plots the number of defects per unit sampled in


each subgroup. Use the chart U when the subgroup size varies.
For example, if it were counting the number of defects on the inner
surface of a television screens the graph C graficaría the actual number of
defects, while graphic U graficaría the number of defects per square inch
sampled (Figure 6.4).

Gráfica Xbarra de C5

UCL=6,792
6,75

6,50
Media de la muestra

6,25
_
_
X=6,120
6,00

5,75

5,50
LCL=5,447

1 2 3 4 5 6 7 8 9 10
Muestra

Figure 6.4: Graphic 7 sample letter of control.

6.4.4. Correlation
Probability and statistics, the correlation indicates the strength and direction
of a linear relationship and proportionality between two statistical variables.
It is considered that two quantitative variables are correlated when the values
of them vary systematically with respect to the values homonyms other: if
we have two variables (A and B) there is correlation if increasing values of
A do also of B and vice versa. The correlation between two variables does
not, by itself, no causal.

6.5. METHODOLOGICAL FRAMEWORK


In developing the methodological framework fieldwork, it was carried out
by verifying the processes that were detailed in the above, together staff
time required to perform their respective duties was taken.
100 Parallel Programming

Since in many cases the times are large time series shown in minutes or
hours depending on the case.
Once collected the information regarding time, we proceeded to take
data on the costs of each of the activities in order to establish approximate
costs versus the cycle time of each of the processes resulting in a diagram
Cycle Time vs. costs. The costs of these activities contribute to the total cost
of the service and how these activities provide value to the organization for
the provision of full service.
Additionally, it was analyzed each process activities in order to establish
whether or not businesses providing customer value, organization or if in
fact, this does not add any value to the achievement of the service.
Once obtained the encode chart of the processes causes, effects, and
solutions were reviewed each of the problems encountered so that it
will improve the productivity of the company following the steps in the
methodology Harrington.
Finally, as part of the model, they have been established to follow the
proposed flow diagrams for the execution of each of the processes that shape.
Chapter 7

Modular Programming

CONTENTS
7.1. Programs And Judgments ................................................................ 102
7.2. Modularity Linguistics..................................................................... 102
7.3. Normal And Pathological Connections ........................................... 103
7.4. How To Achieve Minimum Cost Systems ........................................ 103
7.5. Complexity In Human Terms........................................................... 104
7.6. Cohesion ........................................................................................ 116
102 Parallel Programming

7.1. PROGRAMS AND JUDGMENTS


A computer program is a system with components and interrelationships.
We try to determine what components and how they relate?
First, we define the concepts of program and computer program.
A program can be defined as a “precise and orderly sequence of
instructions and groups of instructions, which, in all, define, describe or
characterize the performance of a task.”
A computer program is simply a program which, possibly via a
transformation, tells the computer how to perform a task.
At the most basic level, we note that a computer program consists of
statements or instructions. These instructions are arranged in a sequence.
We can give instructions, and the computer program identifies and sequence
components as a relationship. This is the classic or “algorithmic” vision of
a program.
In this vision, we have the effort to develop a program emphasized on
finding a solution method and its transcription statement. For the purposes
of our study, we will consider a sentence, that is, a line of code that the
programmer writes.

7.2. MODULARITY LINGUISTICS


Prior to the definition of program module, we will make some observations.
Suppose a set of statements: such as in Figure 7.1 are represented, then
we say A1 and A2 are the limits or aggregate of sentences called A. The
sentence B is within A, and C is outside A.

Figure 7.1. The sequence of statements.


Modular Programming 103

The sentences are in the order, and they will enter in a compiler. This
order is lexicographical order known as of a program. For our study, the
lexicographical term always mean “as written” or the order in which the
sentences of a program listing of a compiler appear. Returning to the
example, we say the sentence C is lexicographically after A2.
It is important to distinguish the lexicographical order almost always
does not correspond to the order of execution of judgments.
One purpose of the boundary elements (A1 and A2 in the example) is
to check the extent to which identifiers are defined and associated objects
(variables).
We are now able to define the term program module or simply module:
A module is a lexicographically contiguous sequence of sentences,
enclosed between boundary elements, and having a set of identifiers.
In other words, a module is a group of contiguous sentences having a
single identifier by which they are referenced.
This definition is generally within the same and can find special
implementations of specific language: such as “paragraphs,” “sections,” and
“applets” COBOL “functions” C, “procedures” Pascal, etc.
A programming language includes a certain type of module that can
be executed only if specific linguistic constructions are performed by the
defining characteristics and activation of these modules.

7.3. NORMAL AND PATHOLOGICAL CONNEC-


TIONS
We say between two modules, normal connection occurs when the connection
at the module level identifier invoked.
In contrast, if in the inner module, the connection is made to an
identifier of an internal element of the invoked module, we will say that is a
pathological connection.

7.4. HOW TO ACHIEVE MINIMUM COST SYSTEMS


When it comes to a design problem, for example, a system can be developed
in a couple of weeks, there should be no major problems, and the developer
can have all the elements of the problem “in mind” at a time. When working
on large-scale projects, however, it is difficult for one person to be able to
104 Parallel Programming

carry all tasks and keep in mind all items at once. Successful design is based
on an old principle Divide and conquer. Specifically, we will say that the
cost of implementing a computer system can be minimized when it can be
separated into parts:
• Manageably small;
• Solvable separately.
Of course, the interpretation varies from person to person. On the other
hand, many small attempts to partition into parts derived systems with
increased deployment times. This is primarily due to the second point in
many separate systems to implement part A.
Similarly, we can say maintenance costs can be minimized when parts
of a system are:
• Easily relatable to the application;
• Manageably small;
• Correctable separately.
Often the person making the modification is not designed the WHO
system.
It is important that the parts of a system are manageably small in order
to simplify maintenance. Work to find and correct an error in a “piece” 1,000
lines of code is far superior to do with parts of 20 lines. Not only you reduce
the time to locate the fault but also if the change is very cumbersome, you
can rewrite the piece completely. This concept of “disposable module,”
which you have been used successfully many times.
Moreover, to minimize maintenance costs, we must ensure that is
independent of another. In other words, we must be able to make changes to
the module A without introducing unwanted effects in the module B.

7.5. COMPLEXITY IN HUMAN TERMS


In the previous section, we perform an analysis on the impact of complexity
in costs, and how to manage through the problem subdivision minor
problems. We saw that many of our problems in design and programming
are due to the limited capacity of the human mind to deal with complexity.
The question now is:
• What complex for humans?
In other words:
Modular Programming 105

• What aspects of system design and complex programs are


considered by the designer?
And by extension:
• What can we do to make less complex systems?
First, we can say that the size of a module is a simple measure of
complexity. Generally, module 1000 sentences are more complex than one
of 10. Obviously, the bigger problem is there are more complex than other
sentences.
For instance, decision sentences are one of the first factors contributing
to the complexity of a module. Another important factor is the “space” of
life or scope of data elements, i.e., the number of sentences in which the
state and value of a data item should be remembered by the programmer in
order to understand what makes the module.
Another aspect related to the complexity is the scope or flow amplitude
control, i.e., the number of contiguous lexicographically sentences to be
examined before finding a section of code black-box with an entry point
and an exit point. It is interesting to note that the theory behind structured
programming provides the means to reduce to a minimum range ESTA
length organizing logic combinations of operations “sequence,” “decision”
and “iteration.”
The complexity of procedures, as perceived by humans, is highly
influenced by the size of the module.
Three factors implicit in the previous approach, have been identified as
the complexity of sentences affecting:
• the amount of information must be properly understood;
• accessibility of information;
• the structure of information.
These factors determine the likelihood of human error in processing
information of all kinds.
While the complexity of all types of sentences of programs can be
evaluated according to these criteria, we will focus on intermodulation those
that establish relationships.
Quantity: For much information, we will understand the number of data
bits within the meaning assigned by the information theory. The programmer
must manage to understand interface.
106 Parallel Programming

In simplest terms, ESTA Relates to the number of arguments or


parameters are passed that in the call to the module. For example, a call to
a subroutine containing 100 parameters will be more complex than the one
involving only three parameters.
When a programmer sees a reference to a module in the middle of
another, I should know how the reference will be resolved, and that kind of
transformation will be transmitted.
Consider the following example: a call to the procedure SQRT If the call
is: SQRT (X)
The programmer infers functions as an X That input parameter and
return value as the method. And it is very likely that this is so.
Now if the call is: SQRT (X, Y)
That programmer infers the X works as an input parameter and the result
is returned in Y. Now suppose we have: SQRT (X, Y, and Z). Then, the
programmer can infer the function’s X as an input parameter, the result is
returned to the procedure, and Z, as an error code, is returned. This may be
true, but the probability is high that the order of the parameters is different.
It can be seen that, as the number of parameters increases, the
possibility of error is higher. It can be argued that this is solved with proper
documentation, but the reality is that in most cases the programs are not well
documented.
Also notice that a module with several parameters, possibly being
performed over a specific function, can be decomposed into two simple and
functional modules with fewer arguments.
Observe the following example: Suppose we that have the DIST function
calculate the distance between two points. The mathematical formula for
this calculation is:
DIST = SQRT ((y1 – y0) 2 + (x1 – x0) 2)
We consider the following interfaces:
Option 1. CALL DIST (X0, Y0, X1, Y1, DISTANCE)
Option 2. CALL DIST (ORIGIN, END, DISTANCE)
Option 3. CALL DIST (XCOORDS, YCOORDS, DISTANCE)
Option 4. CALL DIST (LINE DISTANCE)
Option 5. CALL DIST (LINETABLA) option 6. CALL DIST ()
We try to determine which of the interfaces is the least complex.
At first, glance that we may think Option 1 is the most complex, because
Modular Programming 107

it involves the largest number of parameters. Option 1 presents the however


parameters directly.
In contrast, Option 2 presents information indirectly. In order to
understand the interface, we must go to another part of the program and
verify origin is defined in terms of subelements X0 and Y0, X1 and Y1 as
END.
Option 3 as well as presenting information indirectly also presents non-
standard form which complicates the interface.
Option 4 has the same disadvantage as 2 and 3, presenting values
remotely. Option 5 is even more complex. The line table identifier is obscure.
Option 6 unlike the previous parameters locally but not remotely
representing.
Unfortunately, some languages: such as COBOL not allow the call to
modules with parameters within a single program. In Addition, there is an
aversion to use parameterized calls by some people founded mainly on:
• Parameterize an interface requires more work;
• The parameterization process itself can introduce errors;
• Generally the program speed is lower than the global variables
are used.
Structure: Finally we note that the structure of information can be a key
point in complexity.
The first observation is that the information presented is less complex if
more complex linearly and if presented in nested form.
The second observation is that the information is presented in less
complex if affirmative or positive manner, and is more complex if present
in negative mode.
Both concepts primary application in writing have program code. For
example, constructions of nested IF certain statements are more complex
than simple to understand sequence of IF statements several.
Similarly, logical expressions involving operator of negation (NOT) is
harder to understand that those without it.
These philosophies of linear and positive thinking are also important in
the references intermodulation. Suppose the following statement:
DISTANCE = SQRT (sum (square (dif (Y1, Y0), square (dif (X1, X0))))
Normally ESTA expression for ordinary programmers find it difficult to
108 Parallel Programming

read. If we break down the instead expression into smaller parts, we have a
greater amount of linear elements and a reduction in nesting. The resulting
sequence is easier to read expressions:
• A = diff (Y1, Y0) B = diff (X1, X0).
• A2 = square (A) B2 = square (B).
• DISTANCE = SQRT (sum (A2, B2)).
Many aspects of the modularization can be understood only if modules
are discussed in relation to others. In principle, we see the concept of
independence. We say that two modules are completely independent if both
can work completely without presence of the other. This implies that there
are no interconnections between the modules, and has a zero value that on
the scale of “dependency.”
We see generally the greater the number of interconnections between
two modules, have less independence.
The concept of functional independence is a direct derivation of
modularity and concepts of abstraction and information hiding.
The question here is: how much do we need to know about a module to
understand another module? The more we should know about the module B
in order to understand the module A, it is less independent of B.
The sheer number of connections between modules is not a complete
measure of functional independence. Functional independence is measured
with two qualitative criteria: coupling and cohesion. We study in principle
the first one.
Highly modules “coupled” will be joined by strong interconnections,
loosely coupled modules will have few weak interconnections, while the
modules “uncoupled” will have no interconnections between them and be
independent.

The coupling is an abstract concept that indicates the degree of


interdependence between modules.

In practice, we can materialize as the probability that in coding, debugging,


or modification of a module, the programmer needs to take parts of
knowledge about another module. If two modules are strongly coupled,
there is a high probability that the developer needs to know one other by
attempting modifications.
Modular Programming 109

Clearly, the overall cost of the system will be strongly influenced by the
degree of coupling between modules.

7.5.1. Factors Influencing the Coupling


The four main factors influencing the coupling between modules are:
• Type of connection between modules: the connected systems
typically have lower those with pathological coupling connections.
• Complexity of the interface: This is approximately equal to the
number of different items past (no amount of data). More items,
more engaging.
• Flow information in the connection type: with lower coupling
systems have control systems with data coupling, and in turn
unless those hybrid coupling.
• When the binding of the connection occurs: connections linked
to fixed references during the runtime, are less coupled when
linking occurs at load time, while lower coupling occurs when
the bonding is performed during linkage-time, which has less
coupling than the one done at the compile time, all of which is
less than the linked one when coupling is performed in encoding
time.

7.5.2. Types of Connections between Modules


A connection in a program is a reference to an item, name, address, or other
identifier element.
An intermodule connection occurs when the referenced element is an
element different from the referencing module.
The referenced element defines an interface, a limit of the module,
through which data and control flow.
The interface can be regarded as residing in the referenced element. May
it be thought as a plug (socket) where the connection element is inserted
referencing.
All interface in a module must be represents what known, Understood,
and appropriately connected by other system modules.
It seeks to minimize the complexity of the system/module in part by
the number and complexity minimizing of interfaces per module. Every
module must also have at least one interface to be defined and linked to the
110 Parallel Programming

rest of the system. But is it a single identity interface systems sufficient to


implement that work properly? The question here is: What purpose do they
serve the interfaces?
Only flows can be control and data between modules passed in a system
programming. An interface can meet the following four unique features:
• transmitting data to a module as input parameters;
• receive data from a module as output results;
• be a name by which control is received;
• be a name by which control is transferred.
A module can be activated by an identified and simple interface identity.
We can pass data also to a module without adding other interfaces, making
the input interface capable of accepting data as control. This requires that
data items are dynamically passed as arguments (parameters) as part of the
activation sequence, which gives control to a module; any static reference
data may introduces new interfaces.
Also, it is required that the identity module interface serves to transfer
the return of control to the calling module. This may be accomplished by
the transfer of control from the caller is a conditional transfer. Also, it must
implemented a mechanism to transmit data back from the called module to
the caller. May it be associated to a special value called activation module,
which can be used in the caller contextually. Such is the case of logic
functions. Alternatively, parameters can be transmitted to define locations
where the module returns values called the caller.
If all connections of a system are restricted to be completely parameterized
(relative to inputs and outputs their), and conditional transfer to each module
monitoring is performed through a simple and unique identity, we say the
system is minimally connected.
We say that system is normally connected to meet the conditions of when
minimally connected, except for some of the following considerations:
• More than one entry point to the same module.
• The activator module or as part caller can specify the activation
process of a return point than the next sentence in the execution
order.
• The control is transferred to an entry point of a module by some
mechanism other than an explicit call (i.e., Perform thru COBOL).
Modular Programming 111

Using multiple entry points will be there that ensures more than the
minimum number of interconnections to the system. Moreover, if each entry
point functions determines with minimal connection to other modules, the
system behavior is similar to one minimally interconnected.
However, the presence of multiple entry points to the same module may
be an indication that the module is performing over a specific function.
Also, it is an excellent opportunity for the programmer to partially overlap
the code the actions included within the same module, being coupled by
functions Said content.
Similarly, alternate return points are useful often within the spirit of the
normally connected systems. This is a module when will continue to run on
a point that depends on the value resulting from a decision by a subordinate
module previously invoked. In a case of minimal connection, the feeder
module returns the value as a parameter, which must be tested again in the
upper module. However, the upper module may indicate directly by some
means should be continued to the point where program execution, (a relative
value + or – direction from the calling instruction or a parameter to an
explicit address).
We will say that connection has to check if the upper coupling module
control communicates information to subordinate the execution. This
information can be passed as data used as signals or “flags” (flags) or as
memory addresses for conditional jump instructions (branch-address).
These elements control are “disguised” as data.
The data link is minimal, and no system can function without it.
Data communication is necessary for system operation, however,
control, communication is an undesirable characteristic and dispensable,
and however which OCCURS very frequently in programs.
It can be minimized if only the link data is transmitted through the
system interfaces.
The control linkage includes all forms of connection elements to
communicate control. This not only involves transfer of control (addresses
or flags), but it may Involve passage changing data, Regulate, or synchronize
execution of another module.
This form of indirect or secondary coupling control is known as
coordination. Coordination involves a module in the procedural context
of another. This can be understood by the following example: Suppose the
module a calls B will supply module discrete data elements. The function of
112 Parallel Programming

the module B is to group data item and compound module A (upper). The B
module will send the module A, signs or flags indicating that it needs another
elementary item, or to indicate that you are returning the item compound.
These flags will be used within the module A to coordinate B and provide
their operation as required.
When a module modifies the content procedural another module, we
say that there hybrid coupling. The hybrid coupling is a modification of
intermodule sentences. In This case, destination or modified module, the
coupling is seen as while controlling for the caller or switch module is
considered data.
The degree of interdependence between two connected modules with
very strong hybrid coupling is. Fortunately, it is a practice in decline and
reserved almost exclusively in assembler programmers.

7.5.3. Time of Intermodulation Linked Connections


“Linked” or “Binding” is a term commonly used in the field of data processing
to refer to a process or fixed that solves a system identifier values within.
The linked varying values, or more generically, referring to specific
identifiers can take place at different stages or periods in the evolution of
the system. Time history of a system can be thought of as a line extending
from the time of writing the source code until the moment of execution.
This line can be subdivided into different levels of refinement to different
combinations of corresponding computer/language/compiler/OS.
Thus, the bound can take place:
• when the programmer writes a sentence in the source code editor;
• when a module is compiled or assembled;
• when the object code (compiled or assembled) is processed by the
“link-editor” or “link-loader “(usually ESTA process is known as
linked in MOST systems);
• when the code” memory-image “is loaded into the main memory,
and finally executed.
The importance of tack time is the value of that when variables within a
piece of code is later, and the system is more adaptable to easily modifiable
and changing requirements.
Modular Programming 113

Alternatives:
1. We write the literal “72” in all printing routines all programs.
(Linked in writing time).
2. We replace the literal in the manifest constant LONG_PAG which
we assign to the value “72” in all programs (linked at compile
time).
3. We put the LONG_PAG constant external file inclusion programs
(linked at compile time).
4. Our language does not allow the declaration of constants which
defines global variable LONG_PAG to which we assign the
initialization value “72” (linked-time link-editing).
5. We define a parameter file system with a LONG_PAG field to the
assigned which is the value “72.” This value is read along with
other parameters when the system starts. (Linked at runtime).
6. Defined in the parameter file for each record one terminal system
and customize the value of the field depending on the printer
LONG_PAG Have linked each terminal. With printers thus
terminals 12 “print 72 lines per page, and a printer using legal
size having inkjet print paper 80 (linked at runtime).
We now consider the relationship between time and linked intermodule
connections, as it affects the degree of coupling between modules.
Again, a reference intermodule fixed to a specific time reference or
object definition, will have a greater link to a fixed reference-time translation
or even later.
The possibility of a separate compilation module and other system
maintenance facilitate modification, if it should compile all modules
together. Similarly, if the link-editing module is deferred prior to execution
until moment, the implementation of changes will be simplified.
There is a particularly derivative coupling modules lexicographical
program structure. We speak in case coupling ESTA content.
Two forms of coupling can be distinguished content:
• Lexicographical including: Occurs when a module is included
lexicographically in another, and it is a minor form of coupling.
Generally, modules can be run separately not. This is the case in
which the slave module is activated online within the context of
the upper module.
• Partial overlap: is an extreme case of coupling content. Part of
114 Parallel Programming

the code module intersects the other. Fortunately, MOST modern


high-level languages do not allow such structures.
In terms of usage, maintenance, and modification, coupling the
consequences of content, they are worse than those of the control coupling.
The coupling contents does the modules cannot function without each
other. Not so in controlling coupling, in which a module, but can be receive
controlling information invoked from different points in the system.

7.5.4. Common Environment Coupling (Common Coupling-


Environment)
Whenever two or more modules interact with a common data environment,
it is said that these modules are in engagement common environment.
Examples of common environment can be data areas DATA overall as
COBOL or division of a disk file.
Coupling common environment is a form of coupling second order,
different from those discussed above. The severity of the coupling
depends on the number of modules simultaneously accessing the common
environment. In the extreme case of only two modules where one uses as
input data generated by the other will discuss one input-output coupling.
The point is coupling the common environment that is not necessarily
bad and should be avoided at all costs. On the contrary, there are certain
circumstances in which it is a valid option.

7.5.5. Decoupling
Decoupling is any systematic or technique to make more independent
program modules method.
Each type of coupling suggests a method of generally decoupling. For
example, the coupling caused by bound can be uncoupled by changing the
parameters appropriate as seen in the example of the line counter programs
printers.
Decoupling from the functional point of view, it can rarely done except
in the early design phase.
As general rule, a design discipline that encourages input-output coupling
and coupling control over the content coupling and coupling hybrid, and
seek to limit the scope of engagement common environment is the Most
effective approach.
Modular Programming 115

Other techniques to reduce coupling are:


• Convert the implicit explicit references. What can be more easily?
• It is easier to understand.
• Standardize connections.
• Use of “buffers” for the elements reported in a connection.
If a module can be designed from the start to buffer assuming
that mediate each stream communication, timing issues, speed,
frequency, etc. Within a module will not affect the design of
others.
• Location used to reduce coupling by common environment.
Consist in dividing the area into regions so common that have
access to modules only are strict data that concern them.
Normal data coupling: Every connection is explicitly made by the
minimum number of parameters, data being primitive or elemental type.
Regular coupling pattern: It occurs when pieces of data exchanged so
composed as structures or arrangements. When it is desired to set the level
of coupling a module, each of the composite data is counted as one, and not
the number of elements of the structure.
Normal coupling packaging: They are grouped into elements are not
related that, the sole purpose of reducing the number of parameters structure.
When qualifying the usual level of coupling, each of the elements of the
composite data is counted one.
Normal control coupling: A module exchanges information with
another module to alter the wishes that internal logic of another module.
Information expressly indicate the action to be performed by the other
module.
Hybrid coupling: The modules exchange information which is different
for the calling module and the module named. For the module called, the
parameter is seen for the control and parameter module is seen as a caller
given.
Common coupling: Two modules exchange information through
global variables since a global variable is not protected in any module, any
piece of code can change the value of that varying by making the modulus
unpredictable behavior
Pathological Coupling: Two modules have the potential to affect other
data through programming errors. This may occur, for example, when a
116 Parallel Programming

module writes the data of another module, for example, misuse of pointers
or explicit memory usage to modify variables.

7.6. COHESION

7.6.1. Functional Relationship


We have seen that the determination of modules in a system is not arbitrary.
The manner in which physically divided parts system (Regarding the
structure of particularly the problem) can significantly affect the structural
complexity of the RESULTING system, and the overall number of references
intermodulation.
Adapt the system design problem structure (or structure of the application
or problem domain) is a very important design philosophy. Often we find
that domain processing elements are highly related problems translated
into highly interconnected code. The structures of the highly interrelated
problem grouped elements tend to be modularly effective.
Imagine that we have a magnitude for measuring the degree of functional
relationship between pairs of modules. In terms such as, we say that the
more effective modular system is one whose sum of functional relationship
between pairs of elements belonging to different modules is minimal. Among
other things, seta tends to minimize the number of connections required
intermodulation and intermodule coupling.
This functional relationship is intermodulation known as cohesion.
Cohesion is the qualitative measure of how closely related are the internal
elements of a module.
Other terms used are often “modular force,” “attachment” and
“functionality.”
In practice, a single processing element isolated may be operatively
associated to varying degrees with other vouchers elements. As a result,
different designers with different “visions” or interpretations of the same
problem can get different modular structures with different levels of
cohesion and coupling. The disadvantage to this is it that often difficult
to assess the degree of functional relationship of one element relative to
another additions.
The modular cohesion can be seen as the cement amalgam processing
elements together within a module. It is the most crucial factor in the
Modular Programming 117

structured design and an effective most important modular design. This


concept is the main technique that has a designer to keep your design as
semantically close to the real issue, or problem domain.
Clearly, the concepts of cohesion and coupling are closely related.
Greater cohesion implies lower coupling. Intramodular maximize the level
of cohesion across the system results in a minimization of intermodulation
coupling.
Mathematically the calculation of intramodular functional relationship
(cohesion) Involves fewer pairs of elements to the measure which should are
applied compared to calculation of the intermodule functional relationship
(link).
Both are excellent tools for measures effective modular design, but the
two most important and extensive is the cohesion.
An important question to be determined is how to recognize the
functional relationship.
The cohesion principle can be implemented with the introduction of the
notion of a partnership principle
In the decision to certain processing elements in one module, the designer
use the principle that certain characteristics or properties relate to elements
that possess. That is, the designer will put the Z object in the same module
X and Y Because X, Y, and Z have a same property. Thus, the associative
principle is relational, and is usually verifiable in such terms (e.g., ‘is correct
put Z with X and Y, that has the same property that they “) or in terms of a
member of a group (e.g., “Z is in September right next to X and Y, they all
belong to the same set “).
For example, single statements found in the module B, which is invoked
from the module A, not listed in the cohesion of the module A. However, the
overall processing (function) by the call to module B, is to clearly processing
element in the caller module, and therefore participates in the cohesion of
the module A.

7.6.2. Cohesion Levels


Associative were different principles through the years unfolding through
experimentation, theoretical arguments and practical experience of many
designers. There are seven distinct levels of cohesion seven associative
principles. These are listed below in order of increase increasing degree of
cohesion, from lower to higher functional relationship:
118 Parallel Programming

• Casual cohesion (the worst);


• Cohesion logic (following the worst);
• Temporal cohesion (moderate to poor);
• Procedural cohesion (moderate);
• Communication cohesion (moderate to good);
• Sequential cohesion;
• Functional cohesion (best).
We can visualize the degree of cohesion as a spectrum ranging from a
maximum to a minimum.

7.6.2.1. Casual Cohesion (The Worst)


The casual cohesion there is little occurs when or no relationship between
the elements of a module.
The casual cohesion to zero point establishes on the scale of cohesion.
It is very difficult to find purely random modules. It may appear as
a result of the modularization of a program written already, in which the
programmer finds a special sequence of instructions repeated at random, and
decide to group them therefore into a routine.
Another factor influenced that often casually cohesive modules was
making the bad practice of structured programming, programmers modularize
when poorly understood that was to change the GOTO statements by calling
subroutines
That we finally say, although in practice it is difficult to find casually
cohesive modules in its entirety, it is common to have casually cohesive
elements. Such is the case of initialization and termination operations are
put together in that an upper module.
That we should note while casual cohesion is not necessarily harmful
(indeed is preferable linear one cohesive casually program) obstructs
modifications and code maintenance.

7.6.2.2. Cohesion Logic (Following the Worst)


The elements of a module are logically can be associated if they thought of
as belonging to the same logical class of functions, which are namely those
deemed to logically together. For example, you can be combined in a single
module all processing elements which fall into the class of “tickets” which
covers all input operations.
Modular Programming 119

We can have a console module read from a card with parameters control
records to file erroneous transactions on tape, with valid transaction records
from another file on tape, and the previous master records to file on disk.
This module could be called “readings” That groups and all input operations,
is logically cohesive.
Logical cohesion is stronger than the casual, because it represents a
minimum of association between the problem and the module elements.
However, we can see that a logically cohesive module does not perform a
specific function, but includes a number of functions.

7.6.2.3. Temporal Cohesion (Moderate to Poor)


Temporal cohesion: It means that all processing elements of a collection
occur in the same period of time during the execution of system. Because
such processing must be performed or can in the same time period, the
associated elements combined into may temporarily be a single module run
at the same that time.
There is a relationship between cohesion and temporal logic; however,
it does not imply the first time relation between processing elements.
Temporal cohesion is stronger than the logical cohesion, since it implies
a level of respect more: the time factor. However, the temporary cohesion
is still poor level of cohesion and carries drawbacks in maintenance and
system modification.
As in the previous cases, to say that a module has only procedural
cohesion, the processing elements must be elements of an iteration, decision,
or sequence, but should not be linked with any associative principle of higher
order. The procedural cohesion based on associated processing elements
their algorithmic or procedural relationships.
This level of cohesion commonly results to derive a modular structure
from process models: such as flowcharts or Shneiderman diagrams.

7.6.2.4. Communication Cohesion (Moderate to Good)


None of cohesion levels are strongly linked previously discussed particularly
to a structure problem. Communication is the lowest level cohesion in which
we found a relationship between the processing elements is intrinsically
dependent problem. Say that a set of processing elements are linked by
communication means that all elements operating on the same set of input
data or output.
120 Parallel Programming

In the above diagram, we can see the processing elements that 1, 2, and
3, are associated by communication on the current input data, while 2, 3, and
4 are linked by the output data.
The data flow diagram (DFD) is an objective means to determine whether
the elements of a module are associated by communication.
The communication relationships have an acceptable degree of cohesion.
The cohesion communication is common in commercial applications.
Typical examples include:
• A print module or record a transaction file.
• A module that data from different sources receives, and transform
and assemble in a print line.

7.6.2.5. Sequential Cohesion


The next level of cohesion on the scale is the sequential association. Here,
the output data (results) of a processing element serve as input to the next
processing element.
In terms of a DFD of a problem, combine a linear sequential cohesion
chain of successive data transformations.
Clearly, this is an associative principle related to the problem domain.

7.6.2.6. Functional Cohesion (Best)


At the upper limit of the spectrum of functional relationship is functional
cohesion. In a fully functional module, each processing element is an integral
and essential to the realization of a single function.
Modular Programming 121

In practical terms, we can say that is one functional cohesion that is not
sequential, for communication, procedure, temporal, logical, casual or.
The most clear and understandable examples eats from the field of
mathematics. A module for calculating square root certainly is highly
cohesive, and probably fully functional. It is unlikely to be superfluous more
beyond absolutely essential for the mathematical function, and are unlikely to
processing elements can be added without altering the calculation somehow.
In contrast, module to calculate square root and cosine, it is unlikely to
be fully functional (two ambiguous functions should be performed).

7.6.3. Criteria for Establishing the Degree of Cohesion


A useful technique for determining if a module is functionally is bounded
to write a sentence describing the (purpose) and then a function module
examine the phrase. You can do the following test:
1. If the phrase happens to be a compound statement contains a
comma or contains more than one verb, probably the module
performs more than one function; therefore, it probably has
sequential or communication link.
2. If the sentence contains words relating to time, such as “first,”
“then,” “then,” “next,” “when,” “the beginning,” etc. Then
probably the module has a temporal or sequential link.
3. If the predicate phrase does not contain a single specific object
following the verb, the module will probably be limited logically.
For example edit all data has a logical link; edit source sentence
may have functional link.
4. Such words as “initialize,” “clean,” etc. involve temporary
bonding.
Modules always be functionally bounded described in terms of ITS
elements using a compound statement. But if you cannot avoid the above
language, while still a full description of the function of the module, the
module is then a functionally limited probably not.
It is important to note that it is NOT necessary to determine the required
level of cohesion. Instead, the important thing is to try to achieve high
cohesion and to recognize that the low cohesion so it can modify the software
design that has greater functional independence.
122 Parallel Programming

7.6.4. Cohesion Measurement


Any module rarely verifies a single associative principle. Its elements
can be related by a mixture of the seven levels of cohesion. This leads to
a continuous scale having in the degree of cohesion rather than a discrete
scale with seven points.
Where there is more than one pair of relationship between a processing
element, the maximum level achieved applies. Therefore, if a module has
logical cohesion between all pairs of processing elements, and in turn
presents cohesion communication said also among all pairs, then a module
that is considered for communication cohesion.
Now, what is the cohesion of the module if it contains a pair of present
items completely unrelated? In theory, it should have some sort of average
between cohesion and casual communication. For debugging purposes,
maintenance, and modification, a module behaves as if “only as strong as its
weakest link.”
The effect on programming costs is close to the lower level of cohesion
within the applicable module instead of the higher level of cohesion.
The cohesion of a module is the highest level of Acerca that cohesion
is applicable to all processing elements within the module.

A module may consist of complete several logically related functions. This


is definitely a more cohesive module logically linking fragments of various
functions.
The decision that level of cohesion is applicable to a module requires
some human given judgment. Some criteria are:
• Sequential cohesion is closer to the functional than optimal
communication its predecessor.
ª Similarly, there is a major leap between logic and cohesion
between casual and temporal logic.
We can assign the following scale of values to assist the designer in
qualifying levels:
• 0: casual;
• 1: Logic;
• 3: Temporary;
• 5: procedural;
Modular Programming 123

• 7: Communication;
• 9: Sequential;
• 10: Functional.
Anyway, it is not a fixed rule, but a conclusion.
The obligation of the designer is to understand the effects of variation in
cohesion, especially in terms of modularity, in order to make compromise
solutions benefiting one aspect against another.
Chapter 8

Recursive Programming

CONTENTS
8.1. Classification Of Recursive Functions ............................................. 126
8.2. Design Recursive Functions ............................................................ 127
8.3. Bubble Method ............................................................................... 138
8.4. Sorting By Direct Selection ............................................................. 140
8.5. Method Binary Insertion ................................................................. 141
8.6. Method Quicksort (Quicksort)......................................................... 141
8.7. Mixing Method (Merge Sort) ........................................................... 143
8.8. Sequential Search ........................................................................... 146
8.9. Binary Search (Binary Search) ......................................................... 147
8.10. Seeking The Maximum And Minimum .......................................... 149
8.11. Greedy Method ............................................................................ 152
8.12. Optimal Storage On Tape (Optimal Storage On Tapes) .................. 152
8.13. The Knapsack Problem .................................................................. 153
8.14. Single Source Shortest Paths (Shortest Route From an Origin) ........ 156
126 Parallel Programming

The programming area is very large and with many details. In the
programming language C, as well as in other programming languages can
be applied to technique that was given the name recursion functionality. The
memory allocation can be static or dynamic and at any time can employ
given the combination of these two.
Recursion is a technique with which a problem is solved by substituting a
problem in the same way but simpler.
For example, the definition of factorial for N >= 0.

Certain problems to adapt naturally recursive solutions.

8.1. CLASSIFICATION OF RECURSIVE FUNC-


TIONS
Recursive functions are classified to the recursive call according is made in:
• Direct recursion: The function calls itself.
• Indirect recursion: A function calls the function B and function
B calls A. Depending on the number of recursive calls generated
at runtime.
• Linear recursive or single function: A single internal call is
generated.
• Nonlinear recursive or multiple functions: Two or more
internal calls are generated.
According to the point where the recursive call is made, the recursive
function can be:
• Final (tail recursion): The recursive call is the last instruction
that occurs within the function.
• No end (Non-tail recursive function): any operation is performed
when returning from the recursive call.
The final recursive functions are more efficient usually (in the
multiplicative constant as to time, especially in terms of memory space) that
does not end. (Some compilers can optimize these functions automatically
passing them to iterative).
An example of end recursion is the Euclidean algorithm for calculating
the greatest common divisor of two positive integers:
Recursive Programming 127

8.2. DESIGN RECURSIVE FUNCTIONS


• The original problem can be transformed into a problem like
simpler.
• We have some direct way to solve “trivial problems.”
Recursive module to be correct should be performed:
• A case analysis of the problem: There is at least one termination
condition in which a recursive call is NOT necessary. Are trivial
cases are solved directly.
If n = 0 or n = 1, the factor is 1
• Convergence of the recursive calls: Each recursive call is made
with a smaller, so that data is reached the termination condition.
Factorial (n) = n * factorial (n – 1)
• If recursive calls work well, The entire module works well:
induction principle.
Factorial (0) = 1 factorial (1) = 1.
For n > 1, correct calculation of factorial assuming (n – 1), Factorial (n)
= n * factorial (n – 1)
Graphically, the factorial function would be:
128 Parallel Programming

An important requirement for a recursive algorithm is correct is that it


does not generate an infinite sequence calls him.

8.2.1. Advantages and Disadvantages of Recursion


A recursive function must be a natural, simple, comprehensible, and elegant
way problem. For example, given a non-negative integer, write binary
coding.

binary void (int n) {if (n <2)


printf ( “% d,” n); else {
binary (n/2);
printf ( “% d,” n% 2);
}
}

Another element is the ability to consider checking and verifying that the
solution is correct (mathematical induction).
Generally, recursive solutions are inefficient in time and space iterative
versions, due to subroutine calls, creating dynamic variables in the recursive
Recursive Programming 129

stack and duplication of variables. Another disadvantage is that in some


recursive calculations repeated solutions unnecessarily. For example, the
calculation of the nth term of the Fibonacci sequence.

int fib (int n) {if (n <2)


return 1;
return fib (n – 1) + fib (n–2);
}

In general, any recursive function can be transformed into an iterative


function.
• Iterated function advantage: more efficient in time and space.
• Disadvantage of iterated function: in some cases, very
complicated; Additional structures are required auxiliary often
data.
If efficiency is a critical parameter, and the function to be executed
often, an iterative solution should write. Recursion can be simulated using
batteries to transform a recursive iterative program. Batteries are used to
store values applet parameters, the values of the local variables, and the
results of the function.
Exercises:
1. Perform a recursive tree with Fibonacci (5).
2. Perform a recursive function tree with Hanoi (4, or, d, a) Hanoi
(N, origin, destination, Auxiliary)
If N = 1
Print “Move album” Source “to” Destination
If not
Hanoi (N – 1, Origin, Auxiliary, Destination)
Print “Move album” Source “to” Destination
Hanoi (N – 1, Auxiliary, destination, origin)
3. Ackermann function is defined as:
N + 1Si M=0
130 Parallel Programming

A (M, N) A (M–1, 1) If N=0


A (M–1, A (M, N – 1)) In
Otherwise, perform a recursive tree A (2,2)
4. Euclid’s algorithm for calculating the greatest common divisor is
defined as:
MSi N=0

GCD (M, N)
Gcd (N, M MOD N) If N> 0
Perform a recursive tree for MCD (15.4) and MCD (15.3)
5. That makes a program prints a word backwards (no matter their
size) without the use of arrays or dynamic memory.
Two thoughts changed the world. In 1448 in the city of Mainz, Johann
Gutenberg called a goldsmith discovered the way to print books by putting
together two metal parts moving. At this time the dissipation of the Dark
Age and the human intellect is freed, science and technology triumphed
HAD thereby initiated the fermentation of the seed that led to the Industrial
Revolution. Several historians, we owe it suggests that to typography.
Imagine a world in only an elite which could read these lines. But others
insist that was not the key development typography; it was algorithmic
(Figure 8.1).
Recursive Programming 131

Figure 8.1: Johann Gutenberg (1398–1468).


Nowadays, we are used to type numbers in decimal notation; it is easy
to Forget Gutenberg, to write using Roman notation MCDXL 1448 + VIII.
How you can add two Roman numerals? The most you could do was
to add small numbers Gutenberg with your fingers; for something more
complicated would have to consult the abacus.
The decimal system, invented in India around 600 BC produced a
revolution in qualitative reasoning: using 10 symbols, even large numbers
could be written without any complications. Any arithmetic operation can
be performed without any complications. However, these thoughts took a
long time to expand, due to ignorance, traditions, distance, and language
barrier. The most influential means to allow the transmission of knowledge
ESTA was made by a book written by a ninth-century Arab WHO lived in
Baghdad. Al Khwarizmi. I gave the basic methods for addition, subtraction,
multiplication, division, and square root even for calculating and Π. The
procedures were accurate, unambiguous, mechanics, efficient, and correct
algorithms were, a term coined in honor of the wise man. The decimal
system in Europe was adopted several centuries later.
Since the advent of decimal notation in Europe was allowed to technology,
science, commerce, industry, and subsequently the calculation, fully
developed. Scientists around the world have more developed increasingly
complex algorithms for all kinds of problems and novel applications
investigated that have changed the world radically (Figure 8.2).
132 Parallel Programming

Figure 8.2: Al Khwarizmi (lived between 780 and 850 D.C.).


Al Khwarizmi’s work could be established in Europe by the efforts of
a thirteenth-century Italian mathematician Leonardo Fibonacci known as,
who discovered the potential of the positional system and worked enough
for the development and future spread.
But today, Fibonacci is best known for the famous sequence of numbers:
0, 1, 1, 2, 3, 5, 8, 13, 21, and 34.
Formally speaking, the Fibonacci numbers Fn are generated by a single
rule (Figure 8.3):

Figure 8.3: Leonardo of Pisa (1170–1250).


Recursive Programming 133

No other sequence of numbers that have been studied so extensively, or


applied to more fields of knowledge: biology, demography, art, architecture,
music, to name a few of the fields. And with the power of two, it is in
computer science to favorite sequences.
Actually, the Fibonacci sequence grows as fast as the power of two: for
example, F30 Exceeds one million, and the F100 sequence has approximately
21 digits long. Overall, Fn ≈ 20.694n.
Worse, what is the exact value of F100 or F200? Fibonacci never knew
the answer. To find out, we need an algorithm to compute the nth Fibonacci
number.
One thought is to use the recursive definition of Fn. The pseudo code is
shown below:

fib function (n) {


If n = 0 returns 0 if n = 1 returns 1
Returns fib (n – 1) + fib (n–2)
}

Every time you have an algorithm, there are questions to answer:


1. It is right?
2. How long it takes to give the result as a function of n?
3. It can improve?
The first question is not discussed here since the algorithm is the
definition of Fibonacci precisely to Fn. But the second definition demands a
response. Let T (n) the number of computational steps required computing
fib (n); what can we say about this role? If n < 2 the procedure ends almost
immediately, after just a couple of steps. Thus: T (n) ≤ 2 for n≤1.
For larger numbers exist invocations of recursive fib (), as follows: T (0)
= 1, revision of the first conditional,
T (1) = 2, the two conditional review
T (2) = A review of the first and second conditional function invocation. T
(2) = 3 + T (1) + T (0)
T (2) = T (1) + T (0) + 3 = 1 + 2 + 3 = 5
T (3) = Revision of two conditional recursive invocation more of T (2) + T
134 Parallel Programming

(1) T (3) = T (2) + T (1) + 3 = 5 + 2 + 3 = 10


T (4) = T (3) + T (2) + 3 = 10 + 5 + 3 = 18

T (n) = T (n – 1) + T (n–2) + 3 for n> 1.

The execution time of the algorithm grows as fast as the Fibonacci numbers:
T (n) is exponential in n, which implies that the algorithm is impractical
except for very small values of n.
A demonstration of the complexity of the algorithm can be seen as follows:
Consider a naive approach to calculating the complexity of the recursive
function Fibonacci. If we call S (n) the number of additions to find necessary
F (n). For the first values we have:
S (1) = 0 = S (2) S (3) = 1, S (4) = 2, S (5) = 4, S (6) = 7.
And in general, by induction, the number of sums to compute F (n) is equal
to S (n) = S (n – 1) + S (n–2) +1

Induction F (n–2) <S (n) is obtained.


But how fast it grows Fibonacci function? We can make an analogy
F (2) = F (1) + F (0) X2 = X + 1
That is, the roots of X2-X–1 = 0
This being a characteristic equation and its roots is known as the golden
(Golden Ratio), and its exact value is c = (1+√5)/2.
{Cn} geometric progression satisfies the same equation in Fibonacci
recurrence function, this is cn = cn – 1 + cn–2 has an equivalence with F (n)
= F (n – 1) + F (n–2). Now, by induction as
c0 = 1 = F (2), c = c1 <2 = F (3), we obtain cn–2 <F (n)
In conclusion, the Fibonacci function at least grows exponentially. It
Inequalities linking the two, for all n we have:
Cn–4 <F (n–2) <S (n)
And the S (n) function increases at least exponentially.
For example, to calculate F(200), the fib () function for T (200) ≥ F(200) is
called with elementary steps. How long does it take? That depends on the
used computer. At this time, the fastest computer in the world is the NEC
Earth Simulator, with 400 billion steps per second. Even for this machine,
Recursive Programming 135

fib (200) will take 292 seconds at least. This means that if we start counting
today, would be working after the sun becomes a red giant star.
But the technology has improved so the steps of calculation are roughly
doubling every 18 months: this phenomenon is called Moore’s law. With
this extraordinary growth, possibly fib function will run much faster in the
next year. Its check data, the runtime fib (n) ≈ 20.694n (1.6) n, so it takes
1.6 times longer to compute F(n + 1) than F(n). And under Moore’s Law,
computing power grows each year about 1.6 times. If we can compete with
the F(100) reasonably growth of technology, the following year F(101) can
be calculated. And the following year, F(102) and so on: only one Fibonacci
number each year. This is the behavior of an exponential time.

#include <stdio.h> main () {


int i, n = 5, fibn_2, fibn_1, fibn;

fibn_2 = fibn_1 = 1; for (i = 2; i <= n; i ++) {


fibn = fibn_2 + fibn_1;
printf ( “fib (% d) =% d \ n,” i, fibn); fibn_2 = fibn_1;
fibn_1 = fibn;
}
getchar ();
}

How long does the algorithm? The loop has only one step and is executed
n – 1 times. So the algorithm to be linear in n. From the exponential time, we
have moved to a polynomial time, great progress at runtime. F(200) is now
reasonable to estimate or even F(200000).
When analyzing an algorithm, you must first check which operations are
employed and their cost.
These operations can be limited in a constant time. Example, comparison
can be done between characters in a fixed time.
Comparing strings depends on the size of the chains. (You can narrow time).
To see the times of an algorithm analysis a priori and a posteriori not
required. In an analysis, a priori that function limits the time of the algorithm
is obtained in an ad hoc analysis on statistical algorithm development in
136 Parallel Programming

time and space when an algorithm is running.


An example of a priori analysis is:
A priori analysis of what type of machine ignores is, language, and only
focuses on determining the order of execution frequency.

F (n) = O (g (n)) if there are two constants “c” and “not” such that
| f (n) | ≤ C | g (n) | for all n> no.
When it is said that an algorithm has a computational time O (g (n)) indicates
that the algorithm is run on a computer x with the same type of data but with
n greater, the time will be less than some constant time | g (n) |.
If A (n) = amnm +. + A1n + ao then A (n) ≈ O (nm)
Where A (n) the number of steps of a given algorithm.
If an algorithm has x order of magnitude whose steps is c1nm1, c2nm2,
c3nm3…., cknmk, Then the order of the algorithm is:
A(Nm) where m = max {I], 1 ≤ i ≤ k.
As an example to understand the order of magnitude suppose there are two
algorithms for the same task in which each requires O (n2) and the other of
OR(N log n). If n = 1024 is required for the first algorithm and 1,048,576
operations for the second algorithm are 10241 operations. If it takes a
computer to perform each operation (in seconds), the algorithm requires
approximately one second, and 1.05-second algorithm requires about 0.0102
seconds for the same input.
The most common times are:
OR (1) <O (log n) <O (n) <O (n log n) <O (n2) <O (n3) <O (2n).
Note: The basis of the logarithm is two. (Log b N = Log(a) N/log b)
OR (1) The number of basic operations is fixed at that time it is bounded by
a constant.
OR (N), O (n2) and O (n3) are the polynomial type.
OR (2n) it is exponential.
Algorithms with greater complexity O (n log n) are sometimes impractical.
An exponential algorithm is practical only with a small value of n.
Example (Table 8.1):
Recursive Programming 137

Table 8.1: Algorithmic Complexity

Log (n) n n log (n) n- n3 2n


0 1 0 1 1 2
1 2 2 4 8 4
2 4 8 16 64 16
3 8 24 64 512 256
4 16 64 256 4096 6536
32 160 1024 32768 4294967296

Another comparison table is as follows in which it is assumed that the


computer can do a million operations per second (Table 8.2):

Table 8.2: Time in Seconds It Takes to Perform f (n) Operations

..... n 10 20 30 40 50 70 100
f(n)--
n 0.00001 s 0.00002 s 0.00003 s 0.00004 s 0.00005 s 0.00007 s 0.0001 s
n logo) 0.00003 s 0.00008 s 0.00014 s 0.00021 s 0.00028 s 0.00049 s 0.0006 s
n' 0.0001 s 0.0004 s 0.0009 s 0.0016 s 0.0025 s 0.0049 s 0.01 s
n3 0.001 s 0.008 s 0.027 s 0.064 s 0.125 s 0.343 s Is
n4 0.01 s 0.16s 0.81 s 2.56s 6.25s 24s 1.6 min
it 0.1 s 3.19 s 24.3 s 1.7 s 5.2 min 28 min 2.7 horas
numl" 0.002 s 4.1 s 17.7 s 5 min 35 s 1 h 4min 2.3 dias 224 dias
2° 0.001 s 1.04 s 17 min 12 dias 35.6 a. 37 Ma 2.6 MEU
3° 0.059 s 58 min 6.5 a. 385495 a. 22735 Ma 52000 Ma 1018 MEU

n! 3.6 s 77054 a. 564 MEU 1.6 10 ME


13
6 10 34
2.4 10 M 70°
2

nil 2.7 horas 219 EU 4.2 10 ME14


2.4 10 ME
28
1.8 10 ME 67
3 10 111
M 2 10182M

In Figure 8.4 you can graphically observe the growth of the main
functions of temporal complexity.

Figure 8.4: The growth of the main functions of low complexity.


138 Parallel Programming

The notation O (or large O) is used to indicate an upper bound. You can also
set a lower bound.
F (n) = Ώ (g (n)) iff there is a constant “c” and “not” such that
| f (n) | ≥c | g (n) | for all n> no if F (n) = Ώ (g (n)) f (n) = O (g (n)) then:
F (n) = Ө (g (n)) iff there are no positive constants c, such that for all n> no,
C1 | g (n) | ≤ | f (n) | ≤ Ctwo| G (n) |
This indicates that the worst and best case makes the same amount of time.
Example: That an algorithm searches the maximum n disorderly elements
always perform n – 1 iteration, therefore
Ө (n).
Seek a settlement value
OR(N) Ώ (1)
Exercises
1. Since the algorithm of the towers of Hanoi shown in chapter
recursion, determine their order. That assuming 63 rings have.
2. Investigate the complexity of the algorithm and compare it
Ackermann with the algorithm of the Towers of Hanoi.

8.3. BUBBLE METHOD


It is the simplest sort algorithm. It is a list repeatedly cycle through comparing
elements in adjacent pairs. If an item is greater than the one in the next
position are exchanged.
The algorithm in pseudocode in C is:
for(i=1; i<TAM; i++)
for(j=0; j<TAM-1; j++)
if (lista [j] > lista [j + 1])
temp = lista [j];
lista [j] = lista [j + 1];
lista [j + 1] = temp;
where is any given list, TAM is a constant that determines the size of the
list, and I and j are counters.
Let’s see an example. This is our list: 4-3-5-2-1. We have 5 elements. That is,
TAM takes the value 5. Comparing the first begin with the second element.
Recursive Programming 139

4 is greater than 3, so exchanged. Now we have:


3-4-5-2-1
Now we compare the second to the third: 4 is less than 5, so we do nothing.
We continue with the third and fourth 5 is greater than 2. Exchanged and get:
3-4-2 to 5 to 1
Comparing the fourth and fifth: 5 is greater than 1. Exchanged again: 3-4-
2-1-5.
By repeating ESTA, we process the following results obtained: 3-2-1-4-5
2-1-3-4-5
1-2-3-4-5
This is the analysis for non-optimized version of the algorithm:
• Stability this algorithm never exchanged records with key equal.
Therefore it is stable.
• Memory requirements this algorithm requires only an additional
variable for exchanges.
• Execution time: The inner loop is executed n times for a list of
n elements. The outer loop executed also n times. That is, the
complexity is O(n2). The average case behavior depends on the
order of data entry, but is only slightly better than the worst case,
and remains O(n2).
Advantage:
• Easy implementation;
• It does not require additional memory.
Disadvantages:
• Very slow;
• Makes numerous comparisons;
• Performs numerous exchanges.
Calculation of complexity:
for (i=1; i<n; i++) C1 N
for j=0 ; j<n - 1; j++) C2 N
if (lista[j] > lista[j+1]) C3 N2
temp = lista[j]; C4 N2
lista[j] = lista[j+1]; C5 N2
lista[j+1] = temp: C6 N2
140 Parallel Programming

So that:
T (N) = N + 5 * N2 is so complexity O (n2).

8.4. SORTING BY DIRECT SELECTION


The basic notion of ESTA algorithm is to find the smallest element of the
array and place it to the first position. The second smallest element of the
array looks and then a second position placed in the. The process continues
until all array elements have been ordered. The method is based on the
following principles:
• Select the smallest element of the array;
• Said exchanging element with the first;
• Repeating the above steps with the (n – 1), (n–2) elements, and so
on until only the largest element.
The analysis method of direct selection is relatively simple. Should the
number of comparisons consider between elements is independent of
the initial arrangement of elements in the array these. In the first pass is
performed (n – 1) comparisons in the second pass (n–2) comparisons and so
on until two and one comparisons in the penultimate and last, respectively
last. So that:
C = (n – 1) + (n–2) +. + 2 + 1
Now, using the Gauss trick for the sum of the natural numbers we have:
Sn = 1 + 2 + 3 + 4 +. + (n–3) + (n–2) + (n – 1) + n
Sn = n + (n – 1) + (n–2) + (N–3) +. + 4 + 3 + 2 + 1
2 Sn = (n + 1) + (n + 1) + (n + 1) + (n + 1) +. + (n + 1) + (n + 1) + (n + 1)
+ ( n + 1)
2 Sn = n * (n + 1), THEREFORE Sn = n * (n + 1)/2 Using the same idea to
have:
Sn = 1 + 2 + 3 + … + (n–3) + (n–2) + (n – 1)
Sn = (n – 1) + (n–2) + (N–3) + (n–4) +. + 1 2+
2 Sn = (n – 1) + (n – 1) + (n – 1) + (n + 1) +. + (n + 1) + (n + 1) + (n – 1)
2 Sn = n * (n – 1), Therefore Sn = n * (n – 1)/2

Thus, we have: C = n * (n – 1)/2 = (n2-n)/2


So the algorithm execution time is proportional to O (n2).
Recursive Programming 141

8.5. METHOD BINARY INSERTION


The sorting method by binary insertion binary search performs a sequential
search pleases, to insert an element on the left side of the array.
By analyzing the sorting method by the presence of binary insertion an
unnatural event is noted. The method makes fewer comparisons when the
array is completely disordered and maximum when is ordered.
It is possible to assume that while in a sequential search K comparisons
are needed to insert an item in a binary half of the K comparisons are
needed. Therefore, the average number of comparisons in the sort method
for insertion can be binary calculated as:
C = 1/2 + 2/2 + 3/2 +… + (n – 1)/2 = (n * (n – 1))/4 = (n2–n)/4
Therefore, the execution time of the algorithm remains proportional to
O (n2).

8.6. METHOD QUICKSORT (QUICKSORT)


The quicksort sorting method is currently the most efficient and fastest
methods of internal management. This method is a substantial improvement
on the speed with which orders the array elements. The so-called notation of
central ESTA algorithm is as follows:
1. AN X element fixes any position is taken.
2. It is located in the correct position X of the arrangement, so that
all items placed on the left are less than or equal to X and all are
on the right WHO are greater than or equal to X.
3. The above steps are repeated, but now for data sets are that on the
left and right of the X position in the array.
4. The process ends when all the elements are in their correct
position in the array.
In this case, scheduling algorithm, the X element will be the first item
in the list, it begins to traverse the array from right to left if the elements
are comparing greater than or equal to X. If an item does not meet ESTA
condition, those are exchanged and stored in the variable position of the
position I exchanged limits the under the right. The travel starts again, but
now from left to right, the elements are comparing whether less than or equal
to X. If an element does not meet ESTA condition, then those are exchanged
and stored in another variable-element position interchanged I refine the
arrangement by the left. The above steps are repeated until the X element
142 Parallel Programming

finds its correct position in the array. At this time, the algorithm shown in C.

#include <stdio.h>
#define N 10
void quicksort (int [], int, int); main () {
int [] = {10,8,7,2,1,3,5,4,6,9}, i;
quicksort (a, 0, N – 1); for (i = 0; i <10; i ++)
printf ( “% d.,” a [i]); putchar ( ‘\ n’);
getchar ();
}

Quicksort is the fastest method of internal organization that exists today.


This is surprising, because the method has its origin in the method of direct
exchange, the worst of all direct methods. Various studies on their behavior
show that is if chosen in each element pass the center occupies the position
of the data set to be analyzed, the number of comparisons, if the array size is
a power of 2, performed in the first pass (n – 1) comparisons in the second
(n – 1)/2 comparisons, but in two different sets, the third conduct (n – 1)/4
comparisons, but in four different sets and so on, This produces a tree-
recursive binary. Thus:

Recursion Tree K. Indicates the level of the tree


K=0

K = 1. The sum is: 2 * (n – 1)/2

K = 2. The sum is: 4 * (n – 1)/4

C = (n – 1) + 2 * (n – 1)/2 + 4 * (n – 1)/4 +. (N – 1) * (n – 1)/(n – 1)


Which is the same as: C = (n – 1) + (n – 1) +. + (n – 1)
If one considers each of the components of the sum as a term and the number
of terms of the summation is equal to k, k Where is the depth of recursive
binary tree, we have:
C = (n – 1) * k
Whereas the number of terms of the sum (k) is the number of levels of the
Recursive Programming 143

binary tree, the number of array elements can be defined as 2k = n, so log 2


2k = log2 n, k log 2 n = log 2 (remembering that log mm = 1), k = log 2 n,
so the above expression that becomes:
C = (n * 1) * log n
However, finding the element occupies the center position of the data set to
be analyzed is a difficult task, there are because 1/n possibilities to achieve.,
Moreover, the average yield of the method is Acerca (2 * ln 2) suboptimal
case, so Hoare, the author of the method, as a solution to proposed the X
element is arbitrarily or select from a Relatively small sample of element
arrangement.
The worst case occurs when the array elements are sorted already, or
when they are arranged in reverse. Suppose you must order the following
dimensional array already is ordered:
A: 08 12 15 16 27 35 44 67
If arbitrarily choose the first element (08), then an arrangement in two
halves, one of 0 and the other (n – 1) elements will be partitioned.
If we continue with the ordering process and is selected again the first
element (12) of the data set to be analyzed, then a divided the array into two
new subsets, again one of 0 and other (n–2) elements. Therefore, the number
of comparisons will be made that will be:
Cmax = n + (n – 1) + (n–2) +. + 2 = n * (n – 1)/2–1
Which it is equal to:
Cmax= (N2+ N)/02.01
It can be said that the average execution time of the algorithm is proportional
to
O (N * log n). In the worst case, it is proportional to O (N2).

8.7. MIXING METHOD (MERGE SORT)


This algorithm sorts elements and has the property that the worst-case
complexity is O (n log n). The items will be sorted increasingly. Given n
items, divided into two they are subsets. Each subset is ordered and the
result is bound to produce a sequence of elements ordered. The C code is
the following:
#include <stdio.h>
#define N 10
144 Parallel Programming

MergeSort void (int [], int, int); void merge (int [], int, int, int); main () {
int i, a [N] = {9,7,10,8,2,4,6,5,1,3};
MergeSort (a, 0,9); for (i = 0; i <10; i ++)
printf ( “% d.,” a [i]); getchar ();
}

MergeSort void (int a [], int low, int high)


{int mid;
if (low <high) {
mid = (low + high)/2;
MergeSort (a, low, mid);
MergeSort (a, mid + 1, high);
merge (a, low, mid, high);
}
}

void merge (int a [], int low, int mid, int high) {int b [N], h, i, j, k;
h = low; i = low; j = mid + 1;
while (h <= mid && j <= high) {if (a [i] <= a [j]) {
b [i] = a [h]; h ++;
}
else {
b [i] = a [j]; j ++;
} I ++;
}
if (h> mid)
for (k = j; k <= high; k ++) {b [i] = a [k];
i ++;
}
else
for (k = hk <= mid; k ++) {b [i] = a [k];
Recursive Programming 145

i ++;
}
for (k = low; k <= high; k ++) a [k] = b [k];
}

Consider the array of ten elements A (310, 285, 179, 652, 351, 423, 861,
254, 450, and 520). MergeSort starts to split into two subarrays five sizes.
The elements A (1: 5) are in turn divided into three size two arrays. Then l
elements A (1: 3) are divided into two arrangements of their size two and
one. The two values A (1: 2) are divided into a single-element subarray and
melting begins. Therefore, no movement has-been performed. Pictorially
arrangement can be seen as follows:
(310 | 285 | 179 | 652351 | 423, 861, 254, 450, 520)
Vertical bars show the shoulder of the subarrays. A (1) and A (2) are fused
to produce:
(285, 310 | 179 | 652351 | 423, 861, 254, 450, 520)
Then A (3) is fused with A (1: 2) to afford
(179, 285, 310 | 652351 | 423, 861,254, 450, 520)
Elements A (4) and A (5) are fused:
(179, 285, 310 | 351652 | 423, 861, 254, 450, 520)
Following fusion of A (1: 3) and A (4: 5) have
(179, 285, 310, 351, 652 | 423, 861, 254, 450, 520)
At this point the algorithm has returned to the first invocation of MergeSort
and second recursive call is performed. Recursive calls to the right sub
produces the following arrangements:
(179, 285, 310, 351, 652 | 423 | 861 | 254 | 450, 520)
A (6) and A (7) are fused and then A (8) merges with A (6: 7) to give: (179,
285, 310, 351, 652, | 254, 423, 861 | 450, 520)
146 Parallel Programming

T (n) = 2 (T (n/2) + Cn = 2 (2T (n/4) + Cn/2) + Cn = 4T (n/4) + 2Cn = 4 (2T


(n/8) + Cn/4) +2 Cn T (n) = 8T (n/8) + Cn Cn = 8T +2 (n/8) Cn + 3.
T (n) = 2kT (1) + K Cn = an + KCN // n = 2k know that and T (1) = a
T (n) = an + Cn log n // we know that n = 2 k k = log2 n therefore.
If 2k <n ≤ n = 2k + 1, then a T (n) ≤ T (2k + 1). THUS:
T (n) = O (n log n)
This algorithm is a classic example of Paradigm “divide and conquer”
explained previously.

8.8. SEQUENTIAL SEARCH


Sequential search is to review item after item to find the information sought,
or reach the end of the dataset available.
When a search function normally is successful, interested in knowing what
position was found the item you were looking for. This concept can be
generalized to all search methods.
Then the sequential search algorithm is presented in untidy arrangements in
code C.

#include <stdio.h>
#define N 10 main () {
int x, i = 0, a [N] = {9,7,10,8,2,4,6,5,1,3};
scanf ( “% d,” & x);
while (i <N && a [i] = x!) i ++;
if (i> N – 1)
printf ( “data not found”); else
printf ( “The data is in the position [on% d \ n,” i); getchar ();
getchar ();
}

If there are two or more occurrences of the same value, it is the first one.
However, you can change the algorithm for all occurrences of data sought.
Then a variant of ESTA algorithm is presented, but using recursion, instead
of interactivity.
Recursive Programming 147

#include <stdio.h>
#define N 10
sequential void (int [], int, int, int); main () {
int x, i = 0, a [N] = {9,7,10,8,2,4,6,5,1,3};
scanf ( “% d,” & x); sequential (a, N, x, 0); getchar ();
getchar ();
}
sequential void (int a [], int n, int x, int i) {if (i> n – 1)
printf ( “Data not located \ n”);
else if (a [i] == x)
printf ( “located at position% d \ n data,” i);
else
sequential (a, n, x, i + 1);
}

The number of comparisons is one of the most important factors in


determining the complexity of sequential search methods, you should set-up
a favorable or unfavorable the most arising cases.
When searching for an item in the disordered N-dimensional array
components, it can happen that value is not found; Therefore, N will be made
comparisons to walk the arrangement. On the other hand, if the element is in
the array, it can be in the first position or last or somewhere in between. So
that the algorithm has the complexity O (n).

8.9. BINARY SEARCH (BINARY SEARCH)


The binary search involves dividing the search interval into two parts; the
searched element comparing that occupies the center position in the array.
For the case were not equal the endpoints are redefined as the main element
is greater or less than the desired element; thus reducing the search space.
The process end when the element is found, or when the search range is
canceled, is empty.

#include <stdio.h>
148 Parallel Programming

#define N 10
main () {
int x, low, high, mid, j, n, a [] = {1,2,3,5,6,7,8,9,10,13}; low = j = 0;
high = N – 1; scanf ( “% d,” & x); while (low <= high) {
mid = (low + high)/2; if (x <a [mid])
mid-high = 1; else if (x> a [mid])
low = mid + 1; else {
j = mid; break;
}
}
if (j == 0)
printf ( ““); else
printf ( “% d \ n.,” j); getchar ();
getchar ();
}

Given a function to compute n inputs, divide, and conquer strategy


suggests dividing the entry into k subsets, 1 <k ≤ nk producing subproblems.
These subproblems to be solved and then a method must be found to combine
to subsolutions into a solution as a whole. If the subproblem is still too large,
it can be reapplied then a strategy. Frequently resulting subproblems of a
divide and conquer design are the same type of the original problem. The
principle, divide and conquer, is naturally expressed by a recursive procedure.
As subproblems smallest dimension of the same class are obtained, possibly
subproblems that occur they are small enough to settle without the need to
break.
A way of seeing the overall model is:

DandC void (int a [], p, q)


Recursive Programming 149

// a [] is the array to work


// p and q are subspaces split int m, p, q;
if small (p, q) return (G ((p, q))
else {
m = divide (p, q);
return (conbinacion (DandC (a, p, m), D & C (a, m + 1, q))
}
}
In this case, the small function determines whether a specific condition
for the algorithm stops is satisfied, if not two subspaces which can subdivide
the space into two smaller are created.
f (n) it is the time spent for calculating Divide and Combine functions.
Three algorithms seen belong to this paradigm previously:
• Binary Search (Binary Search);
• Mixture (Merge Sort);
• Quick Sort (Quick Sort).

8.10. SEEKING THE MAXIMUM AND MINIMUM


The problem is to find the maximum and minimum of a set of n elements in
disorder. A direct algorithm would be:
#include <stdio.h> main () {
int max, min, i, j, a [] = {} 4,2,1,5,7,9,8,6,3,10;
max = min = a [0]; for (i = 1; i <10; i ++) {
if (a [i]> max)
max = a [i];
if (a [i] <min)
min = a [i];
}

printf ( “The maximum value is the minimum value% d% d \ n,” max, min);
getchar ();
}
150 Parallel Programming

The procedure requires 2 (n – 1) comparisons at best, average, and worst.


There may be improved by changing the cycle as follows:
if (a [i]> max)
max = a [i];
else if (a [i] <min) min = a [i];

Now the best case is the elements are increasingly when as the best
n – 1 comparisons require is in the best and the worst 2 required (n – 1)
comparisons. The average will be.
[2 (n – 1) + n – 1]/2 = 3n/2–1
Then a recursive algorithm finds the maximum and that minimum of a set of
elements and handles of divide and conquers strategy shown. This algorithm
sends four parameters; the first two are handled as step parameters by value
and last two are handled as pass by reference parameters. In this case, the
second and third parameters indicate the subset to be analyzed and the last
two parameters are used to return the minimum and maximum of a special
subset. After the recursion minimum and maximum of the given are obtained
in September. The algorithm shown in C:

#include <stdio.h>
void MaxMin (int [], int, int, int *, int *); int max (int, int);
int min (int, int); main () {
int fmax, fmin, a [] = {4,2,10,5, –7,9,80,6,3,1};
MaxMin (a, 0,9, & fmax, fmin &);
printf (“The maximum value is the minimum value% d% d \ n,” fmax, fmin);
getchar ();
}
void MaxMin (int a [], int i, j int, int * fmax, fmin int *) {int gmax, gmin,
hmax, hmin, mid;
if (i == j)
Fmin fmax * = * = a [i]; else if (i == j–1)
if (a [i] <a [j]) {
* Fmax = a [j];
Recursive Programming 151

* Fmin = a [i];
}
else {
* Fmax = a [i];
* Fmin = a [j];
else {
}
}
}

mid = (i + j)/2; MaxMin (a, i, mid, & gmax, gmin &); MaxMin (a, mid + 1,
j, & hmax, & hmin);
* Fmax = max (gmax, hmax);
* Fmin = min (gmin, hmin);
int max (int g, int h) {if (g> h)
g return; return h;
}

int min (int g, int h) {if (g <h)


g return; return h;
}

T(| _ n / 2 _ |) + T( | n / 2 |) n>2

T(n) =  1 n=2
 0 n =1

T (n) = 2T (n/2) +2 = 2 (2T (n/4) +2) +2 = 4T (n/4) + 4 + 2 = 4 (2T (n/8) +
2) + 4 + 2 T (n) = 8T (n/8) + 8 + 4 + 2 = 8 (2T (n/16) +2) + 8 + 4 + 2 = 16T
152 Parallel Programming

(n/16) + 16 + 8 + 4 + 2. T (n) = 2k – 1 T (2) = 2k + Σ2i – 1 + 2 k –2 = 3n/2–2

Indicate that does this better in practice? Not necessarily. In terms of storage
it is worse because it requires a battery to store i, j, fmax, fmin. Given n log n
+1 element require levels of recursion. Save to 5 values are required and re-
turn address. MaxMin is as inefficient as a battery and recursion is handled.

8.11. GREEDY METHOD


Most of these problems have n inputs and requires having a subset that
satisfies certain constraints. Any subset that satisfies these constraints is
called feasible solution. The problem requires finding a feasible solution that
minimizes or maximizes an objective function given. There is an obvious
way to find a feasible point, but not necessarily optimal.
The greedy method suggests that one can make an algorithm that works
in steps, an entry considering at a time. At each step, a decision is made. If
the inclusion of the next entry gives a solution not feasible, the input is added
to the partial solution. Soon the greedy strategy is shown in pseudocode:
Greedy procedure.
// A (1: n) contains n entries. ← φ solution
for i ← to n do
x ← select (A)
if feasible (solution, x) then
← UNION solution (solution, x)
endif
repeat
return (solution) Greedy end.

8.12. OPTIMAL STORAGE ON TAPE (OPTIMAL


STORAGE ON TAPES)
There are n files are stored on a tape size L. Associated with each file i is
a Li, 1 ≤ i ≤ n size. All files can be saved on tape if the sum of file sizes is
maximum L. If the files are stored in order, I = i1, i1, i2, i3.
In, and time required to store or retrieve the file ij is Tj = Σ1 ≤ k ≤ Li j, k. If
any file is read with the same frequency, then at the half-Reference (TRM)
is (1/n) ≤ j ≤ n tj Σ1. In the problem of optimal storage, it requires finding
Recursive Programming 153

a permutation to N, such that the TRM is minimized. Minimize TRM is


equivalent to minimizing D (I) = Σ1 ≤ k ≤ j ≤ k ≤ j Σ1 Ii, k.

Example:
N = 3 (I one, I2, I3) = (5, 10, 3).
There are possible orderings.
1, 2,3 5 + 5 + 10 + 5 + 10 + 3 =38
1,3, 2 5 + 5 + 3 + 5 + 3 + 10 =31
2,1,3 10 + 10 + 5 + 10 + 5 + 3 = 43
2,3,1 10 + 10 + 3 + 10 + 3 + 5 = 41
3,1, 2 3 + 3 + 5 + 3 + 5 + 10 = 29
3, 2,1 3 + 3 + 10 + 3 + 10 + 5 =34

The optimal order is (3,1,2).


In this algorithm, the greedy method requires files to be stored in increasingly
this ordering can be performed by a sorting algorithm as Merge Sort, so it
requires O (n log n).

8.13. THE KNAPSACK PROBLEM


There are n objects and a backpack. The object i has a weight Wi and the
backpack has a capacity ≤ M. If Xi ≤ 0, then a fraction object i is input to
gain Xi Pi will. The goal is to fill the bag so as to maximize the gain.
The problem is:
Max ∑1 < i < n Pi X i (1)
S.A.

∑1 < i < n W Xi i ≤M (2)

0 ≤ Xi ≤ 1 1 ≤ i ≤ n (3)
One possible solution is any set (X1, X2,…, Xn) satisfying (2) and (3). An
optimal solution is a feasible solution which maximizes the gain (1) (Table
8.3).
Example: n = 3, M = 20, (P1, P2, P3) = (25, 24, 15) (W1, W2, W3) = (18,
15, 10)
Four possible solutions are:
154 Parallel Programming

∑WX i i
∑PX i i

(1/2,1/3,1/4) 16.5 24.25


(1,2/15,0) 20 28.2

(0,2/3,1) 20 31

(0,1,1/2) 20 31.5

PROCEDURE TREE (L, n)


for i ← 1 to n – 1 do
Call GETNODE (T) lchild (T) ← LEAST (L) RCHILD (T) ← LEAST (L)
WEIGHT (G) ← WEIGHT (lchild (T)) + WEIGHT (RCHILD (T)) CALL
INSERT (L, T)
repeat return (LEAST (L)) end TREE
Recursive Programming 155

Table 8.3: The Knapsack Problem

1 2 3 4 5 6 NEAR
1 ∞ 10 ∞ 30 45 ∞ 0
2 10 ∞ 50 ∞ 40 25 0
3 ∞ 50 ∞ ∞ 35 15 2, 6
4 30 ∞ ∞ ∞ ∞ 20 1, 6, 0
5 45 40 35 ∞ ∞ 50 2, 3, 0
6 ∞ 25 15 20 55 ∞ 2, 0
156 Parallel Programming

8.14. SINGLE SOURCE SHORTEST PATHS (SHORT-


EST ROUTE FROM AN ORIGIN)
Graphs can be used to represent roads, structures of a state or country
with vertices representing cities and segments joining the vertices as the
road. Segments can be assigned weights that make a distance between two
connected cities.
The distance of a path is defined by the sum of the weight of the segments.
The start vertex is defined as the origin, and the last vertex will be defined as
the destination. The problem to be considered is based on a directed graph G
= (V, E), a weight function c (e) for segments G and source vertex v0. The
problem is to determine the shortest path from v0 to all other vertices of G.
It is assumed that all weights are positive.
Example: Consider the directed graph in the figure below. The number
of trips is also the number of pesos. If v0 is the apex of origin, then the
shortest path from v0 to v1 v3 v2 v1 v0 it is. The trip distance is 10 + 15 +
20 = 45. In this case, go three roads it is cheaper to go directly v0 v1, which
has a cost of 50. There is no path v0 to v1 (Figure 8.5).

Figure 8.5: An example for the shortest path algorithm.


To formulate a greedy algorithm to generate the shortest path, we must
devise a multi-stage solution. One possibility is to construct the shortest path
one by one. As a measure of optimization can use the sum of all paths so far
generated. In order that the measure is minimal, each individual route should
be minimum size. If you have built i minimum distances, then the next path
to be built should be the next path with minimum distance. The greedy path
to generate the short distances from the remaining vertices V0 is generating
paths in increasing order. First, the shortest path to the nearest vertex is
Recursive Programming 157

generated. Then the shortest vertex to the nearest second path is generated
and so on. For graphical example, the closest path to V0 is V2 (c (V0, V2)
= 10). So the path V0 V2 will be the first trip generated. The second closest
route is V0 V3 with a distance of 25. The journey V0 V2 V3 generated path
is as follows. To generate the following routes must determine i) the next
vertex with which must generate a shorter path and ii) a shorter way this
vertex. Let S be the set of vertices (Including V0) in which the shortest path
has been generated. For w not in S, is DIST (w) the distance of the shortest
path from V0 going only through this
s vertices at S and ending at w. It is observed that:
I. If the next shortest path is the vertex u, then the path begins at V0,
Uy ends goes through the vertices located in S.
II. The next destination path must be generated that vertex or such
that the minimum distance DIST (u), overall vertices not in S.
III. Having selected a vertex u in II and generated the shortest
path V0 au, the apex or becomes member S. At this point the
dimension of the shortest route starting at V0. Shall be located
on the vertices S and ending at S w cannot decrease. That is,
the distance value DIST (w) can change. If it changes, then it is
because there is a shorter route starting in V0 subsequently wills
then u and w. Intermediate vertices V0 AUY of waw must all be
in S. In addition, the way V0 au must be the shortest; otherwise
DIST (w) is not properly defined. Also, the path cannot be uaw
intermediate vertices.
Observations indicated above form a simple algorithm. (The algorithm was
developed by Dijkstra). In fact, only it determines the magnitude of the
trajectory of the vertex V0 to all vertices in G.
It is assumed that G n vertices are numbered from 1 to n. The set S is kept
in an array with S (i) = 0 if vertex i is not in S and S (i) = 1 if part of S. It is
assumed that the graph is represented by a matrix of costs.
Pseudocode algorithm is as follows: Procedure shortest PATHS (v, COST,
DIST, n)
// DIST (j) is the set of lengths of the shortest route from the vertex v to
// vertex j in the graph G with n vertices.
// G is represented by the cost matrix COST (n, m) Boolean S (1: n); Real
COST (1: n, 1: n), DIST (1: n)
Integer u, v, n, num, i, w i ← for 1 to n do
158 Parallel Programming

S (i) ← 0; DIST (i) ← COST (v, i)


Repeat
S (v) ← 1 DIST (v) ← 0 // place the vertex v in S.
For num ← 2 to n – 1 do // determine n – 1 paths from the vertex v.
DIST choose u such that (u) = min {DIST (w)}
S (w) = 0
S (u) ← 1 // Place vertex u for all in S w S (w) = 0 do
DIST (w) ← min (DIST (w), DIST (u) + COST (u, w))
Repeat
Repeat
End shortest path.
The time it takes the algorithm with n vertices is O (n2). This is easily
seen since the algorithm contains two nested for.
Example (Figure 8.6):

Figure 8.6: An example for the shortest time.


The cost matrix is given in Table 8.4.

Table 8.4: The Cost Matrix

1 2 3 4 5 6 7 8
1 0
2 300 0
3 1000 800 0
4 1200 0
Recursive Programming 159

5 1500 0 250
6 1000 0 900 1400
7 0 1000
8 0

If v = 5, indicates that the minimum cost path searches all nodes from the
node
5. Therefore, the run is:
It will be noted that this algorithm has a complexity of O (n2) where n is the
number of nodes. It is a nested list as follows:

Figure 8.7: The nested complexity depiction.


Chapter 9

Dynamic Programming

CONTENTS
9.1. Optimality Principle ....................................................................... 162
9.2. Multistage Graphs (Multistage Graphs) ........................................... 163
9.3. Traveling Salesman Problem (TSP) ................................................... 166
9.4. Ix Return On The Same Route (Backtracking) .................................. 170
9.5. The Eight Queens Puzzle (8-Queens) .............................................. 171
9.6. Hamiltonian Cycles (Hamiltonian Path) .......................................... 177
162 Parallel Programming

9.1. OPTIMALITY PRINCIPLE


When we talk about optimizing, we mean to find some of the best solutions
among many possible alternatives. The optimization process can be
viewed as a sequence of decisions that give us the right solution. If given
a subsequence of decisions, always knows what is the decision to be made
below for the optimal sequence, the problem is elementary and trivially
solved by one decision after another, what is known as greedy strategy.
In other cases, although it is not possible to apply the greedy strategy, the
principle of optimality Bellman statement in 1957 is met, and that dictates
that “given an optimal sequence of decisions, all subsequence of it is, in
turn, optimal.’ In this case, it is still possible to go take basic decisions, in
the confidence that the combination of them will remain optimal, but will
then need to explore many sequences of decisions to make with the right, it
is here that intervenes dynamic programming. Although this principle seems
obvious is not always applicable, and therefore it is necessary to verify
compliance to the problem in question.
Contemplating a problem as a sequence of decisions is equivalent to
divide it into smaller problems and therefore easier to solve as we do in
divide and conquer by, similar to the dynamic programming technique.
Dynamic programming is applied when the subdivision of a problem leads
to:
• A huge amount of problems;
• Problems whose solutions partial overlap;
• Groups of very different complexity problems.
So that a problem can be addressed by this technique must meet two
conditions:
• The solution to the problem must be achieved through a sequence
of decisions, one in each stage.
• Said sequence of decisions must comply with the principle of
optimality.
Overall, the design of a dynamic programming algorithm consists of the
following steps:
1. Approach to the solution as a succession of decisions and
verification that it meets the principle of optimality.
2. Recursive definition of the solution.
3. Calculation of the optimal solution by a table where partial
Dynamic Programming 163

solutions to problems stored for reuse calculations.


4. Construction of the optimal solution by using the information
from Table 8.3.

9.2. MULTISTAGE GRAPHS (MULTISTAGE GRAPHS)


It is a directed graph in which the vertices are partitioned into disjoint sets
K ≥2 Vi, 1 ≤ i ≤ k. In addition, if <u, v> is an edge in E then u Є and vЄVi
Vi + 1 for some i, 1 ≤ i <k. The V1 and Vk conjuto are such that | V1 | = | vk
| = 1. Sea syt respectively the vertex in V1 and Vk. s is the source yt is the
goal to reach. Let c (i, j) the cost of the edge <i, j>. The cost path sat starting
in state 1; step 2 is then to stage 3, stage 4, etc. And eventually, it ends in
step k. Figure 9.1 shows a graph of five stages. The minimum cost path sat
shown in bold.

Figure 9.1: Graph 5 stages.


Several problems can be formulated as multistage problems. For
example, consider a resource allocation problem in which n resource units
will be allocated in projects. If j, 0 <= j <= n resource units are assigned to
project i then the resulting net benefit is N (i, j). The problem is to allocate
the resource to r project, so that maximize net profit.
This problem can be formulated as a graph of r + 1 stages as follows.
Step i, 1 <= i <= r i represents the project. There’s n + 1 vertices V (i, j), 0
<= j <= n associated with step i, 2 <= i <= r. Stages 1 and r + 1 each have
a vertex V (1,0) = s V (r + 1, n) = t respectively. Vertex V (i, j), 2 <= i <=
r represents the state in which a total of j resource units are assigned to
projects 1,2…i–1. The edges in G are of the form <V (i, j, V (i + 1, L)>, for
164 Parallel Programming

all j <= L and 1 <= r. The edge <(V (i, j, Vi + 1, L)>, j <= L is assigned with
a weight or cost of N (i, Lj) and corresponds to the allocation of Lj resource
units to project i, 1 < = i <r, in addition, G has edges type <V (r, j, V (r + 1,
n)>. each of these edges is assigned a weight of max0 <= p <= nj {N ( r, p)}.
The resulting graph with a three-problems with n = 4 is shown in Figure 9.2.
It should be easy to see that an optimal allocation of resources is defined by
a maximum cost path to t.

Figure 9.2: Corresponding graph 3 project stages problems.


A dynamic programming formulation for a graph problem k stages is
first obtained, realizing that all roads SAT is the result of a sequence of k – 2
decisions. The ith decision is to determine which vertex Vi + 1, 1 <= i <=
k – 2 will be in the path. It is easy to see that the optimization principle is
fulfilled. Let P (i, j) the cost of this trip. Then using the forward approach,
you are obtained:
COST (i, j) = min {c (j, L) + COST (i + 1, L)}
L € Vi + 1
(J, L) € E
where E is the set of edges in the graph. The cost (k – 1, j) = c (j, t) if (j, t) €
E and Cost (k – 1, j) = ∞ if (j, t) ¢ E. Solving the graph of 5 stages shown in
Figure 9.1, the following values are obtained:
Cost (3,6) = min {6 + Cost (4,9) + Cost 5 (4.10)} = min {6 + 4, 5 + 2 =
7} Cost (3,7) = min {4 + Cost (4,9) + Cost 3 (4.10)} = min {4 + 4, 3 + 2}
= 5 COST (3.8) = min {5 + COST (4.10) 6 + COST (4,11)} = min {5 + 2,
6 + 5} = 7
Dynamic Programming 165

Cost (2,2) = min {4 + Cost (3,6), 2 + Cost (3,7), 1+ COST (3,8)} = 7 COST
(2,3) = 9
Cost (2,4) = 18 COST (2.5) = 15

Cost (1,1) = min {9 + Cost (2,2) + Cost 7 (2.3) 3 + Cost (2,4), 2 + Cost (2,5)}
= 16.
So the least cost path sat minimum is 16. This path can be easily determined
if we record the decision made at each stage. Let D (i, j) the value of L which
minimizes c (j, L) + COST (i + 1, L), for the printing of five stages has:
D (3.6) = 10; D (3.7) = 10; D (3.8) = 10;
D (2,2) = 7; D (2,3) = 6; D (2.4) = 8; D (2.5) = 8; D (1.1) = 2
So the minimum cost path can be s = 1, v2, v3, v4. vk–1, t. Is easy to see
that v2 = D (1,1) = 2; v3 = D (2, D (1,1)) = 7 and v4 = D3, D (2, D (1,1)))
= D (3,7) = 10.

Before writing an algorithm to solve a graph of k stages, an order in the


vertices in V. This order will be easy to write the algorithm will be imposed.
Will be required in V n vertices are indexed from 1 to n. The indices will be
assigned in order to the stages. S first will be assigned to index 1, the vertices
in V2 are assigned in the index, so on. t has the index n. So the indexes Vi
+ 1 are greater than those assigned to the vertices in Vi. As a result of this
indexing scheme, COST, and D can be calculated in the order n–1, n–2,…,
1. The first subscript COST, P, and D only identify the stage number and is
omitted from the algorithm. The algorithm is:

FGRAPH Procedure (E, k, n, P)


// k Number of stages in the graph.
// E a set of edges
// c (i, j) is the cost of <i, j>.
// P (1: K) is the path of minimal cost.
Real COST (n), integer (D (n–1), P (k), r, j, k, n COST (n) = 0;
For j = n–1–1 1 by calculating // do COST (j)
R is a vertex such that <j, r> € E c (j, r) + COST (r) is minimal. COST (j) =
c (j, r) + Cost (r)
166 Parallel Programming

D (j) = r
repeat
// seeks the path of least cost. P (1) = 1; P (k) = n
For j = 2 to k – 1 do
P (j) = D (P (j–1));
repeat
end FGRAPH
It will be noted that there are two control operators for non-nested, so
that the time required will Ө (n).

9.3. TRAVELING SALESMAN PROBLEM (TSP)


This is a problem of permutations. Permutation problem usually be much
harder to solve problems in which subsets are chosen because there are
n! permutations of n objects while there are only 2n different subsets of n
objects (n!> O (2n)) (Table 9.1).

Table 9.1: Traveling Salesman Problem Data

n 2n n!
1 2 1
2 4 2
4 16 24
8 256 40,320
Let G = (V, E) A directed graph with cost of each edge ij, ij is defined
such that cij > 0 for all j = ∞ ci iyjy if <i, j> ¢ E. Sea | V | = Ny assume that n
> 1. A tour G is a directed cycle that includes all vertices in V. The cost of the
tour is the sum of costs of edges in the tour. The traveling salesman problem
(TSP) is to find a tour that minimizes costs.
An application of the problem is this: Suppose you have a route of a
postal truck that collects mail from mailboxes located in n different sites.
A graph of n + 1 vertices can be used to represent this situation. A vertex
represents the post office where postal truck begins its journey and which
must return. The edge <i, j> is assigned a cost equal to the distance from
the site i to site j. The route taken by the postal van is a tour and what is
expected to minimize the path of the truck. In the following discussion will
be discussed on a journey that starts at the apex 1 and ends at the same
vertex, being scoured the minimum cost. All turns consists of an edge <1, k>
Dynamic Programming 167

for some k є V – {1} and a path from vertex to vertex k 1. k path from vertex
to vertex 1 goes through each vertex in V – {1, k}. Hence the optimization
principle is maintained. Let g (i, S) the length of shortest path starting at
vertex i, going through all vertices in S and ending at vertex 1. g (1, V – {1})
is the length of a optima tour of a traveling salesman. Since the beginning of
optimality it follows that:
g (1, V – {1}) = min {ci k + g (k, V – {1, k})} 1
2 <= k <= n
Generalizing is obtained (for i ¢ S):
g (i, S) = min {ci j + g (j, S – {j})} 2
jєS
Can be solved g (1, V – {1}) if we know g (k, V – {1} k) for all k options. G
values can be obtained using (2). Clearly, g (i, ф) = Ci, 1, 1 <= i <= n. From
here we can use (2) to obtain g (i, S) for all size S 1, then one can obtain g (i,
S) to S with | S | = 2, etc. When | S | <N – 1, i, and S values for which they
need g (i, S) are such that i ≠ 1; 1 ¢ S ¢ ei S.

The algorithm is programmed in C:


#include <stdio.h>
#include <stdlib.h>

void generate (int **, int);


int tsp (** int, int *, int, int, int, int);
main () {
int ** cost, * m, d, o, v, i, j, r;
printf ( “TSP \ n”); printf ( “\ nPlease enter number of vertices:”);
scanf ( “% d”, & v);
cost = (int **) malloc (sizeof (int *) * v); for (i = 0; i <v; i ++)
cost [i] = (int *) malloc (sizeof (int) * v); m = (int *) malloc (sizeof (int) *
v); for (i = 0; i <v; i ++)
m [i] = 0;
for (i = 0; i <v; i ++) for (j = 0; j <v; j ++) {
if (i == j)
168 Parallel Programming

cost [i] [j] = 0;


else
cost [i] [j] = 9999;
}
generating (cost, v); printf ( “\ nPlease enter origin”); scanf ( “% d”, & r);
printf ( “The generated matrix is: \ n”); for (i = 0; i <v; i ++) {
for (j = 0; j <v; j ++) {printf ( “% 6d” cost [i] [j]); } Printf ( “\ n”); }
printf ( “\ n \ nCosto minimum from% d:% d \ n”, r, tsp (cost, m, v, r-1, v,
r-1)); system ( “PAUSE”);
}

generate void (int ** cost, int size) {int i, nv = 0, p, d;


printf ( “To finish entering a vertex introduced 99 \ n”); while (nv <tam) {
printf ( “% d Vertex ... \ n”, nv + 1); scanf ( “% d”, & d);
if (d == 99) {
printf ( “Term vertice% d \ n”, nv + 1); nv ++;}
else if (d> tam)
printf ( “The vertex does not exist \ n”); else {
printf ( “Vertex intorduce weight”); scanf ( “% d”, & w);
cost [nv] [ad-1] = p;
}
}
}

int tsp (int ** cost, int * m, int d, int or v int, int r) {if (d == 1)
return cost [or] [r]; int dist, dmin = 999; m [o] = 1;
for (int i = 0; i <v; i ++) if (m [i] == 0) {
dist = cost [or] [i] + tsp (cost, m, d-1, i, v, r); m [i] = 0;
if (dist <dmin) {dmin = dist;
}
}
Dynamic Programming 169

return dmin;
}

Example. Consider the following chart where the size of the edges are given
in matrix c:

Figure 8.3 directed graph whose length of each edge is located in the matrix
C. g (2, ф) = c2,1 = 5, g (3, ф) = c; 3.1 = 6, g (4, ф) = c; 4.1 = 8;
Using a (2) is obtained:
g (2, {3}) = g + C2,3 (3, ф) = 15 g (2, {4}) = c 243 + G (4, ф) = 18; g (3,
{2}) = 18 g (3, {4}) = twenty;
g (4, {2}) = 13 g (4, {3}) = fifteen;
Now, we calculate g (i, S) with | S | = 2, i ≠ 1, 1 ¢ S ¢ ei S.
g (2, {3,4}) = min {c23 + g (3, {4}), c24 + g (4, {3})} = 25
g (3, {2,4}) = min {c32 + g (2, {4}), c34 + g (4, {2})} = 25
g (4, {2,3}) = min {c42 + g (2, {3}), c43 + g (3, {2})} = 23
Finally, from (1) we obtain:
g (1, {2,3,4}) = min {c12 + g (2, {3, 4}), c13 + g (3, {2,4}), c14 + g (4, {2,
3 })}
170 Parallel Programming

= Min {35, 40, 43}


= 35.
The recursive tree shown in Figure 9.3.

Figure 9.3: Recursive tree traveling salesman.


Optimal tour of the graph in the figure has a tour of this length can be
constructed if retained each g (i, S) the value of j that minimizes the right
side of (2) a length of 35. Is J (i, S) this value. Then J (1, {2, 3, 4}) = 2.
Thus the tour begins from 1 to 2. The next point to visit is obtained from g
(2, {3,4}), J (2, {3, 4} = 4, so the next edge is < 2. 4>. what remains of the
tour is g (4, {3}), J (4, {3}) = 3. the optimum path is 1, 2, 4, 3, 1. Let N be
the number g (i, S) which must be calculated before that g (1, V – {1}) to be
calculated for each value. | S | there are n – 1 options for i the number of sets.
n–2
An algorithm proceeds to find an optimal route using (1) and (2) require
θ (n2 2n) times to calculate g (i, S) with | S | K = k requires – 1 comparisons
to solve (2). This is better than the enumeration of all n! Different routes to
find the best route. The most serious drawback of this solution with dynamic
programming is the space required. The space required is O (n2n). This is
too large even for modest values of n.

9.4. IX RETURN ON THE SAME ROUTE (BACK-


TRACKING)
In pursuit of the fundamental principles of design algorithms, backtracking
is one of the more general techniques. Several problems trying to search for
a set of solutions or seek an optimal solution satisfying some constraints can
be solved using backtracking.
To use the backtracking method, the problem must be expressed
in n-tuples (X1, x2, x3,…, Xn), where xi is chosen from a finite set Si.
Sometimes the problem is to maximize or minimize a function P (x1, x2,
Dynamic Programming 171

x3…, Xn). Sometimes such searches all the vectors that satisfy P. For
example, order the integers located in A (1: n) is a problem whose solution
is expressed by n tuples where xi is the index of where the i- is located the
smallest element. The function P is the inequality A (xi) <= A (xi + 1) for 1
<= i<N. Management numbers is not itself an example of backtracking, is
just one example of a problem whose solution can be formulated by means
of n tuples. In this section, some algorithms which solution is best done by
backtracking are studied.
Suppose my size is set Si. Then there are m = m1, m2…, mn tuples
whose potential candidates can meet P. function approach “brute force”
would form n-tuples all and evaluate each one with P. saving those who
produce optimal. Backtracking virtue is the ability to produce the same
response with fewer steps. Its basic idea is to construct a vector and evaluate
the function P (x1, x2, x3,…, Xn) to test whether the newly formed vector
has a chance to be the optimum. The great advantage of this method is: if
it appears that the partial vector (x1, x2, x3, xn.) Has the ability to lead to a
possible optimal solution, then m1 + 1. mn can be completely ignored.
Several problems are solved using backtracking require that all solutions
meet a complex set of constraints. These restrictions can be divided into
two categories: explicit and implicit. Explicit restrictions are rules whose
restrictions on each xi take values for a given set. An example of explicit
restrictions is:
xi> = 0oS i = {All real numbers are nonnegative} xi = 0 or 1os
i = {0, 1}
li<= Xi <= Ui you i = {A: li <= A <= ui}
Explicit restrictions may or may not depend on a particular instance I of the
problem to be solved. All tuples that satisfy explicit constraints define a pos-
sible solution space I. The implicit constraints determine which of the tuples
in a space solution I actually satisfy the criterion. Thus, implicit restrictions
describe the way in which xi must relate to one another.

9.5. THE EIGHT QUEENS PUZZLE (8-QUEENS)


A classic combinatorial problem is to place eight queens on a chess board
so that they cannot attack including this is that no two queens in the same
row, column or diagonal. Enumerating the rows and columns of board 1 to 8.
Queens can also be listed from 1 to 8. Since each queen must be on a different
row, it can be assumed that the queen i is placed on row i. Any solution can
be represented as 8 tuples (x1, x2, x3,…, X8) where xi is the column where
172 Parallel Programming

the queen i will be placed restriction explicitly using this formulation will
be S = {1, 2, 3,…, 8}, 1≤ i ≤n. By which solution space 88 consists of tuples
8. One of the implicit restrictions of the problem is that two X’s must not be
the same (i.e., all queen must be in different column) and either on the same
diagonal. The first constraint indicates that all solutions are permutations of
8 tuples (1, 2,…, 8). This reduces the solution space of 88 tuples to 8! Tuples
(Figure 9.4).

Figure 9.4: Positions that can attack the queen.

9.5.1. Problem Statement


As each queen can threaten all queens are in the same row, each must be
placed in a different row. We can represent the 8 queens by a vector [1–8],
given that each index represents a row vector and a column value. Thus each
queen would be in the (i, v [i]) for i = 1–8 position (Figure 9.5).

Figure 9.5: Example two threatened queens on the board 4 by 4.


Dynamic Programming 173

The vector (3,1,6,2,8,6,4,7) means queen 1 is in column 3, row1; Queen


2 in column 1, row 2; Queen 3 in column 6, line 3; Queen 4 in column 2,
row 4; etc. As shown this solution is incorrect since they would Queen 3 and
6 in the same column. Therefore the vector corresponds to a permutation of
the first eight integer numbers.
The problem of rows and columns we have covered, but what about
the diagonals? For positions on the same diagonal downward is met with
the same value row – column, while for the same positions in diagonal
ascending complies with the same row + column value. Thus, if we have
two queens placed in positions (i, j) and (k, l) are then in the same diagonal
If and only if true:
i – j = k – j = k + loi + lj – l = i – koj – l = k – i
Taking all considerations into account, we can apply the backtracking
scheme to implement the eight queens in a really efficient manner. To do
this, we reformulate the problem as a search problem in a tree. We say that
a vector V1. k integers between 1 and k–8 is promising, for 0≤ k ≤ 8 if none
of the k queens placed at positions (1, V1), (2, V2), (3 V3) threatens any of
the others. Solutions to our problem correspond with those vectors that are
8-promising.
iєA
Establishment Algorithm: Let N be the set of vectors of k-promising,
0≤ k ≤8, let G = (N, A) graph directed such that (U, V) є A if and only if there
is an integer k, with k ≤8 such that 0≤
• OR is k-promising;
• V is (k + 1)-promising;
• ORi = Vi for all i є {1., K}.
This graph is a tree. Its root is the corresponding empty vector ak = 0.
Leaves are either solutions (k = 8) or end position (k <8). Solutions eight
queens puzzle can be obtained exploring this tree. However we not explicitly
generate the tree to explore it later. The nodes are generated and leaving
during the scan using a depth-first traversal (Figure 9.6).
174 Parallel Programming

Figure 9.6: Scheme reduced tree solutions.


Must decide whether a vector is k-promising, knowing that it is an
extension of a vector (k – 1)-promising only need to check the last queen
to be added. This can be accelerated if we associate with each promising
node the set of columns, the positive diagonal (45 degrees) and negative
diagonals (to 135 degrees) controlled by the queens already placed.

9.5.2. Algorithm Description


Then the algorithm yields the solution to our problem is shown, in which S 1.
8 is a global vector. To print all solutions, the initial call is queens (0,0,0,0).

Procedure queens (k, cabbage, diag45, diag135)


//Sunone. k k is promising
// col = {soli | 1 ≤ i ≤ k}
// diag45 = {soli – i + 1 | 1 ≤ i ≤ k}
// diag135 = {soli + i – 1 | 1 ≤ i ≤ k}
if k = 8 then –8 promising // Writing a vector is a sol solution
// If you do not explore the extensions (k + 1) promising sun j ← 1 to 8 do
If j ¢ al yj – yj + k ¢ k ¢ diag45 diag135 then Solk ← j + 1 // g1., K + 1 is (k
+ 1) promising
Queens (k + 1, j} {col U, U diag45 {j – k}, {diag135 U j + k})

The algorithm check first if k = 8, if this is true is that we have before


us a promising –8 vector indicating that meets all constraints resulting
in a solution. If k is other than 8, the algorithm explores extensions (k +
1)-promising as, for it makes a loop, which is 1 to 8, due to the number
of queens. In this loop checks whether enter check queens placed on the
Dynamic Programming 175

board, if not fall in check, a recurrence in which we increased k (seek (k +


1)-promising) and add the new row, column, and diagonal is performed the
set of constraints. When performing recurrence we added to a new queen
sun vector which does not fall in check with any of the above, we have also
increased the constraint set by adding a new row,
An example tree in depth can easily be seen in Figure 9.7 with an example
of the 4 queens:

Figure 9.7: A decision tree for the program 4 queens.


The implementation of the N queens program in C is shown below:

#include <stdio.h>
#include <stdlib.h>
void mark (int **, int, int, int); empty void (int **, int);
void solution (int **, int);
void dam (int ** ** int, int, int, int, int); void return (int ** ** int, int, int,
int); cont int = 0;
main () {
system ( “color 2f”);
int ** matrix board **, queens, row 0, column = 0; printf ( “Enter the number
of queens”); scanf ( “% d”, & queens);
matrix = (int **) malloc (sizeof (int *) * queens); board = (int **) malloc
(sizeof (int *) * queens); for (int i = 0; i <queens; i ++) {
matrix [i] = (int *) malloc (sizeof (int) * queens); panel [i] = (int *) malloc
(sizeof (int) * queens);} empty (matrix queens);
empty (board, queens); for (int i = 0; i <queens; i ++)
dam (matrix board, i, 0,1, queens); if (cont == 0)
printf ( “No solutions to the problem with queens% d \ n”, queens); system
176 Parallel Programming

( “PAUSE”);
}

void dam (int ** array, int ** board, int row, int column, int queens, int R)
matrix {[row] [col] = 1;
board [row] [col] = 1; mark (matrix, R, row, column); if (queens == R)
solution (board, queens);
else {
for (int j = 0; j <R; j ++) if (matrix [j] [column + 1] == 0) dam (matrix board,
j, column + 1, queens + 1, R);} return ( matrix board, row, column, R);
}

void return (int ** array, int ** board, int row, int column, int R) {board
[row] [col] = 0;
for (int i = 0; i <R; i ++) {for (int j = 0; j <R; j ++) {
matrix [i] [j] = 0;
}
}

for (int i = 0; i <R; i ++) {for (int j = 0; j <R; j ++) {


if (board [i] [j] == 1) mark (matrix, R, i, j);
}
}
}

void solution (int ** vect, int queens) {printf ( “Solution% d \ n” ++ cont);


for (int i = 0; i <queens; i ++) {
for (int j = 0; j <queens; j ++) {
printf ( “% d”, vect [i] [j]); } Printf ( “\ n”);} printf ( “\ n \ n”);
}

empty void (int ** vect, int queens) {for (int i = 0; i <queens; i ++) for (int j
Dynamic Programming 177

= 0; j <queens; j ++) vect [i] [j] = 0;


}

void mark (int ** array, int queens falfil int, int calfil) {for (int row = 0; row
<queens; row ++)
for (int column = 0; column <queens; column ++) if ((Row + == column
falfil + calfil) || (row-column == falfil-calfil))
matrix [row] [col] = 1; for (int row = 0; row <queens; row ++) {
matrix [falfil] [row] = 1; matrix [row] [calfil] = 1;
}
}

9.6. HAMILTONIAN CYCLES (HAMILTONIAN PATH)


In the countryside mathematical of the Graphic Schema Theory, a
Hamiltonian path in a graph is a path a succession of adjacent edges, visiting
all vertices the graph once. If you also visited the last vertex is adjacent to
the first, the path is a Hamiltonian cycle.
Hamiltonian paths and cycles were named after William Rowan
Hamilton, Inventor of the game of Hamilton, threw a toy that involved
finding a Hamiltonian cycle on the edges of a graph of a dodecahedron.
Hamilton solved this problem by using quaternions, but this solution does
not generalize to all graphs.

9.6.1. Definition
A Hamiltonian path is a path passing through each vertex exactly once.
A graph that contains a Hamiltonian path is called a Hamiltonian circuit
Hamiltonian cycle or if it is a cycle passing through each vertex exactly once
(except the apex of that part and to which arrives). A graph that contains a
Hamiltonian cycle is said Hamiltonian graph (Figure 9.8).
178 Parallel Programming

Figure 9.8: Example of a Hamiltonian cycle.


The vector backtracking solution (x1, x2, x3., Xn) is defined such that xi
represents the ith vertex visited the proposed cycle. Now all you have to do
is determine how to calculate the possible set of vertices to xk if x1. xk–1
have already been chosen. If k = 1 then X (1) can be any of the n vertices.
To avoid printing the same cycle times required n X (1) = 1. If 1 <k <n then
X (k) can be any vertex v which is other than X (1), X (2)., X (k–1) v is
connected by an edge to X (k–1). X (n) can be only one remaining vertex
and must be connected to both X (n–1) and X (1).

Next Value Procedure (K)


// X (1). X (k–1) it is a path k–1 vertices. If (X (k) = 0 then not assigned
// vertex to X (K). After execution of X (k). After execution of X (k) // is
assigned to the next higher numbered vertex which (i) still does not appear
in X (1). //, X (k- 1). Otherwise X (k) = 0. If k = n then in addition X (k) is
connected to X (1).
Global integer n, X (1: n), Boolean GRAPH (1: n, 1: n) Integer k, j
Loop
X (k) ← (X (k) +1) mod (n + 1) // the next vertex. if X (k) = 0 then return
end if
if GRAPH (X (k–1), X (k)) then // there is an edge?
For j ← 1 to k–1 // check distinction do If (X (j) = X (k) then
Exit
end if
Repeat
If j = k then // if true then the vertex is different.
If k <n or (k = n and GRAPH (X (n) 1)) then return End if
end if
Dynamic Programming 179

end if
Repeat
End next Value

Using the procedure, next value can particularize the recursive scheme
backtracking to find all Hamiltonian cycles.

Procedure Hamiltonian (K)


// This procedure uses a recursive formulation backtracking to find
// all Hamiltonian cycles of the graph. The graph guardacomo one matrix
// adjacent in GRAPH (1: n, 1: n). Whole cycle starts at the apex 1. Overall
integer X (1: n)
Local integer k, n
// loop generates values for X (k) call nextValue (K)
if X (k) = 0 then return endif if k = n then
print (X, ‘1’)
else call Hamiltonian (k + 1) endif
repeat
end Hamiltonian

This procedure first initializes the adjacent matrix GRAPH (1: n, 1: n), then
set X (2: n) ← 0, X (1) ← 1 and executes the call to Hamiltonian (2).
The traveling salesman problem (TSP) is a Hamiltonian cycle with the dif-
ference that each edge has a different cost.
Chapter 10

Branch and Bound

CONTENTS
10.1. General Description ..................................................................... 182
10.2. Poda Strategies.............................................................................. 184
10.3. Branching Strategies...................................................................... 184
10.4. The Traveling Salesman Problem (TSP) .......................................... 187
182 Parallel Programming

The Branch and Bound design method algorithm is a variant of backtracking


substantially improved. The term applies mainly to solve optimization issues
or problems.
The branch and bound technique is usually interpreted as a solution
tree where each branch leads to a possible future solution to the current.
The feature of this technique with respect to other previous (and which is
named) it is that the algorithm is responsible for detecting in which branch
the solutions given are no longer being optimal for “Prune” the tree branch
and not continue wasting resources and processes in cases away from the
optimal solution.

10.1. GENERAL DESCRIPTION


Our goal is to find the minimum value of a function f (x) (one example is
the manufacturing cost of a particular product) where x ranges look on a
given set S of possible solutions. A method of branch and bound requires
two tools.
The first is an expansion process, that given a fixed set S of candidates,
returns two or more sets smaller S1, S2…, Sn whose union covers S. Note
that the minimum of f (x) on S is min {V1, V2,...} where each vi is the
minimum of f (x) without Si. This step is called branching; their application
is recursive; this will define a structure tree whose nodes are subsets of S.
The key idea of branching and pruning algorithm is: if the smallest
branch to a node tree (set of candidates) A is greater than the parent branch
to another node B, then A must be discarded safely search. This step is called
pruning and is usually implemented by maintaining a variable Global m
that records the minimum parent node seen among all subregions examined
so far. Any node whose child node is greater than m can be discarded. The
recursion stops when the candidate set S is reduced to a single element, or
where the parent node for the set S coincides with the child node. Anyway,
any element S will be the minimum of a function without S. pseudocode
algorithm pruning branch and is as follows:

Function RyP {
Sons P = (x, k)
while (not empty (P)) x (k) = extract (P)
if esFactible (x, k) and G (x, k) <optimal if esSolution (x)
Branch and Bound 183

Storing (x) but


RyP (x, k + 1)
}

where:
• G (x) It is the function estimation algorithm.
• P It is the pile of possible solutions.
• esFactible. It is the function that considers whether the proposal
is valid.
• esSolution. It is the function that checks if the target is met.
• Optimum is the value of the function to optimize assessed on the
best solution found so far.
• Note: We use less than (<) for minimization problems and greater
than (>) for maximization problems.

10.1.1. Effective Subdivision


The efficiency of this method depends primarily on the expansion process
nodes, or estimating the parent and child nodes. It is best to choose a method
of expansion that provides the subsets do not overlap to save problems of
duplication of branches.
Ideally, the procedure stops when all nodes of the search tree are pruned
or resolved. At that point, all subregions unpruned, will have a parent and
child node equal to a minimum overall function. In practice, the procedure
often ends when a given time ends, to the extent that the minimum and
maximum child nodes of parent nodes so be all sections unpruned, define a
range of values containing the global minimum. Alternatively, subject to a
limited time, the algorithm must end when an error criterion, such that (max-
min) / (max + min), falls below a specific value.
The efficiency of the method depends especially on the effectiveness of
the algorithms used branching and pruning. A bad choice can lead to repeated
branching without pruning until subregions become very small. In that case,
the method would be reduced to an exhaustive list of the domain, which is
often impractically long. There is no universal pruning algorithm that works
for all problems, but there is little hope that anyone ever find. Until then
we need to implement each separately for each computer application, with
branching and pruning algorithm specially designed for him.
184 Parallel Programming

The branching and pruning methods must be classified according to the


pruning methods, and ways of creation/ranking of search trees.
The design strategy of branching and pruning is very similar to retreat
(Backtracking), where the state tree is used to solve a problem. The
differences are that the method of branching and pruning does not limit us
to any particular way of obtaining a transverse shaft and is used only for
optimization problems.

10.2. PODA STRATEGIES


Our main objective is to eliminate those nodes that do not lead to good
solutions. We can use two basic strategies. Suppose a maximization problem
where you have traveled several nodes i = 1,…, n, estimating for each upper
bound CS (xi) and lower CI (xi).
We will work on a problem where you want to maximize the value (if a
minimization problem then the equivalent strategy would apply).
Strategy 1: If from a node xi can obtain a valid solution, then it can
prune the node if the upper bound CS (xi) is less than or equal to the lower
bound CI (xj) for some node j generated in the tree.
For example: Suppose the knapsack problem, where we use a binary
tree. So:
If from xi can find a maximum benefit of CS (xi) = 4 and from xj, it has
secured a minimum benefit of CI (xj) = 5, this will lead to the conclusion
that you can prune the node xi not to lose any possible optimal solution.
Strategy 2: If a possible valid solution to the problem with a profit Bj is
obtained, then they will prune those nodes xi whose upper bound CS (xi) is
less than or equal to the benefit that can be obtained Bj (this process would
be similar to the lower bound ).

10.3. BRANCHING STRATEGIES


As discussed in the introduction to this section, the expansion of tree with
different strategies is determined by finding the optimal solution. Because
of this, all nodes in a level must be expanded before reaching a new level,
which is logical because in order to choose the tree branch to be explored,
should know all the possible branches.
All these nodes that are generated and that have not been scanned are
stored in what is called node list Living (from now LNV), pending nodes
Branch and Bound 185

to expand by the algorithm. The LNV contains all nodes that have been
generated but that have not been explored yet. Depending on how the nodes
are stored in the list, the tree traversal will be one or the other, leading to the
three strategies listed below.
FIFO strategy: The strategy FIFO (First in first out), the LNV will be a
tail, Resulting in a path in the tree width (Figure 10.1).

Figure 10.1: Strategies branch FIFO.


In the figure shows that begins by introducing in the LNV node A. We
draw the tail node and expand creating the nodes B and C that are introduced
in the LNV. Then the first node that is the B and re-inflates generating the
D and E nodes that are introduced in the LNV is removed. This process is
repeated while remaining an item in the queue.
LIFO strategy: The strategy LIFO (Last in first out), the LNV will be a
battery, Producing a depth-first traversal of the tree (Figure 10.2).

Figure 10.2: Strategies branching LIFO.


In the figure, the order generating nodes with a LIFO strategy is shown.
The process followed in the LNV is similar to the FIFO strategy, but instead
of using a queue, a stack is used.
186 Parallel Programming

Low-Cost strategy or LC: By using the LIFO and FIFO strategies


performed what is called a search “blind” as they expand regardless of the
benefits that can be achieved from each node. If the expansion should be
developed in terms of the benefits that each node reports (with a “farsighted”)
could be achieved in most cases a substantial improvement.
Thus the strategy Least Cost or LC (Least cost) was born, selected to
expand between all nodes in the LNV who have greater benefit (or lower
cost). Therefore, we are no longer talking about a breakthrough “blind.”
This can lead to the situation that multiple nodes can be expanded at the
same time. To be the case, it is necessary to have a mechanism to solve this
conflict:
• Strategy LC-FIFO: Choose the LNV the node that has greater
benefit in case of a tie and the first to be introduced is chosen.
• Strategy LC-LIFO: Choose the LNV the node that has greater
benefit in case of a tie and the last to be introduced is chosen.

10.3.1. Branch and Bound “Relaxed”


A variant of the method of branching and more efficient pruning can get
“relaxed” problem, i.e., removing some of the restrictions to make it more
permissive.
Any valid solution to the original problem will be valid solution for the
“relaxed” problem, but does not have to happen contrary. If we can solve
this version of the problem optimally, then if the solution obtained is valid
for the original problem, this will mean that is optimal for this problem.
The real utility of this process is the use of an efficient method to solve
problem we relaxed. One of the best-known methods is the Branch-and-Cut
(Branch and Cut (English version)).

10.3.2. Branch and Cut


Branch and cut is a method of combinatorial optimization for solving linear
integers, which are problems linear programming where some or all of the
unknowns are restricted to integer values. This is a hybrid of branching and
pruning methods cutting planes.
This method solves problems with linear constraints using simplified
entire regular algorithms. When an optimal solution having a non-integer
value for a variable that must be whole, the cutting planes algorithm is used
to find a linear constraint later to be satisfied by all feasible points whole is
Branch and Bound 187

obtained. If it is found that inequality is added to the linear program, so that


resolve will lead to a different solution we hope will be “less fractional.”
This process is repeated until either an entire solution (we can show that it is
optimal) is either not found more cutting planes.
At this point, the algorithm of the branch and bound begins. This problem
is divided into two versions: one with added restriction that the variable is
larger or equal to the next integer greater than the intermediate result, and
one where the variable is less than or equal to the next lowest integer. Thus
new variables are introduced into the bases according to the number of basic
variables that are not integers in the intermediate solution but are integers
according to the original restrictions. The new linear programs are solved
using a method, and then the process is repeated until a solution meets all
integer restrictions.
During branch and bound, the cutting planes can be separated later, and
can be either valid global cuts for all feasible integer solutions, or local cuts
they are satisfied by all solutions filling all branches of restriction subtree
current branching and pruning.

10.4. THE TRAVELING SALESMAN PROBLEM


(TSP)
For dynamic programming algorithm traveling salesman has a complexity
of O (n22n). Now, what is going to try to explain is seen from the perspective
algorithm branch and bound. Using a good dimensioning feature will allow
the algorithm, under the paradigm branch and bound, in some cases can be
solved in much less time than is required by dynamic programming.
Let G = (V, E) a graph defined as an instance of the traveling salesman prob-
lem (TSP) and is cij the cost of the edge <i, j>, ij = ∞ if <i, j> ¢ E and is |
V | = N. Without loss of generality, we can assume that any path starts and
ends at the vertex 1. Thus the solution space S is given by S = {1, Π, 1 | Π is
the permutation of (2, 3., N)}. | S | = (n – 1)!. The size S can be reduced so
restricting S (1, i1, i2,…, Yn–1, 1) € S if and only if <ij,…, Ij + 1> € E, 0≤ j
≤ n – 1, i0 = in = 1. S can be organized in a tree states. The following figure
shows the organization of the tree in the case of a complete graph with | V | =
4. Each leaf node L is a solution and represents the path defined by the path
from the root to L. The node 14 represents travel i0 = 1 = 3 i1, i2 = 4, i4, i3
= 2 and = 1 (Figure 10.3).
188 Parallel Programming

Figure 10.3: Tree states for a traveling salesman problem with n = 4 and i0 =
i4 = 1.
In order to use branch and bound LC to search tree states traveling
salesman requires defining a cost function c (.) And two other ĉ functions
(.) and u (.) So that C (R) ≤ c (.) (.) (.) (R) ≤ u (R) for all node R. c is the
solution node if c is the node of lower cost for the shorter tour G. One way
to choose c it is:
A simple ĉ (.) Such that c (A) ≤ c (A) for all A is obtained by defining
ĉ (A) to be the size of the path defined in the node A. For example, the
path defined in the preceding tree is i0, i1, i2, = 1, 2, 4. This consists of
edges <1,2> and <2, 4>. A ĉ (.) Can be obtained is better using the reduced
cost matrix corresponding to G. A row (column) is reduced if and only if it
contains at least one zero and all other values are non-negative. A matrix
is reduced if and only if all row and column is reduced. As an example of
reducing the cost of a matrix of a given graph G, consisting of the matrix.
The matrix corresponds to a graph with five vertices. All tour includes
exactly one edge <i, j> i = k 1 ≤ 5 ≤k exactly one edge <i, j> j = k, 1 ≤ k
≤ 5, subtracting a constant t of all elements a row or a column of the cost
matrix size of each path exactly t units is reduced. A tour of minimum cost
remains after this subtraction operation. If t is chosen to make minimum the
entry in row i (column j subtracting i from all entries in row i (column j)
present a zero in row i (column j). By repeating this procedure as necessary,
the matrix of costs can be reduced. The total amount subtracted from all
columns and rows is the lower limit and can be used as the C value of the
tree root state space. Subtracting 10, 2, 2, 3, 4, 1 and 3 of the rows 1, 2, 3,
4, 5, and columns 1 and 3, respectively, matrix paragraph a,) of the figure
above has the reduced matrix subsection b) of the same figure. The total
Branch and Bound 189

amount subtracted is 25. Therefore, all travel origin has a length of at least
25 units.
With all nodes in the tree traveling salesman states can be associated to
a cost matrix. Either the node cost matrix R. Let S be the child of R such that
the edge of the tree (R, S) It corresponds to the edge <i, j> in the path.
If S is not a leaf then the cost matrix for S can be obtained as follows:
• By choosing the path <i, j>, change any value in the row i and
column j of A ∞. This prevents the use of certain outgoing edges
of the incoming vertex io vertices j.
• Position A (j, 1) ∞. This prevents the use of edges <j 1>.
• Reduce all the rows and columns in the resulting matrix except
for the rows and columns containing only ∞. Each adds zero
difference in the variable “r.” The resulting matrix is B.
• C (S) = C (R) + A <i, j> +r
It is “S” number of current node. Being “R” the number of parent node.
The first two steps are valid and will not exist a route in the subtree S
containing the edges of the type <j, k> or <k, j> or <j, 1> (except for the
edge <i, j>). At this time, “r” is the total amount subtracted from step 3, then
C (S) = C (R) + A <i, j> + r. For leaf nodes C (.) = C () it is easy to calculate
because each branch to the sheet defines a single path. For the function of
the upper bound u, it is required to use or (R) = ∞ for every node R.
For transition tree the following example will be taken:

Taking transition notation:


Is= y
where: I indicates the case transition; s indicates the level (// child node),
and Y indicates the column.
Thus, the tree path for the example is given in Figure 10.4.
190 Parallel Programming

Figure 10.4: Possible paths.


To locate the lower level the next step is performed:
The curtailment of the matrix. The sum of all differences will be the
lower bound ĉ (.). Returning to the first example of the traveling salesman
has:

Subtracting by rows:
The h1 is subtracted 10. Therefore, r = 10 A h2 is subtracted 2. Therefore,
r = 12 A h3 is subtracted 2. Therefore r = 14 the h4 is subtracted 3. As both
r = 17 a h5 4. Subtracted therefore r = 21
The resulting matrix is:

Subtracting per column has


A c1 subtracts 1. Therefore, r = 22 A c3 is subtracted 3. Therefore, r = 25
Thus ĉ (.) = 25 and the upper bound U = ∞. So the resulting matrix is:
Branch and Bound 191

For S = 2 (1.2):
And knowing the lowest cost, the first part of the path is chosen, where
A <2.1> = ∞.

In this case, any row and column in all there is a zero, so that r = 0.
In this case, the whole row has at least one zero, but the first column is
nonzero, so r = 11.

Lower cost node is taken. So the parent node is S = 4.

where S = 6 have the minimum route:


192 Parallel Programming

It notes that the leaf node with the lowest cost is S = 10.
Therefore: C (11) = 28 + 0 + 0 = 28.

Adding the paths to nodes has:


Node 1 to node 410 units
Node 4 to node 26 units
Node 2 to node 52 units
Node 5 to node 37 units
Node 3 to node 13 units.
Total: 28 units.
Branch and Bound 193

#include <stdio.h>
#include <stdlib.h>

List struct {
int ** matrix * brands, cost accountant, city, mc; * Sig list;
};

void block (List * p, int size, int x, int y) {


// row and row blocks, also A <j, i>

for (int i = 0; i <SIZE; i ++) {


p-> array [x] [i] = 999; p-> matrix [i] [y] = 999;
}
p-> matrix [y] [x] = 999;
}

void restar_fila (List * q, int size, int min, int i) {// Subtract the
minimum each row for (int k = 0; k <tam; k ++)
if (! q-> matrix [i] [k] = 999 && q-> matrix [i] [k] = 0) q-> matrix [i]
[k] - = min;
}

void restar_columna (List * q, int size, int min, int i) {// Subtract the
minimum each column for (int k = 0; k <tam; k ++)
if (! q-> matrix [k] [i] = 999 && q-> matrix [k] [i] = 0) q-> matrix [k]
[i] - = min;
}

void cost (List * q, int size) {


// Make the cost supported the restar_columna function and restar_fila
int min;
for (int i = 0; i <SIZE; i ++) {min = q-> matrix [i] [0];
194 Parallel Programming

for (int j = 1; j <tam; j ++) {if (min> q-> matrix [i] [j]) min = q-> matrix
[i] [j];
}
if (min! = 999)
q-> Cost + = min; if (min! = 0 && min! = 999)
restar_fila (q, tam, min, i);
}
for (int i = 0; i <SIZE; i ++) {min = q-> array [0] [i];
for (int j = 1; j <tam; j ++) {if (min> q-> matrix [j] [i])
min = q-> matrix [j] [i];
}
if (min! = 999)
q-> Cost + = min; if (min! = 0 && min! = 999)
restar_columna (q, tam, min, i);
}
}

void generate (List * q, int size) {// Generate the matrix minimum
costs int i, j, cont = 0, destination;
for (i = 0; i <SIZE; i ++) for (j = 0; j <tam; j ++)
q-> matrix [i] [j] = 999;
printf ( “To stop intorducir a city introduces 99”); while (cont <tam) {
printf ( “\ nCiudad% d \ n “Cont + 1); do {
printf ( “In da city%:” cont + 1); scanf ( “% d”, & destination); if (target
== 99)
printf ( “% d Term city” cont + 1); else if (destination> tam destination
|| <= 0) printf ( “This city does not exist \ n”);
else if (cont destination == -1)
printf ( “You are in this city \ n”); else {
printf ( “Enter cost% to% d”, cont + 1 destination); scanf ( “% d”, & q->
matrix [cont] [destination-1]);
}
Branch and Bound 195

} While (target = 99!); cont ++;


}
}

The idea of having an algorithm or recipe to perform some task has


existed for thousands of years. Also for many years, people believed that if
any problem could be initiated accurately, then with enough effort would be
possible to find a solution with time (or perhaps a test that no solution could
be provided with the passage of time). In other words, it was believed that
there was no problem that was so inherently difficult in principle never be
resolved.
It was thought that mathematics is a consistent system; i.e., it is not
possible to contradictions from the initial axioms. Later the problem was
expanded (which became the decision problem), to include the demonstration
that arithmetic is complete (there is a test for all mathematical proposition
right) and decidable (there is an effective method that decides for each
possible proposition, if it is true or false).
One of the main promoters of this belief was the famous mathematician
David Hilbert (1826–1943). Hilbert believed that mathematics could and
should be tested everything from basic axioms. The result would prove
conclusively the two basic elements of the mathematical system. First,
mathematics should be able, at least in theory, to answer any particular
question. Second, mathematics should be free of inconsistencies, or what
is the same, once demonstrated the truth of a premise through a method
would not be possible by another method is concluded that same premise
is false. Hilbert was convinced that, taking only a few axioms, it would be
possible to answer every conceivable mathematical question without fear of
contradiction.
On August 8, 1900, Hilbert delivered a historic conference at the
International Congress of Mathematicians in Paris. Hilbert proposed twenty
unsolved mathematical problems he considered an urgent matter. Hilbert
sought to shake the community to help him realize his dream of creating a
free mathematical system of doubt and inconsistency. At the same time, the
English logician Bertrand Russell, who was also contributing to the great
project of Hilbert, had stumbled upon an inconsistency. Russell recalled his
own reaction to the dreaded possibility that mathematics were inherently
contradictory.
196 Parallel Programming

There was no escaping the contradiction. Russell’s work caused


considerable damage to the dream of creating a free mathematical system of
doubt, inconsistency, and paradox.
Russell’s paradox is often explained with the story of meticulous librarian.
One day, wandering among the shelves, the librarian discovers a collection
of catalogs. There are different catalogs for novels, reference books, poetry,
and others. He realizes that some of catalogs include themselves and others;
however, no.
In order to simplify the system, the librarian produces two catalogs: one
of them finds all the catalogs that include themselves and other, even more
interesting, all those who are not categorized themselves. You should be
classified himself? If included, by definition it should be not included; but if
it is not included, it should be included by definition. The librarian is in an
impossible situation.
Incongruity tormenting the librarian causes problems in the logical
structure of mathematics. Mathematics cannot tolerate inconsistencies,
paradoxes or contradictions. The powerful instrument of proof by
contradiction, for example, is based on a free mathematical paradox. Proof
by contradiction states that if an assertion leads to absurdity, then it must be
false; however, according to Russell, even the axioms can lead to absurdity.
So the proof by contradiction could show that an axiom is false, and yet the
axioms are the foundation of mathematics and recognized as true.
Russell’s work shook the foundations of mathematics and dragged the
study of mathematical logic to a situation of chaos. One way to address
the problem was to create an additional axiom that prohibited any group
membership itself. That would reduce Russell’s paradox and become
superfluous the question of whether to introduce in the catalog those that are
not included them.
In 1910, Russell published the first of three volumes of the work
Principia Mathematical, an attempt to address the problem arising from its
own paradox. When Hilbert retired in 1930, he was certain that mathematics
were on track towards recovery. Apparently, his dream of a coherent logic
and solid enough to answer any question was in the process of becoming
reality.
But it happened that in 1931 an unknown mathematician twenty-five
years old he published an article that would destroy forever the hopes of
Hilbert. Kurt Godel would force mathematicians to accept that mathematics
they would never achieve a perfect logic and his works had implied the idea
Branch and Bound 197

that problems like Fermat’s last theorem could be unsolvable.


Godel had shown that the attempt to create a comprehensive and
coherent mathematical system was impossible. His ideas can be picked up
in two propositions.
First theorem of undecidability: If the set of axioms of a theory is
consistent, there are theorems that can neither prove nor disprove.
Second theorem of undecidability: There is no constructive process
able to demonstrate that an axiomatic theory is consistent.
Gödel’s first statement basically says that, regardless of the number of
axioms that is to be used, there will be issues that mathematics cannot solve;
Completeness can never be achieved. Worse, the second sentence says that
mathematicians can never be sure that the axioms chosen not lead to any
contradiction; consistency can never be proved. Gödel proved that Hilbert’s
program was an impossible task.
Although the second sentence of Gödel said it was impossible to prove
that the axioms are consistent, that did not mean they were inconsistent.
Many years later, the great theorist André Weil said numbers:
“God exists because mathematics is consistent and the devil exists
because we cannot prove.”
The demonstration of Gödel’s theorems is tremendously complicated
but can be illustrated by a logical analogy that we owe to Epiménides and is
known as the Cretan paradox. Epiménides was a Cretan who she said:
I’m a liar.
The paradox arises when we try to determine whether this proposition is
true or false. If the proposition is true, in principle, we affirm that Epimenides
is not a liar. If the proposition is false, then Epiménides is not a liar, but have
accepted that issued a false statement and therefore that’s a liar. We found
another inconsistency; therefore the statement is neither true nor false.
Gödel gave a new interpretation to the paradox of the liar and incorporated
the concept demonstration. The result was a sentence like the following:
This statement does not show.
Since Gödel managed to translate the proposition above a mathematical
notation, he was able to show that there are certain mathematical statements
that can never be proved as such; they are called undecidable statements.
This was the mortal for the program Hilbert hit.
This showed for the decision problem Hilbert is not computable.
198 Parallel Programming

That is, there is no type algorithm seeking Hilbert. A cynic might say that
mathematicians gave a sigh of relief, because if there were such an algorithm,
all would be out of work when he was found. However, mathematicians
were surprised this remarkable discovery.
The decision problem was introduced to challenge the symbolic logician
finding a general algorithm, where the goal was to decide whether a first-
order calculation is a theorem. In 1936, Alonzo Church and Alan Turing
independently showed that it is impossible to write such an algorithm. As
a result, it is also impossible to decide with an algorithm if certain specific
arithmetic phrases are true or false.
The question goes back to Gottfried Leibniz who in the XVII century,
after successfully building a mechanical calculating machine, dreamed of
building a machine that could manipulate symbols to determine whether
a sentence in mathematics is a theorem. The first thing you would need is
a clear and precise formal language. In1928, David Hilbert and Wilhelm
Ackermann proposed this problem in its above-mentioned formula.
A first-order logical formula is called universally valid or invalid if
logically follows from the axioms of calculating first order. The Gödel’s
completeness theorem establishes a logical formula is universally valid in
this sense if and only if it is true in every interpretation of the formula in a
model.
Before you can answer this question, it was necessary to formally define
the notion of algorithm. This was done by Alonzo Church in 1936 with the
concept of “effective calculability” based on his lambda calculus and Alan
Turing based on the Turing machine. The two approaches are equivalent, in
the sense that can be solved exactly the same problems with both approaches.
The negative response to decision problem was given by Alonzo Church
in 1936 and independently shortly thereafter by Alan Turing, 1936. Church
also demonstrated no algorithm (defined by the recursive functions) can
decide whether two lambda calculus expressions are equivalent or not.
Church for this was based on previous work Stephen Kleene. Moreover,
Turing reduced this problem of the stop for Turing machines. Generally,
it considered to be the Turing test has been more influential than Church.
Both works were influenced by previous work Kurt Godel about him
incompleteness theorem, especially by the method of assigning numbers to
logical formulas to reduce logic arithmetic.
Turing’s argument is as follows: Suppose we have a general decision
algorithm for first-order logic. You can translate the question whether a
Branch and Bound 199

Turing machine ends with a formula of the first order, which could then be
submitted to the decision algorithm. But Turing had already shown that there
is no general algorithm that can decide whether a Turing machine stops.
It is important to note that if the problem is restricted to a specific theory
of first order with constant, constant predicates and axioms, there may be
a decision algorithm for the theory. Some examples of decidable theories
are: arithmetic Pressburger and the static type systems of the Programming
languages.
However, the general theory of first order for natural numbers known
as Peano arithmetic cannot be decided with that kind of algorithm. This
follows from Turing’s argument summarized above.
In addition, Godel’s theorem showed that there is no algorithm whose
input can be any statement about integers and whose output or is not true.
Following closely Gödel, other mathematicians as Alonso Church, Stephen
Kleene, Emil Post, Alan Turing, and many others, found more problems that
lacked algorithmic solution. Perhaps the most striking feature of these early
results on problems that cannot be solved by computers is that they were
obtained in the 1930s before the first computer was built!
Chapter 11

Turing’s Hypothesis

CONTENTS
11.1. Hypothesis Church–Turing ............................................................ 202
11.2. Complexity ................................................................................... 202
11.3. Thesis Sequential Computability ................................................... 203
11.4. NP Problems................................................................................. 204
202 Parallel Programming

11.1. HYPOTHESIS CHURCH–TURING


There is significant to prove that there is an algorithm for a specific task
obstacle. First, you need to know exactly what it means algorithm. Each of
the mathematicians mentioned in the previous section had overcome this
obstacle and did defining algorithm differently.
Gödel defined an algorithm as a sequence of rules for forming complicated
mathematical functions from simple mathematical functions. Church used a
formalism called lambda calculus.
Turing used a hypothetical machine known as the Turing machine. Turing
an algorithm defined as any set of instructions for simple machine.
These definitions, apparently different, and independently created, are
to be equivalent. As the researchers became increasingly aware of this
equivalence in the early 30’s, it was believed extensively in the following
two propositions:
1. All reasonable definitions of “algorithm” known so far are
equivalent.
2. Any reasonable definition of “algorithm” reached, will be
equivalent to the known definitions.
These beliefs have come to be known thesis Church–Turing in honor of
two of the first workers realized the fundamental nature of the concept that
had defined. So far there has been no evidence against the thesis of Church
and is widely accepted – Turing.
In a modern approach, you can define “algorithm” as anything that can
run on a computer. Given two modern computers, it is possible to write a
program to one that can understand and run on another.
The equivalence between any modern computer, and the Turing machine
and numerous other means of defining “algorithm” is further evidence of the
thesis of Church–Turing. This property of algorithms known as Universality.
In informal universality, it means that any computer terms is equivalent
to all others in the sense that all can perform the same tasks.

11.2. COMPLEXITY
The study of computability leads to understand what problems that support
algorithmic solution and which are not. Of those problems for which there
are algorithms also interesting to know how many computing resources are
needed for implementation. Only algorithms that use a feasible amount of
Turing’s Hypothesis 203

resources useful in practice. The field of computer science called complexity


theory is the questioner and try to resolve questions about the use of
computing resources.
The following figure shows a pictorial representation of the universe
of problems shown. Those who can be counted in algorithmically form an
infinitesimally small subset. Those who are feasibly computable taking
into account their resource needs, comprise a tiny portion of the already
infinitesimally small subset. However, the class of computable feasible
problems is so great that computer science has become an interesting
science, practice, and flourishing (Figure 11.1).

Figure 11.1: Computable and non-computable problems.

11.3. THESIS SEQUENTIAL COMPUTABILITY


In the preceding sections the difference between using polynomial
algorithms and exponential described. The algorithms tend to be feasible
for reasonable sizes of input data. Exponential algorithms tend to exceed
the resources available even in the case of small amounts of input data.
One of the objectives of complexity theory is to improve this classification
algorithms and thus, understanding the difference between feasible and
unfeasible problems.
If an algorithm is constructed by taking two feasible algorithms and
placing them sequentially one after the other, the algorithm thus constructed
must be feasible. Similarly, if a feasible algorithm is replaced by a call to
a module that represents a second possible algorithm, the new combined
algorithm should also be feasible. This property actually lock holds for
polynomial time algorithms.
Any algorithm that runs in polynomial time on a computer can run in
polynomial time anywhere else. Hence it makes sense to speak of polynomial
204 Parallel Programming

time algorithms independently of any specific computer. A workable theory


of algorithms based on polynomial time is independent of the machine.
The belief that all reasonable sequential computers that have come to
create polynomially related execution times called sequential computation
thesis. This thesis compares with the thesis Church–Turing. It is a stronger
version of this thesis; it claims not only that all computable problems are the
same for all computers, but also that all feasible computable problems are
the same for all computers.

11.4. NP PROBLEMS
This section contains what is perhaps the most important development in
research on algorithms in the early 70’s, not only in computer science but
also in electrical engineering, operations research and related areas.
An important idea is the distinction between a group of problems whose
solution is obtained in polynomial time and a second group of problems
whose solution is not obtained in polynomial time.
The theory of NP-complete does not provide algorithms for solving the
problems of the second group in polynomial time; it does not say that there
is some algorithm in polynomial time. Instead, it explains that everyone
has problems not currently in polynomial time algorithm is computationally
related. Actually, you can set two kinds of problems. These will be the NP-
hard problems and NP-complete. One problem is NP-complete will have the
property that can be solved in polynomial time if and only if all other NP-
complete also can be solved in polynomial time. If an NP-hard problem can
be solved in polynomial time, then all NP-complete problems can be solved
in polynomial time.
While several issues are defined with the property of being classified
as NP-hard or NP-complete (problems that are not solved sequentially in
polynomial time), these same problems can be solved in non-deterministic
machine in polynomial time.

11.4.1. Nondeterministic Algorithms


Until now, the algorithms have the property explained that the result of
any operation is unique and well defined. Algorithms with this property
are known as deterministic algorithms. Such algorithms can be run without
problem on a sequential computer. In a theoretical computer, you can remove
this restriction and allow operations whose output is not unique but limited
Turing’s Hypothesis 205

to a set of constraints. The machine running such operations would be


allowed to choose any of the results subject to completion under a specific
condition. This leads to the concept of a non-deterministic algorithm. Thus,
a new feature and two new statements are defined:
1. choice (S), where S is arbitrary chosen element of the set.
2. Failure: termination without success.
3. Success: termination with success.
X ← choice (1: n) may result in X assigning an integer in the range [1,
n]. There is no rule specifies how the integer was chosen. Signal success or
unsuccessful is used to define a state of the algorithm. These statements are
equivalent to define a stop and are not used to make a return. An algorithm
ends unsuccessfully if and only if there is a set of choices that lead to
success. Computation times for choice, success, and failure are taken as O
(1). A machine that is able to run a non-deterministic algorithm is known
as non-deterministic machine. While there is a non-deterministic machine
(as defined in this paper), there are intuitive enough to conclude that certain
problems cannot be solved in deterministic algorithms reasons.
Example: Consider the problem of searching for an element x in a given
elements A (1: n) set, n ≥1. It is required to determine the index such that A
(j) = 0 if x = Xoj is not in A. A nondeterministic algorithm is:
j ← choice (1: n)
if A (j) = x entrance prints (j) end if success print ( ‘0’); failure
In this example, a nondeterministic computer will print a ‘0’ if and only
if there is some j such that A (j) = x. The algorithm is a nondeterministic
algorithm complexity O (1).
Note that, as A is not ordered any deterministic search algorithm has a
Ω (n) complexity.
An interpretation of a non-deterministic algorithm can be enabled
using a parallel computer without limits. It can be done at each instant each
choice (S); the algorithm performs multiple copies of itself. A copy for each
choice (S). So all copies are executed simultaneously. The first copy ends
with a successful requires ending the remaining copies. If a copy ends in
failure, only that copy stops. It is important to note that a non-deterministic
machine does not produce any copy of any algorithm whenever a choice (S)
is executed. Since the machine is fictitious, need not be explained as such
machine determines whether a success or failure.
206 Parallel Programming

Definition: P is the set of all problems solvable by a deterministic


algorithm in polynomial time. NP is the set of all solvable algorithms
decision by a non-deterministic algorithm in polynomial time.
Since deterministic algorithms are a special case of non-deterministic
algorithms, we can conclude that PC NP. I do not know, and what may be
perhaps the most famous unsolved problem in computer science is whether
P = NP P ≠ NP or.
Is it possible that for every problem there is a deterministic algorithm
NP where P is not known at this time? This seems not to be possible by
the effort that has been made by many experts to answer such a question,
however, proof that P ≠ NP is so elusive and seems to require techniques that
have not yet been discovered.
Considering this problem, Cook asked the question: is there a problem
in NP such that only if it was to be in P, then this would imply that N = NP?
Cook answered his own question with the following theorem:
Cook theorem: P is satisfied if and only if P = NP.
As parentheses can say that in computational complexity theory, the
Boolean satisfiability problem (also called SAT) was the first problem
identified as belonging to the class of complexity NP-complete.
Stephen Cook showed that membership in 1971 using a non-deterministic
Turing machine (MTND) through the following demonstration:
If NP has a problem, then there is a MTND that solves in polynomial
time known p (n). If the machine becomes a satisfiability problem (in
polynomial time n) and said problem is solved, there is also obtained the
solution to the original NP. In such a case will be shown to all NP problem
can be transformed to a satisfiability problem and therefore the SAT is NP-
complete.
The Boolean satisfiability problem:
• One problem is satisfiable if there is at least one assigning values
to the variables of the problem do so true ( ).
• One problem is unsatisfiable if all possible value assignments
always make false problem ( ).
Let’s look at this with an example:
• It goes from the following proposition disjunctive normal form:

• The following allocation is made:


Turing’s Hypothesis 207

⊥, x2 =
x1 = ⊥ Y x3 =
T

• It is replaced in the expression:

• The expression is evaluated: .


• Since no valid solution has found a new allocation is made:
= , x2 T Y
x1 T= = x3 T

• The expression is evaluated:


As found an assignment of values (model) which make the true
expression, it has been shown that this particular problem is satisfiable.
These are only two of the eight (2n = 3) possible assignments. You can
see that the number of solutions is growing rapidly adding new variables;
hence its computational complexity is high.
An algorithm for solving SAT problems is as follows:
• DPLL algorithm: Uses a systematic search backward
(backtracking) to explore possible assignments of values to
variables that make the satisfiable problem.
Right now you can better define the types of problems NP-hard and NP-
complete. First, the notion of reducibility is defined.
Definition: Let L1 and L2 two problems. L1 reduces to L2 (L1 L2 α)
if and only if there is a way to solve L1 with a deterministic polynomial
algorithm also a deterministic polynomial time algorithm that solves L2 is
used.
This definition implies that if there is a deterministic polynomial time
algorithm for L2, then L1 can solve in polynomial time. This operator is
transitive, that is, if L1 and L2 α L3 then L1 L3.
Definition: L problem is NP-hard if and only if satisfiability reduced to
L (satisfiability α L). A problem L is NP-complete if and only if L is NP-hard
and LЄNP.
An NP-hard problem cannot be NP-complete. Only a decision problem
can be NP-complete. However, an optimization problem can be NP-hard.
Also, if L1 is a decision problem and L2 is an optimization problem, it
is quite possible that L1 α L2. One can see that the decision problem of
the backpack can be reduced to the problem of optimizing the backpack.
You can also comment that the optimization problem can be reduced to the
corresponding decision problem. Therefore, optimization problems cannot
208 Parallel Programming

be NP-complete, while some decision problems can be NP-hard type and are
not NP-complete.

11.4.2. Example of a Decision Problem NP-Hard (NP-Hard)


Consider the problem of unemployment for a deterministic algorithm. The
problem of stoppage (the halting problem) is determined for an arbitrary
deterministic algorithm A and an input I if the algorithm A with the input
I end (or enters an infinite loop). This problem is undecidable. So there is
no algorithm (any complexity) to solve this problem. So the problem is not
the type NP. To show satisfiability α halting problem simply an algorithm is
constructed whose input is a propositional formula X. If X has n variables,
then A makes possible 2n assignments and checks whether X is satisfiable.
If it is, then A is stopped. If X is not satisfiable, then A enters into an infinite
loop. If we have a polynomial time algorithm for the halting problem, then
we can solve the problem of satisfiability in polynomial time using A and X
as input to the algorithm to “the halting problem.” Hence, the unemployment
problem is an NP-hard problem but is not in NP.
Definition: Two problems L1 and L2 are polynomially equivalent if and
only if one L2 and L2 α L1.
To show that a problem, L2, is NP-hard is suitable show that L1 L2
where L1 is α some problem known as NP-hard. Since α is a transitive
operator shown that if L1 and L1 satisfiability α satisfiability then L2. To
show that a decision problem is NP-hard type NP-complete, only must show
a non-deterministic algorithm polynomial time.
The problems of NP-hard class (and NP-complete subset) are found in a
variety of disciplines; some of them are:
a. NP-Hard Graph Problems:
1. Decision -Clique Problem (CDP):
Node Cover Decision Problem.
2. Decision Problem -Chromatic Number (CN):
Directed Hamiltonian Cycle (DHC)
3. Decision -Traveling Salesperson Problem (TSP):
AND / OR Decision Graph Problem (AOG)
b. NP-HARD Scheduling Problems:
1. Scheduling Identical Processors
2. Flow Shop Schelling.
Turing’s Hypothesis 209

c. Code Generation NP-Hard Problems:


1. Code Generation with Common subexpressions.
2. Implementing Parallel Assignment Instructions.
The literature shows a lot of NP-hard problems.

11.4.3. Simplifying NP Problems


Once the time it takes to resolve any problem L type showed NP-hard, we
could tip us discard the possibility that L can be resolved in a deterministic
polynomial time. At this point, however, one can perform the following
question: Could one restrict the problem to a subclass L to be able to solve in
deterministic polynomial time? One can see that placing sufficient restrictions
on a NP-hard problem (or defining a subclass enough representatives) can
reach a problem that can be solved in polynomial time, but the solution has
a relaxation and is not the problem as such.
Since it is almost impossible for NP-hard problems can be solved in
polynomial time, it is important to determine what restrictions to relax
within which we can solve the problem in polynomial time are.
 A possible non-deterministic machine.
Quantum computing is a paradigm of computing different from classical
computing. It is based on the use of qubits instead of bits and gives rise
to new logic gates that make possible new algorithms. The same task can
have different complexity in classical computing and quantum computing,
which has led to great expectations, as some intractable problems become
tractable. While a classic computer is equivalent to a Turing machine, a
quantum computer is equivalent to a Turing machine quantum.
In 1985, Deutsch presented the design of the first quantum machine based
on a Turing machine. To this end, he enunciated a new variant Church–Turing
thesis resulting in the so-called “principle of Church–Turing-Deutsch.”
The structure of quantum Turing machine is very similar to that of a
classical Turing machine. It consists of the three classic elements:
• Infinite memory tape where each element is a qubit;
• A finite processor; and
• A head.
The processor contains the instruction set that is applied to the element
identified by the tape head. The result will depend on the qubit tape and
processor status. The processor executes one instruction per unit time.
210 Parallel Programming

The tape drive is similar to a traditional Turing machine. The only difference
is that each element of the tape machine is a quantum qubit. The alphabet of
this new machine is formed by the space of values qubit. Head position is
represented by an integer variable (Figure 11.2).

Figure 11.2: A quantum Turing machine.


CONCLUSIONS
One of the goals is to know to create parallel algorithms in a methodical
and easily able to recognize design flaws that compromise efficiency or
scalability.
The design of a model of parallel climate has proven to be a simple
process, in which options most designs are clear, a two-dimensional domain
decomposition of the results of the network model a necessity for both
communication Local handling tasks between the neighboring grid points
and an addition operation in parallel.
Because of the ease of administration that provides, users can lift large
computational clusters in a short time without the need to be a specialist in
handling them.
Thanks to the services offered by EC2 and the creation of this tool presented
in this project, users who need to solve big problems FDTD have easy access
to computational clusters with minimal costs.
Because the tool supports resource allocation processing according to the
problem, facilitates, users can resolve themselves, with different values
without the need to wait to finish some other problem is running.
With Resource Monitoring we can check the status of each cluster node,
so the user can know the use of processing and memory as well as whether
there is any kind of overload that can result in an error.
The system greatly improves time using instances High-CPU, but
performance may be even better with the announcement of Amazon of a new
type of instance High-Performance Computing (HPC), which according to
specifications designed for processing large amounts of calculations. And
also it highlights the high-performance, low latency network, which is a
great benefit for applications that use MPI passing protocol messages.
Currently writing the results are made with the HDF5 library, we demonstrated
that this creates a delay compared to the processing time, and is more
noticeable when the simulation is continuously writes. A possible solution
to this problem is to compile the parallel HDF5 library from the source code
more dependencies, so it is possible that if this version is improved could
improve the delay in writing. Meep package offers a library hdf5utils for
post-processing the output files but has its limitations and download the
results may be more difficult due to the large size of some simulations. So
a more complete tool would be most optimal such as Octave, which is free
software, but we can also choose to trade options as Matlab.
BIBLIOGRAPHY

1. Wikipedia contributors, finite-difference-time-domain


method Wikipedia, the free encyclopedia, http://en.wikipedia.
org/w/index.php?title=Finite-difference_time-domain_
method&oldid=370208130[14 July 2010].
2. Taflove, A., & Hagness, S. C. (2005). Computational
electrodynamics: the finite-difference time-domain method.
Artech house.
3. Mario, C., & Omar, S. (2009) FDTD Electromagnetic Calculation
of Magnitudes. http://fittelecomunicaciones.blogspot.
com/2009/09/fdtd-al-calculo-de-magnitudes.html
4. Wikipedia Contributors, (2010). Cluster [online]. Wikipedia,
the free encyclopedia. http://es.wikipedia.org/w/index.
php?title=Cluster_(inform%C3%A1tica)&oldid=38185002.
5. Armbrust, M., Fox, A., Griffith, R., Joseph, A., Katz, R.,
Konwinski, A., et al., (2009). Above the Clouds: A Berkeley View
of Cloud Computing. Technical Report No. UCB / EECS 2009–
28, University of California at Berkley, USA.
6. Oscar, G. R., (2008). Extension of the Finite Difference Method
in the Time Domain for the Study of Hybrid Structures Microwave
Circuits Concentrates Including Assets and liabilities. Doctoral
Thesis, University of Cantabria, Spain.
7. Blaise, B., & Lawrence, L., (2010). Introduction to Parallel
Computing. Desde: https://computing.llnl.gov/tutorials/parallel_
214 Parallel Programming

comp/, July 15,


8. Kevin, D., (1993). High-Performance Computing, O’Reilly &
Associates, Inc., Sebastopol, CA.
10. Star cluster, (2010). Massachusetts Institute of Technology. http://
web.mit.edu/stardev/cluster/.
11. The Open MPI Project, Open Source High-Performance
Computing (2010). http://www.open-mpi.org/2004.
12. Parallel Meep. http://ab-initio.mit.edu/wiki/index.php/Parallel_
Meep.
13. Google Web Toolkit, (2010). http://code.google.com/webtoolkit/.
14. Brent, C., & Matt, M., (2010). Ganglia Cluster Toolkit. http://
ganglia.sourceforge.net/.
15. Abellanas, M., & Lodares, D., (1990). Analysis of Algorithms
and Graph Theory. Mexico: Macrobit-Ra-Ma.
16. Alfonseca, C. E., Alfonseca, M. M., & Moriyon, R., (2007).
Automata Theory and Formal Languages. Mexico: Mac Graw
Hill.
17. Booch, G., (1991). Object Oriented Design with Applications.
California: The Benjamin / Cummings Publishing Company, Inc.
18. Cairó / Gardati, (2000). Data Structure. Mexico: Mc Graw Hill.
19. Carlo, G. M. J., (1991). Fundamentals of Software Engineering.
New Jersey: Prentice Hall International.
20. Dasgupta, Sanjoy, Christos H. Papadimitriou, and Umesh
Virkumar Vazirani. Algorithms. McGraw-Hill Higher Education,
2008.
21. Deheza, M. E., (2005). Importance of Cohesion and Acoplamiento
in Software Engineering. Texcoco, Mexico: Universidad
Francisco Ferreira and Arriola (Thesis in Computer Science).
22. Deutsch, D., (sf). Lectures on Computation Quantum. Retrieved
on September 20, 2011, from: http://www.quiprocone.org/
Protected/DD_lectures.htm.
23. Dewdney, A. K., (1989). The Turing Omnibus: 61 Excursions in
Computer Science. New York: Computer Science Press.
24. Dictionary of the Royal Spanish Academy, (Sf). Retrieved on
September 30, 2015, from: http://lema.rae.es/drae/?val=algoritmo
25. Ellis, H. S. S., (1978). Fundamentals of Computer Algorithms.
Bibliography 215

United States of America: Computer Science Press.


26. Garey, M. R., & Johnson, D. S., (1975). Computer and
Intractability: A Guide to the Theory of NP-Completeness. USA:
A series of Books in the Mathematical Science.
27. Goldshlager, L., & Lister, A., (1986). Modern to Computer
Science with an Algorithmic Approach Introduction. Mexico:
Prentice Hall.
28. Kewis, H. R., & Papadimitriou, C. H., (1989). Elements of the
Theory of Computation. USA: Prentice Hall.
29. Levin, G., (2004). Computation and Modern Programming,
Computer Comprehensive Perspective. Mexico: Addison Wesley.
30. Loomis, M. E., (2013). Data Structure and Organization Files.
Mexico: Prentice Hall.
31. Penrose, R., (1989). The Emperor’s New Mind: Around
Cybernetics, the Mind and the Laws of Physics. Mexico: Fondo
de Cultura Economica.
32. Singh, S., & Ribet, K. A. (1997). Fermat’s last stand. Scientific
American, 277(5), 68-73.
INDEX

A Businesses providing customer 100

Acceleration of parallelization 55 C
Acceleration program 55 Calculation of communicating sys-
Additional agglomeration 47 tems 59
Algorithm grows 134 Calculation of Resonance Frequen-
Algorithmic solution 202 cy 29
Algorithm implemented 76 Calculation of resources 5
Algorithm remains proportional 141 Code and style sheets (CSS) 72
Amazon Web Services (AWS) 10 Combinatorial optimization 186
Application management 15 Combinatorial problem 171
Applications investigated 131 Communication 6, 7, 34, 35, 36, 37,
Application-specific integrated cir- 39, 40, 41, 43, 44, 45, 46, 49
cuit approaches (ASIC) 66 Communication and synchroniza-
Appropriate partition 34 tion 52, 53
Atmosphere 42, 43, 44, 45, 46, 47, Comparing strings 135
48 Comprehensive visualization 76
Atmospheric processes 42 Computational electromagnetic field
Automatic parallelization 67 3
B Computational environment 4
Computer program 60
Berkeley Open Infrastructure for Computer services 4
Network Computing (BOINC) Consuming valuable resources 92
64 Countryside mathematical 177
Bound design method algorithm 182 Critical parameter 129
Bound technique 182 Critical processes 92
218 Parallel Programming

Customer satisfaction 81, 82 (FDTD) 2, 11


Functional decomposition 34, 35,
D
41
Data parallelism 61 Fundamental nature 202
Demonstrate improvement 22
G
Design algorithm 170
Design a parallel algorithm 34 General purpose computing units on
Deterministic algorithm 206, 208 graphics processing (GPGPU)
Deterministic polynomial 207, 209 65
Developing mapping algorithms 41
H
Development environment 73
Distributed memory system 6 Half-Reference (TRM) 152
Document management 88 Hamiltonian path 177
Document Object Model (DOM) 72 High-performance computing
Domain decomposition approach (HPC) 6, 52, 53
34, 35 Horizontal dimension 44, 46
Domain decomposition atmosphere
45 I
Domain decomposition problems Identifying indicators 86
36 Identify stakeholders 82
Dynamic programming 162, 164, Implementing algorithms 73
170 Incongruity tormenting 196
E Inform management 89
Initial arrangement 140
Efficient manner 173 Instruction-level parallelism 60
Elastic Block Storage (EBS) 10 Internal organization 142
Electric field 5 Interrelated activities 83
Electromagnetic field 5, 22, 23, 29
Electromagnetic signals 29 K
Equations of electrodynamics 2 Knapsack problem 184
Euclidean algorithm 126
Exponential algorithm 203 L

F Load distribution 47, 48

Famous mathematician 195 M


Field-programmable gate (FPGA) Management stage processes 84
65 Massively parallel processor (MPP)
Final recursive function 126 64
Finite difference time domain Mathematical logic 196
Index 219

Mathematical paradox 196 Parallel systems 52


Mathematical system 195, 196, 197 Partial differential equations 2, 5
Message Passing Interface (MPI) 67 Partitioning algorithm 41
Methodological framework field- Perform multiple operations 52
work 99 Perform process optimization 85
Modern enterprises written material Permutation problem 166
93 Phase partitioning 36
Modern equipment 2 Pictorial representation 203
Modernization process 90 Positional system 132
Monitors local resources 14 Potential speedup algorithm 55
Multidimensional data structure 40 Problems linear programming 186
Multiple-instruction-multiple-data Process management facilitates 88
(MIMD) 59 Processor-processor communication
Multiple-instruction single-data 62
(MISD) 59 Programming language 126
Programming parallel program 67
N
Programming technique 52
Network File System (NFS) 10 Proposed technique 72, 73, 76
Non-deterministic algorithm 205, Pruning algorithm 182, 183
206, 208
Q
O Qualitative reasoning 131
Optimality Bellman statement 162 Quantum computer 209
Optimal sequence 162 Quantum computing 209
Optimal solution satisfying 170
R
Optimization problem 184, 207
Optimization process 162 Radiation measurements 47
Organizational activities 91 Real environment 76
Original formulation 5 Recursive function 126, 128, 129,
134
P Recursive stack 129
Parallel computer 205 Remarkable discovery 198
Parallel computing 5 Round-robin database (RRD) 14
Parallel program 38, 46
S
Parallel programming languages 58,
67 Scheme backtracking 179
Parallel program performance 54 Serial computation 54
Several problems 163, 170, 171
220 Parallel Programming

Significantly change processes 94 Total clear sky (TCS) 44


Simple strategy 46 Traveling salesman 167, 170, 179
Single-instruction-multiple-data Traveling salesman problem (TSP)
(SIMD) 59 166, 179, 187
Single-instruction-single-data Turing machine quantum 209
(SISD) 59 Typography 130
Software engineering 34, 39, 40, 46
U
Software transactional memory 58
Specialized hardware 53, 54 Uniform memory access Systems
Specific implementations 41 (UMA) 62
Spectrum for energy extraction 28 User interface testing (UI) 72
Subsequent monitoring 87
Subtraction operation 188 V
Supporting processes 84 Value Customer 91
Symmetric multiprocessors (SMPs) Very-large-scale integration (VLSI)
63 59
System creation and management 3,
4 W

T Written communication 93

Termination condition 127

Вам также может понравиться