Академический Документы
Профессиональный Документы
Культура Документы
This project is submitted in partial fulfilment of the requirements for the Bachelor of Science Honours
Degree in Information Technology.
1
Dedication
This research we dedicate it to cloud computing service providers and any scholars who would
wish to do a project in cloud computing resource scheduling algorithms or similar project.
2
Abstract
Cloud computing is growing rapidly and also changing shape in process. While its adoption
may have considerable impact on the world in many dimensions of business and technology,
these changes are difficult to predict for many reasons. One of them is that we cannot foresee
how cloud computing itself will evolve. None the less there are trends that have already begun
or appear to be imminent, which may be closely related to the future evolution of cloud
computing. As such it is worth taking closer look at them and monitoring them, and their
impact, over the next years.
3
Acknowledgements
We are greatly indebted mostly to the people who posted their journals on Google Scholar
which we used mostly to get the information about cloud computing resource scheduling
algorithms. The support of our supervisor Mr Muchabaiwa is greatly acknowledged. It is not
possible to list all the people that contributed to the successful conduction of our study.
Thanking all our parents, friends and all those who had helped us in carrying out this work. We
are also indebted to the library resources centre and interest services that enabled us to ponder
over the vast subject of resource scheduling algorithm.
4
Topic
5
Table of Contents
Chapter 1..................................................................................................................................................... 8
Introduction............................................................................................................................................. 8
Background of Study .............................................................................................................................. 9
Statement of the Problem...................................................................................................................... 10
Objectives ............................................................................................................................................. 11
Significance of the Study ...................................................................................................................... 12
Research Questions ............................................................................................................................... 13
Limitations ............................................................................................................................................ 14
Delimitations......................................................................................................................................... 15
Operational Definition of Terms........................................................................................................... 16
Chapter 2................................................................................................................................................... 20
Literature Review ................................................................................................................................. 20
Cloud Architecture............................................................................................................................ 20
Resources scheduling problems ........................................................................................................ 21
Resource Scheduling Algorithms ..................................................................................................... 22
Chapter 3................................................................................................................................................... 62
Methodology ......................................................................................................................................... 62
Introduction....................................................................................................................................... 62
Research Design ............................................................................................................................... 62
Research Instruments ........................................................................................................................ 62
Population ......................................................................................................................................... 63
Sampling procedure and Sample size ............................................................................................... 63
Data collection procedures................................................................................................................ 63
Data Analysis .................................................................................................................................... 64
Summary ........................................................................................................................................... 64
Chapter 4................................................................................................................................................... 65
Data Presentation .................................................................................................................................. 65
Chapter 5................................................................................................................................................... 69
Proposed Scheduling Model ................................................................................................................. 69
Hybrid PAS Algorithm (Prioritisation Sharing Allocation) ................................................................. 70
Summary and Conclusion ..................................................................................................................... 71
Recommendations................................................................................................................................. 72
Reference .................................................................................................................................................. 74
6
7
Chapter 1
Introduction
Introduction of cloud computing Cloud computing is nothing but it is a collection/group of
integrated and networked hardware, software and Internet infrastructure (called a platform).In
addition, the platform provides on demand services, that are always on, anywhere, anytime and
anyplace. Cloud computing technology virtualizes and offers many services across the network.
It mainly aims at scalability, availability, throughput, and resource utilization. What is resource
scheduling? Resource scheduling is a way of determining schedule on which activities should
be performed. Demand for resource. Resources scheduling strategy is the key technology in
cloud computing. Cloud computing is growing rapidly hence there is need to manage resources.
8
Background of Study
Cloud Computing is the use of computing resources (Hardware and Software) that are delivered
as a service over a network (typically the internet). It supplies a high performance computing
based on protocols which allow shared computation and storage over long distances.
Shailesh Sawant states that, the cloud computing platform guarantees subscribers that it sticks
to the service level agreement (SLA) by providing resources as service and by needs. However,
day by day subscriber’s needs are increasing for computing resources and their needs have
dynamic heterogeneity and platform irrelevance. In the cloud computing environment,
resources are being shared but they are not properly distributed resulting in resource wastage.
Therefore, the main problems to be solved are how to meet the needs of the subscribers and
how to dynamically as well as efficiently manage the resources.
In addition, in cloud computing, there are many tasks requires to be executed by the available
resources to achieve best performance, minimal total time for completion, shortest response
time, utilization of resources. Because of these different intentions, we need to design, develop,
propose a scheduling algorithm to outperform appropriate allocation map of tasks on resources.
9
Statement of the Problem
What is the most efficient and effective resource scheduling algorithm that can be used on
cloud platforms?
10
Objectives
a) Assess problems being faced by cloud service providers in relation to resource scheduling.
b) Evaluate resource scheduling algorithms that are currently available on cloud platforms.
c) To develop an improved resource scheduling algorithm for cloud data services.
d) Simulate the algorithms .
11
Significance of the Study
Our study will help cloud service providers to predict dynamic nature of user, user demands and
application demands. Service providers can also use this study to manage resources for
individuals or clients. It also seeks to integrate cloud provider activities for utilising and
allocating scarce/availability resources within the limit of cloud environment so as to meet the
needs of the cloud applications and clients.
In addition, the study is important since it seeks to address the problems faced by clients when
accessing cloud platforms. The study will come up with solutions to avoid resource contention
that is two applications trying to access the same resource at the same time. Due to denial of
service that people experience on cloud platforms even if they have the right to access, the
study will devise ways in which resources can be effectively scheduled.
12
Research Questions
a) What are the resources scheduling problems being faced by cloud service providers?
b) What are the different scheduling algorithms that can be used to enhance prioritisation,
resource sharing and resource allocation?
c) What is the most effective and efficient algorithm that can be used in cloud platforms to
enhance prioritisation, resource allocation and sharing?
13
Limitations
In conducting our research we faced difficulties which led to us having limitations to our study
and the limitations are:
a) The number of companies that offer cloud services are a few in Zimbabwe.
b) Most of the algorithms mainly focus on Load Balance on Cloud VM.
c) Most of the companies are not willing to disclose the resource scheduling algorithms that they use.
d) Time taken by respondents is too long considering that our research is being done within a
timeline.
14
Delimitations
a) The study will examine at resource scheduling algorithms for cloud data services.
b) The study will be confined to clients that use cloud data services.
c) The study will focus in the resources that are used in cloud data services.
15
Operational Definition of Terms
Algorithm
From Algorithm-Wikipedia, the free encyclopaedia, an algorithm is a step-by-step procedure
for calculations.
Back End
According to Vijayalakshmi A. Lepakshi and Dr. Prashanth C S R, the back end of the cloud
computing architecture is the cloud itself, comprising various computers, servers and data
storage devices.
Cloud
1. According to Judith Hurwitz, Robin Bloor, Marcia Kaufman and Fern Halper, the “cloud”
in cloud computing can be defined as the set of hardware, networks, storage, services, and
interfaces that combine to deliver aspects of computing as a service.
2. According to R. Buyya and S.Venugopal, a cloud is a type of parallel and distributed system
consisting of a collection of inter-connected and virtualized computers that are dynamically
provisioned and presented as one or more unified computing resources based on service-
level agreements established through negotiation between the service provider and
consumers.
Cloud Computing
a) It is the storing and accessing of applications and computer data often through a Web
browser rather than running installed software on your personal computer or office server.
b) It is an internet-based computing whereby information, IT resources, and software
applications are provided to computers and mobile devices on-demand.
c) Using the internet to access web-based applications, web services and IT infrastructure as
a service.
Cloud Platform
From Adrian Otto’s Blog a cloud platform is a system where software applications may be run
in an environment composed of utility cloud services in a logically abstract environment.
Cloud Provider
It is a company that provide components of cloud computing typically IaaS, NaaS, PaaS and
SaaS to subscribers or customers.
Community Cloud
16
Dr Mark I Williams states that Community clouds are used by distinct groups (or
‘communities’) of organizations that have shared concerns such as compliance or security
considerations, and the computing infrastructures may be provided by internal or third-party
suppliers.
CPU time is the amount of time for which a central processing unit was used for processing
instructions of a computer program or operating system
Data Centre
According to Ratan Mishra, Anant Jaiswal, a datacentre is nothing but a collection of servers
hosting different applications. An end user connects to the datacentre to subscribe different
applications.
Front End
Vijayalakshmi A. Lepakshi and Dr. Prashanth C S R states that the front end is the part seen by
the client, i.e. the computer user. This includes the client‘s network (or computer) and the
applications used to access the cloud via a user interface such as a web browser.
Hard disk space is the amount of permanent storage of data measured in bytes and this storage
is maintained whether the computer is on or off.
Hybrid Cloud
According to Cloud computing - Wikipedia, the free encyclopaedia states that hybrid cloud is a
composition of two or more clouds (private, community or public) that remain unique entities
but are bound together, offering the benefits of multiple deployment models.
Network throughput is the average rate of successful message delivery over a communication
channel.
17
PaaS is a set of services aimed at developers that helps them develop and test apps without
having to worry about the underlying infrastructure. Developers don't want to have to worry
about provisioning the servers, storage and backup associated with developing and launching an
app.
According to Cloud computing - Wikipedia, the free encyclopaedia, this is where cloud
providers deliver a computing platform, typically including operating system, programming
language execution environment, database, and web server. Application developers can develop
and run their software solutions on a cloud platform without the cost and complexity of buying
and managing the underlying hardware and software layers. PaaS offers, the underlying
computer and storage resources scale automatically to match application demand so that the
cloud user does not have to allocate resources manually.
Public Cloud
According to Cloud computing - Wikipedia, the free encyclopaedia, a cloud is called a 'Public
cloud' when the services are rendered over a network that is open for public use.
Private Cloud
Cloud computing - Wikipedia, the free encyclopaedia states that Private cloud is cloud
infrastructure operated solely for a single organization, whether managed internally or by a
third-party and hosted internally or externally.
Server
According to Bradley Mitchell, a server is a computer designed to process requests and deliver
data to other (client) computers over a local network or the internet.
Throughput
18
Random Access Memory is the most common computer memory which can be used by
programs to perform necessary tasks while the computer is on
Virtual Memory is the memory that appears to exist as main storage although most of it is
supported by data held in secondary storage, transfer between the two being made automatically
as required
Virtual Machine is a software based or fictive computer which may be based on specifications
of a hypothetical computer or emulate the computer architecture and functions of a real world
computer.
19
Chapter 2
Literature Review
Cloud computing is internet based computing, whereby shared resources, software and
information are provided to computers and other devices on demand, like a public utility. Cloud
computing is technology that uses the internet and central remote servers to maintain data and
applications. This technology allows consumers and businesses to use application without
installation and access their personal files at any computer with internet access. An essential
requirement in cloud data services is scheduling the current jobs to be executed with the given
constraints. The scheduler should order the jobs in a way where balance between improving the
quality of services and at the same time maintaining the efficiency and fairness among the jobs.
Thus, evaluating the performance of scheduling algorithms is crucial towards realizing large-
scale distributed systems. In spite of the various scheduling algorithms proposed for cloud data
services, there is no comprehensive performance study undertaken which provides a unified
platform for comparing such algorithms.
Cloud Architecture
The two most significant components of cloud computing architecture are known as the front
end and the back end.
20
Types of clouds
There are different types of clouds that you can subscribe to depending on your needs. As a
home user or small business owner, you will most likely use public cloud services.
a) Public Cloud
b) Private Cloud
c) Community Cloud
d) Hybrid Cloud
21
Need Of Resource Scheduling Algorithm:
a) Minimize the variation during the resource demand.
b) Improve efficiency.
c) Reflect reality:
Modifying activities within time, in other word modify resource loading for each unit of
time.
d) Technical aspects:
Not every technology is absolutely new, but is enhanced to realize a specific feature,
directly or as a pre-condition.ˆ Virtualization is an essential characteristic of cloud
computing. Virtualization in clouds refers to multi-layer hardware platforms, operating
systems, storage devices, network resources, etc. The first prominent feature of
virtualization is the ability to hide the technical complexity from users, so it can
improve independence of cloud services. Secondly, physical resource can be efficiently
configured and utilized, considering that multiple applications are run on the same
machine. Thirdly, quick recovery and fault tolerance are permitted.ˆ Virtual
environment can be easily backed up and migrated with no interruption in service.
e) Resource management:
From the cloud platform service provider’s point of view, large scale of virtual
machines needs to be allocated to thousands of distributed users, dynamically, fairly,
and most important, profitably. From the consumer’s point of view, users are economy-
driven entities when they make the decision to use cloud service.
22
Bee Algorithm (Bee Life Algorithm)
Bees in Nature
According to Bitam S., Batouche M., Talbi E.G., as a social and domestic insect, the bee is
native to Europe and Africa. The bees feed on nectar as a source of energy in their lives and use
pollen as a source of protein in the rearing larvae. The bee colony contains generally, a single
breeding female called Queen, a few thousands of males known as the Drones, a several
thousands of sterile females called Workers, and many young bee larvae called Broods. The
bees share a communication language of extreme precision, based on two kinds of dances: the
round dance when food is very close. They are carried out when bees search food. The bees’
reproduction is guaranteed by the queen. It will mate with several males in full flight, until her
spermatheca is full. The unfertilized egg will give rise to a drone, while, the fertilized egg gives
rise to worker or queen depending on food quality.
According to Pradeep R. and Kavinya R., it is a nature inspired algorithm which tries to track
the activities of bee to get their food. First they select a scout bee to go and search a wide
domain of areas, if a scout bee finds a potential food resource it returns to its hive and does a
waggle dance which tells other bees the direction and the distance of the potential food
resource. A set of selected bees goes to the food resource and starts bringing in the honey while
other scout bee’s does the same work and sets of bees are sent to different location to bring in
the food. After every identification of a food resource the scout bee informs others and sets its
course for other new sites nearby the potential food resource. Using these activities we define
terms as in no of scout bees (n), no of sites selected out of n visited sites(m), no of best sites out
of whole set (e), no of bees recruited for the best sites e (nep), no of bees recruited for other
sites (m-e).
Bee’s algorithm is as stated below:
1. Initialize population with random solutions.
2. Evaluate fitness of the population.
3. While (stopping criterion not met)
4. Select sites for neighbourhood search.
5. Recruit bees for selected sites (more bees for best e sites) and evaluate fitness.
6. Select the fittest bee from each patch.
7. Assign remaining bees to search randomly and evaluate their fitness.
8. End While.
In first step, the bee’s algorithm starts with the scout bees (n) being placed randomly in the
search space. In step 2, the fitness of the sites visited by the scout bees are evaluated. In step 4,
bees that have the highest fitness are chosen as “selected bees” and sites visited by them are
chosen for neighbourhood search. Then, in steps 5 and 6, the algorithm conducts searches in the
neighbourhood of the selected sites, assigning more bees to search near to the best e sites. The
bees can be chosen directly according to the fitness associated with the sites they are visiting.
23
Alternatively, the fitness values are used to determine the probability of the bees being selected.
Searches in the neighbourhood of the best e sites which represent more promising solutions are
made more detailed by recruiting more bees to follow them than the other selected bees.
Together with scouting, this differential recruitment is a key operation of the Bees Algorithm.
However, in step 6, for each patch only the bee with the highest fitness will be selected to form
the next bee population. In nature, there is no such a restriction. This restriction is introduced
here to reduce the number of points to be explored. In step 7, the remaining bees in the
population are assigned randomly around the search space scouting for new potential solutions.
These steps are repeated until a stopping criterion is met. At the end of each iteration, the
colony will have two parts to its new population – those that were the fittest representatives
from a patch and those that have been sent out randomly.
Scheduling Based on Bee Algorithm
The proposed algorithm for resource allocation using the above explained bee algorithm is
justified. In this, meta-scheduler is used, which sends independent jobs to various clusters
present. In the beginning, the jobs will be submitted to meta-scheduler.
Select
The meta-scheduler using a select function will find a job which has lowest memory
requirement, lowest I/O requirement and lowest processor required to complete their job which
will act as a scout bees to find a site.
f(n) = min{Uni=1 j(i)}
where j(i) denotes the jth job and function f(n) determines the minimum resource requirement
job.
Fitness
The minimum resource requirement job which acts as a scout bee are identified and sent to the
cluster where in the jobs identify the instances present. A scout job identifies the site by using a
fitness function which runs that job in a particular instance and if a progress is made it
determines that instance specification as in it is either memory oriented or processor oriented.
Conceptually, fitness refers to how much progress each job is making with assigned resources
compared to the same job running on the entire cluster. Therefore, it is between 0 (no progress
at all) and 1 (progress is shown). The computing rate (CR) is used to calculate the fitness of an
instance to the job. Specifically, given that CR(s; j) is the computing rate of slot s for job j, the
progress share where Sj is a set of slots running tasks of job j and S is the set of all slots.
F(j) = CR(s;j) / CR(s';j)
Waggle
24
By identifying the sites resources the scouts returns to meta-scheduler and does the waggle
function. Waggle function segregates the jobs present in meta-scheduler based upon scouts
information like whether the instance is memory or processor oriented also its distance and cost
to travel. The grouping takes place in such a way that the memory scout job passes on the
information to the memory oriented jobs present and the selected set goes to the cluster to get
executed. Care should be taken to select all jobs within instances capacity. If a job outruns
instance capacity then the job has to wait until the scout jobs find another resource available
adjacent to the just found site.
W(n)= {Umi=1 s(j). s}
where the s job’s resources which is an integral multiple of scout job i.e. s(j)
After waggle function the subsets of jobs are rendered to the desired sites by meta-scheduler
and scout jobs sets course for another site exploration. Hence after the fitness function have to
run only for the scout jobs in algorithm and the time is reduced drastically. Thus the resources
are efficiently allocated also the time is reduced.
Modified Bees Life algorithm (BLA)
Tasquia Mizan, Shah Murtaza Rashid A and Masud, Rohaya Latip further modified the Bee
Algorithm and called it the Bee Life Algorithm. They chose the Bees Life algorithm (BLA) as
an optimization algorithm for its simplicity of operation and power of effect. Each cycle of a
bee population life consists of two bee behaviours: reproduction and food foraging respectively.
In reproduction behaviour, the queen starts mating in the space by mating-flight with the drones
using mutation and crossover operators. Our idea is the adaptation of the BLA operator’s value
(selection; mutation; crossover) during the run of the BLA. In this section task will be
scheduled. In the food foraging part of BLA, we propose to use a greedy method which will
find the nearest cloud storage centre (CSC) using shortest path algorithm. Therefore, scheduler
will assign each task to a nearest cloud storage centre.
In this algorithm, N means the number of population in bees colony, D mention the drones bee's
population and W mention the worker bee's population. Pseudo code for the proposed job
scheduling algorithm using BLA and Greedy Method is as follows:
i. Get new jobs to be scheduled. The jobs to be scheduled include uncompleted task
and new jobs enter the global queue and the queue size is predicted as open.
ii. Generating tasks property by GIS.
iii. Get the current state of the system.
iv. Go through the BLA to get optimised task schedule.
a. Initialise population N bees.
b. Evaluate fitness of population.
25
c. While stopping criteria is not satisfied forming new population. /*reproduction
behaviour*/
d. Generate N broods by mutation and crossover.
e. Evaluate fitness of broods.
f. If the fittest brood is fitter than the queen then replace the queen for the next
generation.
g. Choose D best bees among D fittest following broods and drones of current
population to form next generation drones.
h. Choose W best bees among W fittest remaining broods and workers of current
population to ensure food foraging. /*food foraging behaviour*/
v. Use greedy method to find a neighbourhood.
In BLA, at first the algorithm will choose a set of tasks randomly. Then the next step in fitness,
the calculation of the makespan for that set of tasks for a particular CSC is done. After that The
Algorithm will check the stopping criteria, if the total jobs not scheduled than generate a new
set of tasks. In the reproduction stage the algorithm will find out which set of tasks will
forwarded to which CSC by mutation and crossover. In the step 'e' fitness will compare the
priority for the set of tasks chosen by step 'd' for a particular CSC. If the set of tasks chosen by
step 'd' has higher priority then it will be scheduled first by replacing the queen for next
generation in step 'f'. In the steps 'g' and 'h' will find out the next set of tasks with priority basis
and choose the tasks that will go to the CSC concurrently. In food foraging behaviour, step 'v'
the greedy method will start by initially reach the first CSC using the shortest path algorithm
,then find its successors and repeat the process until the next CSC is found by steps va, vb and
vc. When the sets of tasks with less makespan allocated to a nearest CSC in the hybrid cloud
then the other set of tasks will be selected with priority basis for scheduling in step 'i'. At the
end the optimal solution will be obtained.
26
The Flow Chart Of Bee Algorithm:
Multiple Users Requests Global Queue
Task Properties
NO
Do jobs need to be
scheduled?
YES
Check System Status
BLA
Initialise Population
Check Fitness
YES
Stopping Criteria
NO
Reproduction
Select Drones
Mutation
Crossover
Food Foraging
Greedy Method
Find Successor
Optimal Scheduling
27
Efficiency and Performance Analysis
A number of different set of tasks with same number of instructions and assuming the same
execution time has been used for examining the efficiency of scheduling methods. The common
and significant evaluation methods are makespan and flowtime. AdilYousif states that
makespan is the time where system completes the latest task; and flowtime is the total of
execution times of all tasks submitted to the cloud. In order to evaluate the performance and the
effectiveness of BLA scheduler, we assume that 3 jobs to be executed in 3 CSC and resources.
Each job can be divided in tasks relevant to the task property and scheduled to the CSC. Each
task is defined by the execution time. A simple simulation has been conducted to measure the
performance of proposed scheduling algorithm. The results is shown in Figure 4 illustrates that
the proposed method has less makespan than the other nature inspired algorithm such as firefly
algorithm (FA) or even the genetic algorithm (GA).
28
Disadvantages
Random initialisation
Algorithm has several parameters
Parameters need to be tuned
2. Max-Min Algorithm
Max-Min is almost same as the min-min algorithm except the following: in this after
finding out the completion time, the minimum execution times are found out for each and
every task. Then among these minimum times the maximum value is selected which is
the maximum time among all the tasks on any resources. Then that task is scheduled on
the resource on which it takes the minimum time and the available time of that resource
is updated for all the other tasks. The updating is done in the same manner as for the
Min-Min. All the tasks are assigned resources by this procedure.
Pardeep Kumar and Amandeep Verma have implemented the logic of Min-Min and Max-Min
algorithms on the execution time values as given in the TABLE I below. They have assumed
four machines and six tasks.
29
Figure 1 Task assignment by Min-Min algorithm
There is a term “makespan” in Min-Min and Max-Min scheduling techniques, which is the
maximum execution time on any machine among the machines on which the tasks are
scheduled. For example, in Figure 1, “630” is the makespan because it is the maximum
execution time among the four machines. In Figure 1 and Figure 2, the x-axis represents the
different machines and y-axis represents the execution times. According to Pardeep Kumar and
Amandeep Verma, they got the following different values of makespans by the two techniques:
Method used Makespan
Min-Min 630
Max-Min 590
Based on the different execution times of tasks on resources, one technique can outperforms the
other and the assignment of resources to the tasks can change i.e. if any task is assigned to a
machine if one technique is used; the same task can be assigned to another machine another
technique is used.
Genetic algorithm is a method of scheduling in which the tasks are assigned resources
according to individual solutions (which are called schedules in context of scheduling), which
30
tells about which resource is to be assigned to which task. Genetic Algorithm is based on the
biological concept of population generation. The main terms used in genetic algorithms are:
A. Initial Population
Initial population is the set of all the individuals that are used in the genetic algorithm to
find out the optimal solution. Every solution in the population is called as an individual.
And every individual is represented as a chromosome for making it suitable for the genetic
operations. From the initial population the individuals are selected and some operations are
applied on those to form the next generation. The mating chromosomes are selected based
on some specific criteria.
B. Fitness Function
A fitness function is used to measure the quality of the individuals in the population
according to the given optimization objective. The fitness function can be different for
different cases. In some cases the fitness function can be based on deadline, while in cases
it can be based on budget constraints.
C. Selection
We use the proportion selection operator to determine the probability of various
individuals genetic to the next generation in population. The proportional selection
operator means the probability which is selected and genetic to next generation groups is
proportional to the size of the individual's fitness.
D. Crossover
We use single-point crossover operator. Single-point crossover means only one
intersection was set up in the individual code, at that point part of the pair of individual
chromosomes is exchanged.
E. Mutation
Mutation: - Mutation means that the values of some gene locus in the chromosome coding
series were replaced by the other gene values in order to generate a new individual.
Mutation is that negates the value at the mutate points with regard to binary coded
individuals.
31
In Genetic Algorithm the initial population is generated randomly, so the different schedules
are not so much fit, so when these schedules are further mutated with each other, there are very
much less chances that they will produce better child than themselves. They have provided an
idea for generating initial population by using the Min-Min and Max- Min techniques for
Genetic Algorithms. As discussed in Genetic Algorithm; the solutions that are fit, give the
better generations further when genetic operators are applied on them and hence if Min-Min
and Max-Min will be used for the individual generation, we will get the better initial
population and further the better solutions than in the case of standard Genetic Algorithm in
which initial population is chosen randomly.
32
Table 2 Makespans For Fixed VMs And Varying Cloudlets
VMs Fixed: 10 Cloudlets varying
10 20 30 40
M Improve Genetic 8 26.1 60.9 113.5
E
T
H
O
D
Standard Genetic 12.4 44.7 86.5 146.8
U
S
E
D
The performance of the first case according to the noted values is shown in graph of Figure. 3,
in which x-axis shows the number of cloudlets and the y-axis shows the makespans and the
number of virtual machines is fixed as 10:
Figure 3 Graph For Makespans For Fixed VMs And Varying Cloudlets
In second case, we have fixed the number of cloudlets 40 and they are varying the number of
virtual machines from 10 to 40 with a difference of 10. They have run each algorithm 10 times
and the average of these 10 runs is noted down in the Table III shown below:
33
Table 3 Makespans for Fixed Cloudlets and Varying VMs
Cloudlets Fixed: 10 VMs varying
10 20 30 40
M Improve Genetic 8 26.1 60.9 113.5
E
T
H
O
D Standard Genetic 12.4 44.7 86.5 146.8
U
S
E
D
The performance of the first case according to the noted values is shown in graph of Figure. 4,
in which x-axis shows the number of virtual machines and the y-axis shows the makespans and
the number of cloudlets is fixed as 40:
Figure 4 Graph for makespans for fixed Cloudlets and varying VMs
From both the graphs, it can be observed that the makespan of the Improved Genetic Algorithm
is less than that of Standard Genetic Algorithm. So the new improved Genetic Algorithm
helped in reducing overall execution time of the tasks and in proper utilization of resources.
34
Advantages
Improved Genetic algorithms are easily transferred to existing simulations and models
Easy to understand and practically does not demand the knowledge of mathematics
Disadvantages
The genetic algorithm performed well on some problems that were very difficult for the
branch and bound techniques (i.e. the branch and bound method took a long time to the
optimal solution the genetic algorithm did not perform well on problems in which the
resources were tightly constrained). This comes as little surprise since there presentation
forces the genetic algorithm to search for resource- feasibility, and tightly constrained
resources mean fewer resource-feasible solutions the genetic algorithm did not perform
well on the job shop problem
35
Ant Colony Algorithm
Basic principles of Ant trail laying:
According to Ratan Mishra and Anant Jaiswal, depending on the species, ants may lay
pheromone trails when travelling from the nest to food, or from food to the nest, or when
travelling in either direction. They also follow these trails with a fidelity which is a function of
the trail strength, among other variables. Ants drop pheromones as they walk by stopping
briefly and touching their gesture, which carries the pheromone secreting gland, on the ground.
The strength of the trail they lay is a function of the rate at which they make deposits, and the
amount per deposit. Since pheromones evaporate and diffuse away, the strength of the trail
when it is encountered by another ant is a function of the original strength, and the time since
the trail was laid. Most trails consist of several superimposed trails from many different ants,
which may have been laid at different times; it is the composite trail strength which is sensed by
the ants.
Shilpa Damor states that the Ant Colony algorithm was aims to search for an optimal path in a
graph, based on the behaviour of ants seeking a path between their colony and a source of food.
36
The Task Resource Scheduling (TRS) is the problem of assigning T tasks to R. Resources so
that the assignment cost is minimized, where the cost is defined by a Cost function. The TRS is
considered one of the hardest combinatorial optimization (CO) problems, and can be solved to
optimality only for instances.
For example there is a dependency between tasks i & j then the edge weight between them is
defined as CCij. If there is no edge between the tasks i & task j then CCij is defined as zero.
The edge weight between same tasks are also defined as zero, i.e. CCii=0.
37
Figure 6 Task Dependence-Communication Cost Graph (TDG)
Tij matrix consists of M×M entries indicates the communication cost between the tasks where
M is the number of tasks. Rij matrix consists of N×N entries indicates communication cost
between resources where N is number of resources available in cloud environment.
38
Implementation of ACO:
Ant colony Optimization Algorithm is implemented to solve the task scheduling problem.
Tasks arrives into the system randomly then group them as batches. The batch of tasks 90
generated is given as input to the genetic algorithm. The ACO schedules these tasks onto the
available resources. ACO generates new solution and test the solution by evaluating the cost
function. Each solution is scored and either accepted or rejected before considering it for the
next iteration.
Makespan of the cloud environment can be calculated using the Total Cost F(x) = TM
+ TD + TO+ TE +TC. Makespan is used to measure the throughput of the cloud system.
Minimum F(x) value will be taken as the makespan of the cloud system. The proposed
algorithm should minimize the makespan value of the cloud grid system
The task dependencies are represented as a Task Dependence Graph (TDG), as depicted in
Figure 7.
39
Table 6 Task Dependency Cost Matrix representation
Task # 1 2 3 4 5 6
1 0 0 0 1 0 0
2 0 0 1 3 4 0
3 0 1 0 0 3 0
4 1 3 0 0 0 5
5 0 4 3 0 0 2
6 0 0 0 5 2 0
Cloud is collection of Data centers represented as weighted graph. In the graph each node
represents a data centre or resource. The cost of communication between the Data centers is
represented using the variable ‘dc’. For example dcij is the cost of communication between data
centers i and j.
Figure 8 Resource Dependency with Communication Cost Graph
dc12
DC #1 DC #2
dc14
dc13 dc24
dc32 = 3
DC #3 DC #4
dc3
40
Table 7 Resource Dependency Communication Cost Matrix Representation
Data Centre # 1 2 3 4
1 0 1 2 4
2 1 0 3 1
3 2 3 0 2
4 4 1 2 0
41
Table 11 Probability Matrix for Resources
Pij R1 R2 R3 R4
R1 0.0 0.142857 0.285714 0.571428
R2 0.2 0.0 0.6 0.2
R3 0.285714 0.428571 0.0 0.285714
R4 0.571428 0.142857 0.285714 0.0
42
T2(15) 1(5) 0 0 0
T3(20) 0 1(5) 0 0
T4(25) 0 0 1(15) 0
T5(10) 0 0 1(5) 0
T6(30) 0 0 0 1(20)
Path 2: R2 R1 R4 R3
43
Availability of Resources after allocating the each task in path 2
DC#2(25) DC#1(30) DC#4(50) DC#3(40)
T1(10) 1(15) 0 0 0
T2(15) 1(0) 0 0 0
T3(20) 0 1(10) 0 0
T4(25) 0 0 1(25) 0
T5(10) 0 1(0) 0 0
T6(30) 0 0 0 1(10)
Path 3: R2 R4 R1 R3
Task allocation of resources by using path 3
44
Availability 20 0 10 5
Path 4: R3 R4 R1 R2
Task allocation of resources by using path 4
45
Total Communication Cost (Tc) for completion of all tasks in path 4
Data Communication cost (Tc)
Centres
DC1 cc64*dc14+cc65*dc13 24
DC2 ----------------- 0
DC3 cc14*dc34+cc24*dc34+cc25*dc33+cc23*dc34+cc52*dc33+cc53*dc34+cc56*dc31 20
DC4 cc32*dc43+cc35*dc43+cc41*dc43+cc42*dc43+cc46*dc41 36
Total Communication Cost (Tc) 80
Path 5: R3 R1 R4 R2
Task allocation of resources by using path 5
46
Path 6: R4 R2 R3 R1
Task allocation of resources by using path 5
As a measure of performance, total cost function was used. The total cost was computed
using two heuristics: Ant Colon Optimisation (ACO) based cost optimization, and Genetic
Algorithm (GA) selecting a resource based on minimum cost.
The total numbers of tasks were set 10 to 100, the processing times of tasks are uniform
distribution in [5, 10] and the memory requirement is also uniform distribution in [50, 100].
The interactive data between of tasks are varying from 1 to 10, and the communications
47
between Data Centres are varied by uniform distribution from 1 to 10. Simulation results
demonstrate that more iterations or number of particles obtain the better solution since more
solutions were generated.
Figure 9 Experimental Observations of GA and ACO
Advantages
for a small number of nodes, problems can be solved by exhaustive search
The algorithm has strength in both and local global searches
Implemented in several optimization problems
Disadvantages
Given large number of nodes it is very difficult to carry out computations.
Theoretical analysis is difficult, due to sequences of random decisions
Random initialization
Probabilistic approach in the local search
48
Particle Swarm Optimization (PSO)
Particle Swarm Optimization (PSO) is a self-adaptive global search based optimization technique
introduced by Kennedy and Eberhart. According to Suraj Pandey, LinlinWu, Siddeswara, Mayura
Guru and Rajkumar Buyya, the algorithm is similar to other population-based algorithms like
Genetic algorithms but, there is no direct re-combination of individuals of the population. Instead, it
relies on the social behaviour of the particles. In every generation, each particle adjusts its trajectory
based on its best position (local best) and the position of the best particle (global best) of the entire
population. This concept increases the stochastic nature of the particle and converges quickly to
global minima with a reasonable good solution. PSO has become popular due to its simplicity and its
effectiveness in wide range of application with low computational cost. Some of the applications that
have used PSO are: the reactive voltage control problem, data mining, chemical engineering, pattern
recognition and environmental engineering. The PSO has also been applied to solve NP-Hard
problems like Scheduling and task allocation.
49
Figure 10 depicts a workflow structure with five tasks, which are represented as nodes. The
dependencies between tasks are represented as arrows. This workflow is similar in structure
to our version of the Evolutionary Multi-objective Optimization (EMO) application [20]. The
root task may have an input file (e.g. f.in) and the last task produces the output file (e.g.
f.out). Each task generates output data after it has completed (f12, f13, ..., f45). These data are
used by the task’s children, if any. The numeric values for these data is the edge-weight
(еk1,k2) between two tasks k1 ∈ T and k2 ∈ T . The figure also depicts three compute resources
(PC1, PC2, PC3) interconnected with varying bandwidth and having its own storage unit (S1,
S2, S3). The goal is to assign the workflow tasks to the compute resources such that the total
cost of computation is minimized.
The problem can be stated as: “Find a task-resource mapping instance M, such that when
estimating the total cost incurred using each compute resource PCj , the highest cost among
all the compute resources is minimized.”
Let Cexe(M)j be the total cost of all the tasks assigned to a compute resource PCj (Eq. 1). This
value is computed by adding all the node weights (the cost of execution of a task k on
compute resource j) of all tasks assigned to each resource in the mapping M. Let Ctx(M)j be
the total access cost (including transfer cost) between tasks assigned to a compute resource
PCj and those that are not assigned to that resource in the mapping M (Eq. 2). This value is
the product of the output file size (given by the edge weight ek1,k2) from a task k1 ∈ k to task
k2 ∈ k and the cost of communication from the resource where k1 is mapped (M(k1)) to
another resource where k2 is mapped (M(k2)).
The average cost of communication of unit data between two resources is given by
dM(k1),M(k2). The cost of communication is applicable only when two tasks have file
dependency between them, that is when ek1,k2 > 0. For two or more tasks executing on the
same resource, the communication cost is zero.
Equation 4 ensures that all the tasks are not mapped to a single computer resource. Initial
cost maximization will distribute tasks to all resources. Subsequent minimization of the
overall cost (Equation 5) ensures that the total cost is minimal even after initial distribution.
For a given assignment M, the total cost Ctotal(M)j for a computer resource PCj is the sum of
execution cost and access cost (Eq. 3).
When estimating the total cost for all the resources, the largest cost for all the resources is
minimized (Eq. 5). This indirectly ensures that the tasks are not mapped to single resources
and there will be a distribution of cost among the resources.
50
In this section, we present a scheduling heuristic for dynamically scheduling workflow
applications. The heuristic optimizes the cost of task-resource mapping based on the solution
given by particle swarm optimization technique.
The optimization process uses two components:
a) the scheduling heuristic as listed in Algorithm 1, and
b) the PSO steps for task-resource mapping optimization as listed in Algorithm 2. First,
we will give a brief description of PSO algorithm.
where:
𝑣𝑖𝑘 velocity of particle i at iteration k
𝑘+1
𝑣𝑖 velocity of particle i at iteration k + 1
ω inertia weight
cj acceleration coefficients; j = 1, 2
randi random number between 0 and 1; i = 1, 2
𝑥𝑖𝑘 current position of particle i at iteration k
pbesti best position of particle i
ɡbest position of best particle in a population
𝑥𝑖𝑘+1 position of the particle i at iteration k + 1.
51
7. for all “ready” tasks {ti} ∈ T do
8. Assign tasks {ti} to resources {pj} according to the solution provided by PSO
9. end for
10. Dispatch all the mapped tasks
11. Wait for polling time
12. Update the ready task list
13. Update the average cost of communication between resources according to the current
network load
14. Compute PSO({ti})
15. until there are unscheduled tasks
Scheduling Heuristic: Suraj Pandey, LinlinWu, Siddeswara, Mayura Guru and Rajkumar
Buyya states that we calculate the average computation cost (assigned as node weight in
Figure 10) of all tasks on all the compute resources. This cost can be calculated for any
application by executing each task of an application on a series of known resources. It is
represented as TP matrix in Table 1. As the computation cost is inversely proportional to the
computation time, the cost is higher for those resources that complete the task quicker.
Similarly, store the average value of communication cost between resources per unit data,
represented by PP matrix in Table 1, described later in the paper. The cost of communication
is inversely proportional to the time taken. We also assume we know the size of input and
output data of each task (assigned as edge weight ek1,k2 in Figure 10). In addition, we
consider this cost is for the transfer per second (unlike Amazon CloudFront which does not
specify time for transferring). The initial step is to compute the mapping of all tasks in the
workflow, irrespective of their dependencies (Compute PSO(ti)). This mapping optimizes the
overall cost of computing the workflow application. To validate the dependencies between
the tasks, the algorithm assigns the “ready” tasks to resources according to the mapping given
by PSO. By “ready” tasks, this mean those tasks whose parents have completed execution
and have provided the files necessary for the tasks’ execution. After dispatching the tasks to
resources for execution, the scheduler waits for polling time. This time is for acquiring the
status of tasks, which is middleware dependent. Depending on the number of tasks
completed, the ready list is updated, which will now contain the tasks whose parents have
completed execution. According to Suraj Pandey, LinlinWu, Siddeswara, Mayura Guru and
Rajkumar Buyya, we then update the average values for communication between resources
according to the current network load. As the communication costs would have changed, they
recomputed the PSO mappings. Also, when remote resource management systems are not
able to assign task to resources according to our mappings due to resource unavailability, the
recomputation of PSO makes the heuristic dynamically balances other tasks’ mappings
(online scheduling). Based on the recomputed PSO mappings, assign the ready tasks to the
compute resources. These steps are repeated until all the tasks in the workflow are scheduled.
52
4. If the fitness value is better than the previous best pbest, set the current fitness value as
the new pbest.
5. After Steps 3 and 4 for all particles, select the best particle as ɡbest.
6. For all particles, calculate velocity using Equation 6 and update their positions using
Equation 7.
7. If the stopping criteria or maximum iteration is not satisfied, repeat from Step 3.
NB] The algorithm is dynamic (online) as it updates the communication costs (based on
average communication time between resources) in every scheduling loop. It also recomputes
the task-resource mapping so that it optimizes the cost of computation, based on the current
network and resource conditions.
PSO: The steps in the PSO algorithm are listed in Algorithm 2. The algorithm starts with
random initialization of particle’s position and velocity. In this problem, the particles are the
task to be assigned and the dimension of the particles is the number of tasks in a workflow.
The value assigned to each dimension of particles is the computing resources indices. Thus
the particles represent a mapping of resource to a task. In our workflow (depicted in Figure
10) each particle is 5-D because of 5 tasks and the content of each dimension of the particles
is the compute resource assigned to that task. For example a sample particle could be
represented as depicted in Figure 11.
The evaluation of each particle is to perform by the fitness function given in Eq. 5. The
particles calculate their velocity using Eq. 6 and update their position according to Eq. 7.
The evaluation is carried out until the specified number of iterations (user-specified stopping
criteria).
Performance Metric:
As a measure of performance, we used cost for complete execution of application as a metric.
Suraj Pandey, LinlinWu, Siddeswara, Mayura Guru and Rajkumar Buyya computed the total
cost of execution of a workflow using two heuristics: PSO based cost optimization
(Algorithm 1), and best resource selection (based on minimum completion time by selecting a
resource with maximum cost).
They evaluated the scheduling heuristic using the workflow depicted in Figure 10. Each task
in the workflow has input and output files of varying sizes. Also, the execution cost of each
task varies among all the compute resources used (in our case PC1 − PC3). Suraj Pandey,
LinlinWu, Siddeswara, Mayura Guru and Rajkumar Buyya analysed the performance of our
heuristic by varying each of these in turn. Plot the graphs by averaging the results obtained
after 30 independent executions. In every execution, the x-axis parameters such as total data
size (e.g. 1024MB), range of computation cost (e.g. 1.1-1.3 $/hour) remain unchanged, while
the particle’s velocity and position change.
53
The graphs also depict the value of the plotted points together with the CI (represented as “+/-
” value).
Suraj Pandey, LinlinWu, Siddeswara, Mayura Guru and Rajkumar Buyya varied the size of
total data processed by the workflow in the range 64-1024 MB. By varying the data size, they
compared the variance in total cost of execution and the distribution of workload on
resources, for the two algorithms as depicted in Figure 12 and Figure 13, respectively. We
fixed the computer resource cost in the range 1.1−1.3$/hr.
Figure 12 Comparison Of Total Cost Between PSO Based Resource Selection And Best
Resource Selection Algorithms When Varying Total Data Size Of A Workflow.
Total Cost of Execution: Figure 12 plots the total cost of computation of the workflow (in the
log scale) with the increase in the total data processed by the workflow. The graph also plots
95% Confidence Interval (CI) for each data point. The cost obtained by PSO based task-
resource mapping increases much slower than the BRS algorithm. PSO achieves at least three
times lower cost for 1024MB of total data processed than the BRS algorithm. Also, the value
of CI in cost given by PSO algorithm is +/- 8.24, which is much lower as compared to the
BRS algorithm (+/- 253.04), for 1024 MB of data processed by the workflow. The main
reason for PSO to perform better than the ‘best resource’ selection is the way it takes into
account communication costs of all the tasks, including dependencies between them. When
calculating the cost of execution of a child task on a resource, it adds the data transfer cost for
transferring the output from its parent tasks’ execution node to that node. This calculation is
done for all the tasks in the workflow to find the near optimal scheduling of task to resources.
However, the BRS algorithm calculates the cost for a single task at a time, which does not
take into account the mapping of other tasks in the workflow. This results in PSO based
algorithm giving lower cost of execution as compared to BRS based algorithm.
54
Figure 13 Distribution Of Workflow Tasks On Available Processors.
Distribution of Load: Suraj Pandey, LinlinWu, Siddeswara, Mayura Guru and Rajkumar
Buyya calculated the distribution of workflow tasks onto available resources for various size
of total data processed, depicted in Figure 13. This evaluation is necessary as algorithms may
choose to submit all the tasks to few resources to avoid communication between resources as
the size of data increases, thus minimizing communication cost to zero. In our formulation,
equation 4 restricts all tasks being mapped to the same resource, so that tasks can execute in
parallel for increased time-efficiency. In Figure 13, The X-axis represents the total size of
data processed by the workflow and the Y-axis the average number of tasks (expressed as
percentage) executed by a compute resource for various size of data. The figure shows that
PSO distributes tasks to resources according to the size of data. When the total size of data is
small (for 64-126 MB), PSO distributed tasks proportionally to all the resources (PC1 −
PC3). However, when the size of data increased to (and over) 256MB, more tasks were
allocated to PC1 and PC3. As the cost of computer resources was fixed for this part of
experiment, the BRS algorithm does not vary task resource mapping. Also, it is indifferent to
the size of data.
Hence, BRS’s load distribution is a straight line as depicted in Figure 13, with PC1, PC2 and
PC3 receiving 20%, 40% and 40% of the total tasks, respectively. The distribution of tasks to
all the available resources in proportion to their usage costs; ensured that hotspots (resource
overloading) were avoided. Our heuristic could minimize the total cost of execution and
balance the load on available resources.
55
Figure 14 Comparison Of Total Cost Between PSO Based Resource Selection And Best
Resource Selection Algorithms When Varying Computation Cost Of All The Resources
(For 128MB Of Data).
The reason for PSO’s improvement over BRS is due to PSO’s ability to find near optimal
solutions for mapping all tasks in the workflow to the given set of compute resources. The
linear increase in PSO’s cost also suggest that it takes both computation and communication
cost into account. However, BRS simply maps a task to the resource that has minimum
completion time (a resource with higher frequency, lower load and thus having higher cost).
As the resource costs increase, the use of BRS leads to more costs due to the affinity towards
better resource, irrespective to the size of data. Whereas, PSO minimizes the maximum total
cost of assigning all tasks to resources.
56
57
Round Robin Algorithm
Shilpa Damor states that to achieve optimal resource utilization, maximize throughput,
minimize response time, and avoid over- load. Using multiple components with load
balancing, instead of a single component, may increase reliability through redundancy. Hence
we chose the Round Robin Algorithm as our existing load balancing algorithm for cloud
computing. According to Dr. Hemant S. Mahalle, Prof. Parag R. Kaveri and Dr.Vinay
Chavan (January 2013), Round robin algorithm uses the time slicing mechanism. The name
of the algorithm suggests that it works in the round manner where each node is allotted with a
time slice and has to wait for their turn. The time is divided and interval is allotted to each
node. Each node is allotted with a time slice in which they have to perform their task. The
complicity of this algorithm is less compared to the other two algorithms. An open source
simulation performed the algorithm software know as cloud analyst, this algorithm is the
default algorithm used in the simulation. This algorithm simply allots the job in round robin
fashion which doesn't consider the load on different machines.
Tejinder Sharma and Vijay Kumar Banga states that Round Robin Algorithm is the simplest
algorithm that uses the concept of time quantum or slices Here the time is divided into
multiple slices and each node is given a particular time quantum or time interval and in this
quantum the node will perform its operations. The resources of the service provider are
provided to the client on the basis of this time quantum. In Round Robin Scheduling the time
quantum play a very important role for scheduling. If the time quantum is extremely too
small then Round Robin Scheduling is called as Processor Sharing Algorithm and number of
context switches is very high. It selects the load on random basis and leads to the situation
where some nodes are heavily loaded and some are lightly loaded. According to Saroj
Hiranwal and Dr. K.C. Roy, though the Round Robin Algorithm is very simple but there is an
additional load on the scheduler to decide the size of quantum and it has longer average
waiting time, higher context switches higher turnaround time and low throughput.
The Round Robin (RR) job scheduling algorithm considered in this study distributes the
selected job over the available VMs in a round order where each job is equally handled. The
idea of the RR algorithm is that it attempts to sends the selected jobs to the available VMs in
a round form.
58
According to Isam Azawi Mohialdeen, the Figure 15 depicts the mechanism of the Round
Robin (RR) job scheduling algorithm. The algorithm does not require any pre-processing,
overhead or scanning of the VMs to nominate the job’s executor.
Input: cloudletlist: The list of cloudlets(i.e jobs), VML: The list of available VMs
Steps:
1. Nocl cloudletlist.size();
2. NoVMVML.size();
3. Index0;
4. For j0 to Nocl do
5. Clcloudletlist.get(j);
6. Index(index+1) mod NoVM;
7. VVML.get(index);
8. stageinTransferTime(cl,v,in);
9. stageoutTransferTime(cl,v,out);
10. execExecuteTime(cl,v);
11. if (cl.AT+stagein+exec+stageout+v.RT≤cl.DL) then
12. sendjob(cl,v);
13. update(v)
14. else
15. drop(cl);
16. FailedJobs;
17. End
The index of the selected VM for the current job is computed by a round robin fashion
using equation below:
index ¬ (index+1) mod NoVM where: index = The index to the selected VM
59
Figure 16 The Total Cost
In this experiment the aim was to study the impact of the number of jobs on the total cost
when VMs execute their assigned jobs. Figure 16 illustrates the experimental results
obtained for the total cost consumed by each set of jobs fed to the four scheduling
algorithms. It is clear that the total cost is highly influenced by the number of assigned
jobs for every scheduling algorithm. Notice that the minimum completion time produces
the highest cost in all cases compared to the other scheduling algorithms. This is mainly
due to the minimum completion time accomplishing the largest number of received jobs.
Thus, the total cost will be more compared with the other algorithms. The opportunistic
load balancing scheduling algorithm incurred higher cost compared with the Random and
Round Robin algorithms. This is because the opportunistic load balancing has the
capability to run more jobs at the same time, as the algorithm when the jobs are
dispatched over the available VM while taking the VM load into account. Thus, many
jobs can be run and that will lead to an increase of cost. The Round Robin algorithm
produced less cost compared to the minimum completion time and the opportunistic load
balancing algorithms. The Random algorithm is the superior in all cases in terms of the
total cost compared with the other algorithms. Nevertheless, the Random algorithm has
the same cost with the Round Rubin algorithm when the number of jobs is up to 500, 600
and 700.
Advantages
Starvation is never a problem
The algorithm ensures that all processes in the job (process) queue share a time slice
on the processor.
Excellent for parallel computing as it is great for load balancing if the tasks are
around the same length
Disadvantages
60
The algorithm does not consider priority
If the time slice value is too small the context switching time will be large in
relation to actual work done on the CPU
61
Chapter 3
Methodology
Introduction
This chapter gives description of the research design adopted, the target population and the
sampling procedures. In addition, it also explains the research instruments used.
Research Design
Our research is qualitative research. According to Family Health International, Qualitative
Research is a type of scientific research. This is justified since our scientific research consists
of an investigation that:
a) seeks answers to a question
b) systematically uses a predefined set of procedures to answer the question
c) collects evidence
d) produces findings that were not determined in advance
e) produces findings that are applicable beyond the immediate boundaries of the study
More so, Creswell (2007), states Qualitative research as the process of research which involves
emerging questions and procedures, data typically collected in the participant’s setting, data
analysis inductively building from particulars to general themes, and the researcher making
interpretations of the meaning of the data. The final written report has a flexible structure
meaning that our final report can be further modified by those who seek to improve our
algorithm.
Qualitative research is essential since our research will involve collection of algorithms
which are predefined set of procedures that will use to produce an optimal solution. The data
will collect in cloud platforms which will be also our participant’s setting.
Research Instruments
On this section our main research instruments was documentary analysis. That is we used
journal publications, articles as well as textbooks that we got from the internet. However,
they were attempts to use other research instruments such as Focus group discussions,
questionnaires and interviews.
The first company we visited was Twenty Third Century Systems Company. They offer a
product called Cumulus which provide clients with a cloud environment where there can
place business applications such as accounting packages. We asked the representative about
the resource scheduling algorithm they use and he said that most the systems that are in the
market like the Intel quad core can allocate resources automatically more so, they will
allocate adequate resources for the client at no cost meaning they won’t be any situation
where the resources are not enough. They on the simulation part he suggested that we use a
testing environment since in the industry he said they don’t use simulations but rather testing
environment. When it came to resources he talked of space and processing power as one of
62
the major resources. In addition, the Twenty Third Century Systems representative mentioned
of security as being part of the cloud environment so as to protect the data of the user.
Secondly we went to Zarnet Company, they offer cloud computing services but only to the
government. Zarnet are still yet to offer the services. But Zarnet representative advised us to
look into the number of space that user would want and also as well as priority of users.
Documentary Analysis
This was our main research instrument. We used journal publications that we found on
Google Scholar. In addition, we took most of the algorithms from the journals and other
support documents from the internet. On algorithms we took those that where scientifically
tested. Some of the data on Cloud Computing algorithms we got it from the textbooks that we
downloaded on various websites. In our research we used current documents in cloud
computing regarding resource scheduling that have been released.
Interviews
As for interviews we did telephone interviews with Marco Fortino, Business Development
Manager, Google Enterprise EMEA. However, our interview was not successful since he was
not able to disclose information regarding to resource scheduling algorithm they use on their
Google Cloud Platform.
On the other hand, Google offer cloud platform services thus we had a chance to explore the
cloud environment and see how resources are being shared. Google share resources over the
network thus it helped us in simulating our algorithm over the network.
Population
Our target population was service providers offering cloud computing platforms mainly those
that offer cloud platforms. We had to reach to our target population use means of
communication such as telephone and email since distance was a major factor. More so, we
have a few companies that offer cloud computing services.
Our sample size was small since the number of companies offering cloud platforms services
is quite small but on the verge rising.
63
During data collection us as researchers explained to the participants that the research was for
post graduate studies purposed and that all the information was confidential unless the
subjects waivered such confidentiality. Participants were informed that the information would
be useful as a starting point resolving issues or problems being faced by cloud platforms
service providers in sharing resources. Participants were asked to answer all questions frankly
and truthfully.
Data Analysis
Data collected through the use of questionnaires was coded and analysed to examine the
patterns. We also use tables, pie charts and graphs that were used to present data. This made
it easy to interpret and they show a clear picture of the resource scheduling algorithms
currently available. Extensive descriptions of patterns observed were used to answer some of
the research questions.
Summary
From the research we did we concluded in coming up with an algorithm that focuses on
priority, allocation and sharing of resources on cloud platforms. We also decided on
simulating the algorithm by creating a testing environment as well as develop a web
simulation application.
64
Chapter 4
Data Presentation
From the resource scheduling algorithms that are currently available on cloud platforms we
managed to simulate Bee Algorithm and Round Robin Algorithm. These algorithms we
simulated using the following tools below:
Test Results
Number of Iterations 1 2 3 4 5 Average
Bee Algorithm 0.022 0.025 0.026 0.026 0.029 0.0256
(seconds)
Round Robin Algorithm 0.027 0.028 0.033 0.027 0.031 0.0292
(seconds)
65
Time Taken To Request A Small Object
0.035
0.03
0.025
Time (seconds)
0.02
0.005
0
1 2 3 4 5
Number of Iterations
From the diagram above we can see that when requesting a small object Bee Algorithm is
much faster as compared to Round Robin. Using average time Bee Algorithm was faster by
0.0036seconds.
Test Results
Number of Iterations 1 2 3 4 5 Average
Bee Algorithm 0.033 0.027 0.033 0.028 0.031 0.0304
(seconds)
Round Robin Algorithm 0.026 0.038 0.033 0.029 0.034 0.0320
(seconds)
66
Time Taken To Request A Large Object
0.04
0.035
0.03
Time (seconds)
0.025
0.02
Bee Algorithm
0.015
Round Robin Algorithm
0.01
0.005
0
1 2 3 4 5
Number of Iterations
From the results obtained after requesting an image of 17.5MB the two algorithms almost
performed fairly equally. However, using the average response time Bee Algorithm was fast
by 0.0016seconds.
Test Results
Number of Iterations 1 2 3 4 5 Average
Bee Algorithm 0.304 0.32 0.279 0.307 0.327 0.3074
(seconds)
Round Robin Algorithm 0.356 0.35 0.323 0.347 0.385 0.3522
(seconds)
67
Time Taken To Performing CPU Intensive Tasks
0.45
0.4
0.35
0.3
Time (seconds)
0.25
0.2 Bee Algorithm
0.15 Round Robin Algorithm
0.1
0.05
0
1 2 3 4 5
Number of Iterations
Using time taken to perform CPU intensive tasks Bee Algorithm from the graph executed
much faster as compared to Round Robin algorithm. Using average response time Bee
Algorithm was much faster by 0.0448seconds.
From the results above we chose Bee Algorithm since from the three tests above it performed
much fast than Round Robin Algorithm.
68
Chapter 5
From the results obtained in tests carried out we came up with Hybrid PAS Algorithm
(Prioritisation Algorithm Sharing). Below is our proposed model for our algorithm.
Quadrant Queue 1 Quadrant Queue 2 GIS Quadrant Queue 3 GIS Quadrant Queue 4 MQ
MBA
Allocation And Sharing Server
Hybrid Cloud
VM2
VM1 VMn
69
Hybrid PAS Algorithm (Prioritisation Sharing Allocation)
This algorithm is seeks to cater for weaknesses of the above algorithms. It is more of a hybrid
since we are taking into account steps in the above algorithms. However, our proposed
algorithm will be more focused on resource scheduling for efficiency and effectiveness on
cloud platforms. Its core objectives will be to be able to achieve prioritisation, allocation and
sharing of resources.
To prioritise in a cloud platform will use the Priority Matrix. The priority of the jobs will be
set by the Cloud Platform administrator according to the client’s prioritisation of jobs. The
prioritisation of jobs can always be changed.
Priority Matrix
1. User Login
2. If Important and Urgent Then
Add job(s) to Quadrant Queue 1
3. Else If Important and Not Urgent Then
Add job(s) to Quadrant Queue 2
4. Else If Not Important and Urgent Then
Add job(s) to Quadrant Queue 3
5. Else If Not Important and Not Important Then
Add job(s) to Quadrant Queue 4
70
6. Else
Add job(s) to Miscellaneous Queue 5
7. End If
8. While (Quadrant Queue 1 has tasks)
Allocate tasks using Modified Bee Algorithm (See Modified Bee Algorithm Below)
9. End While
10. While (Quadrant Queue 2 has tasks)
Allocate tasks using Modified Bee Algorithm (See Modified Bee Algorithm Below)
11. End While
12. While (Quadrant Queue 3 has tasks)
Allocate tasks using Modified Bee Algorithm (See Modified Bee Algorithm Below)
13. End While
14. While (Quadrant Queue 4 has tasks)
Allocate tasks using Modified Bee Algorithm (See Modified Bee Algorithm Below)
15. End While
16. While (Miscellaneous Queue 5 has tasks)
Allocate tasks using Modified Bee Algorithm (See Modified Bee Algorithm Below)
17. End While
71
improve independence of cloud services. Furthermore, physical resource can be efficiently
configured and utilized, considering that multiple applications are run on the same machine.ˆ
In addition, quick recovery and fault tolerance are permitted. Virtual environment can be
easily backed up and migrated with no interruption in service resource management. From
the provider’s point of view, large scale of virtual machines needs to be allocated to
thousands of distributed users, dynamically, fairly, and most important, profitably. From the
consumer’s point of view, users are economy-driven entities when they make the decision to
use cloud service.
Basing on the objective of assessing the problems being faced by cloud service providers in
relation to resource scheduling we found out that most of the Service providers ‘s personnel
do not have enough knowledge on how the algorithms really work. We found out that some
of the Providers do not use algorithms to share, allocate and prioritise the job since they
claimed that the computer automatically schedule the resources
Some of the providers indicated that they scheduled the resources needed by the client based
on the client‘s specifications.
Another issue of concern was of security ,the safety of the client’s information to both the
Service Providers and External threats .The available algorithms do not consider the factor of
security hence it is both a challenge and drawbacks on people who would want to use cloud
services.
Comparing the algorithms mentioned in this document we learnt of the Bee algorithm’s
simplicity, flexibility and robustness , its use of flexible fewer control parameters and its
easiness to implement with basic mathematical and logical operations. The Bees Algorithm
mimics the foraging strategy of honey bees to look for the best solution to an optimisation
problem .This makes it better than the other algorithms mentioned despite its lack of
prioritisation.
Our algorithm which we developed encompasses all the factors to be considered (sharing,
allocation and prioritisation). It dynamically optimises maximum usage of resources on the
cloud which will guarantee better results and service delivery.
Recommendations
a) To the Service providers:
To adopt an algorithm that share, allocate and prioritise cloud resources. However they
should focus on the security part of the algorithm in relation to keeping the client’s
information or data secure.
Conduct seminars with other companies that provide the same services so that they can
have a clear view on the algorithms that are applicable and best to implement in different
platforms.
Training of personnel (staff), educating them so that they can have a good insight on
cloud services and how they work.
b) To the Clients (Users):
72
Encouraged to use cloud services since it is a more affordable way of using expensive
resources at a much lower price as the main goal is to obtain results regardless of the
amount of money spent.
73
Reference
Adnan Mehedi, Md. Habibur Rahman, A Survey of Cloud Simulation Tools, Course:
Simulation and Modeling Techniques.
Adnan Mehedi, Md. Habibur Rahman, GreenCloud: A Tutorial, Course: Simulation and
Modeling Techniques.
Bitam S., Batouche M., Talbi E.G., A survey on bee colony algorithms, 24th IEEE
International Parallel and Distributed Processing Symposium, NIDISC Workshop, Atlanta,
Georgia, USA, pp. 1-8, 2010.
74
Dr. Hemant S. Mahalle, Prof. Parag R. Kaveri and Dr.Vinay Chavan (January 2013), Load
Balancing On Cloud Data Centres, Parag et al., International Journal of Advanced Research
in Computer Science and Software Engineering (1).
Family Health International, Qualitative Research Methods: A Data Collector’s Field Guide,
Module 1: Qualitative Research Methods Overview
Judith Hurwitz, Robin Bloor, Marcia Kaufman and Fern Halper, What Is Cloud Computing?
[o], http://m.dummies.com/how-to/content/what-is-cloud-computing.html. Date Accessed
12/02/2014.
Mark C. Chu-Carroll(April 2011), Code In The Cloud, Programming Google App Engine,
The Pragmatic Bookshelf, Raleigh, North Carolina Dallas, Texas.
Pradeep R., Kavinya R. (2012), Resource Scheduling In Cloud Using Bee Algorithm For
Heterogeneous Environment, Computer Science, Anna University, India.
Ratan Mishra, Anant Jaiswal (2012), Ant colony Optimization: A Solution of Load balancing
in Cloud, International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.2
75
Saroj Hiranwal , Dr. K.C. Roy, Adaptive Round Robin Scheduling Using Shortest Burst
Approach Based On Smart Time Slice, International Journal Of Computer Science And
Communication July-December 2011 ,Vol. 2, No. 2 , Pp. 319-323
Shailesh Sawant (2011), A Genetic Algorithm Scheduling Approach for Virtual Machine
Resources in a Cloud Computing Environment, Master's Projects, San Jose State University
SJSU ScholarWorks.
Seung-Hwan Lim, Bikash Sharma, Gunwoo Nam, Eun Kyoung Kim, and Chita R. Das, MDCSim: A
Multi-tier Data Center Simulation Platform, Department of Computer Science and
Engineering, The Pennsylvania State University University Park, PA 16802, USA, Technical
Report CSE 09-007.
Suraj Pandey, LinlinWu, Siddeswara, Mayura Guru, Rajkumar Buyya, A Particle Swarm
Optimization-based Heuristic for Scheduling Workflow Applications in Cloud Computing
Environments, Cloud Computing and Distributed Systems Laboratory 2CSIRO Tasmanian
ICT Centre Department of Computer Science and Software Engineering Hobart, Australia
The University of Melbourne, Australia
Tasquia Mizan, Shah Murtaza Rashid A, Masud, Rohaya Latip (June 2012), Modified Bees
Life Algorithm for Job Scheduling in Hybrid Cloud, International Journal of Engineering and
Technology Volume 2 No. 6.
Tejinder Sharma, Vijay Kumar Banga (March 2013), Efficient and Enhanced Algorithm in
Cloud Computing, International Journal of Soft Computing and Engineering (IJSCE), ISSN:
2231-2307, Volume-3, Issue-1
Yoshida H., Kawata K., Fukuyama Y., Nakanishi Y, A particle swarm optimization for
reactive power and voltage control considering voltage stability, In the International
Conference on Intelligent System Application to Power System, pages 117–121, 1999.
76