Академический Документы
Профессиональный Документы
Культура Документы
Rohit Handa
Lecturer, CSE-IT Department
IBM-ICE Program, BUEST Baddi
Topic: Overview of scheduling problem, Different types of scheduling, Scheduling for
independent & dependent tasks, Static versus dynamic scheduling, Optimization
techniques for scheduling
Topic-1: Scheduling
Nowadays, many companies offering services to the customer based on the concept of
pay as a service, where each customer pays for the services obtained from the
provider. The cloud environment provides a different platform by creating a virtual
machine that assists users in accomplishing their jobs within a reasonable time and
cost-effectively without sacrificing the quality of the services. The huge growth in
virtualization and cloud computing technologies reflect the increasing number of jobs
that require the services of the virtual machine.
Scheduling is a process of allocating jobs onto available resources in time. Such
process has to respect constraints given by the jobs and the cloud.
Scheduling is a process of finding the efficient mapping of tasks to the suitable
resources so that execution can be completed with the satisfaction of objective
functions such as minimization of execution time as specified by customers.
The need for scheduling arises because efficiency of scheduling algorithm directly
affects the performance of the system with respect to delivered QoS, resource
utilization.
Task: Represents a computational unit to run on a node. A task is considered as an
indivisible schedulable unit. Tasks could be independent (or loosely coupled) or there
could be dependencies (cloud workflows).
Job: A job is a computational activity made up of several tasks that could require
different processing capabilities and could have different resource requirements (CPU,
number of nodes, memory, software libraries, etc.) and constraints, usually expressed
within the job description. Each job may have various parameters such as required
data, desired completion time often called the deadline, expected execution time, job
priority etc.
Resource: A resource is something that is required to carry out an operation, for
example: a processor for data processing, a data storage device, or a network link for
data transporting.
Cloud computing gives the illusion of infinite (virtual) resources. Actually there is a
finite amount of (physical) resources. We would like to efficiently share those
resources:
1. being able to distinguish high priority (serving customer now) from low priority (batch)
requests;
2. schedule accordingly.
Therefore, we should be able to plan ahead computations.
To efficiently increase the working of cloud computing environments, job scheduling is
one the tasks performed in order to gain maximum profit.
The goal of scheduling algorithms in distributed systems is spreading the load on
processors and maximizing their utilization while minimizing the total task execution
time.
Job scheduling, one of the most famous optimization problems, plays a key role to
improve flexible and reliable systems.
Er. Rohit Handa
Lecturer, CSE-IT Department
IBM-ICE Program, BUEST Baddi
The main purpose is to schedule jobs to the adaptable resources in accordance with
adaptable time, which involves finding out a proper sequence in which jobs can be
executed under transaction logic constraints.
Benefits of scheduling:
To improve the quality of services in executing the jobs and provide the expected
output on time.
To maintain efficiency and fairness for all jobs.
2. Cloud Providers
Contribute (idle) resource for executing consumer jobs
Benefit by maximizing resource utilization
Tradeoff local requirements & market opportunity
Strategy: maximize return on investment
Scheduling Process
Scheduling process in cloud can be generalized into three stages namely
1. Resource discovering and filtering Datacenter Broker discovers the resources
present in the network system and collects status information related to them.
Resource discovery may be described basically as the task in which the provider
should find appropriate resources in order to comply with incoming consumers
requests.
Considering that one of the key features of Cloud Computing is the capability of
acquiring and releasing resources on-demand, resource monitoring should be
continuous.
2. Resource selection Target resource is selected based on certain parameters of task
and resource. This is deciding stage.
After acquiring information about
available resources in the Cloud
(during the discovery phase), a set
of appropriate candidates is
highlighted.
The resource selection mechanism
elects the candidate solution that
fulfils all requirements and
optimizes the usage of the
infrastructure.
The resource selection may be done
using an optimization algorithm.
Er. Rohit Handa
Lecturer, CSE-IT Department
IBM-ICE Program, BUEST Baddi
Many optimization strategies may be used, from simple and well-known
techniques such as simple metaheuristic algorithms are Genetic Algorithm, Ant
Colony, and Particle Swarm Optimization (PSO) for Clouds.
3. Task submission -Task is submitted to resource selected.
Optimization Criterion
Optimization criterion is used when making scheduling decision and represents the
goals of the scheduling process.
Typically, we want to e.g., increase resource usage, number of successfully completed
jobs or minimize the response time.
This criterion is expressed by the value of objective function which allows us to
measure the quality of computed solution and compare it with different solution.
SLA Monitor
When a customer first submits the service request, the SLA Monitor interprets the
submitted request for QoS requirements before determining whether to accept or
reject the request.
It is also responsible to monitor the progress of the submitted job.
If any violation is observed from SLA it has to act immediately for corrective action.
Task Scheduling
Er. Rohit Handa
Lecturer, CSE-IT Department
IBM-ICE Program, BUEST Baddi
The input of task scheduling algorithms is normally an abstract model which defines
tasks without specifying the physical location of resources on which the tasks are
executed.
Reschedule: When a task cant be completed due to processor failure or, a disk
failure, or other problems. The uncompleted tasks could be rescheduled in the next
computation.
Dynamic Scheduling:
It is more flexible than static scheduling where task allocation is done on the fly when
the application executes.
Centralized Scheduling:
In the case of centralized scheduling, there is more control on resources: the
scheduler has knowledge of the system by monitoring of the resource state.
Advantages: ease of implementation, efficiency and more control and monitoring on
resources.
Disadvantages: Lacks scalability, fault tolerance and efficient performance
Hierarchical Scheduling
Allows one to coordinate different schedulers at a certain level.
Schedulers at the lowest level in the hierarchy have knowledge of the resources.
Disadvantages: lack of scalability and fault tolerance, yet it scales better and is more
fault tolerant than centralized schedulers.
Er. Rohit Handa
Lecturer, CSE-IT Department
IBM-ICE Program, BUEST Baddi
Workflow scheduling
Tasks are dependent on each other.
Dependency means there are precedence orders existing in tasks, that is, a task
cannot start until all its parent are done.
Category-5: Batch mode heuristic scheduling algorithms (BMHA) and online mode
heuristic algorithms
Metrics:
1. Makespan: The makespan represent the maximum finishing time among all received
jobs per time. This parameter shows the quality of job assignment to resources from
the execution time perspective.
2. Throughput: Each job is assumed to have hard dead-lines which represent the
finishing time. Therefore, the throughput is the number of executed jobs, which is
calculated to study its efficiency in satisfying the jobs dead-lines.
3. Total Cost: If the basic concept of Could computing is renting resources for
customers, then the total cost needed for executing the list of jobs is essential for
evaluating system performance. The total cost is calculated based on the processing
time and the amount of data transferred.
Min-Min Heuristic
Er. Rohit Handa
Lecturer, CSE-IT Department
IBM-ICE Program, BUEST Baddi
Static Task Scheduling Algorithm
Tasks are scheduled based on minimum completion time
For each task determine its minimum completion time over all machines
Over all tasks find the minimum completion time
Assign the task to the machine that gives this completion time
Iterate till all the tasks are scheduled
Max-Min Heuristic
Static Task Scheduling Algorithm
Two Phases
o First Phase : Set of all minimum expected completion time are found for every
task and resource
o Second phase : Task with the maximum value within the above set is selected
for execution
For each task determine its minimum completion time over all machines
Over all tasks find the maximum completion time
Assign the task to the machine that gives this completion time
Iterate till all the tasks are scheduled
Sufferage Heuristic
For each task determine the difference between its minimum and second minimum
completion time over all machines (sufferage)
Over all tasks find the maximum sufferage
Assign the task to the machine that has resulted in obtaining minimum completion
time
Iterate till all the tasks are scheduled
Clustering Heuristics
Clustering is a mapping of the nodes of a DAG onto labeled clusters. A cluster
consists of a set of tasks; a task is an indivisible unit of computation. All tasks in a
cluster must execute in the same processor.
The clustering problem has been shown to be NP-complete for a general task graph
and for several cost functions.
The tasks are convex ,i.e., once the task starts its execution it can run to completion
without interruption.
Clustering is a mapping of the tasks of a DAG onto clusters. A cluster is a set of tasks
which will execute on the same processor. Clustering is also known as processor
assignment in the case of an unbounded number of processors.
A clustering is called nonlinear if two independent tasks are mapped in the same
cluster; otherwise it is called linear.
Er. Rohit Handa
Lecturer, CSE-IT Department
IBM-ICE Program, BUEST Baddi
Genetic Algorithms
Genetic algorithms generate solutions to optimization problems using techniques
inspired by natural evolution, such as mutation, selection, and crossover.
The fundamental unit of GA is chromosome.
Each chromosome represents a solution to the problem and is composed of genes.
The fitness value is function or objective against which chromosome is tested for its
suitability to the problem in hand.
A typical genetic algorithm requires:
o a genetic representation of the solution domain,
o a fitness function to evaluate the solution domain
Steps:
Initialize a population of solutions,
Improve it through repetitive application of the mutation, crossover, selection
operators.
Algorithm
1. Generate an initial population.
2. Select pair of individuals based on the fitness function.
3. Produce next generation from the selected pairs by performing random changes on the
selected parents (by applying pre-selected genetic operators).
4. Test for stopping criterion:
a. return the solutions or individuals if satisfied, or
b. go to step 2. if not. Algorithm
Genetic algorithms are based on the principles that crossing two individuals can
result in offsprings that are better than both parents and slight mutation of an
individual can also generate a better individual.
Er. Rohit Handa
Lecturer, CSE-IT Department
IBM-ICE Program, BUEST Baddi
The crossover takes two individuals of a population as input and generates two new
individuals, by crossing the parents' characteristics.
The offsprings keep some of the characteristics of the parents.
Server Consolidation
Effective approach to maximize resource utilization while minimizing energy
consumption.
Live VM migration is often used to consolidate VMs residing on multiple under-utilized
servers onto a single server, so that the remaining servers can be sent to an energy
saving state.
Problem here is optimal consolidation of servers.
Dependencies among VMs( communication requirements) can be considered.
Key challenge is trade off between energy savings and application performance
Using voluntary resources (resources donated by end users) or a mixture of voluntary
and dedicated resources for hosting non-profit cloud applications such as scientific
computing. Challenge is managing heterogeneous resources and frequent churn
events.
Building small data centers instead of big data centers as former are cheaper to build
and better geographically distributed. Desirable for response time-critical services.