A39 Chaudhry

Thermal-Aware Scheduling in Green Data Centers
MUHAMMAD TAYYAB CHAUDHRY and TECK CHAW LING, University of Malaya

ATIF MANZOOR and SYED ASAD HUSSAIN, COMSATS Institute of Information Technology
JONGWON KIM, Gwangju Institute of Science and Technology
Data centers can go green by saving electricity in two major areas: computing and cooling. Servers in data
centers require a constant supply of cold air from on-site cooling mechanisms for reliability. An increased
computational load makes servers dissipate more power as heat and eventually amplifies the cooling load. In
thermal-aware scheduling, computations are scheduled with the objective of reducing the data-center-wide
thermal gradient, hotspots, and cooling magnitude. Complemented by heat modeling and thermal-aware
monitoring and profiling, this scheduling is energy efficient and economical. A survey is presented henceforth
of thermal-ware scheduling and associated techniques for green data centers.
Categories and Subject Descriptors: A.1 [General Literature]: Introductory and Survey; C5.5 [Computer
System Implementation]: Servers; C.5.3 [Computer System Implementation]: Microcomputers
Microprocessors
General Terms: Management, Economics, Performance, Reliability, Algorithms
Additional Key Words and Phrases: Thermal-aware scheduling, thermal monitoring, thermal profiling, green
data centers, thermal-aware scheduling for data center servers and microprocessors
ACM Reference Format:
Muhammad Tayyab Chaudhry, Teck Chaw Ling, Atif Manzoor, Syed Asad Hussain, and Jongwon Kim. 2015.
Thermal-aware scheduling in green data centers. ACM Comput. Surv. 47, 3, Article 39 (February 2015), 48
pages.
DOI: http://dx.doi.org/10.1145/2678278
1. INTRODUCTION
A large amount of electricity is generated worldwide through the burning of fossil

fuels. This leads to an increase in carbon emission and other Greenhouse Gases in the
environment and contributes to global warming. Data centers worldwide were projected
to have consumed between 203 and 271 billion kilowatt hours of electricity in the year
2010 [Koomey 2011]. According to Greenpeace [2011], unless steps are taken to save
energy and go green, global data centers share of carbon emission is estimated to rise
from 307 million tons in 2007 to 358 million tons in 2020. Koomey [2011] showed that
This research is supported by research grant no. RU018-2013 from the University of Malaya, Malaysia.
Authors addresses: M. T. Chaudhry (corresponding author) and T. C. Ling, Department of Computer
System and Technology, Faculty of Computer Science and Information Technology, University of Malaya,
Kuala Lumpur 50603, Malaysia; emails: mtayyabch@yahoo.com, tchaw@um.edu.my; A. Manzoor and S. A.
Hussain, Department of Computer Science, COMSATS Institute of Information Technology, Lahore 54000,
Pakistan; emails: {atif.manzoor, asadhussain}@ciitlahore.edu.pk; J. Kim, School of Information and Communication, Gwangju Institute of Science and Technology (GIST), Gwangju 500-712, South Korea; email:
jongwon@nm.gist.ac.kr. The corresponding author (Muhammad Tayyab Chaudhry) is now attached to the
Department of Computer Science, COMSATS Institute of IT, Lahore 54000, Pakistan.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or commercial advantage and that
copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for
components of this work owned by others than ACM must be honored. Abstracting with credit is permitted.
To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this
work in other works requires prior specific permission and/or a fee. Permissions may be requested from
Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)
869-0481, or permissions@acm.org.
c 2015 ACM 0360-0300/2015/02-ART39 $15.00

DOI: http://dx.doi.org/10.1145/2678278
ACM Computing Surveys, Vol. 47, No. 3, Article 39, Publication date: February 2015.
39
39:2
M. T. Chaudhry et al.
the electricity is spent mainly on cooling and computing. Hence, the mechanisms in
these two areas have turned into important focal points for energy savings.
The computing mechanisms consist of all servers arranged in racks and chassis
inside the data center. The computing load includes processing jobs and IT tasks. The
cooling mechanism consists of all equipment installed and arrangements in a data
centers to remove heat. Temperature mismanagement is the major cause of equipment
failure [ARS-Techniqa 2008; EE-Times 2008]. Servers consume electricity, which is
eventually converted to heat, in proportion to the allocated computing loads. The cooling
mechanism then works to maintain the computing servers at the vendor-specified
temperature for normal and reliable performance.
The main component within a server that consumes the most electricity and generates the most heat is the microprocessor [Kong et al. 2012]. The high heat generated by
the high-performance-oriented microprocessor architecture is beyond the limit of air
cooling [SIA 2009]. In addition, the servers in a data center are arranged in a compact
manner [Artman et al. 2002]. These factors necessitate the use of cooling mechanisms
and arrangements to dissipate the heat properly and efficiently to avoid overheating.
Data centers have been spending an equal amount of electricity on cooling and computing for a long time [Koomey 2008, 2011]. Any approach to reduce electricity usage
will be more successful by minimizing the load on the cooling mechanism used to cool
the computing mechanism. Thermal-aware scheduling is a computational workload
scheduling based on heat. Thermal-aware schedulers make use of different thermalaware techniques. One technique is using heat modeling, which is a conceptual modeling of the link between power consumed by servers and the resulting heat. The second
technique is thermal-aware monitoring and profiling. Thermal-aware monitoring acts
as the thermal eye for the scheduling process and consists of techniques to record and
evaluate the heat distribution in data centers. Thermal profiling is keeping a record of
the characteristics of heat emission from servers, microprocessors, and computational
workload. It is used to predict the heat distribution in a data center.
Our survey covers all aspects of thermal-aware scheduling related to computer workload scheduling. This article considers that the cooling mechanism provides the cold
air within vendor specifications despite the mechanical optimizations. Also, the application of thermal-aware scheduling in data centers using renewable energy and/or
free cooling will lower the load on the cooling mechanism and make the data centers
greener than green. Our article covers heat modeling, thermal monitoring and profiling, and thermal-aware scheduling in data centers and at the microprocessor level.
This is the first survey of its kind that combines the data center and microprocessor
thermal-aware scheduling and related techniques. In short, this survey provides the
following:
Identifies various techniques of thermal awareness specifically for thermal-aware
scheduling, heat modeling, and thermal monitoring and profiling.
Subclassifies each of the thermal-aware techniques in the scope of data centers and
microprocessors.
Shows that the thermal-aware techniques can be applied at global (data-center-wide)
and local (microprocessor) levels within the boundary of a data center. The goal for
both levels is to lower heat generation and minimize hotspots and thermal gradients
for each level.
Provides a comprehensive discussion and comparisons of different thermal-aware
techniques for scheduling, heat modeling, and monitoring and profiling at data center
and microprocessor levels.
The survey is arranged in sections. Section 2 describes the basic concepts and related work. Section 3 is about heat modeling. Section 4 is related to thermal-aware
39:3
monitoring and profiling, and Section 5 deals with thermal-aware scheduling. The
article ends with the conclusion in Section 6.
2. BACKGROUND AND RELATED WORK
Power usage efficiency (PUE) is an IT industry-recognized standard for grading a

data centers energy efficiency. The PUE value is calculated by dividing the total energy usage of the data center by the energy consumed by the computing mechanism
[ENERGY-STAR 2010]. The total energy consumed in data centers is the total electricity used for computing and cooling. A data center with a PUE value close to 1 indicates
that it spends more energy on computing than cooling. A PUE value equal to or close
to 1 is theoretically possible by spending zero energy on cooling. The near-zero cooling
energy is possible by using free environmental cold-air-, water-, and evaporation-based
cooling economizers such as in the Facebook data center [Waugh 2011]. However, this
is not possible for all data centers due to reasons such as lack of favorable environmental conditions and reliability issues [Kaiser et al. 2011]. Large data centers such
as Google [2012] use mechanical chillers and use a lot of energy for cooling. In practice, the PUE can be lowered by using thermal-aware scheduling with or without the
economizers. Also, the free cooling solutions are not costless; for example, the air has
to be mechanically sucked from outside and filtered to remove the dust and pollutants
before use [Rongliang et al. 2012]. Air economizers are highly dependent on a favorable environmental temperature, which should be low enough so as not to violate the
vendors specifications on the servers air inlet temperature [Frachtenberg et al. 2012].
In addition, water-based economizers using lake or sea water require the data centers
to be located near such bodies. These are some of the things that limit the wide application of economizers. Hence, data centers continue to rely on a traditional cooling
setup that consumes power and is expensive.
Every data center owner aims to achieve a low Total Cost of Ownership (TCO)
[Koomey 2008], which includes the lowest infrastructure investments and cost of running the data center. The lower the PUE and TCO values, the less cost and less
electricity consumption there are. In order to be sustainable and green, data centers
should reduce the loadings on the cooling mechanism, even for free cooling. To lower
the PUE and hence the TCO, the EPA [2007] proposed some best practices that include improving data center infrastructure by installing energy-efficient hardware and
virtualization.
Virtualization is a software-based technology that allows multiple operating system
instances called virtual machines (VMs) to share the same hardware [VMware 2006].
The use of VMs reduces the number of active physical machines and thus saves energy.
Cloud computing, the new paradigm of renting computing resources as a service, uses
virtualization as an essential component [Mell 2011]. The lowering of worldwide data
center PUE values in 2010 is due to virtualization, cloud computing, and reduction
in the expansion of data centers around the globe [Koomey 2011]. To address the
thermal-related equipment reliability problems in data centers, the American Society
of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) has provided
recommendations on the environmental conditions for data centers. A separate class
has been dedicated for data centers, whose recommended range of inlet temperature
has been raised to 45 C [ASHRAE-TC-9.9 2011]. However, current equipment does not
meet this new temperature class [TechTarget 2011]. The currently available equipment
is only compatible at a much lower temperature range.
Racks inside data centers such as in the Google data center [Google 2012] are arranged in a cold aisle/hot aisle arrangement. The Computer Room Air Conditioning
(CRAC) unit is responsible for supplying cold air through raised floors inside data centers. Cold air is blown through hollow floors and then through perforated tiles in front
39:4
of each rack. The cold air enters the servers from the front air inlets. The air temperature at the inlets determines the amount of heat that can be removed from the servers.
The servers blow out hot air from the air outlets located at the rear of the servers. The
hot air from the servers of each rack is exhausted out into the data center. The hot air
is removed or cooled down by the cooling mechanism consisting of the CRAC unit, air
ducts, and chillers. It is possible for the hot air to get mixed with cold air near the top
of the racks. This will increase the cold air temperature.
The efficiency of a data centers cooling system is quantified by the Coefficient of
Performance (COP) [Moore et al. 2005]. This is the ratio of the amount of heat removed
to the amount of work done to remove it. In simple words, the COP is the measure
of power used to provide cold air at a certain temperature and time [Banerjee et al.
2011]. It is used to calculate the cooling energy consumption with reference to energy
used by the computing equipment. By assuming the energy consumed for computing
is transformed into heat, dividing the computing power with the COP value yields
the amount of energy needed for cooling with reference to computing energy. The COP
curve of a water-based CRAC chiller by Hewlett Packard Labs is often cited as a
reference [Moore et al. 2005]. The COP value varies with the desired air temperature
at the servers air inlets. The higher the allowable cool air temperature is, the higher
is the COP and the less energy is required for cooling.
Thermal-aware scheduling has been the interest of researchers, and a few surveys
are available in literature. The nearest-related surveys to our survey are thermalaware scheduling for microprocessors [Kong et al. 2012; Zhuravlev et al. 2012], and at
the data center level, Parolini et al. [2012] gave a short overview related to thermalaware scheduling for data centers.
In Zhuravlev et al. [2012], the thermal-aware scheduling types were identified
as reactive, proactive, and mixed. However, there was no mention of heat modeling or thermal monitoring and profiling. Kong et al. [2012] covered the concepts of
thermal-aware profiling, thermal-aware monitoring, and thermal-aware scheduling.
The thermal-aware techniques were linked with the minimization of heat production,
heat convection to adjacent cores, task migrations, the thermal gradient across the microprocessor chip, and power consumption in microprocessors. The survey covered microprocessor dynamic thermal management (DTM) techniques using Dynamic Voltage
and Frequency Scaling (DVFS), clock gating and task migration, and Operating System (OS)-based DTM and scheduling, but thermal-aware workload scheduling across
multiple microprocessors located on different computers or devices was not covered.
Our thermal-aware scheduling survey encompasses both the microprocessor and data
center environments, thus providing the thermal-aware scheduling link between the
microprocessor level and data center level. In other words, it provides a common platform whereby thermal-aware scheduling algorithms at the microprocessor level can be
consulted when designing data-center-level thermal-aware scheduling algorithms.
Parolini et al. [2012] proposed a heat model and a brief overview of power and thermal
efficiency that spans from microprocessors to data centers. This approach emphasized
heat modeling for nonvirtualized data centers. Our survey covers the thermal-aware
scheduling for microprocessors, servers, and data centers for virtualized and nonvirtualized environments. In addition, it presents the commonalities between local and
global thermal-aware scheduling and conceptualizes their coexistence. The local or micro level refers to microprocessors and the global or macro level covers data centers.
This survey provides the fundamentals to merge the thermal-aware scheduling techniques at the local and global levels into a single hierarchical thermal-aware scheduling
approach. The results from this survey are helpful in saving energy when designing
green data centers. Thermal-aware scheduling for the cloud computing environment is
also discussed.
39:5
The thermal-aware scheduling in data centers may cover the CRAC unit integration
with a heat recirculation consideration to maximize power saving; this is beyond the
scope of this article. Also, other aspects that are beyond the scope of this survey are:
The cooling optimization of the CRAC mechanical setting by altering the compressor
cycles and fan blow rate to match the cooling requirements of servers [Banerjee et al.
2011; Lee 2010]
Economizers and various free cooling techniques aimed at optimizing cooling cost
[Rongliang et al. 2012]
Research proposals aimed at the use of renewable energy in data centers for energy
efficiency [Baikie and Hosman, 2011] and those which are using economizers with
renewable energy sources [Arlitt et al. 2012; Liu et al. 2012]
Our survey incorporates thermal-aware scheduling, heat modeling, and thermal-aware
monitoring and profiling. It combines the thermal-aware scheduling for microprocessors and data centers in multiple classes. It is identified that there can be three types
of thermal-aware scheduling, namely, basic, recirculation aware, and optimized. Each
of these scheduling types contains reactive and proactive approaches. Thermal-aware
data center scheduling can be architected by picking the right classes from microprocessor scheduling and data center scheduling according to their pros and cons provided
in our survey. Therefore, the thermal-aware scheduling at the data center level can be
fine-tuned to increase energy savings.
3. HEAT MODELING
A heat model links the power consumption by the computing devices and the eventual
heat dissipation. The types of heat model can be based on laws of thermodynamics,
RC circuit, and so forth. The type of heat model defines the scope and type of environmental variables to be monitored, for example, power, computational workload,
air, and ambient temperature. The environmental variables embodied within the heat
model determine the comprehensiveness and performance of the model. The heat model
makes computational workload scheduling decisions through the monitoring of chosen
parameters. Monitoring is further discussed in Section 5. Depending on the choice of
variables used by the heat model, the thermal-aware scheduler may be biased toward
some servers and microprocessors. Hence, the heat model affects the overall thermal
efficiency and therefore the power efficiency.
3.1. Thermodynamics Model
The law of thermodynamics may be applied to a heat model that measures the amount
of heat exchange based on air flow. Thermodynamics is a branch of natural sciences
that concerns heat and its transformation to other forms of energy. Computer scientists have used the laws of thermodynamics to co-relate the electricity consumption
of computing equipment and the resultant heat produced [Banerjee et al. 2010, 2011;
Qinghui et al. 2008; Tang et al. 2006, 2007]. The principles of thermodynamics are
well suited for understanding the heat exchanges within the data center. The power
consumption, heat emission, and heat removal through the passing of cold air over the
servers can be visualized as a thermodynamic process of heat generation and removal.
Researchers have used data-center-wide thermal-aware scheduling to manipulate the
thermodynamic heat flow in order to lower the cooling load. Tang et al. [2006] used the
law of energy conservation to quantify the heat. This means that the power consumed
by each server i is transformed into heat as shown in the following equation:
i

Pi = Qi = fi C p Tout
(1)
Tini ,
39:6
Table I. Thermodynamic Symbols and Definitions
Thermodynamic
Symbol
Pi
Qi
fi
Cp
Definition
Power consumed by server i
Heat flow rate at server i (in watts)
Density of air (typically 1.19kg/m3 )
Air flow rate inside server i (at 520 CFM or 0.2454m3 /s)
Specific heat of air (normally 1,005JKg1 K1 )
i
Tout
Outlet air temperature of server i
i
Tin
i
Qout
i
Qin
i
Tsup
hiej
Inlet air temperature of server i
Pie,x
j
Power used by component x in enclosure e located at column i

and row j
Phenomenal power dissipation from component x. The
components considered are CPU, memory, NIC, and I/O devices
Total heat flowing out from server i

Heat flowing inside server i through air inlet
Supplied air temperature set at CRAC and reaching server i
Heat generated in enclosure e located at location represented by
column i and row j of server arrangement inside data center
qiej
Heat extracted from enclosure e located at column i and row j
miej,in
Mass of air entering enclosure e located at column i and row j
Tiej,out
Outlet air temperature of enclosure e located at column i and

row j
Tiej,in
Inlet air temperature enclosure e located at column i and row j
miej,in
Mass of cold air entering inside enclosure e located at column i

and row j
where Pi and Qi are the current power usage and resultant heat flow from server i,
respectively. Table I explains the symbols used in this section.
The difference in temperatures between the air inlet and outlet quantifies the amount
of heat extracted from the server. In order to have a uniform temperature at the air
outlets of all servers, just the right amount of power budget was allocated to each
i
server. Equation (1) was used to calculate the power budget by keeping Tout
at constant
i
value and taking Tin at current value. Therefore, for a given air inlet temperature, the
maximum power budget that can be allocated to each server can be obtained. However,
the heat recirculation, which could affect the cold air temperature, was not accounted
for in deriving the power budget value.
i
Qinghui et al. [2006] used the concept that the heat Qout
given out by each node is
i
equal to the sum of heat Qin entering the servers through cold air coming inside and
the power Pi used by the node I as shown in following equations:
i
i
i
= Qin
+ Pi = fi C p Tout
,
Qout
(2)
i
Qin
= fi C p Tini .
(3)
where
Similarly,
i
i
Qsup
= fi C p Tsup
,
i
Qsup
(4)
i
Tsup
.
where
is the heat supplied with in-supplied cold air at temperature
The value
i
i
of Qin is not less than Qsup, rather, it may be higher if heat recirculation occurs. The
39:7
air temperature was measured and fi C p kept constant, and the corresponding value
of heat is calculated in Equations (2) through (4).
Using the law of energy conservation, the heat dissipated into the environment can
be calculated by Equation (2). The difference between air inlet and outlet temperatures
could be used to calculate the total amount of heat generated at any instant and not the
heat dissipated. This is because the law of energy conservation notes that the power
used by the servers is converted entirely into heat, whereas Equation (2) considers that
all the heat generated is dissipated into the environment. This is impractical if the fi
is kept constant. Considering that fi is dependent upon the server air outlet fan blow
rate, the rate of fan blowing can be variable with the rise in internal temperature of
the server. The current temperature of the node, which is equal to the difference of heat
generated and heat dissipated, was not considered or calculated at all. The server casing
and the server enclosure may fall victim to accumulated heat due to the constant blow
rate of the server outlet fan. On the other hand, if the fan blow rate is set to adaptive
mode, Equations (2) through (4) cannot be applied as used by Qinghui et al. [2006].
Also, the rise in inlet temperature lowers the cooling capacity of the air [Rodero et al.
2012] and thus, Equation (2) cannot hold because of the residual heat left in the server
enclosure.
Lee et al. [2010] proposed to maintain a balance between heat generation and heat
extraction processes in the server enclosures. The heat generated hiej in an enclosure
e located at column i and row j of the server arrangement was represented by the
following equation:
e,cpu
hiej = Pi j
e,mem,stg
. cpu + Pie,IO
. IO + Pi j
j
. mem,stg + Pie,NIC
. NIC ,
j
(5)
where hiej is the sum of power leakage from the CPU, IO, memory, and NIC, which use
P amount of power. The extracted heat qiej from enclosure e in column i and row j of
the server arrangement was represented by using the thermodynamic model as in the
following equation:

(6)
qiej = miej,in . C p . Tiej,out Tiej,in .
The difference between hiej and qiej is represented by iiej . The positive value of iiej
indicates that the data center is getting hot. This is, however, confusing when the value
of i for some enclosures is positive and for some overcooled enclosures is negative.
If the positive and negative values of iiej across the data center sum up to zero, then
it does not necessarily mean that all the enclosures are properly cooled. Particularly
when overcooling results in coldspots in some regions of the data center [Junmei et al.
2013], the value of iiej can be negative enough to nullify the negative value iiej of a
hotspot at some other regions. Thus, the thermodynamics model may miss some or all
of the hotspots across the data center. But the motivation to use the thermodynamics
model is that it is based on the data center environmental temperatures such as the
inlet and outlet temperatures of the servers. This is a strong ground on which to base
the thermal-aware scheduling unless the actual temperature of the nodes is considered.
In this case, an RC thermal-modeling-based approach is more suitable, which is based
on the temperature of a computing node and not the environmental temperatures as
covered in next subsection.
The thermodynamics model relies on the variables related to air as shown in
Equations (1) to (6). Some of these values, such as density of air and specific heat
variate with the air temperature [Mukherjee et al. 2007; Greitzer et al. 2008]. Similarly, the adaptive fan control results in different air flow rates (and hence, variable
volume and mass of air) at different temperatures. If only the typical values of these
thermodynamic variables are considered, then this can affect the server grading and
39:8
server preference for a thermal-aware workload scheduler. Two heterogeneous servers

may have the same rank at the same utilization levels. But in reality, one of the servers
may dissipate more heat after the workload is placed over that server. Perhaps an alternative heat model such as an RC-based model can be independent of these errors.
3.2. RC Model
This modeling technique considers the relationship between heat transfer to the ambient environment and electrical phenomena of the RC circuit. Zhang and Chatha [2007]
showed this association as per Equation (7):
dT
+ T RP = Tamb,
(7)
dt
where P is the power consumed by the processor at time t, T is the die temperature,
R is the thermal resistance, and C is the thermal capacitance of the system. Tamb
represents the ambient temperature, and dT /dt is the rate of change of temperature
T of die in t time. Over the period of time, the temperature reaches a steady state of
RP + Tamb.
The R and C are analogous to electrical resistance and capacitance, respectively
[Gonzales and Wang 2008; AMD 1995; Freescale 2008]. The values of R and C for conductance and convection depend on the power consumed by the CMOS processor and
the temperature difference between two surfaces [AMD 1995; Gonzales and Wang 2008;
Freescale 2008]. But C is also dependent on the area of the die [Freescale 2008]. The
product RC is a time constant and does not change once the processor package is manufactured [Zhang and Chatha 2007; Sofia 1995; Viswanath et al. 2000]. Researchers
have often used the simulators such as HotSpot [Wei 2008] for thermal modeling of electric circuits and evaluation of various CMOS thermal parameters [Sankaranarayanan
2009].
Lizhe et al. [2009], Wang et al. [2009], and their extended work [Wang et al. 2012]
used the RC model to calculate the current temperature of a computing node. This
calculation requires both the thermal profile and the power consumption profile of the
job that is being executed or going to be executed on that node. Their model requires
the thermal signature of each job at hand. Thermal signature is the amount of heat
generated by executing the job. When combined with the current node temperature,
it will give the gross heat to be produced by that node during execution of that job.
In that duration, the node will consume power P and dissipate heat to the ambient
environment. This implies that with the knowledge of estimated gross heat at the time
of job scheduling, the heat dissipated by any of the scheduled jobs can be subtracted
from the estimated gross heat to find the current temperature of the node at any
time. This knowledge is then used to arrange the servers in ascending order of their
current temperatures plus determine the thermal signatures of the current job so that
the waiting jobs can be allocated without violating thermal thresholds. The workload
placement algorithm uses the calculation of risk of violating the temperature threshold
before allocating workload to each node. Lizhe et al. [2009] have used this approach
for backfilling scheduling of jobs. A shortcoming of this approach is the need for a
power-thermal profile of the job at hand. Considering thermal capacitance and thermal
resistance of the node as a whole and not of the individual hardware components is a
continuity of thought by researchers to consider a computing node as a thermal unit
rather than a device that consumes power and generates some heat.
The previously mentioned approaches [Lizhe et al. 2009; Wang et al. 2009, 2012] consider the environmental temperature collected through thermal sensors. These temperature readings were used to calculate the temperature of the computing node. However,
as noted by Rodero et al. [2012], the heat is transferred to servers through convection
RC
39:9
by air and conduction by neighboring servers, and these are hard to model. Also, these
approaches require homogenous servers with the same values of R and C. However,
considering the 15-year life span of a data center and the 3-year life span of servers by
provided by Koomey et al. [2007], a data center can have several types of servers over
its operational life. Even if all the servers are considered to be homogenous, the calculation method for the values of R and C for a computing node is not defined in the related
papers. In their extended work, Wang et al. [2012] have used the microprocessor-based
thermal-aware scheduling principle of scheduling hot jobs before cold jobs [Jun et al.
2008], so it can be assumed that the values of R and C of the microprocessors were
used in their data center simulations.
Although the equipment in a data center can be relocated [Mukherjee et al. 2009;
Parolini et al. 2012], it is not possible to change the layout of a microprocessor once it
is created. Inside a server, the microprocessor consumes the most power and therefore
generates the most heat. The physical build and thermal considerations at design time
of the microprocessor affect the power consumption and the resultant heat dissipation during the utilization of microprocessors [Gonzales and Wang 2008]. The power
consumption and the heat dissipation both attributed to the microprocessors design
are not covered in this survey. Microprocessors consume power PActive in accordance to
activity factorA, processor capacitance c, voltage supplied v, and clock frequency f as
shown in Equation (8):
PActive = Acv2 f,
2
(8)
where the power dissipated per cycle is cv f . The activity fact A represents the fraction
of the circuit that is switching [Mathew 2004]. The value of constant A is within the
range from 0 to 1. The value of A can be controlled through clock gating [Tiwari et al.
1998]. The processor consumes a static power PS even when idle. Therefore, the total
power consumption can be given as PTotal = (PActive + PS ).
The capacitance of a processor can be calculated through simple calculation
[StackExchange 2013] based on the fact sheets and information provided by the processor vendors [Intel 2014]. Similarly, the current frequency of a processor affects the
power consumption of the processor and heat generation [StackExchange 2013]. With
the increasing complexity of microprocessor design and increase in power usage and
heat dissipation in multicore processors, the DTM techniques are often applied to lower
the temperature of microprocessors. These DTM techniques include but are not limited
to DVFS, clock gating (halting the microprocessor clock), controlling Instruction Level
Parallelism (ILP), and task migration. This survey considers that power consumed and
the resultant heat generated in microprocessors can be modeled by using Equation (8).
One specification used by Intel Cooperation to represent power dissipation from
Intel-based components is Thermal Design Power (TDP) [Gonzales and Wang 2008].
TDP is the realistic maximum thermal dissipation from a component such as a microprocessor [Huck 2011]. The TDP specification is used to design cooling solutions for
microprocessors. However, Advanced Micro Devices (AMD) introduced a new standard
named Average CPU Power (ACP) [AMD 2009; Huck 2011] for representing a realworld power dissipation metrics that is lower than TDP. AMD recommends ACP over
TDP for planning data center infrastructure needs [AMD 2009]. This is because TDP
may overestimate the thermal phenomena, which might not occur in the real world,
and is therefore more suitable for designing the cooling solutions for microprocessors
[AMD 2009]. Intel insists that the overall server power usage exceeds the processors
TDP and therefore ACP is of no use [Huck 2011].
With nonefficient cooling or non-thermal-aware scheduling, even the use of TDP for
data center infrastructure planning may lower microprocessor performance. In such instances, microprocessor temperature may cross the TDP threshold and thereby invoke
39:10
the embedded DTM to slow down the microprocessor performance to lower the temperature. Our survey is concerned with thermal-ware workload scheduling and the use of
DTM techniques as complementary techniques to avoid reaching the DTM threshold.
Cameron [2010] emphasized the need for energy-proportional computing to ensure
that no power is wasted when the computer is idle. It was suggested that the computing world should follow an energy-oriented path and still ensure performance-oriented
throughput. Modern-day microprocessors exhibit energy-proportional computing. Microprocessors such as Intel core-i7 processors [Intel 2011] are equipped with intelligent
thermal management techniques composed of frequency scaling, clock gating, and intelligent fan control. These techniques are used in a coordinated pattern to maintain
temperature within the operating limits of the processor. Making scheduling decisions
on such intelligent processors through the RC model is more complex since it is not
clear how the intelligent processor will behave to reduce temperature when on maximum load below the DTM threshold. At a high abstraction level, the components of
a server can be considered as part of a single object called a node. All the components of a node contribute to the overall temperature and power consumption. In the
next subsection, the thermal modeling approach considers the nodes as the units of
thermal-aware scheduling. However, in real-life calculations, a single node has multiple and heterogeneous components, and each of these hardware components has a
different value of R and C. Thus, the RC model has some limitations on the basis of
temperature evaluation of a node.
3.3. Thermal Network
A thermal network is a heat modeling approach that falls between the thermodynamics
and RC modeling. This model considers that each node in the data center belongs to
either one of two networks. The first network is the information technology (IT) network
and the second is the cooling technology (CT) network. Each server is a member of both
networks as the server performs computing, consumes electricity, and generates heat.
The CRAC unit is part of the CT network only. This section considers the servers
and CRAC unit behavior regarding heat generation and heat removal respectively as
being members of the CT network. An important consideration of heat modeling is the
heterogeneity of equipment, with different equipment consuming different amounts
of power and heat. A generic model for data centers should include everything that
performs a data center task and produces heat.
The thermal network model, which separates thermal management from task
scheduling, was used in Parolini et al. [2008] and Parolini et al. [2012]. It can be
utilized as a common mechanism for energy savings when making scheduling decisions. The thermal and IT networks provide the respective statistics to the workload
scheduler, such as air temperature, workload arrival, and execution rate. This model is
flexible enough to deal with heterogeneity. Each thermal network node exchanges heat
and effects as well as being affected by neighboring thermal nodes. Power consumption
by the servers is the common factor between IT and CT networks. Power consumption
is dependent on IT network variables such as the desired workload execution rate. The
power consumed by servers in the CT network is converted into heat, but the power
consumed by the CRAC unit is used to supply cool air at a certain temperature. The
heat model does not consider the air flow and air flow rate. However, it does consider
the heat flow. The devices inside the thermal network exchange heat. The heat leaving
the servers contains recirculated heat and the heat inside cold air. Thus, the thermal
network model bears close similarity to the thermodynamics model with the exception
of air flow properties.
There can be a coordination between the performance-oriented IT network and inlettemperature-oriented thermal network of devices when making workload placement
39:11
decisions. However, with the separation of IT and CT networks, it is difficult to integrate

the cooling-aware or thermal-aware scheduling into an uncoordinated scheme. In fact,
the uncoordinated scheme strictly follows the QoS motive and does not cover the energy
efficiency or thermal-aware notion. This model applies thermal-aware techniques to
control the thermostat temperature of the CRAC unit for each round of scheduling.
This can be covered under a separate title, but a quick note can be given: the CRAC
thermostat setting affects a wider area and probably the whole data center. This is why
it is not practical to change the CRAC thermostat more often as it may cause hardware
damage to the CRAC and/or to servers.
Our survey considers the basic scenario of coordination, which is a control strategy where the computational scheduler of the IT network is set to keep a maximum
workload execution rate with the minimum number of active servers without the need
to migrate any workload. The thermal control is set to maintain the lower thermal
threshold and thus keeps the inside data center temperature at the minimum. The
only energy savings are in the IT network through minimizing the number of active
servers. This approach can ensure the thermal safety of devices. The thermal network
model considers the application of workload scheduling in consecutive time intervals.
The workload scheduling is periodically updated according to jobs at hand. This model
has limitations when applied to a static scenario of a virtualized data center having
multiple VMs but no new VM requests. The infrastructure management in a thermalaware manner by using the thermal network model is not possible in this case. Another
scenario is HPC job scheduling by using the thermal network model. In this article,
the workload optimization part of the thermal network does not consider the heat generation of the servers and/or the job deadline. Therefore, if a long job is scheduled over
a server located in a hotspot region, it may result in prolonging and/or intensifying the
hotspot.
3.4. Heat Recirculation
The temperature of cold air coming out from the vents of floor tiles changes by the time it
reaches the server air inlets. This is due to mixing of hot air from the server outlets with
the CRAC cold air that is coming out from vented floor tiles. This is an example of heat
recirculation and is a typical challenge for supplying a uniform cold air temperature
across the entire data center. The mixing of the cold and hot air most likely occurs
at the top of the racks. Heat recirculation is also possible inside the racks, where the
hot air from the servers exhausts gets mixed with the cold air entering the rack from
the front. The mixing of hot air occurs when the rack has poor ventilation [Khankari
2009]. This article covers thermal-aware scheduling and the related techniques, and
therefore, it is assumed that the heat recirculation occurs just as mentioned in the
papers reviewed. Second, the servers that contribute toward heat recirculation will be
utilized less and thus the overall utilization of the data center and QoS is affected.
Researchers have focused on reducing heat recirculation when considering thermal
awareness in data centers [Banerjee et al. 2010, 2011; Jonas et al. 2007, 2010; Qinghui
et al. 2006, 2008; Tang et al. 2006, 2007]. Heat recirculation raises the cold air temperature at server inlets. This degrades the heat removal efficiency of the cold air and
increases the overall temperature of data centers. This leads to a higher hot air temperature entering the CRAC unit, and the resultant CRAC cold air temperature is also
raised [Banerjee et al. 2010, 2011].
The heat models discussed previously lack the consideration of the inlet temperature
hike and the thermal anomalies due to heat recirculation. Heat recirculation is one of
the reasons for unexpected results from thermal-aware schedulers. As reported by
Rodero et al. [2012], a hike in inlet temperature can result in accumulated heat inside
servers at low utilization and even at idle states. The thermodynamics model and
39:12
thermal network model are polished to include the concept of heat recirculation as
explained later in this section. This also adds to the complexity of implementation of
thermal-aware schedulers aimed at minimizing heat recirculation. An obvious solution
is to identify the servers that contribute toward the heat recirculation and to utilize
them as little as possible. The reduced amount of recirculated heat can result in an
overall reduction in maximum outlet temperatures across the data center and a reduced
cooling load [Tang et al. 2006, 2007]. This ensures the maintenance of the PUE, if not
a lowering of it, by reducing the load overcooling mechanism.
An efficient thermal-aware task scheduling can be obtained when heat recirculation
is taken into consideration [Jonas et al. 2007; Qinghui et al. 2006, 2008; Tang et al. 2006,
2007]. The cold air coming inside contains recirculated hot air from other nodes. Tang
et al. [2007], Qinghui et al. [2006], and Pakbaznia [2010] quantified heat recirculation
in the form of a matrix as shown in Equation (9):
Tin = Ts + DP, where D = [(K T K)1 K1 ],
(9)
where Tin and Ts are the inlet air temperature and supplied air temperature vectors,
respectively, and K is a diagonal matrix whose entries are the thermodynamic constants
( fi C p as explained in Table I) of different chassis.
The contribution of each server in heat recirculation of the data center can be represented in a matrix as shown in Equation (9). The change in outlet temperatures of all
servers by increasing the power consumption of a single server is used to create power
profiles of each server in Computational Fluid Dynamics (CFD) simulations and can
be used to calculate the coefficient of recirculation and coefficient of exit air (the characteristic amount of hot air from each server actually reaching the cooling mechanism)
for each server. This knowledge can be utilized to predict the thermal map of the data
center given the power distribution of servers.
Qinghui et al. [2006] identified and stored these coefficients in a matrix as crossinterference profiles with respect to different discrete power levels of servers. The
algorithms of Tang et al. [2006] do not consider that heat recirculation proved to be
more expensive in terms of cooling cost as compared to Tang et al. [2007]. The servers
closer to the floor are the main source of heat recirculation, and the servers at the top of
the racks are the victims of heat recirculation because the hot air rises [Qinghui et al.
2006]. In our survey, the thermal-aware algorithms have considered heat recirculation
for scheduling decision making, ranking the servers for workload placement [Bash and
Forman 2007a; Yuan et al. 2010], and finding the best external thermal sensor for
determining [Bash et al. 2006] and controlling the CRAC units thermal zone [Marwah
et al. 2010].
Researchers have considered the heat recirculation factor (HRF) for ranking the
servers for workload placement preference [Bash and Forman 2007a; Yuan et al. 2010].
Heat recirculation was considered as the difference of supply temperature from floor
tiles and the temperature at the cold air inlets of the racks. But heat recirculation
prevention was not considered in workload placement.
The rack-top sensor, if chosen for temperature monitoring, might be the sensor most
affected by heat recirculation because the hot air is at the maximum volume and
temperature at the top of the rack than near perforated floor tiles. The rack-top can
be a good spot for early identification of a thermal anomaly for inlet temperature
due to heat recirculation more so than near a floor sensor location. But the rack-top
sensor can also be a source of frequent false-negative alarms even if there is no load
on servers near the rack-tops, whereas the sensors placed lower than rack-top may not
show frequent thermal alarms. This latter location for external thermal sensors can be
more appropriate to increase the overall air-blow power of the CRAC fan to blow off the
recirculated hot air around the sensor location. Blower optimization is out of the scope
39:13
of this article. However, sensor placement is a research issue and may require complex
analysis of possible locations to place the thermal sensors to maintain the quality of
sensed data [Krause et al. 2006].
Previously, Moore et al. [2005] in their pioneer work based the computing nodes
power budget distribution on heat recirculation in the data center. The temperature
difference of cold air supplied and cold air inlet of servers was used to calculate total
recirculated heat Q of each node i as shown in Equation (10):
Q =
n

C p.mi . Tiin Tsup ,
(10)
i=1
where Q is the linear combination of part of the hot air from other servers getting
mixed with the inlet of server i, as also noted and represented in matrix form by
Qinghui et al. [2006]; C p is the thermal capacitance of air inside the data center; mi is
the mass of air entering node i at temperature Tiin; and Tsup is the CRAC supplied air
temperature setting. No formula was given for quantification of heat produced except
that the HRF of each pod was used as the basis for the distribution of power to all the
sets of computing nodes (pods) in the following equation:
HRFi =
Qi Qre f
,
Qi Qre f
(11)
where Qi and Qi are the heat produced and recirculated by pod i, respectively. A
pod can also be assumed as a chassis or even a rack for demonstration. HRF can be
regarded as the thermal profile of each pod of computing nodes, which is independent
of data center workload because Qre f is taken for an idle data center. This, however,
is dependent on the presence of heat recirculation. If Tiin = Tsup, then Q = 0, and
therefore, there can be no calculation for power distribution in any equation.
Further, the heat recirculation is different for each server depending on the location
inside the chassis. The top servers are the worst victims of heat recirculation. At the
same time, the rack-top-mounted servers contribute the least to heat recirculation. So
if a pod has a load only on its rack-top-mounted servers, then its HRF is zero. These profiles were used to allocate workload to minimize heat recirculation and reduce cooling
cost. But with the reduction of heat recirculation, these profiles remain valid no more
because the power budget is allocated on the basis of the heat recirculation contribution. Further, the heat recirculation depends on per-server load and load distribution
around the data center; therefore, there can be innumerable values of HRF for each
node based on execution of workload and this would require the use of coefficients of
heat recirculation instead.
Lee et al. [2010] consider heat recirculation to be represented by the Supply Heat
Index (SHI), which is based on the fact that the temperature of the cold air leaving the
CRAC unit and the cold air reaching the server enclosure is different. The same concept
is used in Bash and Forman [2007a] and Yuan et al. [2010]. Lee et al. [2010] further
tuned the SHI value by considering the difference of temperatures of hot air leaving
crac
r
the servers and cold air supplied by CRAC: (Trout,ij Tcrac
out )(Tout,ij Tout ). Banerjee et al.
[2010] and Banerjee et al. [2011] consider heat recirculation for server ranking but
require a power distribution vector in addition to heat recirculation to decide the new
thermostat setting. The Energy Inefficiency Ratio (EIR) of different algorithms was
evaluated. The EIR is the total energy usage by an algorithm with heat recirculation
divided by the total energy usage without heat recirculation (optimum). This provides
the motivation to reduce heat recirculation. Lizhe et al. [2009] did not consider heat
recirculation or cooling effects for calculating the node temperature for scheduling.
39:14
Instead, the actual temperature from the RC model at any time was used minus the
heat dissipated to thermal sensors at that same time. If heat recirculation and cooling
effects are considered, then the calculations of Lizhe et al. [2009] can become more
interesting and cost saving.
Heat recirculation in microprocessors is possible through conduction, convection, and
radiation. These are the three ways heat dissipation occurs in nature [Gonzales and
Wang 2008]. Conduction is the basic mode, in which a core getting hot will transfer
some heat through conduction to the neighboring cores in a 2D architecture. The top
cores in a 3D architecture are worst hit by induction and radiation when the processor
chips are stacked one over another. Changyun et al. [2008] proposed that the processor
cores in 3D stacks gain heat through convection from the neighboring cores. A matrix
was created consisting of a coefficient of heat transfer from each core to the other cores.
The amount of heat conduction contribution by any core depends on the power used by
that core at any instance. A power vector was used to store the values of power used by
the cores. This is very close to Tang et al. [2007] and Qinghui et al. [2006] for the use of
coefficients of heat recirculation and power vector for data center servers. Scheduling
decisions are affected by heat recirculation, which is discussed in the thermal-aware
scheduling section.
The complexity of including heat recirculation in thermal modeling involves the manual identification process for the contributor servers of heat recirculation [Moore et al.
2005]. Additionally, for CFD simulations based on heat recirculation, many hours are
required to complete the profiling [Qinghui et al. 2006], and a CFD simulator cannot
generate all possible distributions of workload across the data center. Heat recirculation is often linked to the increase in inlet temperature of the servers. The factors
contributing to the increase in inlet temperature in addition to heat recirculation are
the nonefficient placement of servers and server racks [Chaudhry et al. 2014], air flow
across servers [Ahuja et al. 2013], thermal sensors placement [Xiaodong et al. 2013],
or a fault in the cooling mechanism [Bo et al. 2011; Junmei et al. 2013]. The presence of
heat-recirculation-based server grading may result in permanent underutilization or
nonutilization of some servers. Instead of a persistent use of heat-recirculation-aware
scheduling, this scenario may require a manual tuneup of the cooling mechanism
and/or equipment relocation. Also, the absence of heat recirculation makes this model
of limited use.
3.5. Heat Modeling Discussion and Comparison
If the servers are utilized in a pattern to minimize electricity usage, then there may be
few servers active at one time and all working at full utilization. To save more power,
the servers adjacent to each other in a chassis or rack are activated to minimize the
idle energy of the chassis and racks. Doing this might save energy spent in servers but
also may increase the power density in a small area of the data center called a hotspot.
This power density will result in heat concentration in the air covering that area. The
cooling mechanism may detect this phenomenon and gear up the cooling process. The
traditional cooling mechanism consists of CRAC units and mechanical chillers [Google
2012]. So a scheduler that tries to save power in the computational mechanism only
may end up creating hotspots and thus spend even more power in cooling. The workload
scheduling should be performed in order to lower the heat dissipation and minimize
the hotspots. This will reduce the thermal gradient across the data center. In this way,
there will be fewer loads on the cooling mechanism, therefore saving more electricity.
The thermal-aware scheduling ensures the reliability of computing hardware. The
optimization of the CRAC cooling mechanism and the use of economizers are out of the
scope of this article.
39:15
A choice has to be made among various options available for architecting thermalaware scheduling. Regarding heat modeling, the three models discussed have their
unique features and scope of parameters. But this is not so simple when the actual
runtime of the data center is considered. The varying temperatures of servers across
the data center increase the complexity in scheduling and monitoring. There can be
n number of temperature patterns across the data center. This causes ambiguity in
thermal profiling. The accurate static thermal profiles can be made for only a few
discrete values of temperatures. Dynamically updated thermal profiles can be used in
order to have more accurate and up-to-date values of heat across the data center. Such
profiles can be combined in the form of a thermal map. The only problem is the variance
in environmental variable values and the enormous data from the thermal monitoring
module.
The complexity increases with consideration of heat recirculation. The possibilities
of heat recirculation values are vast for each node. Creating a recirculation coefficients
matrix for each node is therefore required instead of using a single matrix of recirculation coefficients. But this is not practical. Also, the heat recirculation may not be there
if the hot air is exhausted out of the data center. The new data center architectures do
consider the removal of heat recirculation. But in the places where heat recirculation
is present, a nonmechanical and software-based thermal-aware scheduler may base its
decisions on dynamic monitoring of nodes and minimize the use of nodes that are major
contributors in recirculation. The nodes can be dynamically ranked on the basis of the
current status of the thermal map. The ranking may be based on air inlet or outlet
temperatures despite heat recirculation. The heat recirculation should be considered
for immediate neighbors or inside the same rack. Inside the racks, heat recirculation
is inevitable due to heat exchange by convection and the hot air rising from the servers
near the floor. Thermodynamics-based heat modeling that considers temperature variance by keeping other environmental variables as constants or valued by reference
tables can work inside the boundaries of computer science.
On the other hand, if the temperature reaches a lower threshold, it means the node
is working idly and consuming the base power. Such a node should be turned off or
put to a low-power state and the jobs such as VMs should be migrated. Such considerations related to workload migration have not been used in the thermodynamics
model or RC model. The data center heat models are compared in Table II in terms
of the base concept, the complexity of implementation and runtime involved, as well
as other parameters related to equipment heterogeneity, derivation of temperature
from power consumption, and workload migration considerations. The thermal network heat model reviewed in Section 3.3 does not consider heat recirculation as part
of the workload scheduling optimization problem. The chances for heat recirculation
could be considered during workload scheduling.
Within the scope of this article, for thermal-aware scheduling, the things to be included are heterogeneity of computing nodes in terms of heat generation and heat
recirculation. The heterogeneity can be due to heat recirculation. This is because a
heat recirculation victim will be taking in hotter air, and because of this, lesser heat
is removed from the node. The accumulation of heat inside the node will increase the
exhausted hot air temperature eventually. The thermodynamics model deals with heat
recirculation through hot air. The thermal network model considers heat recirculation based on the difference of temperature of cold air supplied from CRAC and that
reaching the servers. The thermodynamic model can generate an equipment relocation
recommendation on the basis of heat recirculation.
Another type of heterogeneity is the make and model of devices. Modern-day servers
come with various DTM techniques such as DVFS and energy-proportional architecture. The thermal-aware architecture of microprocessors and servers has also improved
39:16
Table II. Comparison of Data Center Heat Models
Based
Upon
Complexity
Thermal
profiling, can
be
ThermoAir flow
too complex
dynamics
for heat
model
recirculation
coefficients
matrix
Temperature
of cold air
Electrical
should be
resistance
RC
taken as
and
modeling
ambient
capacitance
temperature
Server
location
Thermal
Logical
can affect the
network network of
grading in
devices
thermal
network
Heat
Recirculation
Consideration
Equipment
for Workload Heterogeneity
Scheduling
Consideration
Power
Consumption
Link to
Temperature
Workload
Migration
Possibility
Consideration
Yes
No
Direct
No
No
Yes
Direct
No
No
No
Indirect
Yes
over time. The RC model can give an accurate temperature assessment for this latter type of heterogeneity. Thermal-aware scheduling using the thermodynamics or RC
model does not consider that the workload might have to be migrated from one server
to another. The workload migration is used reactively when a thermal violation occurs, such as the temperature of the air inlet or outlet that reaches a threshold. If the
temperature reaches a maximum threshold, then migrating the workload will lower
the power consumption of the host. This will lead to lowering of temperature. The
temperature threshold can be of node, ambient, or microprocessor.
4. THERMAL MONITORING AND PROFILING
Thermal-aware scheduling decisions are performed with the support of thermal monitoring and profiling. The power consumption and resultant heat dissipation from
servers and microprocessors can be represented by a coefficient or as a thermal profile
[Banerjee et al. 2010, 2011; Jonas et al. 2007, 2010; Mukherjee et al. 2007; Qinghui
et al. 2006, 2008; Tang et al. 2006, 2007]. Such characteristic phenomena can be a thermal rank of severs to be used as a base of equipment preference to make scheduling
decisions [Banerjee et al. 2011; Bash and Forman 2007a; Bash et al. 2006; Mukherjee
et al. 2009]. The contents of the thermal profile in terms of parameters considered are
dependent upon the heat model. A thermal map was created manually from external
thermal sensor readings in Bash and Forman [2007a]. Area-based interpolation was
used to determine the thermal temperature of the servers around each thermal sensor.
This matter, however, is solved in Yuan et al. [2010] by automating this procedure
through a software module.
A thermal profile is static if the thermal monitoring is performed to indicate temperature threshold violations or thermal anomalies only. To create a static thermal
profile, a one-time activity is performed to record thermal statistics such as outlet
39:17
hot air temperature for a range of servers power consumption. The static profile is
then used for making temperature predictions at the time of workload scheduling. A
computational workload scheduling performed by using the static thermal profile is
called offline scheduling. The offline scheduling remains unaware of the current thermal scenario unless there is a thermal violation. Further, the scope of static thermal
profiling is limited to the discrete levels of power consumption. Thermal profiling may
also be dynamic. Dynamic thermal profiling uses continuous monitoring and is more
suitable for online and proactive scheduling. Temperature predictions made from dynamic thermal profiling are more accurate because these are according to a real-time
thermal scenario. Both the static and dynamic monitoring yield a thermal map of the
servers in the data center.
Thermal profiling is used in microprocessors, such as in Murali et al. [2008], where
a lookup table is embedded in the microprocessor at the time of manufacturing. This
table consists of clock frequencies according to die temperature and performance. At
runtime, the processor selects the optimum frequency according to the performance
requirement. This technique reduces the waiting time of tasks and avoids thermal
anomalies as well. Another DVFS-based thermal profiling for microprocessors was
employed in Chantem et al. [2009]. Two frequency levels were used out of numerous
levels. A large- and a smaller-frequency level were chosen, which are not necessarily
the maximum and minimum levels, and allowed the processor to run at higher speed
till the microprocessor reached a maximum threshold of temperature. After this, the
frequency level was altered between the two levels till the execution was complete.
This was found to be better than using minimum and maximum frequency alterations
to maintain performance.
A thermal manager software module integrated with microprocessor hardware
was proposed in Khan and Kundu [2008]. With the help of hardware, the instruction
parallelism is throttled on the basis of the thermal anomaly level judged by the
thermal manager. This judgment is based on linear approximation on the basis of
a recent temperature reading and the current activity value of active threads. A
temperature prediction technique was proposed by Yeo et al. [2008]. This technique
proposed to predict future temperature based on the steady-state temperature
and current temperature of a core. A model for application-based and core-based
temperature prediction was developed. The least mean square method was used
to recursively calculate the application-based predicted temperature so that the
future temperature could be predicted for any thread by noting recent activity of
the thread. But the core-based predicted temperature is calculated through a simple
differential equation. Both of the predictive models were joined to give a combined
prediction of the thermal threshold violation by running a thread over a core so that
the thread could be migrated to a (predicted) cooler core before crossing the thermal
threshold.
There are several methods of thermal monitoring and thermal profiling depending
on the implementation cost and scope. The simplest and most cost-effective way is to
manually perform the monitoring and profiling. Surely it is simple, but it is not reliable
due to human error and the use of manual devices such as handheld thermometers.
The manual approach can be groomed with the use of temperature gadgets such as
thermal sensors and thermal cameras, which increase the implementation cost and
complexity. The implementation cost is greatly reduced and simplified by the use of
simulators such as CFD simulators. The simulators do not need a physical data center,
but they might not give results that are close to real time. Using a mix of methods is
more reliable to get a more accurate thermal map [Banerjee et al. 2011; Qinghui et al.
2008; Viswanathan et al. 2011].
39:18
4.1. Manual Profiling and Monitoring
This survey considers the manual profiling and monitoring as a part of offline scheduling. Manual thermal profiles are created by manually noting the power consumption
and corresponding heat generation and recirculation at the granularity of racks, chassis, and individual servers. Offline thermal maps can be generated by using the thermal
profiles. Manual thermal profiling may also involve the use of simulation tools in case
there is no real data center available. Work done by Moore et al. [2005] is based on
thermal monitoring and workload placement on the basis of thermal profiles of rackmounted servers at full utilization. The calibration phase is done by manually turning
on one rack or server group, called a pod, at a time. For each pod, a thermal profile
is created by manually calculating the heat from hot air and recirculated heat when
each pod is turned on and run at full utilization. Although done manually, in Bash and
Forman [2007a], the thermal map is based on an external sensor in the front and back
of the racks.
Kursun and Chen-Yong [2008] created microprocessor thermal profiles on the basis
of a benchmark run. Jaffari and Anis [2008] used the power leakage patron of a microprocessor chip, modeled as a statistics function, to estimate the chip temperature
on the basis of power consumed at runtime. In the recent 3D processor designed for
high performance, multiple 2D processors are stacked over one another just like blade
servers stacked in racks. Heat is passed through conductance from lower to upper processor cores. So the thermal-aware scheduling algorithm should efficiently place the
workload across the cores to avoid hotspots and conductance. Interestingly, Xiuyi et al.
[2008] noted that the most heated core among the cores that come one over the other
in a column-like arrangement may be the victim of heat conductance from the cores
below. This is similar to heat recirculation for servers where the servers get unwanted
heat from the neighborhood. Changyun et al. [2008] allocated a thermal grade to each
core of 3D processor according to core location and current load over it. DVFS was
applied on a per-core basis and not globally to avoid thermal hotspots. This technique
improves the throughput and avoids more hotspots than a simple distributed DVFS
technique. Coskun et al. [2009a] allocated a preference value to each core according to
a thermal index. The thermal index is a measure of how efficiently a core is cooled and
is dependent on the physical location of the core with respect to heat sink. Thermal
ranks are also used in data center thermal-aware scheduling where the servers are
ranked on the basis of cold air inlet or hot air outlet temperatures.
4.2. Thermal Gadgets
Accurate and timely thermal information can be generated with the help of thermal
gadgets like sensors and thermal cameras [Liu et al. 2013]. There can be multiple
sensors per unit area. The thermal sensors can be external sensor nodes and onboard
thermal sensors on servers and microprocessors. The use of thermal sensors embedded
in microprocessors is helpful since these sensors eliminate manual monitoring and their
use is the only option to have real-time temperature data in the case of microprocessors
[Ware et al. 2010]. But at the data center level, the thermal sensor data increases the
burden on the network in a data center. Ahuja et al. [2011] proposed to use onboard
sensors of microprocessors or servers to monitor the thermal status of data center.
Modern-day servers are equipped with thermal sensors on air inlet and outlet and the
microprocessor chip. These sensors provide better and more accurate thermal readings
of hot and cold air.
Thermal monitoring in Yuan et al. [2010] was based on thermal sensor readings and
analyzed by the centralized software component called daffy. This module has all the
data related to servers, such as vendor specification for temperature thresholds and the
39:19
location of servers in a data center. daffy collects the temperature readings from all the
sensors mounted on server air inlets and grades the servers according to how efficiently
the servers are cooled. The pod- or rack-level workload schedulers were used along with
centralized daffy. The daffy module provides the server ranking information to the podlevel workload placement scheduler, which places VMs according to the thermal grade
of each server. This concept of using thermal-based server ranking was explored by
Bash and Forman [2007a] and Yuan et al. [2010] previously with the exclusion of
virtualization and centralized thermal monitoring and control.
4.3. Thermal Simulators
Thermal profiles can be based on thermal simulator results. These simulators are reliable to give a near-reality picture. To make the simulator results more accurate, the
simulators are fed with parameters from real-life measurement and hardware characteristics. Simulators such as FloVent [Mentor-Graphics 2014] and HotSpot [HotSpot
2014] are highly accurate in creating thermal views for data centers and microprocessors, respectively. For proactive scheduling for microprocessors, Coskun et al. [2008b],
Coskun et al. [2008c], and Coskun et al. [2009b] proposed to model the thermal behavior of the processor as a time series using the HotSpot [HotSpot 2014] simulator.
Then, by using Auto-Regressive Moving Average (ARMA), the pattern of temperature
change was framed so that the temperature for a future time could be predicted with
reasonable precision. ARMA was applied to a part of a time series of temperature and
noted the error in prediction. These errors can be due to random thermal cycles that are
not auto-correlated. The thermal cycles are due to Dynamic Power Management (DPM)
in microprocessors that varies the power across processor cores according to load. The
prediction errors were normalized in the ARMA model by using the Auto-Correlation
Function (AFC), which checks that the events that cause the prediction errors have
close to zero correlation with themselves, meaning these are random events in time.
Once the heat model is set for a subset of time series, it is tested on time-series data
that was not used to train the model. The heat model is finalized according to prediction accuracy from the testing and then applied online for temperature prediction. The
sequential probability ratio test (SPRT) was used to tune the ARMA model for random changes in workload causing prediction errors. This model was applied to predict
the temperature for various proactive thermal management techniques such as thread
migration, DVFS, and shuffling threads across processor queues. This technique was
applied with and without DPM. ARMA-based proactive scheduling shows a reduction
in hotspots, thermal cycles, and spatial thermal gradients as compared to reactive
scheduling.
Work performed by Qinghui et al. [2006] used a CFD simulator to perform the profiling process similarly to manual profiling done by Moore et al. [2005]. The simulator
raised the power consumption of servers one at a time, stored the resulting hot air from
exhaust, and recirculated the hot air into a vector for some discrete values of power
usage. The simulations were repeated for all servers for n power usage levels, resulting
in n thermal vectors or profiles for each server. Thermal profiles can be combined to
make the thermal map of a data center. With the thermal profiles and power distribution vector in hand, the thermal map was manually created. This is faster than CFD
simulation time and equally as accurate. One shortfall is that if the servers are at different power usage levels than the power levels used for creating the thermal profiles,
then the prediction will have errors. Additionally, the profiling process takes hours on
the CFD tool. Thermal profiles created manually are static in nature, meaning that
these are not likely to change unless there is a major change in the data center layout.
A notable point is that Qinghui et al. [2006] aimed to find out the coefficients for the
39:20
amount of exhaust air of each server that reaches the CRAC unit and the part that
contributes to heat recirculation.
If the heat recirculation is significant, then it might be better to relocate the servers
and/or check the cooling infrastructure for faults and possible adjustment of the air
flow. Using the heat recirculation coefficients is yet another way to rank the servers
for scheduling preference. Banerjee et al. [2010] and Banerjee et al. [2011] use static
ranking of servers for job placement based on their maximum-utilization-based power
usage and heat recirculation. This requires thermal sensors for real-time monitoring.
However a thermal monitoring based approach as such was not verified.
Scheduling in Lizhe et al. [2009] requires thermal profiles of each job, which are used
by the simulator to generate a thermal map of sorted servers in ascending order of their
temperatures. Unlike other thermal maps, the thermal map in Lizhe et al. [2009] is
based on actual node temperatures and not the ambient sensor temperatures. This
is also a drawback because without the thermal profile, a job cannot be scheduled in
Lizhe et al. [2009]. A question arises regarding the availability of the thermal profile of
each job. A related point is the validity of the thermal profile in heterogeneous servers.
4.4. Thermal Data Filtering and Thermal Predictions
The proactive thermal-aware scheduling relies on the prediction of resulting heat and
the rise in temperature for a given workload. This allows the scheduling decisions to
avoid or minimize the peak outlet temperature and the thermal gradient. A further
enhancement is possible though the prediction of future workload. These two predictions allow the keeping of just enough chassis and servers to maintain a certain level
of QoS. The reactive scheduler can be enhanced with the workload prediction module
to bring it closer to proactive schedulers in benefits and energy savings. It is therefore
very important to verify the accuracy of the prediction techniques. With a hierarchical prediction model for the thermal-aware scheduler, there can be multiple levels of
prediction such as for admission control, workload placement engine, thermal-aware
scheduler, and so forth. The accuracy of the prediction modules is of vital importance
to support the objectives of thermal-aware scheduling.
The thermal data is required to be carefully managed to filter redundant information and pinpoint the hotspots. For this, the digital sensors are preferred over analog
sensors in microprocessors [Naveh et al. 2006] due to lesser overhead of deducting the
temperature value from digital data than from analog data. The digital thermal sensors
were used to accurately measure temperature in data centers by Marwah et al. [2010].
The advanced use of thermal sensors is to predict future temperatures so that thermal
anomalies can be avoided. These are used by proactive schedulers to arrange workload
according to the predictions so that the heat dissipation from servers is minimized.
Viswanathan et al. [2011] used a neural network unsupervised learning method
to predict thermal behavior of VMs and anomaly avoidance. The VM allocation was
dealt with as a multidimensional problem. The dimensions include the heat dissipation
from VMs and the specifications of VMs. But there was no information regarding the
prediction of behavior of VMs that are created dynamically to avoid a hotspot after VM
allocation.
Marwah et al. [2010] tested four techniques for thermal anomaly detection in data
centers. The techniques such as subthreshold, moving average, and time-weighted
average values for thermal readings were tested from one sensor node from a set
of 120 sensors. Three of these methods indicate a chance for a threshold violation
(the main threshold) called a thermal anomaly on violation of the subthreshold. The
number of predicted anomalies (or events) goes up as the subthreshold is set closer to
the main threshold. However, at each subthreshold, if different continuous-violation
time values are tested, then the number of predicted events goes down with the increase
39:21
in continuous-violation time value. However all of these three techniques show poor
results in predicting the thermal anomalies. The fourth method is based on a machinelearning classifier, the nave Bayesian classifier. This method is based on classification
and categorization of data points. A nave Bayesian classifier is used to learn the
conditional probability that a given data point is an attribute of a certain class. The
output of classifier is in the form of a set of probabilities, which is then plotted on an
ROC curve to increase true positives and reduce false positives. The probabilities that
were close to the top left of the curve were filtered. Then this approach was applied
on test data, and to improve the results, the partitions of training and test data were
shuffled and the results averaged 10 times. The results are better than the three
approaches but not very impressive. The timeline data of the sensor was discretized.
The frequency binning was used for discretizing. Entropy calculation was applied for
deciding the number of bins for up to 10 bins. Data from 132 sensors from 12 racks and
six CRACs was used. Each rack had five sensors on the cold aisle side and five on the
hot aisle side. There were two sensors in each CRAC. There were a total of 132 variables
after discretizing their values; correlation analysis was used to reduce the redundant
variables in two steps. In the first step, the variables at correlation coefficient 0.6 were
discarded and later at value 0.8. This left only 16 variables.
A hierarchical proactive thermal monitoring technique was proposed in Viswanathan
et al. [2011] involving sensor nodes and thermal cameras for monitoring and identifying
hotspots in a data center. Thermal cameras were used to get a 3D view and spatial
identification of hotspots in a data center. Using isotherms to transport dimensional
data of hotspots instead of whole images, the thermal image was re-created on the
receiver end with more accuracy and fewer burdens on the network. Sensor nodes
were used as a secondary level of monitoring. On identification of a hotspot by thermal
cameras, the sensing rate from thermal sensors is increased in the area of the hotspot.
Using a hierarchical approach can save the bulk of dataflow due to thermal sensors.
The sensing interval for sensor nodes can be increased till the hotspot is detected in
any area inside the data center.
Similarly, the 2D representation of hotspots using multiple thermal cameras was
proposed by Liu et al. [2013]. According to them, there is a possibility of recording
phantom hotspots, and complex calculations are to be performed for correct localization of a hotspot. Furthermore, considering the possibility of accurate placement of
thermal sensors [Krause et al. 2006; Xiaodong et al. 2013] for accurately identifying
and detecting the hot servers, there is minor motivation to install expensive thermal
cameras. Another major concern is the placement of a large number of thermal cameras
to detect the server temperature from top to bottom of all the racks across the data
center and the financial alternatives.
Jonas et al. [2007] used thermal sensor data to predict the thermal map of a data
center using neural networks. A fast and noninvasive approach was proposed to create
a thermal map of the data center. For neural network training purposes, the builtin thermal sensors of each server were used to gather data of air inlet and outlet
temperatures. The training was done on the basis of utilization and corresponding
outlet temperature at each chassis so that the neural network has only one input at
the chassis level in the form of utilization and one output in the form of predicted
outlet temperature. Jonas et al. [2010] used this neural-network-based thermal map to
train and predict the chassis-level outlet temperature. The predicted temperature was
then used to control the cooling mode for a multi-mode CRAC unit according to chassis
utilization. A CRAC unit can supply cold air in multiple modes. The neural network is
trained to predict the thermal map correctly for various CRAC modes on the basis of
rack power usage when a new rack is put in the place of an old rack. Jonas et al. [2007]
proposed an equipment relocation algorithm to arrange the racks in the data center
39:22
to minimize heat recirculation and peak inlet temperature of the servers by using a
heat recirculation matrix and shuffling the racks on the basis of the characteristic
heat recirculation shared by each rack to minimize the peak inlet temperatures of all
servers. Mukherjee et al. [2007] used the same technique in setting up node preference
on the basis of server inlet temperature.
Highly accurate thermal measurement results are obtained when based on Kalman
filters and applied on microprocessors [Sharifi et al. 2008]. Performance-factor-based
linear regression was used by Woo and Skadron [2006] to calculate localized temperature based on the current access rate of functional units inside a microprocessor and the
power model. This is tested by DVFS in Chung and Skadron [2006] to make softwarebased thermal estimators for microprocessors. Hwisung and Pedram [2006] considered
that the temperature follows a random pattern. The thermal states of microprocessors
were modeled on the basis of voltage and frequency. For each state, the DTM manager
used a Markov decision process to choose the next state with the least cost. The DTM
manager uses DVFS to switch to the next state.
The thermal state prediction method by Qinghui et al. [2006] does not need external
sensors since the prediction results are quite close to CFD simulation data from sensor
nodes, which can be used to tune the thermal profiles when the prediction model is
applied in real time to data centers. The thermal-aware scheduling and management
techniques of Qinghui et al. [2008], Qinghui et al. [2006], Tang et al. [2007], and Tang
et al. [2006] are based on thermal sensor readings for air inlet and outlet temperatures
of servers. But there were no details of thermal sensors or data-gathering procedures.
A CFD simulator was used for generating thermal profiles and maps. The thermal map
of a scheduling algorithm run can be predicted using a CFD simulator [Banerjee et al.
2011; Mukherjee et al. 2009]. This can save effort but is also time consuming.
A rack-level thermal predictor was proposed by Chen et al. [2011] to choose a chassis
within a rack for workload placement. Neural network was used to predict a chance of
hotspots. The predictor gives six options from which one is chosen on the basis of the
lowest air outlet temperature. But instead of implementing a thermal-aware scheduler, the servers are allocated on the basis of power usage efficiency. Viswanathan
et al. [2011] used sensor nodes to create thermal maps as a tier of hierarchical thermal
monitoring. Adaptive sampling was applied to lower the overhead of sensor data processing. The sampling was increased in the area of hotspots. Clustering was applied in
sensor nodes to reduce dataflow. In both of the approaches, the monitoring system has
no consideration of heat recirculation and thus may not be providing reliable thermal
information.
4.5. Comparison of Thermal Monitoring and Profiling Techniques
A thermal-aware scheduler relies on monitoring and profiling to get accurate and timely
information of the data-center-wide thermal map. These techniques are compared in
Table III. As explained before, the scheduling is offline if the monitoring is done to
detect the thermal anomalies when the scheduling is performed by using static thermal
profiles. Otherwise, the scheduling is online when a live thermal map is used. Table III
shows the comparison of various types of monitoring.
The use of thermal sensors is more practical. These are more accurate, more affordable, and faster than manual, thermal cameras, and simulators, respectively. The DTM
techniques used are not limited for each type of scheduling. There are many types of
DTMs. The choice between reactive and proactive scheduling decides the use of DTM.
Either the DTM techniques are applied after the occurrence of thermal anomalies
(reactive scheduling) or these techniques are used during the scheduling (proactive
scheduling). The proactive scheduler has to be designed to minimize the occurrence of
thermal anomalies. Otherwise, the scheduler will be acting like reactive.
39:23
Table III. Comparison of Thermal Monitoring and Profiling Techniques
Accuracy of
reading
Thermal
profiles
Financial cost
Network
traffic
Temperature
prediction
Prediction
accuracy
Continuous
monitoring
Latency of
anomaly
detection
Temperature
prediction
methodology
Data-centerwide
application
Manual
May contain
human error
Offline
Lowest
None
Thermal
Sensors
Depends on
sensor
location
Offline,
online
Proportional
to density of
installations
High
Thermal Cameras
Highly accurate
Thermal Simulators
Depends on software
Online
Offline
High
Licence fee
High
None
Possible
May contain
human error
Impossible
High
Static
thermal/
power
profiles
Practically
impossible
Depends on prediction methodology used

Possible
Low
Impossible
Low
High
Neural network, moving average, correlation, machine

learning, Kalman filter, Markov decision process,
n-approximation, linear regression
Possible
Practically impossible
Modern-day servers come with the processors having DTM techniques that are triggered as soon as the temperature threshold is violated. The importance of the heat
model becomes evident with the considerations of environmental variables such as air
temperature. The hotter the inlet air of the server, the greater are the chances of DTM
invocation.
5. THERMAL-AWARE SCHEDULING
As discussed in the heat modeling section, the power consumed by computing mechanisms is eventually dissipated as heat. If the computations are to be scheduled in
a thermal-aware manner, then there should be less heat for the cooling mechanism
to remove. The thermal-aware scheduler uses the thermal profiles and predictions to
place the workload across the data center in order to lower the overall heat.
Thermal-aware scheduling is different from power-aware scheduling. The latter considers minimizing the number of active servers and the former considers minimization
of heat from active servers. The power-aware scheduler can end up concentrating the
workload to few active servers and turning off the idle servers. The active servers
remain at the maximum power consumption levels because these are utilized to maximum levels. As a result, there will be high dissipation of heat in a small area, which
may result in creating a hotspot if the cooling and ventilation in that area are insufficient. Such hotspots can invoke the cooling mechanism to speed up cooling. As a result,
the energy spent on cooling during the life of a hotspot may be more than the electricity
saved in computing [Parolini et al. 2012]. This means that a power-aware scheduler
may end up increasing the PUE of the data center due to excessive cooling. Further, continuously using the servers at high utilization may lead to hardware failures,
39:24
particularly at high inlet temperatures [Rodero et al. 2012]. The hardware failure leads
to a loss of workload and degraded performance. The performance loss in case of cloud
data centers results in monetary penalties for violation of the service-level agreement
(SLA) with the cloud users [Wu et al. 2012]. The increased PUE and the monetary
losses result in increased TCO over time. The outlet temperature of the server is also
dependent on the inlet air temperature as per Equation (2) and as demonstrated by
Rodero et al. [2012] and Chaudhry et al. [2014]. Thus, the hotspots can arise in a region
of high inlet temperatures and a task scheduler needs to be thermal aware to lower
the chances of hotspots.
The thermal-aware scheduler may not use a lesser number of active servers than a
power-aware scheduler for a similar workload. Depending on the technique, a thermalaware scheduler minimizes the total heat generated by the servers. Thus, the chances
of hotspots and the cooling load are lowered. Idle servers can be turned off to save more
power. A thermal-aware scheduler can not only lower the PUE but also lower the TCO
of the data center by ensuring infrastructure reliability and energy savings. The total
energy consumption as a result of thermal-aware scheduling is calculated by noting
the energy consumed on computing, and the cooling energy is calculated with reference
to COP and computing energy [Moore et al. 2005; Tang et al. 2007; Mukherjee et al.
2009]. The basic equation is shown by Equation (12):
Ptotal = PIT + Pcooling = PIT

1+

1
,
C OP(Tsupplied)
(12)
where Ptotal is the total energy consumption of the data center, PIT is the energy
consumed by the IT infrastructure, and Tsupplied is the set temperature of the CRAC
unit.
It is possible to apply Equation (12) on power-aware and thermal-aware schedulers.
But thermal-aware schedulers try to calculate the cooling cost on the basis of actual
inlet temperature received Tinlet by each server [Moore et al. 2005]. Thus, it is possible that the power-aware scheduler would calculate a higher cost of cooling than the
thermal-aware scheduler. Thermal-aware workload scheduling can result in up to 5 C
colder ambient temperature [Moore et al. 2005]. This results in millions of dollars of
savings annually for a large-sized data center [Moore et al. 2005].
5.1. Reactive and Proactive Scheduling
Data center scheduling can be regarded as proactive if the computational workload is

scheduled in a thermal-anomaly avoidance way. A reactive approach takes steps after
an anomaly has occurred. A proactive approach requires planning that includes prediction and estimation of temperature and thermal profiling as discussed in Section 4.
Other techniques can help in predicting the thermal scenario in the near future. Such
techniques can be neural networks [Jonas et al. 2007], machine learning [Marwah
et al. 2010], and as simple as moving average [Marwah et al. 2010]. The efficiency of
the proactive approach depends on the prediction methods, which have their limitations [Marwah et al. 2010], such as chances of false positives in prediction of thermal
threshold violations, increased complexity, and a long time to predict.
The reactive approach requires neither thermal prediction nor the maintaining of
online thermal profiles. Workload is scheduled on the basis of current thermal statistics
at or around the execution location. No planning or estimation for postscheduling
temperature is necessary. Remedy DTM measures are invoked only after a thermal
anomaly occurs, for example, when the inlet temperature of a server has exceeded the
maximum threshold after the workload has been placed on it. A reactive approach
39:25
can thus lead to disaster such as equipment failure due to overheating. But a reactive
scheduling approach is faster than a proactive approach.
A proactive approach requires a training phase or a learning phase and is therefore
slow to implement. A reactive approach can perform better than proactive scheduling where the jobs show unstable behavior with respect to resource utilization. Both
types of scheduling may perform the same when the jobs seldom deviate from a static
behavior. In this latter case, the reactive approach will manage to stabilize the data
center after a few steps depending on the static utilization of resources by the jobs
being executed [Lee et al. 2010].
Reactive scheduling is simple and requires less time to perform scheduling tasks but
does not offer as much cost and energy savings as proactive scheduling. But a proactive approach is time consuming and complex to implement. So there is a tradeoff in
deciding the type of scheduling. Classification is possible for thermal-aware scheduling algorithms. The basic thermal-aware algorithm only considers the current thermal
state of a server [Tang et al. 2006], such as inlet cold air temperature. Scheduling
decisions made through a basic level can be very fast but the least energy efficient,
as explained later in this section. The next stage is the consideration of heat recirculation [Qinghui et al. 2008; Tang et al. 2007], which is more energy saving. The last
level of thermal-aware scheduling considered in this article consists of approaches that
are more proactive and utilize prediction methods with the basic thermal-aware and
recirculation-aware scheduling.
The thermal-aware scheduling involving the CRAC thermostat setting of Banerjee
et al. [2010], Bash and Forman [2007a], Bash et al. [2006], Mukherjee et al. [2007],
Parolini et al. [2012], and Yuan et al. [2010] is not covered in this article. The scheduling decision problem involving the consideration of an optimized CRAC unit cooling
control setting [Banerjee et al. 2011; Mukherjee et al. 2009] is not considered. These
schedulers require extra knowledge of mechanical and electrical engineering to implement CRAC control. Thermal-aware scheduling can also be classified on the basis of
thermal parameters considered. This classification starts with the basic level, where
the gross heat generated is considered the result of power consumption. The next level
is where the schedulers consider heat recirculation as the contributor to gross heat.
The techniques of these two classes are more reactive and make less use of extensive
prediction of server temperature. The last level of classification is the optimized level
in which the schedulers are more proactive and use thermal predictions methods such
as neural networks.
5.2. QoS Assurance Considerations for Thermal-Aware Scheduling
When it comes to energy savings through thermal-aware scheduling, performance decline is one of the concerns. This is because thermal-aware scheduling relies on DTM
techniques to control the temperature and to lower the burden of the overcooling mechanism. This introduces the challenges related to QoS such as maintaining SLA [Chong
et al. 2014] and a tradeoff of penalties paid for violation of SLA and the cost savings
through thermal-aware scheduling.
The data center computing infrastructure is by default utilized to ensure maximum
throughput [Parolini et al. 2012]. The servers are utilized without any consideration of
thermal efficiency. The resultant peak temperatures, heat recirculation, and hotspots
are covered by keeping the cooling temperatures to a minimum [Parolini et al. 2012].
The unnecessary overcooling cannot be avoided despite the server and chassis consolidations [Pakbaznia et al. 2010] unless a thermal-aware approach is followed.
Considering the fact that the data center servers are mostly idle or utilized for up
to 50% [Fan et al. 2007; Barroso and Holzle 2007], there is plenty of room to apply
thermal-aware scheduling and to increase the energy savings. The energy savings
39:26
through thermal-aware scheduling are not earned on the cost of SLA violations for an
underutilized data center. There can be a possibility to propagate the energy savings
and reduced costs of running data centers to the service users in the form of reduced
rental rates.
Cloud service providers [Wu et al. 2012; Lavanya and Ramachandran 2013], while
ensuring performance and QoS, also want to maximize profits and always fulfill a
user request when profitable [Wu et al. 2012]. Assuming a data center provides ondemand computing on the basis of a thermal-aware scheduler, the admission control for
incoming VM requests can be based on thermal parameters of the data center. Consider
the Stochastic Model for admission control [Bruneo 2014], which considers a set of
tokens, a token bucket for each predefined state and the token generation rate. There
is also a predefined set of alternate transactions towards alternative states in case the
token bucket of a state is depleted. The tokens can be based on thermal parameters,
such as the overall thermal gradient for admission control. Similarly, for each rack, the
outlet temperature, heat recirculation, and inlet temperature can be used to allocate
tokens and to decide the rate of regeneration of tokens to ensure the QoS level.
For the microprocessor level, a model similar to the Stochastic Model and proposed
by Tolosana-Calasanz et al. [2012] can be considered. The token-bucket-based workload engine proposed by Tolosana-Calasanz et al. [2012] allows parallel execution of
different workloads according to QoS assurance for each workload. The token bucket
filling rate may be lower than, equal to, or greater than the token consumption rate
of the process. The SLA decides the volume and token generation rate of each bucket
or the resource bucket. In case of thermal-aware scheduling at the microprocessor
level, the scenario of a cloud data center providing IaaS can be considered. For each
VM, a token bucket for a temperature contribution can be added to the workload engine. Each VM should be allowed to generate a certain amount of heat over a time
interval. The DVFS can be applied to contain each VM inside the thermal bucket
boundaries.
There may be some challenges in applying the thermal-aware scheduling along with
QoS. These include coordination between server-level and microprocessor-level workload schedulers. The admission control implementations at the server level and microprocessor level may contradict each other if these are based on different thermal
parameters such as heat contribution and inlet temperature. Therefore, the implementation of QoS for thermal-aware scheduling in data centers is regarded as future work.
The QoS and thermal-aware scheduling may turn out to be conflicting goals for
HPC computing [Rodero et al. 2012]. The reasons for this may be the task deadline,
task priority, and availability of server resources. The QoS assurance for a thermalaware scheduler also depends on the reactive/proactive nature of the scheduler. The
reactive scheduler may undergo poor QoS due to frequent thermal anomalies. Even
if the DTM technique such as workload migration is applied to reduce a thermal
anomaly, the destination server will have to be carefully chosen to avoid a further fall in
QoS. Thus, a reactive scheduler has to use a proactive approach in case of DTM invocation. The other DTM techniques such as DVFS and pinning also require an analysis
of the post-DTM application scenario. On the other hand, a proactive scheduler has
to have a QoS consideration prior to choosing a server for workload placement. Considering the QoS vaguely equal to the resource availability on a server, a proactive
thermal-aware scheduler will automatically ensure the QoS. However, there is an exception for the deployment of multi-VM applications such as for Software as a Service
(SaaS). In case a batch of VM is to be deployed on a single server and the server does
not have enough resources to ensure the QoS (determined by SLA), then that batch of
VM will have to be scheduled on a second server. This scenario leads to underutilization
of the servers [Rodero et al. 2010] and thus increases the PUE.
39:27
5.3. Basic Thermal-Aware Scheduling
The basic level of thermal-aware scheduling consists of those algorithms that consider
that the generated heat is the result of power consumption when computational workload is executed. The heat is measured from the temperature sensors located on chip, on
board, at the air inlet/outlet, or through thermal monitoring and profiling mechanisms
providing the thermal statistics of air. If the power consumption of a computational
task is known, then the outlet temperature of a server can be predicted by analyzing
the active computational load. This analysis can be performed at the time of scheduling
as well.
5.3.1. Server-Based Basic Thermal-Aware Scheduling. Moore et al. [2005] proposed the
proactive One-Pass-Analog algorithm in which, at first, the power versus outlet temperature was calculated for each server using a test-run workload. This is the static
thermal profile that is used as a reference only for power budget allocation and is not
updated over time. Then, for each server, the power budget was allocated according to
the outlet temperature. The algorithm allocates a power budget to each server according to the current outlet temperature and reduces the power budget to keep the outlet
temperature of each server within a uniform range from a reference temperature. The
thermal profile is followed in order to find the optimum power. This algorithm saves
cooling power by keeping a uniform temperature across the data center. This is an
initial work performed with the consideration that power is directly proportional to the
outlet temperature. However, it may happen that a server may only get enough power
budget to run idle and no computational activity can be performed on it.
Moore et al. [2005] proposed the Coolest Inlet algorithm, which prefers to allocate a
power budget and thus the workload to nodes with the coolest inlet temperature without an objective to minimize the nonuniformity in outlet temperatures of the servers
across the data center. The coolest inlet preference, when the heat recirculation is sure
to occur, makes the algorithm a pure reactive algorithm that may choose a new server
in every round. The coolest inlet preference is desirable according to Equation (2), as
this will result in lower outlet temperature comparatively among a set of homogenous
servers. As compared to proactive algorithms, the reactive algorithms do not require
the profiling of servers and therefore are simple to implement. But this may result in
an unstable state of heat across the data center.
The coolest inlet preference, however, will always choose a few specific servers with
the lowest inlets, whereas the Uniform-Workload algorithm by Moore et al. [2005]
places an equal power budget across all servers in the data center so that all will have
uniform outlet air temperature. Servers are able to execute computational tasks according to their power budgets. Comparison in terms of cooling costs evaluated through
a simulator showed that the proactive algorithm One-Pass-Analog performed well in
low and high utilization since it keeps adjusting the power budget according to outlet
temperatures. Comparing the uniform task allocation algorithm with basic thermalaware algorithms by Moore et al. [2005] shows no considerable lowering of cooling
costs up to 50% of the total data center utilization. But at medium to high utilization
(60%80%), the uniform task allocation is most expensive in terms of cooling costs.
Therefore, in terms of PUE, the thermal-aware scheduling algorithms have lower PUE
than non-thermal-aware, but at high utilization of the data center.
Often, researchers try to keep the outlet temperatures of the servers at a homogenous
level. Tang et al. [2006] proposed basic thermal-aware algorithms such as the uniform
outlet profile (UOP), which is a proactive algorithm and follows the thermodynamic
model. This is almost the same as the One-Pass-Analog by Moore et al. [2005] for
the objective of keeping a uniform outlet temperature. The workload is divided into
smaller tasks. With the corresponding power consumption to execute each subtask
39:28
known, the total power consumption is calculated for all subtasks. Then, by following
the thermodynamics model, each server is allocated as many subtasks, with reference
to the inlet air temperature, so that the outlet temperature of all servers remains
within a certain value. This is similar to One-Pass-Analog [Moore et al. 2005], with the
difference that One-Pass-Analog is a power budget allocation algorithm and not a task
allocation algorithm, whereas the UOP assigns workload to servers so that their outlet
hot air temperature remains uniform. As compared to the One-Pass-Analog algorithm,
UOP does consider the idle power consumed by chassis and blade servers. However,
the implementation of UOP requires a knowledge of discrete-level power consumption
by the servers when a task is executed. Second, it may not always be possible to divide
a task into equal subtasks.
The second algorithm by Tang et al. [2006] is Minimal Computing Energy (MCE),
which turns on only that number of chassis that are required to complete a task. This
algorithm allocates more load to the node with lower inlet air temperature (similar to
the Coolest Inlet algorithm [Moore et al. 2005]). Using a few servers at full utilization
may lead to hotspots, which leads to excessive cooling. This also increases the PUE.
The third algorithm is Uniform Task (UT), which allocates equal load to each node.
The idle power consumed by servers and chassis is considered for testing the algorithm performance. Tang et al. [2006] considered an algorithm as optimal if it turns
off idle servers and idle chassis. And the algorithm that does not turn off idle servers
and chassis is called nonoptimal. The blade servers having analog and discrete power
consumption are considered, meaning that either the power consumption is taken as
real time according to the current microprocessor utilization rate or the power being
consumed is taken to be equal to the nearest discrete level according to hardware specifications. The algorithms were tested in four groupings: discrete nonoptimal, discrete
optimal, analog nonoptimal, and analog optimal. Simulation results show that the
MCE algorithm results in a minimal total computing energy cost because it allocates
tasks to a minimum number of servers. UOP performs better than UT at low data
center utilization rates, whereas UT outperforms UOP at high utilization rates.
Algorithms save more power in optimal states obviously by turning off idle chassis
and servers. However, the criteria for shutting down a chassis is not given by Tang
et al. [2006], whereas Moore et al. [2005] found it more energy saving to shut down the
servers that cause the most heat recirculation and term this as digital. It is reported
by Tang et al. [2006] that the optimal algorithms result in delayed startup and delayed
scheduling of new tasks. It is also notable that the MCE prefers to use few chassis
and prefers the tasks allocation on the basis of lower inlet temperature. Continuously
using a few chassis may lead to hardware failure [Moore et al. 2005; Tang et al. 2006]
and thus, the total TCO may increase instead of having the expected decline in PUE
due to a few active chassis and servers. Additionally, the PUE of all the algorithms
remained more than 2.0 [Tang et al. 2006].
A proactive approach based on task-based thermal profiling was introduced in Lizhe
et al. [2009] and Wang et al. [2012]. The jobs were simply scheduled on the basis of
the hottest job on the coolest server and hot-job-before-cold-job scheduling by following
the microprocessor-based task scheduling principles [Jun et al. 2008]. The RC model
was used to find the current ambient temperature of the server and used the thermal
profile of each job to predict the postjob temperature of a node. This scheduling can
avoid the maximum temperature threshold proactively and reduce cooling costs. Jobs
are held till a suitable node with respect to postjob temperature is available. This has a
disadvantage of longer response time due to job holding. The task-temperature-profilebased approach requires homogenous servers because a task may result in different
heat generation on heterogeneous servers. The authors have not provided the outlet temperature analysis for COP and cooling costs. However, considering that the
39:29
task-temperature-profiling-based approach is meant to minimize the outlet temperature increase to be similar to the uniform outlet temperature approaches of Moore
et al. [2005] and Tang et al. [2006] without shutting down the idle servers, the tasktemperature-profile-based approach offers fewer benefits in cooling costs and PUE than
One-Pass-Analog and MCE.
Instead of keeping a workload scheduling that remains static to a node, there should
be an option to migrate the workloads from one node to another. VM migration is used
as a DTM technique to shift the workload, for example, in case of thermal anomalies.
Vikas [2012] proposed a thermal-aware green cloud scheduler. There is a centralized
scheduler that contains the global queues for new VM requests and a node information queue that contains the thermal information about each node, such as optimal
working temperature and thermal threshold for processor temperature. Requests for
new VMs are queued in a waiting list. The servers are sorted in increasing order of
the processor temperatures in the preference queue for mounting VMs. The nodes with
core temperature reaching the vendor-specified critical limit are put in a critical queue
in decreasing order of temperatures. If the critical queue is not empty, then some VMs
from the nodes in the critical queue are migrated, starting from the top node. One VM
is migrated to a suitable node at a time till the core temperature of the top node in the
critical queue is lowered and reaches the normal range. If the critical queue is empty,
then new VMs are created according to requested specifications in the waiting queue. A
suitable server is chosen by sorting the server list in increasing order of temperatures.
The top server is chosen from this list.
Processor pinning is the basic level of proactive scheduling by which the VMs are
pinned to physical processors permanently. Rodero et al. [2010] compared DVFS with
pinning and VM migration. Among all of these reactive techniques, VM migration is
the best for lowering the temperature of a host machine. However, a new host should
be chosen that should not get the maximum temperature threshold violated after it
receives the VM. Processor pinning can save more energy when more than one virtual
processor is pinned to one physical processor [Rodero et al. 2010]. This technique will
lower the throughput of each VM. The pinning technique is closer to the microprocessorlevel scheduling, but it is not clear that a microprocessor with each core of, for example,
1.8GHz can accommodate two vCPUs that require the computing power of 1.8GHz
each. Perhaps the VM migration is a more appropriate and practical approach to ensure
QoS.
5.3.2. Basic Thermal-Aware Scheduling for Microprocessors. Researchers have proposed
thermal management through cooperation between hardware and software [Kumar
et al. 2006; Naveh et al. 2006]. Tasks running on a microprocessor are indexed as to
how hot the tasks are. This is calculated through regression analysis of the recent
past and prediction of the possibility of a hotspot, and clock gating is used reactively
to prevent the hot tasks from exceeding a maximum thermal threshold. Merkel et al.
[2005] proposed a scheduling technique that creates a power-consumption-based profile for each task in the run queue of microprocessor cores and tries preemptively to
balance the load by shifting tasks in run queues. A running task that is likely to exceed
the thermal threshold is preempted and migrated to another core. This technique increases the throughput of a processor by minimizing the chances of thermal anomalies.
Choi et al. [2007] proposed a proactive technique to execute jobs in a balanced manner
among all cores. Deferring a hot job and executing a cool job or halting the processor
temporarily and then continuing executing the hot job can keep the temperature lower
with less overhead. If these techniques are applied to a scenario of tasks linked to a
VM, and there are few and a fixed number of cores on each server, the throughput of
the VM can decrease or the cores will be underutilized.
39:30
Arani [2007] proposed to reactively assign more tasks to cooler cores than hotter
cores and lowering the clock frequency of hot cores that have a shorter-length queue of
tasks. Lowering the clock frequency of cores ensures that all the cores finish the tasks in
a time-balanced manner. Jun et al. [2008] showed that choosing between consecutive
hot and cold jobs, if the hot job is to run before the cool job and then running the
cool job, results in an overall lower temperature of the microprocessor than preferring
to run the cool job before the hot job reactively, provided that the hot job will not
violate the thermal threshold when executed. However, in the case when no cool job is
available, the microprocessor will have to run one hot job after another. This may lead
to hotspot formation unless clock gating is used to halt the processor to cool down for
short intervals.
The global stop-and-go may not be the best choice when performance is considered.
Various classes of DTM such as processor halting (stop-and-go), DVFS, and thread migration in multicore chips were tested in Donald and Martonosi [2006]. The application
of these techniques was performed on the basis of per core (distributed) and uniform
to all cores (global). The results showed that global stop-and-go is the worst policy that
lowers the performance of the processor. The processor performance was expressed in
terms of a billion instructions per second (BIPS), duty cycle (the ratio of actual work
done to the possible work that could be done), and throughput. The results showed
that the best performance is given by the distributed DVFS policy. Two types of thread
migration policies were tested; the first is on the basis of performance counters of various hardware components of each core, and the second is on the basis of temperature
readings provided by on-chip thermal sensors. In both types of migration policies, the
workload is shifted from a hot core to a cold core. The results confirmed that the best
performance is given by a distributed DVFS DTM policy that uses on-chip thermal
sensor data for thread migration.
Mulas et al. [2008] proposed to consider the thermal and performance state of the
destination core for thread migration. According to Mulas et al. [2008], tasks are migrated between two cores by considering that the destination core is colder and is
operating at a lower frequency than the source core. Also, if the total power requirement to migrate a task is less than that of avoiding migration, then it results in an
overall more balanced temperature across the chip. The migration should be invoked
when the temperature of a core reaches a threshold above-average chip temperature.
The migration policy shows better performance than the halt-and-go policy. However,
the temperature of the source core may increase when migrating task data. Thus, it is
thermally expensive to migrate large-sized tasks.
Sun et al. [2007] tried to balance the use of power across all cores. The cores in the
3D stack that are closer to the heat-sink can use higher power without a significant
thermal impact than the cores that are farther from the heat-sink. The thermal impact
of task assignment to each core was calculated by dividing average power consumption
of the core by the average thermal resistance. Three algorithms were proposed. The
basic algorithm tries to allocate tasks to cores such that each allocation causes minimal
thermal impact. Later, the voltage of the core is scaled to an optimum level such that
the task is completed before the deadline. The second algorithm iteratively reduces
hotspots by increasing the slack time of the hot task through core voltage scaling. This
algorithm also adjusts the slack times of sequential tasks queued in the neighborhood
of the hot task so that overall, the tasks finish within their deadlines. In the third
algorithm, the previous two algorithms of spatial and temporal management are combined to create the most energy-saving algorithm. Considering the QoS assurance, if
all the tasks are characterized to be hot, then there might be no other solution but to
use DVFS and/or clock gating, which will reduce the performance. The migration of
a hot task to another core might not do any good because the destination core might
39:31
get as hot as the source core is. The random use of both clock gating and DVFS might
minimize the chances of hotspots when there are multiple hot tasks in a sequence.
5.4. Heat-Recirculation-Aware Scheduling
Servers placed closed to the floor are major contributors to heat recirculation because
the hot air from their outlets passes through various servers when rising. This is more
likely to happen inside racks when hot air is not ventilated efficiently [Khankari 2009].
The servers at the top should therefore be utilized more because they are closer to the
ceiling, from which the CRAC unit can effectively remove the hot air. This is achieved
with the help of thermal awareness based on the coefficient of heat recirculation for
each server location. The algorithms that do not consider heat recirculation and place
the workload on the basis of how efficiently a server is cooled place more workload on
servers closer to the floor. This will increase the chances of heat recirculation.
The workload is preferably assigned to either rack-top servers or lower servers; there
is always a chance of underutilization of these computing resources. Therefore, a tradeoff is to be set when deciding on the type of algorithm to be used. If the data center
is being partially utilized, then there is a possibility that any of the thermal-aware
algorithms will perform well. But in case of a fully utilized data center, there will be
no alternative but to equally distribute workload. In this case a backfilling algorithm
(that opportunistically schedules jobs during the free time slots of the servers) can be
used with thermal consideration [Lizhe et al. 2009]. A microprocessor can take advantage of slack time without any impact on performance [Venkatachalam and Franz
2005]. Parolini et al. [2012] proposed an equipment relocation algorithm to minimize
heat recirculation. Heat recirculation is also considered in workload scheduling and a
coordinating system of the thermal network and computing network for energy optimization has been proposed.
5.4.1. Heat-Recirculation-Aware Scheduling for Servers. In case a workload is to be put on
a server and the server has a limited or cut-down power budget not able to fulfill the
workload, Moore et al. [2005] proposed the Zone-Based Discretization (ZBD) algorithm.
According to ZBD, a server is allowed to borrow power budget from the horizontal and
vertical neighbors. Doing so will decrease the power budget of the neighbors by the
amount lent to the borrowing server. This will result in the borrowing server dissipating
more heat and the neighbors dissipating less heat according to the power budget. ZBD
allows a server to borrow more power from vertical than horizontal neighbors because
the heat rises vertically. The overall power consumption in a zone remains the same
as that of One-Pass-Analog except that the servers in ZBD can take the load exceeding
their power budgets if their neighbors have some spare power. This scheme is more
flexible than the One-Pass-Analog because a server can borrow the power if that server
was running idle. The power is borrowed to fully utilize a server. However, it is not
defined whether any alternative servers are considered to put the workload on or not.
We assume that a server from the group of alternate servers is chosen, which will lower
the overall heat recirculation. In the example given in the article, only one server was
chosen from a group of 15 servers. This means that only one server might be active
from a rack. This is not desirable for a commercial data center because it might result
in a lower PUE but overall revenue is decreased due to low performance.
Moore et al. [2005] calculated the heat recirculation by turning on the servers in
groups called pods one after another and noting the heat generated and recirculated
after each pod was turned on when all the servers inside the rack reached maximum
utilization. A test-run load was used to calculate the heat recirculation factor (HRF)
for each pod, which is the ratio of heat generated minus heat generated in the idle
data center and heat recirculated by the pod, minus heat circulated in the idle data
39:32
center. The HRFs of all the racks were added to have a global HRF sum. The power
budget allocated to each rack is according to the percentile share in the global HRF
sum. As a consequence, the workload allocation for each is according to HRF. This is
called the MinHR (minimum heat recirculation) algorithm. There are three variations
of MinHR [Moore et al. 2005]. The basic algorithm that uses One-Pass-Analog is the
Analog-Min-HR algorithm, when the servers are sorted according to their power budgets before allocating workload. When the servers are arranged in ascending order of
HRF, it is called the Digital-Max-HR algorithm, where the workload is placed starting from the bottom of the list. Thus, Digital-Max-HR prefers to allocate workload on
servers that are the biggest contributors to heat recirculation. When the workload is
assigned to servers by arranging in descending order of HRF, then it is the Digital-MinHR algorithm. This algorithm prefers to allocate workload on servers that contribute
the least in heat recirculation. These algorithms are based on the phenomenon of
power budget distribution among servers on the basis of minimizing resulting heat
recirculation at that power usage. Digital-Min-HR has the best energy savings and
the least heat recirculation because it always avoids the servers that are the major
contributors to overall heat recirculation in the data center. However, the calculation
of HRF is more like a permanent grading of the servers, whereas in the real world,
the load on each server may rise and fall randomly. This means that the value of
HRF will change over time, and this can lead to unexpected results. As compared to
non-heat-recirculation-aware algorithms by Moore et al. [2005] presented in a previous section, Digital-Min-HR provides the lowest cooling cost and hence the lowest
PUE.
Mukherjee et al. [2007] extended Moab [Moab Grid Suite of Cluster Resources Inc.
2015a] by integrating thermal awareness. The scheduler takes input from a thermalsensor-based monitoring module to prioritize nodes on the basis of low inlet air temperature. TORQUE [Adaptive Computing 2015b] was used as a cluster resource manager.
Experiments were performed with thermal-aware reactive and proactive scheduling.
The reactive algorithms do not consider resulting heat recirculation and schedule the
jobs on the basis of the current thermal map of the data center, whereas the proactive
algorithm considers the resulting heat recirculation after the scheduling step. Experimental results showed that the reactive algorithm that uses the minimum number of
powered-ON servers and the proactive algorithm that minimizes the maximum inlet
temperature perform almost equally in reducing the cooling cost. The cooling cost becomes significant when the data center is utilized above 60%. Below this, the best reactive and proactive algorithms perform equally. The PUE of the algorithms presented by
Tang et al. [2006] is higher than the heat-recirculation-aware algorithms of Mukherjee
et al. [2007] despite the implementation of similar algorithms. Maybe this is because
Tang et al. [2006] relied on a CFD simulator and Mukherjee et al. [2007] used the
real-world integration of the thermal monitoring system. But overall, the real-world
implementation results agree with the simulated results, and hence it can be concluded that the heat-recirculation-aware scheduling can make the data center cooler
than the basic thermal-aware algorithms, and hence are more efficient for reduced
TCO.
It is notable that all the thermal-aware scheduling techniques based on minimizing
heat recirculation cannot be applied to a scenario without heat recirculation. Similarly,
considering that a chassis will remain powered on and be consuming the base power
for as long as there is a single active server in that chassis, this is not power efficient
and may have the least effect on improving the PUE since the related papers calculate
the total electric energy consumption with reference to computing energy. The base
energy consumed by the chassis is included in computing energy, and a single chassis
consumes as much power as a set of fully utilized blade servers [Tang et al. 2006].
39:33
5.4.2. Heat-Recirculation-Aware Scheduling for Microprocessors. As mentioned in Section

3.4, the top cores in 3D architecture are worst hit by induction and radiation when
the processor chips are stacked one over another. Changyun et al. [2008] proposed that
the processor cores in 3D stacks gain heat through convection from the neighboring
cores. The amount of heat conduction contribution by any core depends on the power
used by that core at any instance. Changyun et al. [2008] allocated a thermal grade
to each core of 3D processor according to core location and current load over it. DVFS
was applied on a per-core basis and not globally to avoid thermal hotspots. We included
this in Section 4.1 as well. This technique improves the throughput and avoids more
hotspots than a simple distributed DVFS technique.
5.5. Optimized Thermal-Aware Scheduling
Thermal-aware scheduling algorithms reach more complexity to be optimized in saving

electricity. The optimized techniques also use prediction and estimation to schedule
workload proactively. These algorithms use a blend of the basic thermal-aware and
heat-recirculation-aware techniques.
5.5.1. Server-Based Optimized Thermal-Aware Scheduling. Tang et al. [2007] suggested a
genetic algorithm (XInt) for proactive scheduling with the fitness function to minimize
the peak inlet temperature of the servers due to heat recirculation. According to the
proposed approach, the heat recirculation can be minimized if tasks are allocated to
servers that take the least part in heat recirculation. XInt keeps the heat recirculation
5 C lower than MCE of Tang et al. [2006] because MCE allocates more tasks to servers
closer to perforated floor tiles because the inlet air temperature of servers is lowest
near the floor. The nodes near the top of the racks are the least contributors to heat
recirculation because these are closer to the ceiling vent of the data center hall. By allocating more tasks on upper servers in racks, the heat recirculation can be minimized.
It was not specified that XInt considers the thermal profiles as in Qinghui et al. [2006]
or that XInt requires CFD simulation to evaluate each of the candidate solutions. Of
all the algorithms compared by Tang et al. [2007], XInt provides the least consumption
of cooling energy. Therefore, it can provide the least PUE value.
As explained in the heat recirculation section, Qinghui et al. [2006] proposed to store
the coefficients of heat recirculation and heat removal for each node in the form of
a matrix and to use this matrix as a thermal profile. Further, as noted by Qinghui
et al. [2006], the servers placed at the top of the racks are the worst victims of heat
recirculation from the servers below.
If less workload is placed on servers placed near the data center floor, then, although
heat recirculation can be minimized, these servers will be underutilized. Suppose the
servers on top of the racks are full to their resource limits and cannot handle any
new jobs due to current job execution. In the case when the lower servers are assigned
computational tasks, there will be an increase in the heat recirculation, and as a result,
the inlet temperatures of servers near the top will rise. By the law of thermodynamics,
this will increase their hot air outlet temperatures and may cause a thermal threshold
violation.
Qinghui et al. [2008] proposed a proactive thermal-aware scheduling technique with
the consideration of minimizing the maximum inlet temperature of servers. The thermodynamics model and the heat recirculation coefficient matrix of Qinghui et al. [2006]
were used to find the optimum workload distribution that results in a minimum increase in cold air inlets. The temperature variations after workload placement through
CFD simulations were noted. The algorithm was implemented as a genetic algorithm.
The performance was compared with MCE and MinHR in terms of cooling costs calculated as in Equation (12). XInt and MinHR allocate more workload to servers near
39:34
the top of the racks, as these are the least contributors to data-center-wide heat recirculation. MCE allocates more tasks to the servers near the floor, which have lower
inlet temperatures. MCE, as a result, shows a higher heat recirculation than XInt and
MinHR for obvious reasons. However, XInt also suggests how much to raise the CRAC
unit air temperature. Raising the CRAC thermostat set temperature may save cooling
power, but this is out of the scope of this article. XInt performs better than MinHR
since it uses the genetic algorithm optimization to select the best placement of jobs by
considering more than one solution. As compared to power-budgeting MinHR, which
does not use the heat recirculation matrix, XInt variants use the heat recirculation
matrix and provide the least PUE when cooling costs are compared. The complexity
of implementation of genetic algorithms and the time required to generate a solution
might be large. This reduces their implementation in dynamic scenarios such as IaaS
where the resource utilization of the VM may change at any time.
Mukherjee et al. [2009] optimized the thermal-aware scheduling algorithms related
to time of execution and location of execution. First Come First Serve (FCFS) and Ending Deadline First (EDF) of each job were used as the scheduling preference for time
of execution. To select the location, the backfilling and First Fit (FF) backfilling with
FCFS were used. Two types of server rankings were used on the basis of the contribution of each server in recirculation (or the victor level of each server) and the effect
of heat recirculation on each server (or the victimization level of each server). Server
location selection for each job execution was done on the basis of lowest victor ranking
to bring Least Recirculation of Heat (LRH). The victor ranking is the same as the HRF
ranking of Moore et al. [2005], with the exception that the victor ranking is not static
and is recomputed for every job-batch deployment. The optimization consisted of the
application of FCFS with backfilling-LRH and EDF with LRH. The FCFS-backfilling
was further optimized with a genetic algorithm that chooses server location to place
the jobs at hand in order to minimize the increase in heat recirculation.
Mukherjee et al. [2009] considered the minimization of heat recirculation as a
minimization of peak inlet air temperatures of servers. Further optimization of the
spatiotemporal algorithms was performed by turning off the idle servers. The best
energy-saving algorithm in their simulation run was EDF-LRH and FSFC with genetic
algorithm. The backfilling algorithms consume more power. The power is calculated by
dividing the computing power by COP. The backfilling algorithms require more cooling
as the servers run under full utilization will increase the heat dissipated. All the algorithms perform better with the idle servers turned off. The EDF-LRH with idle servers
turned off is the most energy-saving and lightweight algorithm. However, a comparison
of algorithms that do not use the genetic algorithm (GA) for task placement and the
algorithms that use the GA shows that the latter is the most energy efficient. However,
this is subject to the time complexity involved in GA-based algorithms, which reduces
their vast implementation. For example, the 3D matrix-based GA scheduler may take
up to three or more times time to compute the job placement than a 2D matrix-based
GA scheduler. The GA-based workload schedulers give the lowest PUE compared to
heat-recirculation-based or basic thermal-aware schedulers.
HP Laboratories [Bash and Forman 2007a; Yuan et al. 2010] has done implementation work on improving data center cooling management and thermal-aware workload
placement. Workload prediction is done periodically for multitiered web applications
based on the knowledge of resource utilization at each tier of application in Yuan et al.
[2010]. However, in Bash and Forman [2007a], the execution time of all jobs is already
known. Yuan et al. [2010] extended the application-level utilization target prediction
model of Zhikui et al. [2009] by adding a live migration feature when the predicted resource requirements of all applications running on all VMs on a node exceed the node
capacity. Dynamic Smart Cooling (DSC) was also used for a cooling-control integrated
39:35
proactive workload scheduler. Depending solely on the web request arrival prediction
may introduce multiple levels of error. With spikes in web requests, the predictor module may fail and thus cause a decline in performance because the computing resources
are provisioned to VM on the basis of predictions.
Unlike Moore et al. [2005], who used the heat-circulation-based server ranking, both
Yuan et al. [2010] and Bash and Forman [2007a] used an optimized criterion for server
ranking. The Local Workload Placement Index (LWPI) [Bash and Forman 2007b] was
used to rank the servers for workload placement. LWPI is the sum of three values.
The first is the thermal management margin, which is the difference between the air
temperature of the server inlet and the vendor-specified thermal threshold. The second
is the air conditioning margin for each node calculated on the basis of the cooling
effect of each CRAC unit reaching that node. The third term is the heat recirculation
at each node, which is the difference between temperatures of cold air supplied from
the floor vents and the cold air actually reaching each server. The LWPI value is
used to place workload in an integrated scheduler for both workload and cooling. The
cooling can be increased for selected servers through DSC [Bash et al. 2006] when the
scheduler puts maximum workload on minimum servers through a genetic algorithm,
whereas in Bash and Forman [2007a], the longest jobs are scheduled on the coolest
servers. Idle servers are turned off and cooling is shut down [Bash and Forman 2007a;
Yuan et al. 2010] from the zonal CRAC unit. Both Yuan et al. [2010] and Bash and
Forman [2007a] require prediction-based or test-run-based knowledge of job execution
statistics. Without prior knowledge of the arriving workload, these two approaches will
just be scheduling workload on the basis of thermal ranking of servers in data center.
Pakbaznia et al. [2010] extended the heat recirculation model of Qinghui et al. [2008]
by considering that the heat recirculation calculation should represent each server by
the chassis index. The workload distribution was considered as an optimization problem to minimize the number of active servers and the inlet temperatures. Another
optimization performed was to put the idle servers in a hibernation state instead of
turning them off. Unlike Tang et al. [2006], a chassis was powered off only when all
the servers were in a hibernation state. This approach can save idle electricity and
lower the PUE. But the actual cost savings are perhaps through delaying the powering off of chassis. So there will always be some servers available to dynamically
shift the workload in case of hotspots, and the cooling load can easily be minimized.
A workload arrival prediction module based on the past 1-week request arrival pattern was used to calculate the number of active servers required in each round of
scheduling. When scheduling, those servers that have a minimal effect on increasing
the heat-recirculation-based inlet temperature hike are preferred. The servers with
high inlet temperature in any round are least ranked for the next round of scheduling.
The workload placement, server selection, and minimization of the increase in inlet
temperature were considered in a single optimization problem. This is different from
Parolini et al. [2012], where the inlet temperature consideration is included in the
CRAC outlet temperature setting optimization problem.
5.5.2. Optimized Thermal-Aware Scheduling for Microprocessors. An approach about maintaining a constant execution rate at the microprocessor is more efficient in terms of
total workload execution and maintaining steady core temperature than a zig-zag
scheme [Rajan and Yu 2007, 2008]. The former scheme was called system throttling.
The zig-zag scheme executes the workload at maximum speed until the maximum
temperature threshold is reached; after that, the speed is minimized to allow the processor to cool down. The proposed technique is based on the fact that there exists an
optimized zig-zag scheme that executes more workload than a constant-speed scheme.
Their optimized scheme executes different jobs with different priorities at different
39:36
discrete processor speeds. The single-core processor was considered in their proposed
scheme on which jobs are preempted in such a way that high-priority jobs are executed
at high speed, which heats up the processor till the maximum temperature threshold,
followed by executing a low-priority job at a lower optimized speed to cool down the
processor while still doing the maximum possible execution. If the priority of a job is
known, then it means the job is at a higher abstraction level than the microprocessor.
In a data center environment with many servers available, the job execution policy can
consider choosing a server and then perform the job execution on a microprocessor.
Jian-Jia et al. [2007] used the n-approximation algorithm of Vazirani [2001] to extend
the Ending Deadline First (EDF) algorithm of Aydi et al. [2001]. Jian-Jia et al. [2007]
experimented with the approximation level of minimization of maximum temperature.
The approximation level was applied to multiprocessors by considering Largest Task
First (LTF) scheduling. The results showed that the EDF algorithms are more efficient
in terms of energy consumption and temperature for single-processor systems. For
multiprocessor systems, it was proved that the LTF algorithm is more energy and
temperature efficient than the backfilling algorithms. This is due to the high and
continuous utilization of cores by the backfilling scheduling algorithms. DVS was used
to control temperature. It was also proved that with the DVS-based n-approximation
scheduling algorithm, the approximation bound is 2.719 for a single processor with
discrete voltage levels. The approximation bound is 3.072 for multiprocessors residing
on the same chip and 6.444 for multiprocessors residing on different chips respectively
under the LTF algorithm. Lin and Gang [2007] applied EDF scheduling by executing
a task at the minimal voltage possible to finish the execution before the deadline.
This helps in reducing the overall energy consumption, lowering the temperature, and
improving the task throughput due to less invocation of DVFS. Implementing an LTF
requires knowing the length of the task and possibly the priority of the task. Suppose
that an email service has multiple tasks and the most prioritized task is the user login.
Whenever a user tries to log in and another user who is already logged in wants to
read an email. Suppose that bother requests arrived at the email server in parallel.
Considering that the user login requires authentication, the login task is larger than
rendering an email, and therefore, login should be completed first and prioritized.
However, in case there are several login requests, then the microprocessor may get
busy performing authorizations and the users who want to read emails might have to
wait indefinitely, whereas the microprocessor may experience hotspots as it might be
undergoing the backfilling of login requests.
A microprocessor scheduling approach considering multiple dimensions was proposed in Coskun et al. [2008a], where the reliability factor is also considered in
addition to hotspots and performance at the time of scheduling. Reliability refers to
temporal temperature variations (thermal cycles) and spatial temperature among
different cores across multiprocessors (thermal gradient). On the basis of these factors,
the algorithm finds a policy with a minimum loss of performance and reliability of
the processor. This technique reduces the hotspots and is further developed by Zanini
et al. [2009] to apply the thermal policies to avoid hotspots at runtime when the temperature reaches a maximum threshold. Control theory was applied to find optimum
frequencies for each core by considering the tradeoff between an increase in hotspots
due to power density and the power savings for MPSoC multiprocessors and to reduce
the task waiting time. Their technique performs better than global and distributed
DVFS. With multiple thermal policies available, the SLA implementation can be
ensured. These policies can be applied on the basis of user priority and QoS assurance.
For this matter, there should be a cooperation between the application layer such
as a cloud service and the microprocessor workload engine. It is important that the
microprocessor-based thermal policies should consider the inlet temperature and/or
39:37
the application-level request arrival prediction. Otherwise, the thermal policies have
limited utilization. Also, if there are heterogeneous servers, then the thermal policies
of one type of microprocessor cannot be applied to another type of microprocessor.
Coskun et al. [2007] proposed an optimized use of DTM techniques. An adaptive
thermal-aware grading policy was proposed for the scheduler. By adaptively grading
the cores with the coolest neighborhood are prioritized for workload allocation. The
analysis showed that using the reactive DTM techniques such as DVS and thread
migration along with an adaptive scheduler reduces the thermal cycles and thermal gradient across the cores more than the scheduling techniques that prefer the
coolest cores and the task queue length-based load balancing techniques. This work
was extended in Coskun et al. [2008d] to propose an ILP-based spatial-thermal-aware
scheduling approach. This approach minimizes the time spent above the maximum
thermal threshold level by using DVS for tasks with known thermal profiles at various
clock frequencies. To reduce the thermal gradient, their technique allocates workload
to cores with no load in neighboring cores. The ILP objective function used by Coskun et
al. [2008d] has two parts: the first is to minimize and balance the hotspots by minimizing the maximum time spent over the thermal threshold; the second is to minimize the
thermal-spatial gradient across the chip through minimization of workload scheduling
on neighboring cores when possible. Coskun et al. [2008e] combined this ILP technique
with Coskun et al. [2007], which allocates the tasks to the coolest cores. There is an
updated technique in Coskun et al. [2008e], which uses adaptive random sampling
of processor cores. Through adaptive sampling, the cores are evaluated to increase or
decrease the probability for workload assignment. This ranking is similar to increasing
or decreasing the priority of any core on the basis of recent thermal history. This resulted in overall better performance than allocating tasks to the coolest nodes without
considering spatial-thermal efficiency. There were fewer hotspots and reduced thermal
variation across the chip in terms of time and space. The adaptive sampling can be
used to grade the servers as well. This can create a hierarchical grading mechanism
for servers and microprocessors. However, the adaptive sampling at the server level
may not be able to detect or avoid the hotspots during a long adaptive period.
Chantem et al. [2008] proposed a scheduling technique that considers spatial, thermal, and temporal characteristics to schedule hard real-time jobs with chip temperature reduction. This scheduling technique makes sure that the real-time jobs are
scheduled when all the prerequisite jobs are finished. If possible, the job is scheduled
on the fastest processor to complete execution before the task deadline. The task execution is performed with the consideration that the temperature rise is lowered. For
this, an accurate temperature profiling was required for each job, which depends on
the duration of task execution time. For longer tasks, their technique failed to calculate the thermal profile. This shortfall was overcome by using phased steady-state
analysis for longer tasks and transient analysis for shorter tasks. Their scheduling
technique improves the thermal efficiency more than energy optimal techniques, and
the improved temperature estimation technique has higher thermal efficiency than
their Mixed Integer Linear Programming (MILP)-based technique.
5.6. Discussion and Comparison of Thermal-Aware Scheduling Techniques
The thermal-aware scheduler relies on monitoring and profiling to get accurate and
timely information of the data-center-wide thermal map. These are compared in
Table III. As explained before, the scheduling is offline if the monitoring is done to
detect the thermal anomalies when the scheduling is performed by using static thermal profiles. Otherwise, the scheduling is online when a live thermal map is used.
Table III shows the comparison of various types of monitoring. The more practical
is the use of thermal sensors. These are more accurate, affordable, and faster than
39:38
Table IV. Comparison of Thermal-Aware Reactive and Proactive Scheduling
Consideration
Thermal monitoring
Energy saving
Time consuming
Thermal anomalies
Complexity
Plan
Workload migration
DTM invocation
Reactive Scheduling
Current temperature
Less
Less
Higher than proactive
Simple
Anomaly detection
Possible
More than proactive
Proactive Scheduling
Current temperature,predicted temperature
Offline, online
High
High
Lower than reactive
Complex
Anomaly avoidance
Not likely
Less than reactive
manual, thermal cameras, and simulators, respectively. The DTM techniques used are
not limited for each type of scheduling. In Table II, the techniques used by different
researchers are shown.
There are many types of DTM. The choice between reactive and proactive scheduling
decides the use of DTM. Either the DTM techniques are applied after the occurrence
of thermal anomalies (reactive scheduling) or these techniques are used during the
scheduling (proactive scheduling). The proactive scheduler has to be designed to minimize the occurrence of thermal anomalies before occurrence. Otherwise, the scheduler
will be acting like a reactive scheduler. The modern-day servers come with the processors having DTM techniques that are triggered as soon as the temperature threshold
is violated. The importance of the heat model becomes evident if one considers the
environment variables such as air temperature. The hotter the inlet air of the server,
the greater are the chances of DTM invocation.
The DTM invoked at the processor level may not be known by the data-center-level
scheduler. These embedded DTM routines are local to each server and can slow down
the performance in order to ensure equipment safety. Similarly, the operating-systemcontrolled thermal management techniques are local to each server.
With the thermal-aware workload scheduling in data centers discussed in this article,
the choices are vast after selecting the heat modeling and monitoring. A fundamental
choice has to be made between reactive and proactive scheduling. This choice either
simplifies the monitoring or makes it complex. Table IV shows the comparison between
thermal-aware reactive and proactive scheduling. The scheduling techniques are generalized in Table IV, and a more detailed elaboration of the scheduling techniques is
provided in Table V.
The data-center-wide thermal-aware scheduler has to make decisions for scheduling the workload across all servers. The thermal-aware scheduling can be performed
hierarchically from data center wide to server to microprocessor to per core by using
similar concepts such as heat transfer, heat recirculation, heat model, thermal-aware
monitoring and profiling, thermal-aware predictions, thermal-aware scheduling, and
DTM techniques. Although the implementation of these concepts is different at the
local and global levels, a data-center-wide thermal-aware scheduler should implement
these modules to get the full benefit of energy savings through complete thermal awareness. As discussed in the related section, more energy is saved by proactive scheduling.
Table V shows the comparison of different thermal-aware scheduling techniques reviewed in this article. The more complex the thermal-aware scheduling logic is, the
more energy saving it is.
Regarding the energy savings, the basic thermal-aware algorithms are the least
ranked and the optimized algorithms are the top ranked. But interestingly, if the results
are compared across different research works that implement the same algorithms,
39:39
Table V. Comparison of Thermal-Aware Scheduling Techniques
Proactive
scheduling
Basic Thermal-Aware
Scheduling
Minimum computer
energy, uniform task, hot
job before cool job, hottest
job on coldest core/server,
job backfilling
Uniform outlet, power
budget allocation
Thermal
awareness
Task/job-based
thermal/power profiles
DTM
techniques
VM/task migration, VM
pinning, clock gating,
DVFS
More
Reactive
scheduling
DTM invocation
frequency
Consideration
of
environmental
variables
Minimization of
hotspots
Consideration
of nonthermal
parameters
Heat
recirculation
control
Resource
utilization
Energy
efficiency
Heat-RecirculationAware
Scheduling
HRF-based power
allocation
Optimized
Thermal-Aware
Scheduling
LWPI-based scheduling,
thermal policies
Heat-recirculationcoefficient-based
minimization of peak
inlet temperature
Multidimensional bin
packing, resource
requirement prediction,
processor throttling,
hierarchical scheduling
genetic algorithms
Sensor network, thermal
cameras, adaptive
sampling
Thread/task/VM
migration, DVFS
Location-based heat
recirculation/coefficient,
simulator
Voltage scaling
Less
Least
No
Yes
Yes
Least
Low
High
None
None
High
None
High
High
Less
Medium
High
Medium
Medium
High
there are some mismatches. For example, the MCE in Tang et al. [2006] performs better
than uniform task (UT) and uniform outlet profile (UOP). But MCE in Tang et al. [2007]
and Qinghui et al. [2008] perform worse than UT and UOP. However, the performance
of MinHR in Moore et al. [2005] and Qinghui et al. [2008] remained better than UT
and UOP. This is perhaps due to the selection of different racks/pods/chassis/servers for
each implementation. Also, the CFD simulation-based results of Tang et al. [2007], Tang
et al. [2006], and Qinghui et al. [2008] are different from the real-world implementation
of Mukherjee et al. [2007]. This shows that the actual implementation results can be
more unexpected than the simulation-based test runs. Nevertheless, the basic thermalaware algorithms are fast in terms of time for deployment, but these have a higher PUE
than the heat-recirculation-based and optimized thermal-aware schedulers. Therefore,
when estimating the TCO, the data center management should plan for a long-term
implementation of a certain type of thermal-aware scheduler reviewed in this article.
For virtualized data centers providing cloud services, the unit of scheduling is VM to
provision various cloud services. The thermal-aware scheduling can be implemented
just as done by Yuan et al. [2010] through DSC. But cooling is not optimized with respect
39:40
to dynamic provisioning and elasticity of virtualized resources as per the definition of

Mell and Grance [2011]. Even ignoring the elasticity and scalability of the public
cloud as per the definition of Garg and Buyya [2011], the full virtualization limits the
benefits of operating-system-invoked DTM and scheduling techniques, except that set
for hypervisor [VMware 2010], as it can be a security threat to allow a virtual machine
to manipulate the logical thread attached with each virtual processor of VMs.
Nevertheless, the VM can be scheduled by using pinning and frequency scaling
as DTM techniques before invoking VM migration. Older hypervisors such as VMware
ESXi 5.0 [VMware 2011] allowed VM migration only between compatible hosts with the
help of shared storage. So in case of heterogeneous hosts, the thermal-aware scheduling
was to be used proactively with frequency scaling as a DTM technique. But in the new
version, the VMs are not required to be placed on shared storage and can be migrated
across different servers [VMware 2012]. This is nearly like task migration across different cores of microprocessors. Thus, the hierarchical thermal-aware scheduling can
be applied to virtualized environments just like nonvirtualized data centers.
For the purpose of demonstration, a scenario is presented for a SaaS provider. Consider that the data center is providing the service and there is no intermediate IaaS
provider. The supposed size of the data center is medium with around 200 racks. Certain decisions have to be made for the implementation of the thermal-aware scheduler.
The first step is considering the level of scheduling, either server level or microprocessor level. Suppose the hierarchical thermal-aware scheduler is the choice. The second
step is the choice of heat model. Table VI can be consulted at this level to choose a
heat model. Suppose the RC model is selected for server-level thermal-aware scheduling. The next step should be to consult Table IV for selecting the reactive or proactive
scheduling. Reactive scheduling provides fast processing and minimum response time,
and therefore can be considered for the implementation.
The reactive scheduling faces a diverse scenario of workload and the unpredictable
thermal gradient across the data center. Therefore, the thermal sensor nodes such as
wireless external thermal sensors and onboard thermal sensors are required. The next
step should be to identify the microprocessor-level scheduling, but before proceeding, it
is assumed that the SaaS will be implemented through multiple VMs. The microprocessor can use the thermodynamic model or RC model for task scheduling. The final step
is to identify the DTM techniques to be employed. DVFS and workload migration are
the DTM techniques that can be applied to microprocessors and VMs. At this step, the
QoS and PUE issues are considered. Since the thermal-aware scheduling provides cost
savings in cooling and computing, the QoS assurance for thermal-aware scheduling is
covered in a previous section.
Going into detail of thermal-aware scheduling at the server level, Table V provides
various options for reactive scheduling. If heat recirculation is not considered, then
uniform task allocation might be the best choice for thermal-aware VM placement. The
VM can be deployed with the coolest inlet preference. This means that for every SaaS
deployment, either each set of VMs is placed over a separate server or the individual
VM can be distributed among servers in a round-robin fashion. The round-robin distribution of individual VMs will require as many servers as there are VMs in a SaaS
deployment. Therefore, using MCE along a uniform task will limit the maximum number of active servers. Although it is assumed that the total active servers will be (n + 1),
such a reactive thermal-aware scheduler can avoid hotspots through DTM techniques.
There is a lot of room for future research work that can improve this hypothetical
scheduler. There can be a proactive scheduler at the server level with a prediction module to predict the future load of the SaaS deployment. There can be a microprocessorlevel workload predictor. Alternatively, a single prediction module can predict the
workload for servers and microprocessors. There can be different heat models for
39:41
servers and microprocessors. There can even be a thermal predictor that predicts the
thermal behavior of the servers by following the thermal network model. At different
instances, there can be different SLA assurance policies for thermal-aware schedulers.
Every change in the thermal-aware scheduler architecture will affect the SaaS cost
model.
5.7. Lowering of PUE and TCO Through Thermal-Aware Scheduling
As demonstrated in Table V, the various thermal-aware scheduling techniques use

a diverse set of thermal preferences for decision making. The objective of all these
techniques is to increase the energy efficiency of the data center. In other words, every
type of thermal-aware scheduling tries to minimize the PUE. The lowering of the PUE
means a lower usage of energy and savings in running costs of the data center. In case
of a commercial data center that provides the cloud services, the reduction in running
costs results in lower costs and high profits.
But a reduced PUE may not necessarily bring down the TCO in the long run. Considering the TCO calculator proposed by Koomey et al. [2007], the power consumption
accounts for 10% of the cost of the data center and a lower PUE can reduce the TCO,
but does it ensure hardware reliability? Thermal-aware scheduling tries to minimize
the overall cooling load by carefully managing the workload, and if this involves powering off idle chassis and servers, then it may result in continuous use of a few servers
and lead to hardware failures. The IT infrastructure costs cover up to 35% of the total
data center costs [Koomey et al. 2007], and therefore, the loss of a server is not desirable. As pointed out previously, the optimized thermal-aware scheduling provides
the least PUE and the basic thermal-aware scheduling provides the highest PUE. If
the powering down of idle servers is not considered, then the smallest TCO is possible
through optimized thermal-aware scheduling. In order to calculate the PUE, through
Equation (12), the COP value should be adjusted according to the inlet temperature at
each server, which can be higher than the set temperature of the CRAC unit [Moore
et al. 2005].
In short, the potential cost benefits associated with thermal-aware scheduling are
hardware safety, cooling cost savings, computing cost savings, decreased PUE, and
overall decreased TCO. The basic thermal-aware scheduling is the least cost saving
and the optimized thermal-aware scheduling is the most beneficial. However, considering the throughput as the metrics of revenue earning, the basic thermal-aware
scheduling is the most revenue picker than the optimized thermal-aware scheduling
(in terms of time taken during scheduling). Similarly, the thermal monitoring system
implementation is the most expensive in case of thermal cameras, but thermal sensors
may be cheaper comparatively. However, the bulk of wireless sensor nodes may reach
the thermal camera cost. But overall ease of use and integration makes the thermal
sensors a good choice to lower the TCO. Of course, all these benefits are subject to
the tradeoff between the QoS assurance versus the PUE and the economic analysis of
return on investment in a data center.
6. CONCLUSION
The dominant part of electricity usage in data centers is in the mechanisms of computing servers and cooling. Servers inside a data center require a constant supply of
cold air from an on-site cooling mechanism for reliability. The servers dissipate more
heat with the increase in computational workload. This heat invokes the cooling mechanism to remove the heat and to maintain the cold air temperature for the data center
servers. Thermal-aware scheduling is the scheduling of computational workload with
the objective of lowering the heat emitted by servers and thus reducing the cooling
39:42
Table VI. Metrics to Evaluate Thermal Awareness in Green Data Centers
load. Thermal-aware scheduling is different from computing power-saving scheduling,

which emphasizes minimizing the number of active servers and thus increasing the
power density in a small area inside the data center. As a result, the cooling mechanism
boosts up working to cool down the hotspot. This may also shorten the life of servers,
resulting in hardware failure. In thermal-aware scheduling, the computing power is
saved within the scope of equipment reliability, minimizing hotspots and the thermal
gradient. It saves not only computing energy but also cooling energy and at the same
time ensures equipment life, reliability, and safety.
The metrics presented in Table VI provide an overview of the thermal-aware scheduling techniques reviewed in this article. Table VI can be used to identify the thermalaware scheduling techniques. A thermal-aware scheduler can be designed and/or
evaluated on the basis of these metrics. Thermal-aware scheduling and the related
techniques of heat modeling and thermal-aware monitoring and profiling span from
microprocessor to data center wide. It can be applied to virtualized and nonvirtualized
environments.
This article is the first survey of its kind that combines the thermal-aware techniques
of microprocessors and data centers to provide a solid base for establishing green data
centers. This survey is useful for readers who want to gain knowledge and develop and
evaluate the energy-efficient thermal-aware scheduling techniques. The techniques
evaluated and reviewed in this survey highlight the advantages and tradeoffs of each
technique. Taking guidance from the comparison tables, a thermal-aware scheduler
can be implemented and evaluated. In addition to that, the QoS assurance and TCO
were also discussed with respect to thermal-aware scheduling for green data centers.
This survey provides the foundations for future work such as creating a hierarchical
thermal-aware scheduler that spans from racks to microprocessors, a thermal model
that can be formulated to quantify heat for servers and microprocessors both and not
individually, QoS assurance for thermal-aware scheduling, cost modeling for thermalaware cloud computing, PUE versus QoS analysis to lower the TCO, and capacity
planning for new data centers.
REFERENCES
Adaptive Computing. 2015a. Moab HPC Suite. Retrieved from http://www.adaptivecomputing.com/products/
hpc-products/moab-hpc-suite-grid-option/.
Adaptive Computing. 2015b. TORQUE Resource Manager. Retrieved from http://www.adaptivecomputing.
com/products/open-source/torque/.
39:43
N. Ahuja, C. Rego, S. Ahuja, M. Warner, and A. Docca. 2011. Data center efficiency with higher ambient
temperatures and optimized cooling control. In Proceedings of the 2011 27th Annual IEEE Proceedings
of the Semiconductor Thermal Measurement and Management Symposium (SEMI-THERM11). 105
109.
N. Ahuja, C. W. Rego, S. Ahuja, S. Zhou, and S. Shrivastava. 2013. Real time monitoring and availability of server airflow for efficient data center cooling. In Proceedings of the 2013 29th Annual IEEE
Semiconductor Thermal Measurement and Management Symposium (SEMI-THERM13). 243247.
AMD. 1995. CPU Thermal Management. Retreived from http://datasheets.chipdb.org/AMD/486_5x86/
18448D.pdf.
AMD. 2009. ACPThe Truth about Power Consumption Starts Here. Retrieved from http://sites.amd.com/
us/Documents/43761C_ACP_WP_EE.pdf.
A. S. Arani. 2007. Online thermal-aware scheduling for multiple clock domain CMPs. In Proceedings of the
2007 IEEE International SOC Conference. 137140.
M. Arlitt, C. Bash, S. Blagodurov, Y. Chen, T. Christian, D. Gmach, et al. 2012. Towards the design and
operation of net-zero energy data centers. In Proceedings of the 2012 13th IEEE Intersociety Conference
on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm12). 552561.
ARS-Techniqa. 2008. NVIDIA Denies Rumors of Faulty Chips, Mass GPU Failures. Retrieved from http://
arstechnica.com/hardware/news/2008/07/nvidia-denies-rumors-of-mass-gpu-failures.ars.
P. Artman, D. Moss, and G. Bennett. 2002. DellTM PowerEdgeTM 1650: Rack Impacts on Cooling
for High Density Servers. Retrieved from http://www.dell.com/downloads/global/products/pedge/en/
rack_coolingdense.doc.
ASHRAE-TC-9.9. 2011. 2011 Thermal Guidelines for Data Processing EnvironmentsExpanded Data Center Classes and Usage Guidance. Retrieved from http://www.eni.com/green-data-center/it_IT/static/pdf/
ASHRAE_1.pdf.
H. Aydin, R. Melhem, D. Mosse, and P. Meja-Alvarez. 2001. Dynamic and aggressive scheduling techniques
for power-aware real-time systems. In Proceedings of the 22nd IEEE Real-Time Systems Symposium.
95.
B. Baikie and L. Hosman. 2011. Green cloud computing in developing regions: Moving data and processing
closer to the end user. In Proceedings of the Telecom World (ITU WT), 2011 Technical Symposium at ITU.
2428.
A. Banerjee, T. Mukherjee, G. Varsamopoulos, and S. K. S. Gupta. 2010. Cooling-aware and thermal-aware
workload placement for green HPC data centers. In Proceedings of the International Green Computing
Conference. 245256.
A. Banerjee, T. Mukherjee, G. Varsamopoulos, and S. K. S. Gupta. 2011. Integrating cooling awareness with
thermal aware workload placement for HPC data centers. Sustainable Computing: Informatics and
Systems 1, 2 (2011), 134150.
L. A. Barroso and U. Holzle. 2007. The case for energy-proportional computing. Computer 40, 12 (2007),
3337.
C. Bash and G. Forman. 2007a. Cool job allocation: Measuring the power savings of placing jobs at coolingefficient locations in the data center. HP Laboratories Technical Reports (HPL-2007-62). Retrieved from
http://www.hpl.hp.com/techreports/2007/HPL-2007-62.pdf.
C. Bash and G. Forman. 2007b. Data center workload placement for energy efficiency. ASME Conference
Proceedings 2007, 42770 (2007), 733741.
C. E. Bash, C. D. Patel, and R. K. Sharma. 2006. Dynamic thermal management of air cooled data centers. In
Proceedings of the Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic
Systems (ITHERM06).
Y. Bo, J. Kephart, H. Hamann, and S. Barabasi. 2011. Hotspot diagnosis on logical level. In Proceedings of
the 2011 7th International Conference on Network and Service Management (CNSM11). 15.
D. Bruneo. 2014. A stochastic model to investigate data center performance and QoS in IaaS cloud computing
systems. IEEE Transactions on Parallel and Distributed Systems 25, 3 (2014), 560569.
K. W. Cameron. 2010. The challenges of energy-proportional computing. Computer 43, 5 (2010), 8283.
Z. Changyun, G. Zhenyu, S. Li, R. P. Dick, and R. Joseph. 2008. Three-dimensional chip-multiprocessor
run-time thermal management. IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems 27, 8 (2008), 14791492.
T. Chantem, R. P. Dick, and X. S. Hu. 2008. Temperature-aware scheduling and assignment for hard realtime applications on MPSoCs. In Proceedings of the Design, Automation and Test in Europe (DATE08).
288293.
39:44
T. Chantem, X. S. Hu, and R. P. Dick. 2009. Online work maximization under a peak temperature constraint.
In Proceedings of the 14th ACM/IEEE International Symposium on Low Power Electronics and Design.
105110.
M. T. Chaudhry, T. C. Ling, S. A. Hussain, and A. Manzoor. 2014. Minimizing thermal-stress for data center
servers through thermal-aware relocation. Scientific World Journal 2014, (2014).
H. Chen, M. Song, J. Song, A. Gavrilovska, and K. Schwan. 2011. HEaRS: A hierarchical energy-aware
resource scheduler for virtualized data centers. In Proceedings of the 2011 IEEE International Conference
on Cluster Computing. 508512.
J. Choi, C.-Y. Cher, H. Franke, H. Hamann, A. Weger, and P. Bose. 2007. Thermal-aware task scheduling at
the system software level. In Proceedings of the International Symposium on Low Power Electronics and
Design. 213218.
C. Y. Chong, S. P. Lee, and T. C. Ling. 2014. Prioritizing and fulfilling quality attributes for virtual lab development through application of fuzzy analytic hierarchy process and software development guidelines.
Malaysian Journal of Computer Science 27, 1 (2014).
S. W. Chung and K. Skadron. 2006. A novel software solution for localized thermal problems. In Proceedings
of the 4th International Conference on Parallel and Distributed Processing and Applications. 6374.
A. K. Coskun, J. L. Ayala, D. Atienza, T. S. Rosing, and Y. Leblebici. 2009a. Dynamic thermal management
in 3D multicore architectures. In Proceedings of the Design, Automation & Test in Europe Conference &
Exhibition (DATE09). 14101415.
A. K. Coskun, T. S. Rosing, and K. Whisnant. 2007. Temperature aware task scheduling in MPSoCs. In
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE07). 16.
A. K. Coskun, T. S. Rosing, and K. C. Gross. 2008a. Temperature management in multiprocessor SoCs using
online learning. In Proceedings of the 45th Annual Design Automation Conference. 890893.
A. K. Coskun, T. S. Rosing, and K. C. Gross. 2008b. Proactive temperature balancing for low cost thermal
management in MPSoCs. In Proceedings of the IEEE/ACM International Conference on Computer-Aided
Design. 250257.
A. K. Coskun, T. S. Rosing, and K. C. Gross. 2008c. Proactive temperature management in MPSoCs. In
Proceedings of the 13th International Symposium on Low Power Electronics and Design. Bangalore,
India. 165170.
A. K. Coskun, T. S. Rosing, K. A. Whisnant, and K. C. Gross. 2008d. Temperature-aware MPSoC scheduling
for reducing hot spots and gradients. In Proceedings of the Asia and South Pacific Design Automation
Conference. 4954.
A. K. Coskun, T. S. Rosing, K. A. Whisnant, and K. C. Gross. 2008e. Static and dynamic temperature-aware
scheduling for multiprocessor SoCs. IEEE Transactions on Very Large Scale Integrates Systems 16, 9
(2008e), 11271140.
A. K. Coskun, T. S. Rosing, and K. C. Gross. 2009b. Utilizing predictors for efficient thermal management
in multiprocessor SoCs. Transactions on Computer-Aided Design of Integrated Circuit Systems 28, 10
(2009b), 15031516.
J. Donald and M. Martonosi. 2006. Techniques for multicore thermal management: Classification and new
exploration. SIGARCH Computer Architecture News 34, 2 (2006), 7888.
EE-Times. 2008. The Truth about Last Years Xbox 360 Recall. Retrieved from http://www.eetimes.com/
electronicsnews/4077187/The-truth-about-last-year-s-Xbox-360-recall.
E. Pakbaznia, M. Ghasemazar, and M. Pedram. 2010. Temperature-aware dynamic resource provisioning in
a power-optimized datacenter. In Proceedings of the Design, Automation & Test in Europe Conference &
Exhibition (DATE10).
ENERGY-STAR. 2010. Data Center Industry Leaders Reach Agreement on Guiding Principles for Energy
Efficiency Metrics. Retrieved from http://www.energystar.gov/ia/partners/prod_development/downloads/
DataCenters_AgreementGuidingPrinciples.pdf?262a-86ba.
Energy Star EPA. 2007. Report to Congress on Server and Data Center Energy Efficiency Public
Law 109-431. Retrieved from http://www.energystar.gov/ia/partners/prod_development/downloads/EPA_
Datacenter_Report_Congress_Final1.pdf?6133-414f.
X. Fan, W.-D. Weber, and L. A. Barroso. 2007. Power provisioning for a warehouse-sized computer. SIGARCH
Computer Architecture News 35, 2 (2007), 1323.
E. Frachtenberg, D. Lee, M. Magarelli, V. Mulay, and J. Park. 2012. Thermal design in the open compute datacenter. In Proceedings of the 2012 13th IEEE Intersociety Conference on Thermal and Thermomechanical
Phenomena in Electronic Systems (ITherm12). 530538.
Freescale. 2008. Thermal Analysis of Semiconductor Systems. Retrieved from http://cache.freescale.
com/files/analog/doc/white_paper/BasicThermalWP.pdf.
39:45
S. K. Garg and R. Buyya. 2011. Green Cloud Computing and Environmental Sustainability. Retrieved from
http://www.cloudbus.org/papers/Cloud-EnvSustainability2011.pdf.
C. Gonzales and H. M. Wang. 2008. Thermal Design Considerations for Embedded Applications. Retrieved
from http://download.intel.com/design/intarch/papers/321055.pdf.
Google. 2012. Take a Walk Through a Google Data Center. Retrieved from http://www.google.com/about/
datacenters/inside/streetview/.
Greenpeace. 2011. How Dirty Is Your Data? A Look at the Energy Choices That Power Cloud Computing. Retrieved from http://www.greenpeace.org/international/Global/international/publications/climate/2011/
Cool%20IT/dirty-data-report-greenpeace.pdf.
E. M. Greitzer, Z. S. Spakovszky, and I. A. Waitz. 2008. Specific Heats: The Relation between Temperature Change and Heat. Retrieved from http://web.mit.edu/16.unified/www/FALL/thermodynamics/
notes/node18.html.
HotSpot. 2014. HotSpot 5.0. Retrieved from http://lava.cs.virginia.edu/HotSpot/.
S. Huck. 2011. Measuring Processor Power TDP vs. ACP. Retrieved from http://www.intel.com/content/dam/
doc/white-paper/resources-xeon-measuring-processor-power-paper.pdf.
J. Hwisung and M. Pedram. 2006. Stochastic dynamic thermal management: A markovian decision-based
approach. In Proceedings of the International Conference on Computer Design (ICCD06). 452457.
R
R
CoreTM i7-900 Desktop Processor Extreme Edition Series and Intel
CoreTM i7Intel. 2011. Intel
900 Desktop Processor Series on 32-nm Process. Retrieved from http://download.intel.com/design/
processor/designex/320837.pdf.
R
Processor Feature Filter. Retrieved from http://ark.intel.com/search/advanced?FamilyText
Intel. 2014. Intel
=Intel%C2%AE%20Xeon%C2%AE%20Processor%20E5%20v2%20Family.
J. Jaffari and M. Anis. 2008. Statistical thermal profile considering process variations: Analysis and applications. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27, 6 (2008),
10271040.
C. Jian-Jia, H. Chia-Mei, and K. Tei-Wei. 2007. On the minimization of the instantaneous temperature for
periodic real-time tasks. In Proceedings of the 13th IEEE Real Time and Embedded Technology and
Applications Symposium (RTAS07) 236248.
M. Jonas, G. Varsamopoulos, and S. Gupta. 2007. On developing a fast, cost-effective and non-invasive
method to derive data center thermal maps. In Proceedings of the 2007 IEEE International Conference
on Cluster Computing. 474475.
M. Jonas, G. Varsamopoulos, and S. K. S. Gupta. 2010. Non-invasive thermal modeling techniques using
ambient sensors for greening data centers. In Proceedings of the 2010 39th International Conference on
Proceedings of the Parallel Processing Workshops (ICPPW10). 453460.
Y. Jun, Z. Xiuyi, M. Chrobak, Z. Youtao, and J. Lingling. 2008. Dynamic thermal management through task
scheduling. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems
and Software (ISPASS08). 191201.
Q. Junmei, L. Li, L. Liang, T. Yuelong, and C. Jiming. 2013. Smart temperature monitoring for data center
energy efficiency. In Proceedings of the 2013 IEEE International Conference on Service Operations and
Logistics, and Informatics (SOLI13). 360365.
J. Kaiser, J. Bean, T. Harvey, M. Patterson, and J. Winiecki. 2011. Survey Results: Data Center Economizer Use. Retrieved from http://www.thegreengrid.org/Global/Content/white-papers/WP41SurveyResultsDataCenterEconomizerUse.
O. Khan and S. Kundu. 2008. A framework for predictive dynamic temperature management of microprocessor systems. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design.
258263.
K. Khankari. 2009. Rack enclosures a crucial link in airflow management in data centers. ASHRAE Journal
(Aug 2009), 48.
J. Kong, S. W. Chung, and K. Skadron. 2012. Recent thermal management techniques for microprocessors.
ACM Computer Surveys 44, 3 (2012), 142.
J. Koomey. 2008. Worldwide electricity used in data centers. Environmental Research Letters 3, 034008
(2008).
J. Koomey. 2011. Growth in Data Center Electricity Use 2005 to 2010. Analytics Press.
J. Koomey, K. Brill, P. Turner, J. Stanley, and B. Taylor. 2007. A Simple Model for Determining True Total
Cost of Ownership for Data Centers. Uptime Institute White Paper, Version 2, (2007).
J. Koomey, K. Brill, P. Turner, J. Stanley, and B. Taylor. 2008. A Simple Model for Determining True Total
Cost of Ownership for Data Centers. Uptime Institute 2.1 (2008).
39:46
A. Krause, C. Guestrin, A. Gupta, and J. Kleinberg. 2006. Near-optimal sensor placements: Maximizing
information while minimizing communication cost. In Proceedings of the 5th International Conference
on Information Processing in Sensor Networks (IPSN06). 210.
A. Kumar, L. Shang, L.-S. Peh, and N. K. Jha. 2006. HybDTM: A coordinated hardware-software approach
for dynamic thermal management. In Proceedings of the 2006 43rd ACM/IEEE Design Automation
Conference. 548553.
E. Kursun and C. Chen-Yong. 2008. Variation-aware thermal characterization and management of multicore architectures. In Proceedings of the IEEE International Conference on Computer Design (ICCD08).
280285.
R. Lavanya and V. Ramachandran. 2013. Cloud based video on demand model with performance enhancement. Malaysian Journal of Computer Science 24, 2 (2013).
E. K. Lee, I. Kulkarni, D. Pompili, and M. Parashar. 2010. Proactive thermal management in green datacenters. Journal of Supercomputing 51, (2010), 131.
Y. Lin and Q. Gang. 2007. ALT-DVS: Dynamic voltage scaling with awareness of leakage and temperature
for real-time systems. In Proceedings of the 2nd NASA/ESA Conference on Adaptive Hardware and
Systems (AHS07).660670.
H. Liu, E. K. Lee, D. Pompili, and X. Kong. 2013. Thermal camera networks for large datacenters using
real-time thermal monitoring mechanism. Journal of Supercomputing 64, 2 (2013), 383408.
Z. Liu, Y. Chen, C. Bash, A. Wierman, D. Gmach, Z. Wang, et al. 2012. Renewable and cooling aware workload
management for sustainable data centers. SIGMETRICS Performance Evaluation Review 40, 1 (2012),
175186.
W. Lizhe, G. von Laszewski, J. Dayal, and T. R. Furlani. 2009. Thermal aware workload scheduling with
backfilling for green data centers. In Proceedings of the 2009 IEEE 28th International Performance
Computing and Communications Conference (IPCCC09). 289296.
M. Marwah, R. Sharma, and C. Bash. 2010. Thermal anomaly prediction in data centers. In 2010 12th IEEE
Intersociety Conference on Proceedings of the Thermal and Thermomechanical Phenomena in Electronic
Systems (ITherm), Las Vegas, NV, USA. 17.
Binu K. Mathew. 2004. The Perception Processor. Ph.D. Dissertation, The University of Utah, USA.
P. Mell and T. Grance. 2011. The NIST Definition of Cloud Computing. Available at: http://csrc.nist.gov/
publications/nistpubs/800-145/SP800-145.pdf.
Mentor-Graphics. 2014. FloVENT. Retrieved from http://www.mentor.com/products/mechanical/products/
flovent.
A. Merkel, F. Bellosa, and A. Weissel. 2005. Event-driven thermal management in SMP systems. In Proceedings of the Second Workshop on Temperature-Aware Computer Systems (TACS05).
J. Moore, J. Chase, P. Ranganathan, and R. Sharma. 2005. Making scheduling cool: Temperature-aware
workload placement in data centers. In Proceedings of the Annual Conference on USENIX Annual
Technical Conference. 5.
T. Mukherjee, T. Qinghui, C. Ziesman, S. K. S. Gupta, and P. Cayton. 2007. Software architecture for dynamic
thermal management in datacenters. In Proceedings of the 2nd International Conference on Proceedings
of the Communication Systems Software and Middleware (COMSWARE07). 111.
T. Mukherjee, A. Banerjee, G. Varsamopoulos, S. K. S. Gupta, and S. Rungta. 2009. Spatio-temporal thermalaware job scheduling to minimize energy consumption in virtualized heterogeneous data centers. Computer Networks 53, 17 (2009), 28882904.
F. Mulas, M. Pittau, M. Buttu, S. Carta, A. Acquaviva, L. Benini, et al. 2008. Thermal balancing policy for
streaming computing on multiprocessor architectures. In Proceedings of the Design, Automation and
Test in Europe (DATE08). 734739.
S. Murali, A. Mutapcic, D. Atienza, R. Gupta, S. Boyd, L. Benini, et al. 2008. Temperature control of highperformance multi-core platforms using convex optimization. In Proceedings of the Conference on Design,
Automation and Test in Europe, Munich, Germany. 110115.
A. Naveh, E. Rotem, A. Mendelson, S. Gochman, R. Chabukswar, K. Krishnan, et al. 2006. Power and thermal
management in the Intel core duo processor. Intel Technology Journal, 10, 2 (2006).
L. Parolini, B. Sinopoli, and B. H. Krogh. 2008. Reducing data center energy consumption via coordinated
cooling and load management. In Proceedings of the Conference on Power Aware Computing and Systems.
14.
L. Parolini, B. Sinopoli, B. H. Krogh, and Z. Wang. 2012. A cyber-physical systems approach to data center
modeling and control for energy efficiency. Proceedings of the IEEE 100, 1 (2012), 254268.
39:47
T. Qinghui, S. K. S. Gupta, and G. Varsamopoulos. 2008. Energy-efficient, thermal-aware task scheduling for
homogeneous, high performance computing data centers: A cyber-physical approach. IEEE Transactions
on Parallel and Distributed Systems 19, 11 (2008), 14581472.
T. Qinghui, T. Mukherjee, S. K. S. Gupta, and P. Cayton. 2006. Sensor-based fast thermal evaluation model
for energy efficient high-performance datacenters. In Proceedings of the 4th International Conference on
Intelligent Sensing and Information Processing (ICISIP06). 203208.
D. Rajan and P. S. Yu. 2007. On temperature-aware scheduling for single-processor systems. In Proceedings
of the 14th International Conference on High Performance Computing. 342355.
D. Rajan and P. S. Yu. 2008. Temperature-aware scheduling: When is system-throttling good enough? In
Proceedings of the 9th International Conference on Web-Age Information Management. 397404.
I. Rodero, J. Jaramillo, A. Quiroz, M. Parashar, F. Guim, and S. Poole. 2010. Energy-efficient applicationaware online provisioning for virtualized clouds and data centers. In Proceedings of the 2010 International Green Computing Conference. 3145.
I. Rodero, E. K. Lee, D. Pompili, M. Parashar, M. Gamell, and R. J. Figueiredo. 2010. Towards energy-efficient
reactive thermal management in instrumented datacenters. In Proceedings of the 11th IEEE/ACM
International Conference on Grid Computing.
I. Rodero, H. Viswanathan, E. K. Lee, M. Gamell, D. Pompili, and M. Parashar. 2012. Energy-efficient
thermal-aware autonomic management of virtualized HPC cloud infrastructure. Journal of Grid Computing 10, 3 (2012), 447473.
Z. Rongliang, W. Zhikui, A. McReynolds, C. E. Bash, T. W. Christian, and R. Shih. 2012. Optimization
and control of cooling microgrids for data centers. In Proceedings of the 2012 13th IEEE Intersociety
Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm12). 338343.
K. Sankaranarayanan. 2009. Thermal Modeling and Management of Microprocessors. Doctor of Philosophy Computer Science, University of Virginia. Retrieved from http://www.cs.virginia.edu/skadron/
Papers/sankaranarayanan_dissertation.pdf.
S. Sharifi, L. ChunChen, and T. S. Rosing. 2008. Accurate temperature estimation for efficient thermal
management. In Proceedings of the 9th International Symposium on Quality Electronic Design, 2008.
(ISQED08). 137142.
SIA. 2009. International Technology Roadmap for Semiconductors (ITRS). Retrieved from http://www.
itrs.net/reports.html.
J. W. Sofia. 1995. Fundamentals of thermal resistance measurement. Analysis Tech, 11, (1995).
StackExchange. 2013. Finding a CPUs Capacitive Load. Retrieved from http://electronics.stackexchange.
com/questions/82908/finding-a-cpus-capacitive-load.
C. Sun, L. Shang, and R. P. Dick. 2007. Three-dimensional multiprocessor system-on-chip thermal optimization. In Proceedings of the 5th IEEE/ACM International Conference on Hardware/Software Codesign
and System Synthesis. 117122.
C. S. Woo and K. Skadron. 2006. Using on-chip event counters for high-resolution, real-time temperature
measurement. In Proceedings of the 10th Intersociety Conference on Thermal and Thermomechanical
Phenomena in Electronics Systems (ITHERM06). 114120.
Q. Tang, S. Gupta, and G. Varsamopoulos. 2007. Thermal-aware task scheduling for data centers through
minimizing heat recirculation. In Proceedings of the 2007 IEEE International Conference on Cluster
Computing. 129138.
Q. Tang, S. K. S. Gupta, D. Stanzione, and P. Cayton. 2006. Thermal-aware task scheduling to minimize
energy usage of blade server based datacenters. In Proceedings of the 2nd IEEE International Symposium
on Dependable, Autonomic and Secure Computing. Indianapolis, IN. 195202.
TechTarget. 2011. Data Center Design Tips: What You Should Know About ASHRAE TC 9.9. Retrieved
from http://searchdatacenter.techtarget.com/tip/Data-center-design-tips-What-you-should-know-aboutASHRAE-TC-99.
V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel, and F. Baez. 1998. Reducing power in high-performance
microprocessors. In Proceedings of the 35th annual Design Automation Conference. 732737.
Banares,
R. Tolosana-Calasanz, J. A.
C. Pham, and O. F. Rana. 2012. Enforcing QoS in scientific workflow
systems enacted over Cloud infrastructures. Journal of Computer and System Sciences 78, 5 (2012),
13001315.
V. V. Vazirani. 2001. Approximation Algorithms. Springer-Verlag, New York, NY.
V. Venkatachalam and M. Franz. 2005. Power reduction techniques for microprocessor systems. ACM Computer Surveys 37, 3 (2005), 195237.
39:48
K. Vikas. 2012. Temperature-aware virtual machine scheduling in green clouds. Masters Thesis,
Thapar University, Patiala, India. Retrieved from http://dspace.thapar.edu:8080/dspace/bitstream/
10266/1851/1/vikas+thesis-pdf.pdf.
R. Viswanath, V. Wakharkar, A. Watwe, and V. Lebonheur. 2000. Thermal performance challenges from
silicon to systems. Intel Technology Journal (2000).
H. Viswanathan, E. K. Lee, and D. Pompili. 2011. Self-organizing sensing infrastructure for autonomic
management of green datacenters. IEEE Network 25, 4 (2011), 3440.
VMware. 2006. Virtualization Overview. Available at: http://www.vmware.com/pdf/virtualization.pdf.
R
5. Retrieved from http://www.vmware.com/
VMware. 2010. Host Power Management in VMware vSphere
files/pdf/hpm-perf-vsphere5.pdf.
VMware. 2011. Whats New in VMware vSphereTM 5.0 Platform. Retrieved from http://www.vmware.
com/files/pdf/techpaper/Whats-New-VMware-vSphere-50-Platform-Technical-Whitepaper.pdf.
VMware. 2012. Whats New in VMware vSphereTM 5.1 Platform. Retrieved from http://www.vmware.
com/files/pdf/techpaper/Whats-New-VMware-vSphere-51-Platform-Technical-Whitepaper.pdf.
L. Wang, S. Khan, and J. Dayal. 2012. Thermal aware workload placement with task-temperature profiles
in a data center. Journal of Supercomputing 61, 3 (2012), 780803.
L. Wang, G. von Laszewski, J. Dayal, X. He, A. J. Younge, and T. R. Furlani. 2009. Towards thermal aware
workload scheduling in a datacenter. In Proceedings of the 10th International Symposium on Pervasive
Systems, Algorithms, and Networks. 116122.
M. Ware, K. Rajamani, M. Floyd, B. Brock, J. C. Rubio, F. Rawson, et al. 2010. Architecting for power
R
POWER7TM approach. In Proceedings of the 2010 IEEE 16th International
management: The IBM
Symposium on High Performance Computer Architecture (HPCA10). 111.
R. Waugh. 2011. Thats Really Cool: Facebook Puts Your Photos Into the Deep Freeze as It Unveils
Massive New Five Acre Data Center Near Arctic Circle. Retrieved from http://www.dailymail.co.uk/
sciencetech/article-2054168/Facebook-unveils-massive-data-center-Lulea-Sweden.html.
H. Wei. 2008. Accurate, pre-RTL temperature-aware design using a parameterized, geometric thermal model.
IEEE Transactions on Computers 57, 9 (2008), 12771288.
L. Wu, S. K. Garg, and R. Buyya. 2012. SLA-based admission control for a Software-as-a-Service provider in
Cloud computing environments. Journal of Computer and System Sciences 78, 5 (2012), 12801299.
W. Xiaodong, W. Xiaorui, X. Guoliang, C. Jinzhu, L. Cheng-Xian, and C. Yixin. 2013. Intelligent sensor
placement for hot server detection in data centers. IEEE Transactions on Parallel and Distributed
Systems 24, 8 (2013), 15771588.
Z. Xiuyi, X. Yi, D. Yu, Z. Youtao, and Y. Jun. 2008. Thermal management for 3D processors via task scheduling.
In Proceedings of the 37th International Conference on Parallel Processing (ICPP08). 115122.
I. Yeo, C. C. Liu, and E. J. Kim. 2008. Predictive dynamic thermal management for multicore systems. In
Proceedings of the 45th Annual Design Automation Conference. 734739.
C. Yuan, D. Gmach, C. Hyser, Z. Wang, C. Bash, C. Hoover, et al. 2010. Integrated management of application performance, power and cooling in data centers. In Proceedings of the Network Operations and
Management Symposium (NOMS10). 615622.
F. Zanini, D. Atienza, and D. G. Micheli. 2009. A control theory approach for thermal balancing of MPSoC.
In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC09). 3742.
S. Zhang and K. S. Chatha. 2007. Approximation algorithm for the temperature-aware scheduling problem.
In Proceedings of the 2007 IEEE/ACM International Conference on Computer-Aided Design. 281288.
W. Zhikui, C. Yuan, D. Gmach, S. Singhal, B. J. Watson, W. Rivera, et al. 2009. AppRAISE: Applicationlevel performance management in virtualized server environments. IEEE Transactions on Network and
Service Management 6, 4 (2009), 240254.
S. Zhuravlev, J. Saez, S. Blagodurov, A. Fedorova, and M. Prieto. 2012. Survey of energy-cognizant scheduling
techniques. IEEE Transactions on Parallel and Distributed Systems 24, 7 (2012), 14471464.
Received January 2013; revised April 2014; accepted October 2014

A39 Chaudhry

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

A39 Chaudhry

Загружено:

Авторское право:

Доступные форматы

Thermal-Aware Scheduling in Green Data Centers

MUHAMMAD TAYYAB CHAUDHRY and TECK CHAW LING, University of Malaya

A large amount of electricity is generated worldwide through the burning of fossil

Thermal-Aware Scheduling in Green Data Centers

Power usage efficiency (PUE) is an IT industry-recognized standard for grading a

Thermal-Aware Scheduling in Green Data Centers

Outlet air temperature of server i

Inlet air temperature of server i

Power used by component x in enclosure e located at column i

Total heat flowing out from server i

Heat extracted from enclosure e located at column i and row j

Mass of air entering enclosure e located at column i and row j

Outlet air temperature of enclosure e located at column i and

Inlet air temperature enclosure e located at column i and row j

Mass of cold air entering inside enclosure e located at column i

Thermal-Aware Scheduling in Green Data Centers

server preference for a thermal-aware workload scheduler. Two heterogeneous servers

Thermal-Aware Scheduling in Green Data Centers

Thermal-Aware Scheduling in Green Data Centers

decisions. However, with the separation of IT and CT networks, it is difficult to integrate

Thermal-Aware Scheduling in Green Data Centers

Thermal-Aware Scheduling in Green Data Centers

Thermal-Aware Scheduling in Green Data Centers

4.1. Manual Profiling and Monitoring

Thermal-Aware Scheduling in Green Data Centers

Thermal-Aware Scheduling in Green Data Centers

Thermal-Aware Scheduling in Green Data Centers

Table III. Comparison of Thermal Monitoring and Profiling Techniques

Depends on prediction methodology used

Neural network, moving average, correlation, machine

Data center scheduling can be regarded as proactive if the computational workload is

Thermal-Aware Scheduling in Green Data Centers

Thermal-Aware Scheduling in Green Data Centers

5.3. Basic Thermal-Aware Scheduling

Thermal-Aware Scheduling in Green Data Centers

Thermal-Aware Scheduling in Green Data Centers

Thermal-Aware Scheduling in Green Data Centers

5.4.2. Heat-Recirculation-Aware Scheduling for Microprocessors. As mentioned in Section

Thermal-aware scheduling algorithms reach more complexity to be optimized in saving

Thermal-Aware Scheduling in Green Data Centers

Thermal-Aware Scheduling in Green Data Centers

Thermal-Aware Scheduling in Green Data Centers

Table V. Comparison of Thermal-Aware Scheduling Techniques

to dynamic provisioning and elasticity of virtualized resources as per the definition of

Thermal-Aware Scheduling in Green Data Centers

As demonstrated in Table V, the various thermal-aware scheduling techniques use

load. Thermal-aware scheduling is different from computing power-saving scheduling,

Thermal-Aware Scheduling in Green Data Centers

Thermal-Aware Scheduling in Green Data Centers

Thermal-Aware Scheduling in Green Data Centers

Вам также может понравиться