Академический Документы
Профессиональный Документы
Культура Документы
Title: Cost Implications and Size Estimation of Cloud Migration Projects with Cloud Migration Point
Cloud computing has been a buzz word over the last decade - it offers great potential benefits for enterprises who migrate their
computing systems from local data centers to a Cloud environment. One major obstacle to enterprise adoption of Cloud
technologies has been the lack of visibility into migration effort and cost. Currently, there is very limited existing work in the
literature. This thesis improves our understanding of this matter by identifying critical indicators of Cloud migration effort.
A taxonomy of migration tasks to the Cloud has been proposed, outlining possible migration tasks that any migration project to the
Cloud may encounter. It enables Cloud practitioners to gain an understanding of the specific tasks involved and its implication on
the amount of effort required. A methodology, called Cloud Migration Point (CMP), is presented for estimating the size of Cloud
migration projects, by recasting a well-known software size estimation model, Function Point, into the context of Cloud migration.
The CMP value implies how large the migration project is, and it can be used as an indicator for Cloud migration effort estimation.
The process of calculating CMP also assists one in itemizing the migration tasks, and identifying the complexity of each task. This
is useful for project planning and management. The empirical validation on the set of data points collected from our survey shows
that, with some calibrations, the CMP metric is practically useful as a predictor for effort estimation under a defined set of
assumptions. Besides size measurement, other factors also influence the migration effort. We propose a list of external cost
factors, which do not affect how migration tasks are designed, but may affect how fast migration tasks can be done, such as
development team's experience in software engineering, or experience with the Cloud.
Our overall contribution is to shed light into Cloud migration and the tasks involved, which enables Cloud practitioners to estimate
the amount of effort required for the migration of legacy systems into the Cloud. This contributes towards the cost-benefit analysis
of whether the benefits of the Cloud exceed the migration effort and other Cloud costs.
I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in
part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all
property rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.
I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstracts International (this is applicable to doctoral
theses only).
The University recognises that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for
restriction for a period of up to 2 years must be made in writing. Requests for a longer period of restriction may be considered in exceptional
circumstances and require the approval of the Dean of Graduate Research.
Cloud computing has been a buzz word over the last decade - it offers
ing of the specific tasks involved and its implication on the amount of
effort required. A methodology, called Cloud Migration Point (CMP),
into the context of Cloud migration. The CMP value implies how
large the migration project is, and it can be used as an indicator for
Cloud migration effort estimation. The process of calculating CMP
also assists one in itemizing the migration tasks, and identifying the
complexity of each task. This is useful for project planning and man-
which do not affect how migration tasks are designed, but may affect
how fast migration tasks can be done, such as development team’s
experience in software engineering, or experience with the Cloud.
the benefits of the Cloud exceed the migration effort and other Cloud
costs.
Dedication
I am most indebted to my two supervisors Dr. Anna Liu and Dr. Ray-
mond Wong for their guidance and close supervision over the years.
Dr. Raymond Wong was very encouraging and patient in walking me
through the very first steps in my research journey. Dr. Anna Liu
has inspired me in so many ways. Her tremendous support, care and
understanding made it possible for me to continue this research. I am
Alan Fekete and Kevin Lee, whose constructive and insightful feed-
back have benefited this research and myself in many ways. I would
like to give my sincere thanks to Professor Barbara Kitchenham for
her reviews and expert advices that made great improvements to this
research. This thesis will not be possible without their encouragement
and support.
ups and downs during our Ph.D. journey. I thank my best friends
for their wonderful friendship, especially Jensyn for spending a lot of
minton group for all the entertainment and sport activities that got
me through the difficult times.
little sister, my brother and his family, who have always believed in
me and supported me unconditionally; and my husband, Daniel, for
his endless love and for always being there for me during both happy
• Van Tran, Jacky Keung, Anna Liu, and Alan Fekete: “Applica-
tion Migration to Cloud: A Taxonomy of Critical Factors”, in
Proceedings of the 2nd International Workshop on Software En-
• Van Tran, Kevin Lee, Alan Fekete, Anna Liu, Jacky Keung:
“Size Estimation of Cloud Migration Projects with Cloud Mi-
gration Point (CMP)”, in Proceedings of the 5th International
thesis
ship
List of Figures xv
Glossary xxi
1 Introduction 1
1.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 Literature Review 25
xi
CONTENTS
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3 Research Methodology 47
3.2.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
xii
CONTENTS
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.5 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
xiii
CONTENTS
6 Validation 129
Bibliography 189
xiv
List of Figures
6.1 The boxplots for the six training datasets of variable CMP . . . . 138
xv
LIST OF FIGURES
xvi
List of Tables
xvii
LIST OF TABLES
6.8 Data points for calibrating network connection component weights 160
xviii
LIST OF TABLES
6.14 New dataset - calculated from the new set of calibrated weights . 163
xix
GLOSSARY
xx
FP Function Point
xxi
GLOSSARY
xxii
Chapter 1
Introduction
Cloud computing has recently been the focus of much excitement in the
IT1 community, seen by some as the next platform shift (Erdogmus, 2009), with
impact on enterprise computing that could compare to the change from main-
discussing national agendas for the coming shift, and start-ups are growing to fill
While some software is written from scratch specially for the Cloud, many
1
1. INTRODUCTION
qualities. An indication of how much effort is anticipated for the migration process
is important for project management, particularly project scheduling and budget
Some common terms that are used through this thesis will be clarified in Section
1.2. A broad overview of our work will be presented in Sections 1.3 and 1.4.
Section 1.5 will introduce our research methodology. A general layout of how this
thesis is structured will be provided in Section 1.6.
Since its emergence over the last decade, Cloud computing has been well rec-
ognized for its abilities to provide virtualized resources and services, such as
2
1.1 Background and Motivation
One of the attractions for an organization using Cloud resources, rather than
those in an enterprise-scale data center, is that it can enjoy cost savings through
larger economies of scale, since the costs of hardware, power, buildings and admin-
istrative support are typically about 5 times lower for internet-scale systems than
for enterprise-scale ones (SalesForce, 2012; Aggarwal & McCabe, 2009). Even
more significant to a rapidly-growing business is the elasticity of costs; instead of
the up-front purchase of an overprovisioned system, one can pay a Cloud provider
ongoing fees that are low at first, and that smoothly increase as and when the
system needs more capacity. Therefore, Cloud users are neither required to plan
for provisioning nor tied to huge up-front commitment on hardware resources and
infrastructures. This enables companies to start small and acquire more resources
only when needed on short-term basis (e.g. hourly processors and daily storage),
and reward conservation by releasing computing machines and storage when they
For established businesses, there is the potential to use the Cloud as an ad-
ditional resource (alongside existing data centers) to deal with bursts of load,
perhaps seasonal, or due to intermittent activities such as stress testing. Here
the Cloud allows the client to delay the large commitment of funds needed to
scale-up the hardware. Many applications are not extensively used all the time,
but more often than not, they are under-utilized. In other words, the resource
usage pattern is not stable over time. There are times when resource usage stays
idle, while there are other times (peak times) when it is heavily used. In or-
der to accommodate those peak-time usages, enterprises have no better choice
3
1. INTRODUCTION
but to invest a huge amount of resources to be ready for peak periods, which at
other times, stays idle and wasted. Cloud providers address this issue by their
on-demand resource offers. Cloud consumers pay only for the resources they use
during an average period, while over peak times they can obtain additional re-
sources on demand. For example, online shopping systems are of normal use over
the year, which may accumulate up to around 2-3 months worth of resource usage
in total. With the Cloud, they will have to pay for resources of that 2-3 month
actual usage only, rather than overpaying for a whole year if applications were
managed in house. During the Christmas period, resource demand may increase
more than 10 times than normal and can be accommodated by Cloud providers
promptly. After peak times, resources are released back to the providers, and
charges drop back to normal because of the pay-per-use pricing model of the
Cloud (Armbrust et al., 2009).
Cloud Services:
Cloud computing has been seen to offer a wide variety of services, such as
application services, storage services, compute services, and database services
(Amazon, 2009; Google, 2009; Microsoft, 2009; Agrawal et al., 2009; Armbrust
et al., 2009; Buyya et al., 2008; Chang et al., 2006; Chappell, 2008; Ghemawat
et al., 2003; Palankar et al., 2008). These services are accommodated by different
Cloud technologies. Understanding the Cloud technology stacks and their inter-
relations enables the Cloud community to provide better solutions, portals and
gateways for the Cloud, which facilitate the adoption of this emerging computing
paradigm. Hence, there exist several attempts to create a reference model of
Cloud computing (Ji et al., 2009; Mikkilineni & Sarathy, 2009; Youseff et al., 2008;
4
1.1 Background and Motivation
Lenk et al., 2009) to classify Cloud technologies and services into different layers.
Different proposals tackle different aspects of the Cloud ontology; however, they
all use the same basic model with three main common layers: Infrastructure-as-
• IaaS: The infrastructure layer provides basic physical resources and data
storage with virtualization services. The physical units are hardware re-
software are required for this layer to provide Cloud users with a highly
scalable and manageable basic environment. An example of this layer is
• PaaS: The platform layer works independently with physical resources from
the infrastructure layer. This increases the scalability of the Cloud. This
5
1. INTRODUCTION
across multiple physical nodes (e.g. Google File System, Hadoop Dis-
tributed File System);
Examples of this layer are Google App Engine and Microsoft Azure (Google,
Cloud Vendors:
There are a few major Cloud vendors in the market, such as Amazon, Mi-
tional to NoSQL databases. In this section, we will describe three main providers
6
1.1 Background and Motivation
- Amazon, Microsoft, and Google (Figure 1.2) (Amazon, 2009; Microsoft, 2009;
Google, 2009).
• Amazon: Amazon offers an IaaS solution for Cloud computing called Ama-
zon Web Services (AWS) (Amazon, 2009). AWS provides a range of Cloud-
ronment, where users can launch instances with various operating systems
of their choice. The users are given complete freedom to manage their ap-
Database services from AWS are also well-known to Cloud users. Ama-
7
1. INTRODUCTION
to enable high availability and data durability. Without any database ad-
can be delivered as a SaaS for its flexibility and scalability features. Devel-
oping applications for Windows Azure environment is much like developing
programs for standard Windows applications on a local environment. New
8
1.1 Background and Motivation
databases.
Python. This SDK is available as a plugin for Eclipse, the most commonly
used Integrated Development Environment (IDE) for Java. Any of the Java
putational machine (per hour) and data transfer into and out of the cloud. In
addition, charges may occur for additional administrative tools and services such
9
1. INTRODUCTION
Tables 1.1 and 1.2 show a comparative pricing model for some main charges
in computational and storage cost of the three major Cloud providers: Amazon,
Microsoft and Google (as of Jannuary 2012).
Table 1.1: Pricing model comparision: Service charge (Amazon, 2009; Microsoft,
2009; Google, 2009)
to occur when using local servers, such as operational costs or upgrade and main-
tenance costs. Operational costs include power and electricity cost, premises
rental cost, administration staff cost, networking infrastructure cost, and so on.
Upgrade and maintenance costs include new hardware and middleware costs, new
software costs, new license costs and additional labor costs for installation and
configuration.
When a Cloud is made accessible to the public via its pay-per-use pricing mod-
that are not available to the public are referred to as Private Cloud (Armbrust
et al., 2009).
10
1.1 Background and Motivation
Table 1.2: Pricing model comparision: Storage Cost (Amazon, 2009; Microsoft,
2009; Google, 2009)
Summary:
for both providers and users. Cloud computing offers scalable computing re-
sources available on demand without up-front commitment for its users, freeing
costing less than a medium-sized datacenter, while still generating a good profit
11
1. INTRODUCTION
(Google, 2011), reflects that, with a graph showing an increasing interest in Cloud
computing over the last several years, starting from late 2007 till present. The
top graph is the search volume index in Google search engine for “Cloud Comput-
ing”, which represents how many searches have been done for this term, relative
to the total number of searches done on Google over time. The bottom graph
shows its news reference volume over years, which represents the number of times
it appeared in Google News stories.
As has been discussed in many articles, papers, case studies, and blogs, there
are many ways one can use Cloud services (Kundra, 2010; Khajeh-Hosseini et al.,
2010a; Hajjat et al., 2010; Ward et al., 2010). For example, many of the most
famous stories of Cloud computing have been about startups with explosive
12
1.1 Background and Motivation
the Cloud (Microsoft, 2012; Amazon, 2011). One can also take advantage of
(CRM) applications. However, there are cases where an organization has exist-
ing application software and wants to run this on a Cloud platform. Instead of
a complete rewrite, one could say that they are “migrating” the software from a
The migration case is quite practical and popular, since it is likely that cur-
rently operating businesses already have their own IT systems developed and in
use, whereas Cloud computing is relatively new. A migration project to the Cloud
can be carried out in various forms, as described in the illustrative case studies
at the Federal, state and local government levels of the United States (Kundra,
2010). For example, since 2009, the Department of Energy has been exploring
cost and energy efficiencies from leveraging Cloud computing, such as, deploying
mailboxes on Google Federal Premier Apps, Google Docs and Google Sites, as
well as evaluating the use of Amazon EC2 to handle peak usage periods. This
migration spreads over a wide range of migration activities, from SaaS to IaaS
Cloud. Other case studies, such as the City of Miami, Florida, described their
decision to use Windows Azure platform for on-demand hosting in Microsoft data
centers. This type of migration is different from the case of the Department of
Many papers have also illustrated case studies where enterprises are keen on
13
1. INTRODUCTION
a case study of a UK-based organization that provides IT solutions for the Oil and
Gas industry. This organization was considering deploying one of their primary
These are some representative examples to illustrate how enterprises are en-
couraged to move to the Cloud. More detailed discussion will be presented in
Chapter 2.
Generally, the status of IT systems ported to the Cloud is quite active and in-
creasingly popular.
Although the migration process is a one-off task, it is not automatic, as can be seen
from the above migration examples. Because some installations in the IaaS Cloud
must be done or modifications to the existing systems are unavoidable, and the
14
1.1 Background and Motivation
platform (Verma et al., 2011). There are often differences in the version of various
infrastructure components, the programming models, the libraries available, and
even the semantics of data access; for example, Cloud platforms typically provide
eventual consistency rather than transactional guarantees. All these extra tasks
of the migration process to the Cloud may not be as easy and straightforward as
Migration costs also contribute towards the Overhead Cost component (Figure
1.4) of the cost-benefit analysis (Carriere et al., 2010; de Assuncao et al., 2009)
and decision making process on whether it is worthwhile to migrate a system to
the Cloud.
Figure 1.4 illustrates the analysis of cost and benefits in two options: (1)
migrating an existing application to the Cloud, and (2) keeping the application
on premise. If one decides to go with option (1), one has to pay a total cost of:
application development cost, migration cost (or overhead cost), and on-going
cost paid to the Cloud providers. Otherwise, keeping the application in-house
incurs costs of application development (which is similar to option (1)), and
2009).
15
1. INTRODUCTION
Figure 1.4: Cost and benefit of migrating existing applications into the Cloud
then migrating the application to the Cloud is a wise move. Otherwise, keep-
ing the application in house is more beneficial.
The Overhead Cost component plays an important role in this analysis, and
it is essentially the cost made up from the migration effort. Hence, early effort
1.2 Definitions
There exists related work on migrating a system to the Cloud; however, the notion
of Cloud migration can still vary, such as in (Suen et al., 2011), it refers to the live
migration of virtual machine images between different Cloud providers, as well
as between private and public Cloud offerings. Hence, it is worth to clarify the
meaning of Cloud migration concepts as well as some other common terms used
throughout this thesis. These definitions were defined based on this research’s
16
1.2 Definitions
activities.
cation from local data centers to the Cloud, without sacrificing any performance
attributes. The system can be migrated to the Cloud partially (i.e., only a part
of the system is moved to the Cloud, the rest is still hosted in-house, and the two
parts must working seamlessly together), or as a whole (i.e., the whole system is
ported to the Cloud). The former is called a partial migration, and the latter is
called a full migration.
Cloud.
to the Cloud.
For example, when migrating a Microsoft SQL Server database to SQL Azure,
moving the data is a migration task, and any changes to the database schema are
also called migration tasks.
Migration cost and migration effort are used interchangeably in this thesis.
They both refer to the amount of effort spent on migration activities.
17
1. INTRODUCTION
Overhead cost is used in our analysis of cost and benefit of migrating a sys-
tem to the Cloud. The overhead cost refers to the cost of the actual migration
Those are some common concepts that will be used regularly in this thesis.
• Cloud computing is relatively new and different from the traditional soft-
ware engineering paradigm in many aspects, such as characteristics, pricing
To the best of our knowledge, at the time of writing, no effort estimation ap-
proaches have been specifically designed for Cloud migration projects. Existing
18
1.4 Research Scope
traditional effort estimation approaches for software development are not appli-
place. This strongly motivates us to, firstly, understand and evaluate the critical
cost factors of the migration process, in order to estimate how much effort would
how large a migration project to the Cloud is, and which will serve as a basic
indicator for effort estimation approaches.
• RQ1: What activities are needed to migrate a software system to the Cloud?
• RQ3: What are the cost implications (in terms of staff effort) of those tasks?
19
1. INTRODUCTION
the Cloud. The project ends with the same application or system, either
completely or partially migrated to the Cloud.
the migration. Having said that, some migration activities may involve code
modification to adapt the system to the new environment without adding
more functionality.
• Our study focuses on the migration effort to the Cloud from the consumer’s
point of view; hence, only migration activities carried out by Cloud users
are taken into consideration. In a migration project to an SaaS Cloud,
as an obvious trade-off, restricts their flexibility and control over the systems
in the Cloud. Hence, SaaS is deliberately removed from the scope of our
work. On the other hand, migration projects to PaaS and IaaS clouds
work in this thesis is limited to migration projects to PaaS and IaaS Cloud
platforms, but not SaaS Clouds.
20
1.5 Research Approach
• The migration is between two data centers only (typically, one in-house
and one in-Cloud). We assume that migration projects are directional (i.e.
components are moved from local to remote data centers in the Cloud). In
the case where two or more data centers are involved, each pair of data
centers will be assumed to form a separate migration project.
• We assume that the Cloud target has already been selected. We only focus
on the migration process itself; hence, the decision on which Cloud platform
to choose is out of the scope of this thesis. Having said that, applying our
study to each Cloud platform could assist this decision.
The above presented items form the scope and assumptions of this research.
21
1. INTRODUCTION
migration case studies in the literature and practitioners’ blogs, as well as con-
ducting a series of migration exercises of different types which will be discussed
further in Chapter 4.
From this exploration, a taxonomy of migration tasks is extracted in Step (2).
A record of the required cost (in terms of effort) is carefully tracked, together
with a note about which tasks require more effort than others.
There are many influential cost factors in Cloud migration effort, amongst
which, size measurement is seen as one of the most significant factors of effort
estimation. Traditional size measurements, such as: Source Line of Code (SLOC),
Function Points (FP) and its extensions, are not applicable in the context of
estimation.
The validation in Step (5) is to ensure that CMP can be a reliable indicator
of effort estimation for Cloud migration projects. Data for this validation process
is not publicly available; hence, we conduct a survey in Step (4) to facilitate the
validation process.
22
1.6 Organisation of the Thesis
step of the research process will be elaborated and mapped with each com-
ponent of this thesis.
• Chapter 5 tightens our focus on the most dominant indicator of effort esti-
mation - size measurement. This chapter describes our CMP model, built
from recasting a well-known software size estimation model called Func-
tion Point (FP) into the context of cloud migration. We adopt the three-
for the new environment in the Cloud. For each component, we perform an
estimation by identifying relevant activities that contribute to the overall
effort required for that component. Finally, we aggregate all individual es-
timations into a single CMP value by calculating their weighted sum. The
weighted sum CMP provides a measure of how large the migration project
is, and it can be used as an indicator for Cloud migration effort estimation.
23
1. INTRODUCTION
shows that our metric is practically useful as a basis for effort estimation
under a defined set of assumptions. We conducted a survey with Cloud
survey has allowed us to calibrate the CMP model to increase its validity
externally. In this chapter, we also state a list of assumptions made for de-
veloping the model, and test their plausibility using the available data. This
list of assumptions imply the high complexity and difficulty of validating
the metric.
24
Chapter 2
Literature Review
“Not everything that counts can be counted, and not everything that
can be counted counts.”
Effort estimation and size measurement of software projects have been inter-
esting and challenging areas in traditional software engineering. There has been a
lot of related work in the traditional context. However, none has been considered
for the new settings of Cloud computing. The aim of this literature review is to
examine existing research related to Cloud migration topic, as well as effort es-
timation and size measurement metrics, with consideration of their applicability
to Cloud computing.
The following sections cover a number of issues important for this thesis: Sec-
tion 2.1 reviews other research related to Cloud migration topic, with regard to
their concerns of migration (i.e., risk management, cost saving, and performance).
25
2. LITERATURE REVIEW
ing and Section 2.3 explores existing size measurement metrics, including Source
Line of Code, Function Points and its various extensions. This section will also
states the requirements for a sizing metric for Cloud migration, and explain why
none of the existing approaches meet these requirements; hence, a new metric is
in need. Lastly, Section 2.4 summarizes and concludes this chapter.
There have been many publications and research dealing with various aspects of
Cloud computing, such as Cloud computing architectures, Security and Privacy in
from others. The sub-sections show different streams of the related work in the
literature.
Although there are many benefits associated with the Cloud, whether it is worth
moving an existing working system to the Cloud is still an open question for
enterprises. As cost and benefit analysis is an important tool for IT managers
26
2.1 Cloud Migration Solutions
prise IT system in the oil and gas industry from a local data center to Amazon
EC2. Their findings indicate that there are significant risks associated with the
ments because Cloud providers will be responsible for their daily tasks, and so
on.
assist decision makers by producing cost estimates of using public IaaS Clouds,
as well as outlining benefits and risks of using IaaS Clouds from an enterprise
perspective. They also explicitly stated that the limitation of their work is only
focusing on infrastructure cost, and ignoring the actual migration work, which
could be significant.
Mastroeni & Naldi (2011) also assessed the risks involved in the decision to
migration to the Cloud storage against its alternative to buy the storage devices
and facilities, based on different decision variables; while Yam et al. (2011) and
Hajjat et al. (2010) addressed this from the uncertainty angle, including security
Another important criterion that affects the decision to migrate to the Cloud
is cost savings. It is essential to understand how cost effective it can be to migrate
to the Cloud, as opposed to staying in house. The work of Hajjat et al. (2010)
part of the system is migrated to the Cloud, while the other part stays in house.
This model takes into consideration the cost savings that may result from the
27
2. LITERATURE REVIEW
migration. This cost is essentially the Internet communication cost. They briefly
mentioned that the one-time cost of the actual migration process can also be
easily incorporated in the model; however, there was no further discussion of how
the Cloud, and a cost model (i.e., communication cost) and the decision algorithm
were designed to evaluate the tradeoffs on service selection and migration. Apart
from communication cost, reconfiguration cost also caught the attention of some
researchers. Verma et al. (2011) designed a model, called CosMig, to model the
cost of frequently reconfiguring a Cloud infrastructure and evaluate its impact on
application performance. These factors are considered to be the cost of using the
Cloud.
Li et al. (2011a) and some other researchers (Ye et al., 2011; Ho et al., 2011;
Mastroeni & Naldi, 2011) identified cost savings from the perspective of Cloud
price and server bandwidth. They compare the price of different Cloud providers,
as well as the cost difference between using the Cloud and staying in house. This
cost as the combination of a number of direct costs (e.g. facility, energy, cables and
servers) and indirect costs (e.g. cost from failing to meet business objectives).
However, the list of cost components in this framework is incomplete for both
direct and indirect costs. Furthermore, it does not indicate how these costs can
28
2.1 Cloud Migration Solutions
only considers a portion of Cloud cost and focuses specifically on response time
benefits. This is sufficient to analyse costs and benefits amongst the proposed
scheduling strategies of using Clouds, but cannot be applied to a wider scope of
Conclusion:
looks at the cost of migrating a system to the Cloud. However, the research is
not related to the cost associated with the migration process; it refers to the cost
of using the Cloud assuming that the migration has been done. This differentiates
the focus of our work from others, since our work is concerned with the cost of
Apart from decision making support, a few researchers have reported on their
experiences of migrating a system to the Cloud. Babar & Chauhan (2011) and
Chauhan & Babar (2011) reported their experiences and observations of migrating
Hackystat, an Open Source Software Product to the Cloud. The focus of this
29
2. LITERATURE REVIEW
On the other hand, the experience presented by Thakar & Szalay (2010) dis-
cussed migrating the Sloan Digital Sky Survey science archive, a scientific astro-
nomical database to the Cloud. Their exercise resulted in a strong finding that
to the Cloud (such as Amazon EC2 or Microsoft SQL Azure) without changing
either its schema or its settings. Our finding, which will be discussed later in
Chapters 4 and 5, strongly agrees with this observation.
Conclusion:
this topic. However, researchers have only reported preliminary results of their
experiences. It is still necessary to have a guideline for the migration process, in
order to enable practitioners to better plan their own migration process.
This section reviews and categorizes several issues concerning the migration pro-
cess that have been raised in some related research.
• Data Migration
Data transfer between local data centers and the Cloud can affect the overall
30
2.1 Cloud Migration Solutions
address this issue during the migration, for example, Piao & Yan (2010)
Zhang et al. (2010) took a closer look into application specific workload
• Performance
the migration of legacy systems to the Cloud (Frey & Hasselbring, 2011;
31
2. LITERATURE REVIEW
(2011), on the other hand, found that the configuration for some environ-
ments just does not work for other Cloud environments. Hence, during
Other performance issues have also been raised and discussed, such as:
cerns include target system database population (Bisbal et al., 1997, 1999).
Smith (2007) shared this same view in his migration concerns, such as:
identification of specific components to migrate, recommendations on the
32
2.2 Effort Estimation in Traditional Software Engineering
Conclusion:
Some issues related to the migration process may result in extra cost and
require extra effort, such as: data and database migration, networking or Internet
communication, Cloud infrastructure configuration, or re-engineer the application
to the Cloud. It is also essential for any migration project (not just to the Cloud)
to have a roadmap to follow.
gineering
They can be categorized into three general types: analogy, expert judgement, and
algorithmic models (Jorgensen & Shepperd, 2007; Boehm et al., 2000; Shepperd
& Schofield, 1997; Keung et al., 2008; Helmer, 1966; Baird, 1989; Banker et al.,
1991).
Effort estimation using analogy is the approach where a problem is solved using
knowledge derived from similar problems (Shepperd & Schofield, 1997; Keung
33
2. LITERATURE REVIEW
understood domains because solutions are based upon what has actually hap-
pened. Even so, this approach is still not applicable for the Cloud context at this
stage because the range of completed migration projects is still limited, and it is
not obvious as to where and how similar projects can be identified.
main of interest, and derives estimates based on historical data that they are well
aware of, or past projects that they participated. Similar to the analogy-based
approach, because of the newly emergence of the Cloud, there is a lack of prac-
titioners who have experiences a broad range of migration types to the Cloud.
Nevertheless, this approach shows a great potential when the Cloud gets more
mature in the future.
first round, a group of experts are asked for their assessment on some matters in-
dividually, without knowledge of how other participants do. In the second round,
each participant is asked for their assessment again, but this time with knowledge
of how the others have answered in the first round. This technique is to narrow
the range of answers from the participants, pointing to a more reasonable middle
34
2.2 Effort Estimation in Traditional Software Engineering
perd, 2007; Boehm et al., 1995; Banker et al., 1991). This approach estimates
efforts using mathematical formulas to establish the relationship between depen-
dent and independent variables of the models, which are the estimated effort and
influential cost factors, respectively. This approach also required historical data
to develop the algorithmic model; however, the model itself is more generic than
the other two approaches, which makes model-based technique more suitable to
apply for a broader range of migration projects to the Cloud at this stage.
and Post-Architecture, which can be combined in various ways to deal with the
current and likely future software practices. These sub-models use FPs and/or
LOCs for their sizing parameters. Size of a project is one of the key factors in
Conclusion:
This section has reviewed three popular approaches of traditional software en-
data on similar Cloud migration projects. This can be achieved when the field
of Cloud migration becomes more mature and data on migration projects can be
collected and stored in a repository for future use. Expert judgement approach
relies on practitioners’ expertise in the Cloud migration. This can be achieved
when there are many experts in the field. Algorithmic approach requires a math-
35
2. LITERATURE REVIEW
ware Engineering
The literature has showned that effort spent on a development project relies
significantly on the project’s complexity. A more complicated project would
typically require more effort on both development and maintenance. Software
SLOC is a traditional size measure that counts the number of lines in a software
product’s source code. SLOC is one of the prime measures which are used as
input into equations for effort estimation (Verner & Tate, 1992; Dolado, 2000;
Rosenberg, 1997). SLOC was popular for its simplicity and straightforwardness.
However, counting SLOC is only possible after the implementation phase when
source code is available, which makes SLOC not applicable for estimation in early
phase of the development cycle (Albrecht & Gaffney, 1983; Lai & Huang, 2003).
There are also more concerns on SLOC’s validity because of its high dependency
on the programming language and programmer’s skills and coding style (Ruhe
36
2.3 Software Size Estimation in Traditional Software Engineering
et al., 2003b).
incorporates both size and complexity factors in its counting process. There are
many software development effort estimation approaches using function points,
such as regression model, or artificial intelligence model (e.g. artificial neural
early stage of software development, when LOC is not yet available. The FP of
a system can be obtained relatively easily from discussions with customers early
in the development process.
37
2. LITERATURE REVIEW
system and identifiable system features (such as external inputs, interface files,
outputs, inquiries, and logical internal tables). Counts of system features are
adjusted using weighted values and complexity factors to derive the final size of
the system.
The FPA methodology has three steps, given there exists a list of all functions
that the software should provide. Firstly, each function is classified into one of
five types: External Input (EI), External Output (EO), External Inquiry (EQ),
Internal Logical File (ILF), and External Interface File (EIF). A function is clas-
sified as an EI when it involves user inputs that adds or changes data in a LIF.
A function is an EO when it generates a report or message to the user or other
applications outside the boundary of the application being measured. A function
weight value based on its type from the first step and its complexity level from
the second step. The sum of these weight values forms the Unadjusted Function
Point (UFP) of a software. The weighted sum of all five types of functions is
interested system.
ceptance in sizing software products, mainly due to its applicability in the early
phases of the software development. However, FP has also been subject to some
38
2.3 Software Size Estimation in Traditional Software Engineering
criticisms. Abran & Robillard (1994) pointed out a scale type mismatch and
questioned the math behind the FP approach. Thus, from a theoretical point of
view, the FP may not be considered as a measure that is in conformance with
1994). As a result, despite the criticisms, the FP measure has subsequently been
improved and extended.
39
2. LITERATURE REVIEW
At the same time, Antoniol et al. (1999) developed OOFP for sizing OO
systems. OOFP relies on object models to map FP’s function types into
weights based on their complexity where the number of links and words in a
web page determine its complexity. The WP measure focuses on static web
sites and therefore does not consider behavioural and navigational proper-
ties of web applications.
40
2.3 Software Size Estimation in Traditional Software Engineering
Cost Xpert Group, Inc. (Group, 2002). The IP method replaces five types
of constituents of the FP model with seven new types, namely external
systems. The IP counting process has been automated in a tool called Cost
Xpert that can estimate the equivalent size of a web-based system in LOC
as well as the effort and schedule of its development.
• Class Point
Costagliola et al. (2005), in 2005, proposed Class Point (CP1 and CP2
for initial size estimation at the beginning of the development process and
further detailed estimation when more data are available later in the de-
the three-step approach from Function Point: (1) Classify classes into four
types (Human Interaction, Problem Domain, Data Management, and Task
Management); (2) Evaluate complexity level for each individual class (com-
plexity levels: Low, Average, or High); and finally (3) Assign a complexity
weight for each class based on the previous two steps. The weighted sum
41
2. LITERATURE REVIEW
Despite its name, the Object Point (OP) (Banker et al., 1991) is another
(COSMIC FFP) (Abran, 1999). The COSMIC FFP measure has been
formulated as a refinement of FFP, MKIIFP and the FP models in order
42
2.3 Software Size Estimation in Traditional Software Engineering
systems. However, the method does not explicitly claim to measure the
size of functionality that includes complex mathematical algorithms. In
contrast to FP, the COSMIC FFP measure does not take the effect of tech-
nical and quality requirements of the system into consideration by claiming
Conclusion:
SLOC, FP and its extensions have been widely used to measure size of different
types of systems and development paradigms. However, their applicability is
limited to software functionality development. The main purpose of migrating
a system to the Cloud is not to develop new functionalities, but to reuse the
existing ones, while, at the same time, to benefit from the best performance of
Cloud offerings. In light of this stance, none of the existing metrics are suitable
We, thereby, wish to apply the FP approach to develop a similar size metric for
the Cloud migration context. Although FP is commonly known as a software size
measurement, it is not purely a size metric. The way FP was counted incorporates
both size and complexity concepts. The size metric for Cloud migration projects
that is based on the FP approach will be similar to FP in the sense that they are
both size-complexity hybrid metrics. However, throughout this thesis, this metric
will still be referred to as a size metric to ensure the consistency of terminology.
43
2. LITERATURE REVIEW
2.4 Summary
Cost and benefit analysis is an important tool for IT managers to evaluate whether
Software costs include tangible costs (hardware and software costs), admin-
istrative costs, and development costs. Most of the time, the dominant cost
is the cost of development staff and managers (Sommerville, 2006). The con-
text of Cloud migration requires a different perspective to understand its effort
costs, given that limited experience is available in the published papers. Amongst
in traditional software engineering. Many of these metrics are not able to ad-
equately capture the unique and different characteristics of a Cloud migration
project. Effort estimation and size measurement of migration to the Cloud are dif-
ferent from those of traditional software development in the sense that the latter
44
2.4 Summary
to cover.
• Develop a new size measurement for Cloud migration, which can be served
as a predictor for migration effort estimation purpose. We aim to cap-
ture the size of the migration process, rather than the size of the migrated
system; hence, none of the existing metrics are applicable.
The taxonomy will be presented in Chapter 4, and the new sizing metric will
be introduced in Chapter 5.
45
2. LITERATURE REVIEW
46
Chapter 3
Research Methodology
“If you can’t describe what you are doing as a process, you don’t know
what you’re doing.”
∼ W. Edwards Deming.
The literature review in Chapter 2 has shown that there is no related work
in the Cloud migration effort topic. We, therefore, seek to gain insight into the
Cloud migration tasks and understand their cost implications by carrying out
the concurrent procedure strategy, we collect both forms of qualitative and quan-
titative data at the same time during the study and then integrate and analyze
them to achieve the overall results. In particular, the process of this research
can be described in three steps, which are mapped with steps in this thesis, as in
47
3. RESEARCH METHODOLOGY
Figure 3.1.
The sub-sections in this chapter elaborate the steps of this research process,
as follows: Section 3.1 describes Step 1 in the research process - the experiment
set up for the purpose of exploring possible migration tasks in a Cloud migra-
tion project. Section 3.2 illustrates Step 2 - the discussion protocol with Cloud
engineers from our group to confirm our findings on Cloud migration tasks, and
to develop the CMP metric. Section 3.3 discusses Step 3 - the survey protocol
This is the first step in the research process. I carried out different types of
migration experiments to understand the actual migration activities. The purpose
48
3.1 Cloud Migration Experiments
• The migration experiments are setup for PaaS and IaaS Clouds only (SaaS
Clouds are ignored as discussed in Section 1.4). PaaS Cloud candidates
can be Windows Azure and SQL Azure, and IaaS Cloud candidates can be
• The applications in the Cloud after the migration process should work prop-
tion strategies.
for building an enterprise, N-tier .Net 2.0 application. It serves to highlight the
key technologies and architecture to build scalable enterprise Web applications.
49
3. RESEARCH METHODOLOGY
Its Java version called Java PetStore is also well-known for its use of as an illus-
in various research studies (Li et al., 2004; Singh et al., 2002; Yuan et al., 2003)
and we believe the PetShop application represents a broad class of application
types, that are typically found at an enterprise organisation, and that is also a
Our experiment was to migrate the PetShop application from a local server
to the Cloud. Windows Azure and SQL Azure were selected as the PaaS Cloud
platform for migration since they provide the most similar environment for Pet-
Shop .Net as in the local server. Therefore, it was expected that minimal effort
would be required for migration activities.
The migration of Java PetStore into Amazon EC2 and SimpleDB was also
investigated to add more richness to our findings. Amazon EC2 is an IaaS Cloud,
and SimpleDB is a NoSQL database with less support for full-SQL statements
required in the PetStore application; therefore, different migration strategies and
more re-engineering efforts were expected.
All migration tasks should be recorded, together with the time required to com-
plete each task. Each migration task can be divided into multiple tasks with finer
granularity, or grouped with other tasks to form a more general task. This is to
ensure the uniformity in granular level of all tasks.
The migration tasks should be categorized into different groups, such as in-
50
3.2 Discussion with Cloud Engineers
stallation tasks, or code modification tasks, depending on the nature of each task.
The overhead cost of the migration tasks can be achieved by comparing the time
spent on each migration task category with the development time of the applica-
tion. The application was not developed by us; hence, the development time can
be estimated using an effort estimation approach in the literature (either analogy,
(2011); Chappell (2011), were also consulted to confirm our list of migration tasks.
Although they did not discuss any specific migration project, there are blog en-
tasks, collected from all migration experiments, together with the associated time
spent on each task. This contributes to the taxonomy of migration tasks, and
forms the basic elements of the CMP model to measure the size of a migration
process.
gorized migration tasks and the structure of the CMP model. In this Step 2, we
conducted interviews with our group members at NICTA1 to confirm the migra-
1
NICTA (National ICT Australia Ltd) is Australia’s Information and Communications
Technology Research Centre of Excellence. Since NICTA was founded in 2002, it has cre-
ated five new companies, developed a substantial technology and intellectual property portfolio
and continues to supply new talent to the ICT industry through a NICTA-supported PhD pro-
gram. NICTA has five laboratories around the country. With over 700 people, NICTA is the
largest organisation in Australia dedicated to ICT research.
51
3. RESEARCH METHODOLOGY
tion tasks and migration categories in the taxonomy are reasonable, and to seek
3.2.1 Participants
The discussion was carried out with 6 participants from our group individually.
• Two Ph.D. Research Students, who are in their middle and final stage
of the Ph.D. study. Their Ph.D. topics are related to Cloud Computing
performance.
All participants have good knowledge of Cloud computing. They have good
zon EC2, Amazon RDS, S3, SimpleDB, Windows Azure, SQL Azure, Google App
Engine, MongoDB, Rackspace), although they were small and medium projects.
In addition to general migration activities, they have also explored other vital
52
3.2 Discussion with Cloud Engineers
their exposure to the Cloud computing environment, they are reliable and valu-
• Firstly, each participant were asked for their opinions on the taxonomy of
migration tasks. They could suggest to add more tasks, remove some tasks,
or re-categorize a task.
• Secondly, the structure of the CMP model was presented to the participants,
and they were asked to nominate a numeric value that they think would be
the best suitable for each parameter of the CMP model.
knowledge of other participants’ answers in the first round. A second round of dis-
cussion was conducted with each participant again, but this time with knowledge
53
3. RESEARCH METHODOLOGY
The value for each parameter of the CMP model was determined by averaging
all expert opinion values for that parameter. This set of values forms the initial
set of parameters for the CMP model, as presented in Chapter 5.
Data on migration effort and migration tasks of past Cloud migration projects
are vital elements of a validation process. Unlike data on development effort of
traditional software development projects, the data of interest do not exist on
immature and there is no related work on the migration effort to the Cloud. This
yields both advantages and challenges for our work at this stage. While we enjoy
the flexibility to explore different aspects of the Cloud migration topic, we are
challenged to collect real data ourselves for the validation purpose.
54
3.3 Survey Protocol
to validate our taxonomy of migration tasks and the CMP model with external
data points.
3.3.1 Objectives
The objective of the survey was to collect data on past migration projects to
Cloud for determining migration cost factors, including size, and examining their
relationships with the effort required for migration. Many organisations have
55
3. RESEARCH METHODOLOGY
∗ RQ1.6.4: How many packages were installed from source code and
binary files?
protocol?
task?
56
3.3 Survey Protocol
collected mainly via web surveys, and some additional interviews. We could not
conduct in-person interviews with many practitioners because of geographical
constraints. Hence, web survey approach was our main source of data collection.
The studied population included a project team from NICTA and a list of
individual practitioners who have migrated their systems to the Cloud. The team
from NICTA is different from our group. This team has migrated their system
to the Cloud to take advantage of the Cloud elasticity for their project. The
practitioners were identified from the Cloud community and online discussions,
such as: authors of Cloud scientific papers, and participants in Cloud events (e.g.,
CloudCamp). Interviews were conducted with the NICTA’s project team to gain
more insights and more detailed data, and surveys were sent to a list of identified
57
3. RESEARCH METHODOLOGY
the questionnaire to cover all CMP aspects that require information for validation,
and also to gain further insights on how the respondents have conducted their
migration to the Cloud. The questionnaire was run through 6 Cloud engineers
from our group (as already introduced in Section 3.2). In the discussion described
in Section 3.2 prior to this survey, each participant was asked to describe a Cloud
migration project and their time spent on each migration task of the project.
The questionnaire essentially asked for the same information. Answers from the
discussion and the questionnaire were then analysed and compared. I found
that the participants could correctly interpret the questions and answers were
almost the same for both the discussion and questionnaire. The biggest issue
of the questionnaire was that the participants were confused by questions that
were not relevant to their migration tasks. For example, participants who only
migrated their database to the Cloud were lost within the questions about code
modification because they didn’t modify any of their code. To address this issue,
we needed to create different branches of the survey, so that the respondents will
only be asked questions that are relevant to their migration tasks.
We evaluated different survey software and found that LimeSurvey1 best suits
our needs because if its features and pricing scheme. Surveys can be created with
different layers and branches. Incompleted survey responses can be saved for
later view and update. Different types of questions available in LimeSurvey are
sufficient for our needs. Also, we were charged based on the number of responses
rather than within a timeframe like other online survey software. This pricing
scheme suits our needs because we didn’t expect to receive thousands of responses
weekly or monthly.
1
http://limesurvey.org
58
3.3 Survey Protocol
Our survey was created with LimeSurvey. A link to the web survey was sent
via email to the list of practitioners, and responses were recorded by the web
survey once they finished. To ensure the response rate is adequate, a follow-up
The data collection process was done in over three months. First, we sent out
350 invitation emails to different target audience, including academic researchers,
59
3. RESEARCH METHODOLOGY
industrial groups and companies, and individual practitioners. We did not receive
replies from all recipients, but we received some very positive replies that they
were very interested in participating in our survey. We sent out 308 surveys to the
list of participants again, excluding 42 recipients who replied to our first invitation
email that they were not willing to participate or from whom we received out-of-
office auto replies and failed distribution emails. In this second round, we received
33 responses (around 10% response rate), but some of them were incomplete. For
example, some responses do not provide enough information to calculate CMP;
or some do not have information on total hours spent. The main reason for this
low responses rate is because most of the projects were done for exploration and
tutorial purposes; hence there were no detailed information recorded, especially
some information required for calculating CMP. Most responses could easily an-
swer general questions on why they migrated to the Cloud, or how they generally
did that, but most of them failed to provide sufficient information at the design
total of 19 data points. These data points come from responses that provided
sufficient information for CMP calculation. We discarded all responses that were
commented as “wide guess” by the project teams. 17 out of 19 data points are
small projects with around or less than 100 hours in total. Again, this is because
we targeted some individual practitioners, and their survey responses were all for
example migration projects. We tried to target large groups with larger-scope
The final dataset and the validation process are reported in Chapter 6.
60
3.4 Summary
3.4 Summary
In this chapter, we have described the process of undertaking this research. This
for the purpose of exploring Cloud migration tasks, building the taxonomy of
migration tasks, developing the CMP model for sizing migration projects, and
validating them.
61
3. RESEARCH METHODOLOGY
62
Chapter 4
“Our experience shows that not everything that is observable and mea-
surable is predictable, no matter how complete our past observations
Cloud migration tasks, hence, are defined as primitive units of our study.
63
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
of how migration projects are carried out, in the form of a list of potential mi-
gration tasks that might be involved in a Cloud migration project. We call this
a taxonomy of Cloud migration tasks. A taxonomy, as stated by Mens & Gorp
In this chapter, we will present the process of how migration tasks are ex-
tracted, and categorized into different groups to form the taxonomy. It is both
necessary and challenging to identify the taxonomy. The necessity is because this
will enable us to capture various critical aspects of the cost implications of a Cloud
The content of this chapter includes the sub-sections as follows: Section 4.1
describes how taxonomy is usually derived in other contexts. Section 4.2 shows
our approach to derive the taxonomy of Cloud migration tasks. We report on
our migration experiences with the breakdown of costs (in terms of effort) among
influential factors that impact on the cost of various migration tasks in Section
4.3. The taxonomy of Cloud migration tasks is then described in Section 4.4.
64
4.1 Taxonomy in other contexts
Section 4.5 validates the proposed taxonomy on one industrial migration project
conducted by our group, and also shows how the taxonomy can be applied in
real Cloud migration projects. Section 4.6 reflects on our approach, and on other
needs. The taxonomy was derived based on the discussions of a working group
on Language Engineering for Model-Driven Software Development, on the im-
(2009) was also obtained from a pool of existing sources. This is a taxonomy
65
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
randomly collected from three open source operating systems Linux, FreeBSD,
and OpenSolaris. The comments were categorized from different aspects, based
on the four basic questions: “what is in comments? whom the comments are
written for or written by? where the comments are? and when the comments
were written?”.
The taxonomy by Mehta et al. (2000) about software connectors was formed
from a classification of three atomic elements of software interactions. The tax-
onomy was proposed for the purpose of increasing the level of understanding of
fundamental blocks of software interactions, and how they interact together to
create more complex blocks. This work is the only one of the three that showed
are then systematically classified according to some concrete criteria. The valida-
tion of the taxonomy can then be achieved by showing its usefulness on another
system.
For our Cloud migration context, there are no existing pools of migration tasks
ready for the classification stage. As a result, we had to create a list of Cloud
experiment presented in a case study for the purpose of understanding the ac-
tual migration activities to PaaS and IaaS Clouds (SaaS Clouds are ignored as
66
4.2 Experiment Setup
migration.
The applications used in our experiments are .Net PetShop (Leake, 2006) and
its Java version - Java PetStore, as discussed in Chapter 3. The PetShop applica-
tion was migrated from the local server to Windows Azure and SQL Azure, and
Java PetStore was migrated to Amazon EC2 and SimpleDB. Different migration
In order to calculate the migration effort as an overhead cost over the original
development effort, we needed to have a figure of the initial development effort.
This development effort can be achieved in a conventional manner with Function
Point, given that all required information from the PetShop .Net application is
available.
Function Point Analysis (Albrecht & Gaffney, 1983) was applied on the fully
functional PetShop application to estimate its size complexity, which then can be
applied to estimate its development cost. We used this estimated development
cost and the recorded migration cost in our experiment on PetShop to calculate
the overhead cost of migration over development.
Based on the Function Point reference cards provided by IFPUG (2010), Pet-
67
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
and a total of 118 Adjusted Function Points (AF P s). Using similar settings
This section reports our observations in our experiments as described in the pre-
vious section. The observations and experiences in our study will provide a basis
for the taxonomy of Cloud migration tasks in Section 4.4.
When migrating PetShop to Windows Azure and SQL Azure database, some
• We have used the existing application PetShop which was not developed
by ourselves; hence, efforts were required to learn, understand, and get
• PetShop was developed on an older platform than the current version sup-
68
4.2 Experiment Setup
existing applications, since Cloud computing has just emerged recently and
is equipped with the latest technologies and tools, which may yield incom-
patibility issues. In particular, to deploy applications to Windows Azure,
• The same issue is applied for PetShop database. There are existing tools of-
fering database and data transfer from local servers to SQL Azure; however,
it requires SQL Server 2008 to be installed, while PetShop was designed to
work with SQL Server 2005 and cannot be installed directly on SQL Server
2008. We had to manually retrieve and run the database script on SQL
Server 2008.
method to achieve this; however, this method works with “Web application
project” only, while PetShop was created as a WebSite project, where there
is no project file and it relies on ASP.NET dynamic compilation to compile
pages and classes in the application. Effort was also spent on converting
tool cspack provided by Azure can also be used to create the package file.
69
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
Tasks Effort
(hours)
Install SQL Server 2005 and setup local en- 5.5
vironment in order to run the PetShop in-
stallation file
Get PetShop up and running properly 3.5
Install SQL Server 2008 to get PetShop 2
running with later technology
Migrate databases from SQL Server 2005 5
to SQL Server 2008 and modify PetShop
to work properly with SQL Server 2008
Install .Net 4 and modify PetShop to work 1.5
on Windows 7 and .Net 4
Test Petshop 5
Total 22.5
2, and SQL Server 2005. To enable PetShop run properly for the first time, these
prerequisites need to be installed. Data in Table 4.1 shows that most time of this
activity was spent on setting up the environment to allow PetShop to run. Data
in Table 4.2 shows that the most time spent on migration to the Cloud is for
overcoming the learning curve. No new features were introduced, and Windows
Azure provides similar platform to the one on which PetShop was developed on;
In our experiment, learning about the application and the Cloud environ-
cost. Experience required to deal with unforeseen issues also counted for major
additional cost. When the learning phase is finished, migrating similar types of
applications will require less efforts. Figure 4.1 shows the overhead cost for each
category of the migration tasks for PetShop which has the complexity of 118
70
4.2 Experiment Setup
Tasks Effort
(hours)
Windows Azure tutorials 6
Create Azure account and setup firewall 1.5
rules
Install and explore MS Azure Training Kit 5
Tutorials: migrating databases to SQL 4
Azure
Migrate PetShop database to SQL Azure 2
Modify PetShop to work with SQL Azure 4
Test PetShop on local servers against SQL 2
Azure
Modify and package PetShop to Windows 5.5
Azure
Deploy PetShop to Windows Azure 1.5
Test PetShop in Windows Azure with SQL 5
Azure
Total 36.5
71
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
Other additional issues were also observed as follow while considering Java
• Java PetStore was developed to work with JavaDB database, connected via
SimpleDB instead, since, at the time we carried out our experiment, there
was no JDBC driver written for SimpleDB. Writing a JDBC driver for
Those issues require additional effort in addition to our experiment with Win-
dows Azure. The additional effort mainly fell into the categories of installation
and code modification.
The measured data and observations presented above create the opportunity
for further classification and future work in identifying migration issues and effort
72
4.3 Migration Influential Cost Factors
The report on our migration experiences in Section 4.2.1 helped us identify some
influential cost factors that impact on the effort of the migration process. We
differentiate two types of cost factors: internal and external. These two types are
defined as below:
Internal cost factors involve with the migrating system itself. These factors
essentially refers to what migration tasks are required, how they can be achieved,
and determine their complexity, without knowledge of who is carrying out those
tasks and in which conditions those tasks are done. An example of internal cost
factors is: “database migration”, which consists of modifiying schemas, and trans-
ferring data from a local database to a Cloud database.
External cost factors concern with environmental factors that are specific to
each organization, such as: development team’s skills and expertise, or knowledge
of Cloud platforms and offerings. External cost factors determine how fast a
migration task can be completed. For example, a Cloud-experienced practitioner
These two types are very well aligned with the fundamental elements of the
Function Point approach. The internal cost factors are commonly identified first
to identify what needs to be done as well as to measure the complexity of a project.
The result will only refect on the characteristics of the project only, without
73
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
factors are then localized for each organization, and then applied on top of the
previous result to derive an estimation of the total effort required for this project.
Based on our observations from Section 4.2.1, the influential cost factors (both
internal and external) are identified as follows. Some factors are similar to tra-
ditional software development cost factors (Ruhe et al., 2003a; Madachy, 1997),
to case.
otherwise, more effort would be required to rewrite that library. For ex-
ample, PetStore Java uses JDBC driver to connect to its JavaDB database
and it also uses JPA, which depends heavily on advanced features of JDBC
74
4.3 Migration Influential Cost Factors
or Azure SQL requires less efforts than to a NoSQL database like SimpleDB,
because NoSQL database does not support full relational features, such as
Join operation. In the latter case, efforts are required to implement Join
operations or rewrite custom code for the application so that it would not
• Connection issues: In some Cloud migration cases, when only some com-
ponents of the system are migrated to the Cloud while the rest is kept in
house for various reasons (e.g. enterprises may wish to keep their sensitive
data in house), the connection between two parts of the system - one in
house and the other one in the Cloud - may face different issues such as
security and latency.
effort is required.
the project team possesses some levels of prior knowledge and experiences
of Cloud services and available tools, the learning curve can be improved
significantly, and hence less effort is required. As discussed in the previous
section, the learning curve is a one time task, but requires significant effort.
• Selecting the correct Cloud platforms and services (IaaS or PaaS): greatly
affects the effort and cost required for the rest of migration activities; how-
75
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
ever, this practice itself is not a trivial task. If the selected Cloud platform
Some of these influential cost factors are specific for migration to the Cloud,
because they are not applicable for a conventional migration project from one
platform to another. For example, a migration project from Java to .Net is a
migration process.
from getting familiar with the application and the selected Cloud platform, to
setting up the environment and the application ready for migration, as well as
modifying and testing to ensure the application properly functions in the Cloud.
Our distinction between internal and external factors suggest that the internal
76
4.4 Taxonomy of Migration Tasks
cost factors (i.e., migration tasks) will form the foundation of the taxonomy.
The list of internal cost factors introduced in Section 4.3, together with related
work from literature review and practitioners’ blogs, enables us to generalise
and propose a general taxonomy of migration tasks that any migration projects
may encounter, and the migration tasks are grouped under different categories
The diagram in Figure 4.2 shows the sequence in which Cloud migration tasks
from the taxonomy could be executed, and the possible iterations that may occur.
and Figure 4.2. The last three columns in Table 4.4 represent whether a specific
migration task is supported by examples from the discussion with Cloud engineers
in our group, or from the literature, or from the practitioners’ blogs.
nents and how they are coupled together, identifying which modules are
77
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
78
4.4 Taxonomy of Migration Tasks
be trivial for reasons such as: coding style by other developers may be dif-
ficult to study, confidentiality issues may mean that applications are not
be investigated thoroughly.
79
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
nection are more likely to be modified. Efforts spent on this part is directly
proportional to the complexity of the application. The more complicated
the application is, the more time and skills are required to understand it.
In our experiment, PetShop was measured as 118 Function Points and was
estimated to cost 177 hours for development effort. Its requirements and
There exist quite a few major Cloud providers in the market, providing
different services including PaaS and IaaS. Once Cloud services are evalu-
ated and selected, training on these services is necessary. Some Cloud ser-
vices may not fully support some features provided by similar on-premise
technologies, for example, SQL Azure is the most similar to SQL Server
compared to other Cloud databases, yet SQL Azure does not support dis-
tributed transactions as SQL Server does. In our experiment, effort was
spent on training with Windows Azure using the provided Microsoft Azure
Training Kit.
There have been great contributions from the Cloud community to sup-
port Cloud services that integrate seamlessly with existing technologies and
applications. Many open-source third-party libraries and tools have been
developed. Training on these libraries and tools is also a one-time task,
although it is not easy to select the appropriate libraries and tools without
knowledge about them beforehand. These tools can be categorized as: ad-
80
4.4 Taxonomy of Migration Tasks
for data migration (e.g. Codeplex for converting and uploading databases
to SQL Azure), and other utilities (e.g. Windows Azure provides cspack
utility to pack a web site project ready for migrating to Azure). In our
experience, before being aware of this cspack utility, much effort was spent
on transforming a Web Site into a Web Application, which are different in
server similar to its local requirements. If the target Cloud is PaaS Cloud,
81
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
Third-party tools: Effort is required for installing third-party tools for train-
ing purpose and migration tasks as mentioned above.
changed to connect to the new database server, in our experiment, the con-
nection is modified to use SQL Azure. However, more changes are required
and operations of the application, and this can also be categorized as Code
Modification. Even when two databases are the same type but different
versions, changes may also be required for syntax or schema. For example,
PetShop .Net version 4 was developed on SQL Server 2005 while SQL Azure
is only compatible with SQL Server 2008. There is no direct way to convert
PetShop database from SQL Server 2005 to SQL Azure without converting
priately to align with third party tools’ requirements for database migration.
Migrate the database: If previous tasks have been properly completed, the
effort required for this task is trivial and it is handled by the third party
tools. Otherwise, plans and actions for previous tasks must be revised.
Nevertheless, the size of the database also affects how fast this task can
82
4.4 Taxonomy of Migration Tasks
be achieved. The bigger the size of the database is, the longer it takes to
migrate. Although most of this time is waiting time and may not require
any extra effort, some effort may be necessary for dividing big databases
Code changes: if the selected Cloud platform provides similar services and
technologies to the application’s environment in house, not much code mod-
ification is required. This is the case for the combination of PetShop .Net
technologies are generally the latest ones, while the existing applications
may have been developed a few years previously. During that gap, tech-
nologies may have gone through many changes and updates. There may
not be a direct method to update from the old technologies to the latest
ones, meaning that more intermediate steps will be necessary. Also, Cloud
technologies may not provide full support for services and features offered
83
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
by local servers. Although SQL Azure is similar to SQL Server 2008, it does
not support distributed transactions, while SQL Server 2005 does, and Pet-
Shop .Net utilised this feature for its transactions. Code change is required
projects, where only a part of the system is migrated to the Cloud, while
security may also require extra attention. For full migration projects, where
the migrating system is ported as a whole, this category can be safely
skipped.
• Testing - This step is one of the most important and essential activities. It
happens during migration to ensure each of the previous steps is completed
correctly, and a full testing process needs to be carried out after migration.
If test cases have already been created for local servers, they can be reused
on Clouds to ensure the application works properly. More test cases specific
for Clouds may need to be considered. Testing needs to be done for each
of the actions taken; however, major milestones for testing can be grouped
as following:
The application can then be migrated to the selected Cloud platform, which
If using IaaS Clouds, developers can choose to ignore testing the application
84
4.5 Validation
as security levels and performance quality. Effort required for this task is
relatively large.
These categories are mutually exclusive since they cover different aspects of
a Cloud migration project; but on the other hand, they complement each other
and altogether provide a complete picture of migration to the Cloud. These
categorized migration tasks need to be carefully planned at the early stage of any
migration projects. Some tasks may be broken down into more detailed levels,
whereas some tasks may be skipped, depending on specific characteristics of each
project.
4.5 Validation
and the input from the literature and practitioners’ blogs have confirmed the
validity of the taxonomy to some extent. This section attempts to validate our
proposed taxonomy using one industrial migration project to the Cloud that was
conducted by two researchers in our group. This is a consulting project with a
large Australian Financial Service Organisation (FSO) who wish to migrate a part
of their system into the Cloud without any changes to their existing application
code. Although for the time being, the FSO has no plan to migrate the production
85
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
system into a Cloud computing platform, the main purpose of this migration
since the environment is re-activated often, but for a short period of time such as
a week, the cost and time of re-activating the development environment must be
small. Moreover, the licensing fee for the software that the FSO currently pays
is expected to be reduced by migrating to a pay-per-use paying model as well.
The steps taken in this FSO project are summarised as below:
integration between the migrated component in cloud and the existing en-
• Step 3 - Understand EC2 and its offerings in order to identify if there are
any compatibility issues: The tasks involved were to mirror the system en-
86
4.5 Validation
since EC2 provides infrastructure services and all installation and configu-
ration should be possible. However, the existing FSO system is currently
account, sign up for Amazon EC2, setting up Amazon EC2 command line
tools, setting up an Amazon Virtual Private Cloud (VPC) for security pur-
pose, getting EC2 instances, and finally adding disks to Windows instances
systems and middleware are pre-installed on the machine images, only some
additional components are installed at this step for the migrated system to
properly function, such as: IIS Server, and SQL Server.
87
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
was performed to ensure the various components of the systems were func-
tioning properly and to discover potential problems that might be due to
the migration to the AWS. Performance issues were discovered. The reason
behind this was because the network connection between the migrated com-
ponents and others was the bottleneck. Extra effort was spent on tuning
Table 4.4: Mapping of the FSO migration tasks and the taxonomy
The mapping described in Table 4.4 shows that the proposed taxonomy is
general enough to cover different types of migration tasks to the Cloud. However,
it can also be further broken down to better fit to specific migration tasks in more
details, such as how network connection and security are handled in step 5, where
Amazon VPC is set up, can be separated into a more detailed category than just
88
4.6 Reflection and Discussion
interest, and then classifying existing elements according to those criteria. In our
context of Cloud migration, the fundamental elements are migration tasks and
they have never been officially identified or organised into a collection. Therefore,
the taxonomy of Cloud migration tasks was derived mainly from our experience
of migrating PetShop .Net to Windows Azure, a PaaS type of Cloud. We also
considered the case of migrating Java PetStore to Amazon EC2, an IaaS type of
Cloud, in an attempt to add more richness to the taxonomy.
process, rather than just NICTA projects and participants. However, it was not
easy to locate an external migration project that covers all aspects to be validated
in the taxonomy. It was also not feasible to locate multiple external projects for
this stage’s validation, given that we also had to find data points for next phase.
As a result, the taxonomy proposed in this chapter is exposed to the threat of
external validity. Although the validation in Section 4.5 demonstrates that the
taxonomy can very well fit into a common case of Cloud migration, there is no
guarantee that the taxonomy can be sufficiently applied to every other migration
projects. This is because of the wide variety of Cloud migration project types,
such that, it is not possible to anticipate all migration tasks that could occur in
reality. The taxonomy can only cover general migration tasks that are likely to
occur in a common migration case.
89
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
However, the structure of the taxonomy is general and flexible enough so that
In this study, the assumption was that the Cloud target has been already
selected and its selection is outside the scope of a migration project. However, our
experiences show that major effort is required for selecting Cloud providers and
services. Also, for security reasons, large enterprises tend to keep sensitive data
and applications in their local data centers, and migrate only some components
The taxonomy is applicable for both PaaS and IaaS Clouds. Due to the
differences of PaaS and IaaS types of Cloud, effort required for each migration
task is also different. Table 4.5 below shows a side by side comparison of how
Table 4.5: Efforts comparison for migrating to PaaS and IaaS Clouds
90
4.6 Reflection and Discussion
significant learning effort for several reasons: the Cloud offers latest tech-
nologies that one may not be farmiliar with; new offerings and services are
rapidly created; and the Cloud has a broad community who contributes
with numerous third-party tools. The task in this category could take up a
huge amount of time for both IaaS and PaaS Clouds at the beginning.
the local server in IaaS Clouds requires significant effort compared to PaaS
Clouds. In PaaS Clouds, the environment is handled by Cloud providers.
and PaaS Clouds can be very hard if the local databases and the Cloud
database are different. This could require major effort.
PaaS Clouds.
• Network Connection - PaaS Clouds free their users from the burden of
Clouds. Therefore, PaaS Clouds’ users would not need to concern about
application network connections; whereas IaaS Clouds’ users are responsible
91
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
IaaS and PaaS, to make sure the application functions properly. This effort
depends on the complexity of the application.
Table 4.5 shows a side by side comparison of whether none, minor or major
effort is required for each migration category in PaaS versus IaaS Clouds. IaaS
environment require significant effort compared to PaaS Clouds. Also, both PaaS
and IaaS Clouds require significant learning effort and testing.
Effort required for a migration project to a Cloud platform, either PaaS or
IaaS type of Cloud, depends on various factors as illustrated above. The study
in this chapter enables us to understand these influential aspects and forms a
background for us in our next step of quantifying Cloud migration tasks in the
next chapter.
4.7 Summary
local servers to Cloud platforms is a one-time task and may seem straightforward
at first. However, our experience showed that this process is not automatic and
92
4.7 Summary
a system to the Cloud, which provides the basis for understanding the cost im-
plications of a Cloud migration project. A taxonomy of migration tasks has been
developed and tailored specifically for our Cloud migration context, and applied
to one validation project using different strategies. It will be used as input into
our size measurement model for migration projects to the Cloud in the following
chapter.
The taxonomy consists of six main categories, namely: Training and Learn-
ing, Installation and Configuration, Database Migration, Code Modifications,
Network Connection and Testing. These categories resulted from the internal
cost factors that were identified from our experiment. We have also identified
external cost factors, which are environmental aspects of organizations intending
to conduct migration projects. While the taxonomy and the internal cost factors
indicate what migration tasks are required, and how those tasks are completed;
the external cost factors determine how fast those tasks can be achieved.
93
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
94
Chapter 5
∼ Bill Gates.
The taxonomy of Cloud migration tasks outlined in the previous chapter helps
Cloud consumers to form their migration plans. A Cloud migration project con-
sists of a list of migration tasks from the taxonomy. As a result, the amount of
effort required for a migration project to Cloud is accumulated from the effort
spent on each migration activity or migration task.
In this chapter, we introduce our Cloud Migration Point (CMP) model and
how it can further assist the Cloud consumers in estimating the size of those
migration tasks in their plans, which will facilitate the prediction of the amount
of effort required. We also describe the counting method of the CMP model,
illustrated with examples to help practitioners apply it easily.
95
5. CLOUD MIGRATION POINT
consider not only the system to be migrated, but also the migration process
where various aspects of the system, besides lines of code, are involved. For
these reasons, Function Point (FP) approach, which has been seen a successful
foundation for many extensions, is more suitable as a basis for the CMP model
than SLOC. We have determined to develop the CMP model by using the well-
the Class Point (Costagliola et al., 2005) context, and a migration task in
CMP context) into different pre-defined categories
2. Then for each unit, evaluate its complexity level (Low, Average, or High)
Apart from the FP methodology, the CMP model is also developed on the
basis of the taxonomy presented in Chapter 4. Each category from the taxonomy
will be carefully analyzed and selected to be a CMP component. This will be
Sub-sections of this chapter will explain and cover different aspects of the CMP
model, and are arranged as follows: Section 5.1 states the underlying assumptions
of the CMP model. Section 5.2 analyzes the cost factors from the taxonomy
96
5.1 CMP Assumptions
characteristics. The purpose of this Section is to, later on, show that the CMP
model can be applied to different migration project types. Section 5.4 describes
our CMP metric and its counting process. Section 5.5 demonstrates how CMP
This section will explain the CMP model’s alignment to the broader scope of our
work (presented in Section 1.4). Some specific assumptions for the CMP model
• We consider migration cases between two data centers only (typically, one
in-house and one in-Cloud). In the case where two or more data centers are
involved, CMP can be applied repeatedly for each pair of data centers.
• Our work only focuses on PaaS and IaaS Clouds. Hence, CMP considers
only IaaS and PaaS, although some parts of our cost model might still be
• We assume that the decision on the Cloud target is not a part of the mi-
gration process. CMP estimates the complexity of migrating to a specific
97
5. CLOUD MIGRATION POINT
Cloud technologies/providers, and the need to get familiar with the specific
Cloud technology and offering.
• We assume that the design decision for the migration has been made, such as
nents stay in the local data centre, which pieces of code require modification
for the Cloud environment, which network connections to be modified and
what requirements must be satisfied. CMP requires inputs from the design
phase and is most appropriate to apply before the implementation phase of
a migration.
• CMP takes it for granted that all migration tasks have already been out-
lined. CMP measures the size and complexity of migration tasks, hence
migration tasks must be outlined in advance (i.e., the migration plan has
sufficiently completed).
The above presented items form the scope and assumptions of the CMP model
in this chapter.
projects, namely internal and external cost factors. Internal cost factors refer
to what migration tasks are required, how they can be achieved, and determine
their complexity, without knowledge of who is carrying out those tasks and in
which conditions those tasks are done. External cost factors are concerned with
98
5.2 Cloud Migration Cost Factors
environmental factors that are specific to each organization, such as: develop-
ment team’s skills and expertise, or knowledge on Cloud platforms and offerings.
External cost factors determine how fast a migration task can be completed.
The CMP model aims at sizing Cloud migration projects. In other words, the
CMP model will measure the size of all migration tasks involved in a migration
project. As a result, our CMP model focuses only on the internal cost factors and
The internal cost factors essentially equate to the taxonomy of Cloud migration
tasks.
The CMP model measures the accumulated size of all migration tasks making
the Cloud experience of developers and their learning abilities, which are
external cost factors. Although this category contributes significantly to the
total effort required, we exclude this category from the scope of the CMP
99
5. CLOUD MIGRATION POINT
well. Effort is required to integrate the new libraries with the application
after migration. Hence, the tasks in this category should be included in the
CMP model.
100
5.2 Cloud Migration Cost Factors
Cloud). Network conditions in the Cloud (even for LAN) may be different
to the original environment. In all cases, the connection is changed and
effort is required to ensure security and performance are optimal. The
CMP model will also take these tasks into account.
requirements, methodology, and test cases for the Cloud migration context
are not different from the traditional software development. Other size
metrics for traditional software development do not take these testing tasks
into their measurement; similarly, this category is excluded from the CMP
model.
From the above analysis, the CMP model is determined to include 4 main
101
5. CLOUD MIGRATION POINT
The cost factors identified in Section 5.2 do not apply to all components of the
system, but only to those components that have been affected by the migration.
We classify components involved in a migration into four different categories:
existing component: either migrated to the Cloud or kept in-house. For the
former option, if the component is migrated to the Cloud without any changes, it
belongs to the Migrated category. If it is migrated to the Cloud and then modified,
it can be considered as a Removed component and a newly Added component.
For the latter option of the component being kept in-house, if nothing changes,
project. The definitions of these two concepts have been clearly defined in Chap-
ter 1, Section 1.2 It is repeated here for convenience purpose: A migrating system
is the system to be migrated to the Cloud, and is defined as a set of components
102
5.3 Cloud Migration Project Classification
required for the system to function properly, such as: third-party libraries or mid-
We classify a migration project by, first, denoting its migrating system’ states
in a local data center and in cloud before and after the migration as summarized
in Table 5.1.
Local Remote
Before Migration L R
After Migration L R
Table 5.1 depicts the components that present at each of the states, with the
rows dividing the components temporally and columns dividing the components
spatially. The set of components at each of the states are denoted by L = ∅,
R, L and R . Note that, the same component may appear in different rows but
they cannot appear twice in the same row (i.e. a component cannot appear both
in-house and in-cloud at the same time). Hence, L and R are disjoint sets, and
similarly, L and R are also disjoint. The allocation of components to each state
can be determined using the design documents.
103
5. CLOUD MIGRATION POINT
the cloud. These components are reused with or without modifications. For
example, third-party libraries, database servers, or system software that are
moved to cloud (i.e., effort involved for installation, configuration, and in-
tegration with the rest of the system); application’s code (i.e., effort needed
for moving and changing code); and database (i.e., efforts required for data
quired.
In addition to the above, there is also the category of Added components ((L ∪
R ) \ (L ∪ R)), which are components added to the system as part of the migra-
tion, such as: new libraries in cloud, newly added code for extra functionality,
or integrating new middlewares. For example, when a library is not fitted for
in-house and is modified to interact with a component that has been migrated
104
5.3 Cloud Migration Project Classification
to the Cloud, it can be categorized as removing the old component and adding a
new component.
Proof 3 It suffices to show that (1) M ∪ R ∪ U = L, and that (2) the collection
{M, R, U} is pairwise disjoint.
For (1),
M ∪ R ∪ U ≡ (L ∩ R ) ∪ (L \ (L ∪ R )) ∪ (L ∩ L )
≡ (L ∩ (L ∪ R )) ∪ (L \ (L ∪ R )) ≡ L.
(i)
M ∩ R ≡ (L ∩ R ) ∩ (L \ (L ∪ R ))
≡ (L ∩ R ) ∩ (L ∩ (¬L ∩ ¬R )) ≡ ∅;
(ii)
M ∩ U ≡ (L ∩ R ) ∩ (L ∩ L ) ≡ (L ∩ (L ∩ R ))
≡ L ∩ ∅ ≡ ∅.
(iii)
R ∩ U ≡ (L \ (L ∪ R )) ∩ (L ∩ L )
≡ (L ∩ (¬L ∩ ¬R )) ∩ (L ∩ L ) ≡ ∅.
105
5. CLOUD MIGRATION POINT
The effort associated with each of the categories defined above are carefully
with no changes. Each component here can be a piece of code in the application,
or a database, or a third-party software to enable the whole system to function
properly. Extra effort may also be required to ensure these components work
together seamlessly.
Regardless what type a migration project is, the effort required for the whole
project still aligns with the CMP components defined in Section 5.2.
The CMP metric consists of 4 main components (and each component is a set
of related migration tasks):
system.
106
5.4 Cloud Migration Point
migration tasks.
approach. Particularly:
• Firstly, each migration task in each CMP component is identified and classi-
fied into a pre-defined sub-category. These sub-categories will be discussed
• At this last stage, each migration task has already been classified into a
specific type and has been evaluated with a complexity level in the first two
steps. A weighted value will then be assigned for each task accordingly.
Finally, the total value of this CMP component is the sum of weighted
values of all migration tasks in this component.
Then, the final CMP value is calculated as a weighted sum of its four compo-
nents CMPconn , CMPcode , CMPic , and CMPdb , which measure size of migration
tasks related to connection changes, code changes, installation and configuration,
and database changes, respectively. In this section, we delve further into the
The weighted values assigned for each migration tasks in the third step are
initially derived from our discussion with a group of Cloud engineers, who have
carried out different types of Cloud migration projects themselves. These values
107
5. CLOUD MIGRATION POINT
will be calibrated further in Chapter 6 with more empirical data. In this chapter
CMPconn assesses all migration tasks related to network connections and evaluates
and require effort to optimize performance are identified and classified into three
types:
and minor effort are expected to ensure that security and performance are
preserved.
of the connection is migrated to the Cloud while the other end B stays in-
of the system is already in the Cloud, i.e., R = ∅. Before the migration, this
108
5.4 Cloud Migration Point
is a WAN connection with one end A in local data center (i.e., A ∈ L) and
the other end B in the Cloud (i.e., B ∈ R). After the migration, both ends A
and B are in the Cloud (i.e., A ∈ L ∩ R and B ∈ R ∩ R ). The connection
becomes a LAN connection in the Cloud environment. Migration tasks
related to this type are to undo all security and performance tasks applied
Second, the complexity level (Low, Average, or High) of all migration tasks
involved in each connection is evaluated based on its requirements for security
and protocol optimization using Table 5.2. We identify these two dimensions:
from the previous Chapter, and close study of many Cloud practitioners’ blogs
and discussions.
Protocol Security
Optimization Required Not Required
Required High Average
Not Required Average Low
Lastly, a weighted value is assigned for each connection, based on its type
identified from the first step and its complexity level evaluated from the second
step, using Table 5.3. For example, if a connection is of LAN-to-WAN type and
of High complexity level (i.e., it requires effort for both security and protocol
optimization), its associated weight value would be 9. Values in Table 5.2 and
109
5. CLOUD MIGRATION POINT
5.3 are defined from our discussion with a group of cloud engineers involved in
The value of CMPconn is defined as the weighted sum of all identified connec-
tions:
2
2
CM Pconn = xij × wij
i=0 j=0
where xij is the number of connections type i with complexity level j, and wij
is the weighted value for connection type i and complexity level j.
CMPcode assesses any migration tasks relating to code changes. These tasks can
vary from adding new functionality, removing unnecessary code, to modifying
code to use new databases or integrate with new libraries. CMPcode is inherited
from Class Point (Costagliola et al., 2005) but with modifications to adapt to code
changes rather than adding new functionality. Similar to CMPconn , CMPcode also
follows FP’s three-step approach.
First, all classes in application code that require modification efforts are iden-
tified and classified into four types as defined in Class Point (Costagliola et al.,
110
5.4 Cloud Migration Point
2005):
and retrieval.
• Task Management Type (TMT): classes that are responsible for definition
Identify:
Before changing code After changing code
A - a set of attributes A - a set of attributes
M - a set of public methods M - a set of public methods
S - a set of services re- S - a set of services re-
quested from other classes quested from other classes
Derive:
|A \ A | : number of attributes removed
|A \ A| : number of attributes added
|M \ M | : number of methods removed
|M \ M | : number of methods added
|S \ S | : number of requested services removed
|S \ S| : number of requested services added
Define the changes:
CA = |A \ A | × 0.2 + |A \ A| : changes in attributes
CM = |M \ M | × 0.2 + |M \ M | : changes in methods
CS = |S \ S | × 0.2 + |S \ S| : changes in services requested
111
5. CLOUD MIGRATION POINT
methods (CM ), and services requested from other classes (CS), are evaluated.
These changes are made of the number of elements to be removed and added by
system are identified both before and after code change (e.g., A and A are sets
of attributes before and after the migration, respectively). This information is
already available after the design phase of the development cycle, where all design
the differences between its sets before and after the migration(e.g., |A \ A | and
|A \A| are the number of attributes to be removed and added, respectively). The
final values CA, CM , and CS are determined by applying a factor of 0.2 and 1
on removing and adding tasks, respectively (e.g., , CA = |A \ A | × 0.2 + |A \ A|).
These factors were suggested by Niessink and Vliet (Niessink & Vliet, 1997) since
a removing task also requires effort although not as significant as an adding task.
One element which no longer contributes towards a system’s functionality is better
circumstances happen when a class is newly added, i.e., there are no existing sets
CS = |S |, which are the number of methods and services requested in the new
112
5.4 Cloud Migration Point
class. These three values are similar to Class Point for sizing a new class for
development effort. In other words, CA, CM and CS are also valid for capturing
newly added code.
These three dimensions form the basis to evaluate each changed class’s com-
plexity level as in Table 5.5. The complexity level indicators are inherited from
Class Point.
Changes in CA
CA)
Changes in Attributes (CA
CM )
methods (CM 0−5 6−9 ≥ 10
0−4 Low Low Average
5−8 Low Average High
≥9 Average High High
(a) Changes in services requested (CS): 0 − 2
Changes in CA
CA)
Changes in Attributes (CA
CM )
methods (CM 0−4 5−8 ≥9
0−3 Low Low Average
4−7 Low Average High
≥8 Average High High
(b) Changes in services requested (CS): 3 − 4
Changes in CA
CA)
Changes in Attributes (CA
CM )
methods (CM 0−3 4−7 ≥8
0−2 Low Low Average
3−6 Low Average High
≥7 Average High High
(c) Changes in services requested (CS): ≥ 5
Lastly, a weighted value is assigned for each changed class based on its type
identified from the first step and its complexity level evaluated from the second
step. These weights are also adopted from Class Point (shown in Table 5.6).
113
5. CLOUD MIGRATION POINT
3
2
CM Pcode = xij × wij
i=0 j=0
where xij is the number of classes of type i with complexity level j, and wij
is the weighted value for class type i and complexity level j.
CMPcode is analogous to Class Point in the sense that it also assesses a class’
attributes, public methods, and services requested from other classes. However,
into account both adding and removing tasks. Nevertheless, its validity still holds
when it comes to adding an entirely new class, in which case its counting approach
is exactly the same as Class Point, as shown above. As a result, all complexity
levels and weighted values can be sufficiently inherited from Class Point.
CMPic assesses all migration tasks related to Installation and Configuration (IC),
114
5.4 Cloud Migration Point
CMP.
First, all required installation and configuration tasks are identified and clas-
belong to this type, for example, setting up EC2 instance or image, installing
the operating system and middleware, or installing database server.
• Application level: this type consists of any third-party libraries that the
application requires, for example, JDBC drivers for databases. When an
– (1) Rewrite the library from scratch for the Cloud environment - This
is seen by CMP as adding new code into the system and is sufficiently
captured by CMPcode . Hence, the migration tasks related to this option
are excluded from CMPic .
– (2) Reuse a similar library (if one exists) in the Cloud environment,
with the new library seamlessly - The migration tasks involved in this
option are integrating the new library into the system, which will be
assessed by CMPic , and changing code, which is assessed by CMPcode
and excluded from CMPic . If the libraries are available in the Cloud
115
5. CLOUD MIGRATION POINT
of configuration steps required and the installation methods (from binary files
or source code) as in Table 5.7. Installation and Configuration usually go to-
gether for each package or software, for example, when java is installed, the
JAVA HOME variable needs to be set accordingly; or when MySQL is installed
in an Ubuntu EC2 instance, it is not accessible from outside the instance by de-
fault, hence reconfiguration for accessibility is required. Therefore, Installation
and Configuration tasks are evaluated together based on the following criteria:
• Configuration: for each installation, how many configuration steps are re-
quired?
Installation
Configuration
No installation Package Source Code
<2 Low Low Average
2−5 Low Average High
≥6 Average High High
For example, the IC task of installing MySQL from an installation file and
Finally, each IC task is assigned with a weighted value as in Table 5.8 based
on its type from the first step and its complexity level from the second one. This
last step is necessary because an IC task at the Application level requires different
amount of effort from the same complexity IC task at the Infrastructure level.
116
5.4 Cloud Migration Point
1
2
CM Pic = xij × wij
i=0 j=0
where xij is the number of IC tasks of type i with complexity level j, and wij
is the weighted value for IC task type i and complexity level j.
CMPdb assesses all migration tasks related to modifying queries and populating
data to new databases, excluding database server installation tasks and any code
changes required which have been covered by CMPic and CMPcode , respectively.
Since the effort required for each query modification task or data population task
First, all database related tasks are identified and classified into two types:
• Data population task: Data in each table must be packaged and loaded into
117
5. CLOUD MIGRATION POINT
between the database of the local data center and the database in cloud: same
type of relational database, same type of relational database but different versions,
Finally, CMPdb is determined by the number of database tasks and for each
database task its associated weight as in Table 5.10.
Complexity Level
Type Total
Low Average High
Query Modification ... × 1 = ... ... × 3 = ... ... × 8 = ... ...
Data Population ... × 3 = ... ... × 4 = ... ... × 10 = ... ...
CMPdb ...
1
2
CM Pdb = xij × wij
i=0 j=0
where xij is the number of database tasks of type i (i.e., the number of queries
and wij is the weighted value for database task type i and complexity level j.
118
5.5 CMP Application
5.4.5 CMP
The final value of CMP is determined as a weighted sum of its four components
3
CM P = CM Pi × wi
i=0
where CMPi is the value of CMP type i, and wi is the weighted value for CMP
type i (as shown in Table 5.11).
Conclusion:
In this section, we have presented the CMP model and its counting method
for sizing a Cloud migration project. The greater the CMP value is, the more
complicated the project is, and the more effort is required.
This section will demonstrate how CMP can be applied to size a Cloud migration
project in practice. In this section, we use the example of PetShop .Net that
has been described in the previous chapter (Section 4.2). For the convenience
of referencing, we summarize here again our experiment process of migrating
119
5. CLOUD MIGRATION POINT
• We have used the existing application PetShop which was not developed by
• PetShop was developed on an older platform than the current version sup-
• The same issue is applied for PetShop database. There are existing tools of-
fering database and data transfer from local servers to SQL Azure; however,
it requires SQL Server 2008 to be installed, while PetShop was designed to
work with SQL Server 2005 and cannot be installed directly on SQL Server
2008. We had to manually retrieve and run the database script on SQL
Server 2008.
to create a package file and a configuration file from the existing source code.
Azure plugin for Visual Studio provides a quite straightforward method to
achieve this; however, this method works with “Web application project”
and classes in the application. Effort was also spent on converting WebSite
project to Web Application project. Alternatively, the utility tool cspack
120
5.5 CMP Application
Our experiment with PetShop .Net includes tasks to enable the application
work on local machine prior to migration. This is out of scope of our Cloud
migration project as outlined in Sections 1.4 and 5.1. The starting point of the
migration project is defined when PetShop .Net is already running in the local
machine and is ready to be migrated, and the ending point of the migration
project is when PetShop has been all moved to Windows Azure together with its
database. CMP model for sizing a migration project to the Cloud only consider
migration tasks within the scope of the defined migration project. Hence, we
exclude all tasks to understand the application’s source code and operations, or
to install packages to enable the application to work on local machines.
As a result, migration tasks for PetShop can be selected and categorized into
CM Pconn = 0
Azure. Also SQL Azure does not support distributed transactions, which
PetShop utilised this feature for its transactions. Hence we need to modify
code to accommodate this compatibility. The changes in code are reported
in Table 5.12. The weight values in Table 5.12 are referenced from Table
5.6.
121
5. CLOUD MIGRATION POINT
CM Pcode = (1 × 5) + (2 × 8) + (1 × 9) = 30
Azure, and Windows Azure Tools for Visual Studio to create package file
and configuration file from PetShop source code, so that it can be deployed
All installation tasks are reported in Table 5.13. The weight values in Table
CM Pcode = (1 × 1) + (4 × 3) = 13
122
5.5 CMP Application
• CMPdb : PetShop database is SQL Server 2005, while SQL Azure requires
a database in SQL Server 2008. This migration is considered as same rela-
tional database type with different version. Based on Table 5.9, the com-
Azure, including dumping the old database and restoring it to the new
database.
All database-related tasks are reported in Table 5.14. The weight values in
CM Pdb = (5 × 3) + (2 × 4) = 23
123
5. CLOUD MIGRATION POINT
The value of those four components are summarize in Table 5.15 together with
Conclusion:
This Section has demonstrated how the CMP counting process can be applied
for sizing a Cloud migration project within its scope.
In this section, we will reflect on our process of developing the CMP model. The
discussion will evolve around the structure and methodology of the CMP model.
Other discussion on its validity will be saved for chapter 6 about Validation.
The model has been developed through a few iterations. The model presented
in this chapter is the most basic version, which can be used as a foundation for
There are 37 tunable parameters in the model (reflected in Tables 5.3, 5.6, 5.8,
124
5.6 Reflection and Discussion
5.10, and 5.11). In this basic version, the initial values for these parameters were
derived from our discussion with a group of Cloud engineers, who have conducted
some migration projects to the Cloud. Individual discussion was carried out with
each Cloud engineer to determine the value of each parameter. We then derived
the average value from all discussion for each parameter. We employed the expert
judgement approach for the parameter values at this stage because of the lack of
past projects of migration to Cloud. The only data points we had at that stage
were from the migration exercises and projects conducted by our group.
We took a further step to improve our model by, firstly, looking for more
data points. Survey and interviews were conducted with academic and industrial
practitioners, which will be described in more details in the next chapter. These
data points are more general and of larger scope than the initial ones. The data
collection and tuning process will be discussed further in Chapter 6.
model, it only follows the three-step approach of FP. As a result, CMP is also
affected by some limitations of FP as already being criticized (Lokan, 1998; Low
& Jeffery, 1990; Symons, 1988; Matson et al., 1994; Kitchenham, 1997), such as:
Classification of all system component types’ complexity as low, average, or high,
from three intervals (low, average, and high) into five intermediate subintervals.
125
5. CLOUD MIGRATION POINT
ing FP (and similarly for CMP). For example, a system component containing
over 100 data elements is given at most twice the function points of a component
with one data element. Similarly, CMP suffers from the same problem as FP,
with one configuration step. However, compared to FP and other extensions, this
limitation is less problematic for CMP, since there are normally many migration
tasks with few steps in each task.
The choice of weights has been derived from the expert judgement method
and, in the next chapter, tuned using a set of projects from external sources, but
it is also reasonable to ask if it will be valid in all circumstances. The threats of
validity discussion will be covered in Chapter 6.
The current CMP model only considers internal factors, but not external
factors. Internal factors are to ensure all necessary migration tasks are counted;
while external factors are to adjust and assess the complexity of the migration
tasks to each organization. Further work is scheduled to explore external factors
as well. The challenge with external factors is it is very difficult to identify a
whether they are the right factors, or whether the list is complete, and how to
The CMP model measures the size of a migration project from a local data
the CMP model was developed without any constraint on L. In other words, the
126
5.7 Summary
characteristic of CMP enables the measurement to expand beyond just two data
centers. When there are more than two data centers (either from local to cloud, or
vice versa) involved in the migration process, the CMP model can be repeatedly
applied for each pair at a time, and this can be repeated until all migration tasks
are considered.
5.7 Summary
more suitable for Cloud migration projects than other existing size metrics in
the literature since it captures special aspects of the Cloud migration context,
as discussed in section 5.2. Moreover, CMP emphasises the distinct features of
the Cloud migration, as distinct from migrating between two local data centres,
for example, Cloud users (or developers) do not possess full control over the
range of actions for each migration task. Therefore, the CMP model takes into
consideration Cloud-specific dependencies for each migration task, for example,
only security and protocol optimisation are assessed for each connection task, and
database tasks are concerned with migrating from relational to NoSQL databases,
and so on.
In a project development cycle, the CMP model fits well before the implemen-
tation phase and after the design phase. One important assumption for CMP is
all design decisions have been made. These design decisions have direct impact
on how CMP is counted, since they define all anticipated migration tasks. The
127
5. CLOUD MIGRATION POINT
CMP counting process itself does not require much training and effort; however,
its accuracy relies on the sufficiency and granularity of the migration task list.
Therefore, it is important to carefully analyse the list of expected migration tasks
to ensure it captures the Cloud migration aspects adequately and with as much
detail as possible.
128
Chapter 6
Validation
country trip in a car without a fuel gauge. You can make calculated
guesses and assumptions based on experience and observations, but
without hard data, conclusions are based on insufficient evidence.”
∼ Mikel Harry.
et al., 2005). It is widely accepted that there are two types of validation required
for software metrics, namely theoretical validation and empirical validation. The
129
6. VALIDATION
The CMP metric, similar to FP, incorporates both size and complexity con-
and complexity metrics, which focus on products. There exists no set of proper-
ties for both product and process sizing metrics yet ; hence, the set of criteria for
migration context. Data on past Cloud migration projects must be available for
this purpose, including what tasks have been carried out and how much time has
been spent on those tasks. However, there exists no public repositories for such
data unlike traditional software development projects. As a result, a survey has
been conducted at this stage to collect relevant data. More details on this will
be presented later in the sub-sections of this chapter. Also, in this chapter, two
terms CMP weights and CMP parameters will be used interchangeable, and they
both mean the weighted values of each CMP component and their elements as
presented in Chapter 5.
a set of criteria proposed for product sizing metrics. The empirical validation is
divided into three phases. Section 6.2 describes the first phase of the empirical
validation, where the CMP model is evaluated on the initial set of 6 migration
projects conducted by our group. This section also states the evaluation criteria
130
6.1 Theoretical Validation
and the approach we follow for the empirical validation purpose. The result of
this phase 1 validation shows that CMP is potentially an indicator for Cloud
migration effort estimation. However, more data from external organizations are
necessary to demonstrate that CMP is also externally valid. Section 6.3 presents
the final dataset we obtained from conducting a survey. A similar empirical
validation is performed again on CMP using the new dataset, called Empirical
Validation Phase 2, and is presented in Section 6.4. The result shows that the
parameters (or weights) of CMP need further calibration. Hence, Section 6.5
demonstrates the process of calibrating the CMP weights. In this section, we
also state a list of assumptions made for developing the model, and test their
plausibility using the available data from the survey. This list of assumptions
demonstrate the high complexity and difficulty of validating the metric. Section
6.6 illustrates the Empirical Validation Phase 3, where CMP with the calibrated
weights is validated on the new dataset. The result shows that the calibration
improves the performance of the CMP model significantly, and the model can
be used as a predictor for effort estimation of the Cloud migration. Section 6.7
discusses the threats of validity of the model. Lastly, Section 6.8 summarizes and
some software measurement concepts, such as size and complexity. The frame-
work provides different sets of convenient and intuitive properties which are used
131
6. VALIDATION
(Briand et al., 1996), since CMP is a sizing metric developed to measure the size
of migration projects.
Three properties for a size metric proposed by Briand et al. (1996) are: Non-
negativity, Null Value, and Module Additivity. These properties are formalized
as:
Size(S) ≥ 0
Proof 1 Size(S) is the CMP value of the migration project S. CMP is ob-
tained as a weighted sum of its four components, which in turn are weighted
132
6.1 Theoretical Validation
if E is empty:
E = ∅ ⇒ Size(S) = 0
to it. The final value of CMP is the sum of all the weights of the migration
If E is divided into two disjoint subsets Em1 and Em2 , with no loss of
133
6. VALIDATION
generality, Em1 and Em2 can be represented as: Em1 = {e0 , e1 , ..., ek−1 } and
Applying the same process of determining CMP, the values CM Pm1 and
CM Pm2 of these two subsets of migration tasks Em1 and Em2 are: CM Pm1 =
k−1 n−1
i=0 wi and CM P = i=k wi .
As a result,
k−1
n−1
CM Pm1 + CM Pm2 = wi + wi
i=0 i=k
n−1
= wi = CM P
i=0
We have shown that CMP satisfies all three necessary conditions of a size
measurement proposed by Briand et al. (1996). However, an empirical validation
is divided into three phases. This Phase 1 will evaluate the CMP model with
its initial set of weights as presented in Chapter 5 using our initial set of 6
Cloud migration projects. Because of the limited number of data points publicly
134
6.2 Empirical Validation - Phase 1
available, the data we use in this first phase of the empirical validation is extracted
Although the validity of these data points has not been verified externally with
other research projects, they are suitable for this empirical validation because:
happen in reality.
3. The uniformity of these projects are ensured, because they were carried
out by the same team. Therefore, the external cost factors as discussed in
Section 4.3 have minimal impact on these data points. This is suitable for
validating the CMP model since we focus on internal cost factors only.
In this section, we also state the evaluation criteria and the approach we follow
for the purpose of empirical validation. These are also applied for the other two
phases.
The details of each migration task are used to calculate the size of the migration
project to the Cloud, using the CMP model. Regression analysis will be used to
determine the relationship between the size of a Cloud migration project and the
effort required.
135
6. VALIDATION
|AE − P E|
M RE =
AE
M RE
M M RE =
n
k
P RED(l) =
n
1983) to examine the relationship between CMP values and the actual effort
for migrating a system to cloud. This approach is the same as a k-fold cross-
136
6.2 Empirical Validation - Phase 1
validation, in which k is equal to the number of data points. The k-fold cross-
validation has been successfully used to validate cost estimation models in the
literature, and is especially recommended for small data sets (Briand et al., 1999;
Costagliola et al., 2005). In the leave-one-out cross validation, each single data
point is used as the validation data, whereas the remaining data are used as
training sets. This is repeated until each data point is used once as the validation
data.
Table 6.1 shows the data points extracted from our six projects. For project 1,
the majority of the effort was spent on securing and optimizing WAN connection.
No Effort(hours) CMP
1 45 504
2 4 60
3 6 95
4 9 149
5 32 337
6 51 645
phase, we performed six rounds of validation. Each round uses five projects as
the training set, and one project is left out as the validation set. Descriptive
137
6. VALIDATION
statistics were computed for each training set, based on which the boxplot and
outliers of each set were analysed. Figure 6.1 shows that there are no outliers in
the training sets of the six validation rounds which may biasedly influence the
Figure 6.1: The boxplots for the six training datasets of variable CMP
The scatter plots in Figure 6.2 show a positive linear relationship between
CMP and Effort (in hours) of each training set. As a result, an Ordinary Least-
Squares (OLS) regression analysis is then applied on each training set to derive
the equation of the trend line, which can be used as a prediction model for effort
required in hours.
138
6.2 Empirical Validation - Phase 1
the reliability of the predictor. If t-value > 1.5, it shows that CMP is a potential
predictor of effort. The results of R2 , t-value, and p-value of the coefficients and
the intercepts of all six validation rounds are summarized in Table 6.2. (Note
that the p = 0.05 critical value of the t-test with 3 degrees of freedom is 3.18 for
a two-sided test and 2.35 for a one-sided test, and coefficients are expected to be
The result suggests that the coefficients of the models are statistically signifi-
of R2 and all the coefficients pass the significant test. In other words, the OLS
regression analysis results still shows a strong linear relationship between CMP
139
6. VALIDATION
ID Coefficient Intercept
Value t-value p-value Value t-value p-value
0.0869 10.504 0.002 −1.4664 −0.4399 0.6898
1 R2 = 0.9736
Effort = 0.0869 × CMP − 1.4664
0.0858 10.958 0.002 −0.8785 −0.2789 0.7984
2 R2 = 0.9756
Effort = 0.0858 × CMP − 0.8785
0.0849 12.507 0.001 −0.2669 −0.0985 0.9277
3 R2 = 0.9812
Effort = 0.0849 × CMP − 0.2669
0.086 16.339 0.000 −1.9929 −1.0084 0.3876
4 R2 = 0.9889
Effort = 0.086 × CMP − 1.9929
0.0839 12.069 0.001 −1.1697 −0.501 0.6508
5 R2 = 0.9798
Effort = 0.0839 × CMP − 1.1697
0.0972 17.112 0.000 −3.0637 −1.9008 0.1535
6 R2 = 0.9899
Effort = 0.0972 × CMP − 3.0637
and effort (in hours). For example, in the first training set, the derived model
is: Effort = 0.0869× CMP −1.9929, with high value of R2 = 0.9736 and the
coefficient is significant at level 0.05.
pute the predicted effort of the left-out project in each validation round (reported
in Table 6.3). The results is then evaluated using metrics described in Section
6.2.1.
Table 6.3 shows that the MMRE value is 0.199 and the prediction at level
0.25 is 0.833. This result suggests that the CMP model shows a good predictor
140
6.3 Data Collection
No CMP AE PE MRE
1 504 4541.116 0.086
2 60 4 3.221 0.195
3 95 6 7.272 0.212
4 149 912.383 0.376
5 337 3226.989 0.157
6 645 51 59.63 0.169
MMRE 0.199
PRED(0.25) 0.833
6.2.4 Conclusion
In this section, we have shown that phase 1 of the empirical validation yields
good result of the CMP model as a predictor for effort estimation in some Cloud
migration cases. However, to have more confidence in the CMP model, more data
data on past migration projects to Cloud for determining migration cost fac-
tors, including size, and examining their relationships with the effort required for
migration.
Table 6.4 shows the data points we got from our own projects, survey, and in-
terviews, together with the corresponding CMP values calculated with the initial
These data points are calculated for each CMP component separately, then
141
Database Install.& Config. Connection Code Total
ID
CMP db Hours CMPic Hours CMP conn Hours CMPcode Hours CMP Hours
1 6 2 45 80 0 0 440 250 3232 332
2 0 0 9 3 0 0 0 0 18 3
3 29 25 0 0 0 0 65 40 493 65
4 40 8 0 0 0 0 0 0 56 8
5 0 0 33 50 9 5 44 20 387 75
6 0 0 0 0 9 10 0 0 54 10
7 8 5 0 0 18 20 0 0 118 25
8 0 0 18 24 0 0 0 0 44 24
9 0 0 7 6 0 0 0 0 21 6
142
10 0 0 0 0 110 100 0 0 480 100
11 0 0 27 50 18 20 0 0 124 70
12 0 0 135 300 2 2 90 80 1158 382
13 6 1 9 7 3 2 0 0 32 10
14 6 2 21 20 22 20 0 0 167 42
15 23 7 13 14 0 0 30 10 207 31
16 84 15 8 10 1 2 89 40 511 67
17 6 2 9 4 2 2 0 0 38 8
6. VALIDATION
18 6 2 8 8 2 2 0 0 38 12
19 0 0 36 48 0 0 0 0 72 48
Table 6.4: Data points from surveys and interviews
6.4 Empirical Validation - Phase 2
the CMP value can be accumulated with associated weights from the model for
each component. Some data points consist of all 4 CMP components, but some
only have one or two components of CMP.
CMP only considers internal factors, but not external factors, as discussed in
Chapter 5; while these data points come from different organizations. As a result,
in the survey and interview questions, as well as in the data analysis process, we
project is left out as the validation set. The results of R2 , t-value, and p-value of
the coefficients and the intercepts of all 19 validation rounds are summarized in
Table 6.5.
ID Coefficient Intercept
Value t-value p-value Value t-value p-value
143
6. VALIDATION
ID Coefficient Intercept
Value t-value p-value Value t-value p-value
144
6.4 Empirical Validation - Phase 2
ID Coefficient Intercept
Value t-value p-value Value t-value p-value
R2 = 0.697
Effort = 0.1177 × CMP + 25.6207
145
6. VALIDATION
ID Coefficient Intercept
Value t-value p-value Value t-value p-value
ported in Table 6.6). Table 6.6 shows that the MMRE value is 1.5155 and the
prediction at level 0.25 is 0.5789. This result suggests that the CMP weights (or
146
6.5 CMP Parameters Calibration
ID CMP AE PE MRE
1 3232 332 881.4686 1.6550
2 18 3 27.9283 8.3094
3 493 65 83.5724 0.2857
4 56 8 32.3278 3.0410
5 387 75 69.7342 0.0702
6 54 10 31.9428 2.1943
7 118 25 38.9682 0.5587
8 44 24 29.7354 0.2390
9 21 6 28.0924 3.6821
10 480 100 79.9540 0.2005
11 124 70 36.8852 0.4731
12 1158 382 133.3256 0.6510
13 32 10 29.1903 1.9190
14 167 42 44.0611 0.0491
15 207 31 49.7258 0.6041
16 511 67 85.7034 0.2792
17 38 8 30.0793 2.7599
18 38 12 29.8040 1.4837
19 72 48 31.6707 0.3402
MMRE 1.5155
PRED(0.25) 0.5789
This section presents our attempt to calibrate the CMP model, in order to increase
its validity externally, so that CMP can be more widely useful.
There are 37 parameters (or weights) in total in the CMP model (reflected
in Tables 5.3, 5.6, 5.8, 5.10, and 5.11 of Chapter 5). The original values of these
weights were defined by discussion with a group of Cloud engineers who have
participated in Cloud migration projects. We asked each Cloud engineer for their
individual judgment on each weight value, then we averaged all values across all
Cloud engineers that we had the discussion with, to derive a final value for each
147
6. VALIDATION
parameter.
These expert opinion weights can be further refined using data points collected
from our survey on Cloud migration projects. Since questions for each component
of the CMP was asked separately, so data collected for each CMP component is
also separated from one another; hence, we can separate each survey response for
Table 6.7 clearly shows that there are 11 parameters without any data points
(Weight IDs: 7, 8, 9, 10, 11, 13, 14, 19, 20, 22, and 23). These weights cannot
be calibrated without data points; hence, we keep the expert opinion values for
these weights. These values are candidates for adjustment when more data points
become available.
There are 4 weights with only 1 data point each (Weight IDS: 24, 28, 29,
and 30), and 7 weights with 2 data points each (Weight IDs: 3, 12, 15, 16, 17,
32, and 33). With too few data points, these values can be easily changed to
improve the prediction level of the model, but it could lead to the problem of
overfitting. Therefore, we decided that these weights do not have sufficient data
points for the calibration, and their expert opinion values are also kept for the
time being. However, these values may also be subject to change if more data
The remaining 11 weights in Table 6.7 and 4 other weights associated with each
CMP components have 3 or more data points each; hence, they are considered
for the calibration process, although the number of data points for each weight
148
6.5 CMP Parameters Calibration
Table 6.7: Number of data points available to calibrate each weight of the CMP
model
149
6. VALIDATION
Although with the current dataset, the calibration process can be performed
important to test the plausibility of those assumptions given the available data,
before performing any calibrations. The validation process in the following section
The assumptions on each CMP components and their elements are stated as
follows:
Relevant projects in the survey responses show connection changes in the first
changes, or any other types different from those proposed. Hence, this assumption
is considered valid.
this assumption is because any changes that make a WAN connection become a
LAN connection are essentially reversed activities of changes to make a LAN
connection become a WAN connection, given the source and destination of these
150
6.5 CMP Parameters Calibration
connections remain unchanged. Effort required for carrying out those changes
to the first two types mentioned above, has a significantly different impact on the
size of a migration task. Essentially, effort required to amend a LAN connection
in the local environment to adapt to the new environment in cloud should be much
less than effort required for LAN-to-WAN and WAN-to-LAN connection changes.
This assumption is reflected quite clearly in projects 6, 7, 13, and 16. Projects
and 20 hours vs. 2 and 2 hours). Similar observation is for projects 7 and 14,
where both projects have 2 LAN-to-WAN connections and project 14 also consists
observed for these two projects (both projects required 20 hours each). Therefore,
this assumption is verified.
has a significantly different impact on the size and effort of the migration task.
The requirements for Protocol Optimization and/or Security are defined into
3 levels of Complexity: Low, Average, and High (Table 5.2). Given the avail-
able data of the component CM Pconn as in Appendix B, project 5 has 1 average
151
6. VALIDATION
and 12, where 4 average complexity connections require effort 10 times as much
as that of 2 low complexity connections (20 hours vs. 2 hours). Data on sev-
eral other projects (such as 13 and 18) also yield similar result. Therefore, this
assumption is plausible.
Assumption 5 The relative impact of the three types and performance and se-
curity can be represented by the set of significantly different weights (weight IDs
Each individual weight is a specific assumption. Table 6.7 shows that only 5
weight IDs (1, 2, 4, 5, and 6) can be considered for the calibration exercise as
discussed above. The validation and calibration of these weights will be presented
in more depth in Section 6.5.2. The other weights with very few data points,
whose values were determined by expert opinion, are kept at this stage. These
ola et al., 2005), in this section, we still state all assumptions and validate them
on the available data of our Cloud migration context.
Assumption 6 Four different types of class have a significant impact on the size
152
6.5 CMP Parameters Calibration
6 out of 19 projects from our survey responses (projects 1, 3, 5, 12, 15, and
16) involve code modification component, and they spread over 4 types of class:
None of the responses suggested any different type of class, apart from these 4
types. In these 6 projects with code modification, the effort required to modify
the four types of class plays a major part in the total effort required for the whole
projects ( e.g., in project 1: 250 hours for code modification out of 332 hours
in total (75% of total effort); or project 3: 40 hours for code modification out
Data from the 6 corresponding projects show that all these types of class
changes were actually carried out during their migration, although the data sup-
porting this claim still does not seem very clear and explicit. Also, these types
were inherited from Class Point. Hence, we consider this assumption is valid to
some certain extent.
Assumption 8 The relative impact on tasks’ size and effort of added and deleted
elements for each class type and each change type have a significantly different
impact on task size and effort in the ratio 5 to 1.
This assumption is based on the suggestion of Niessink & Vliet (1997) that
a removing task requires effort 0.2 times as much as that of an adding task.
153
6. VALIDATION
Unfortunately, our data is not sufficient to test this assumption. This assumption
should be subject to be tested when more data comes available in the future.
based on ranges of the individual change counts, and counts greater than the upper
value all have the same impact.
This is an important assumption, inherited from Class Point, and it has been
validated in (Costagliola et al., 2005). The second part of this claim may result
in problems with development effort, and this is a known issue from Function
Point. However, in the Cloud migration context, this issue is less problematic
since data from survey responses have shown that there are very few tasks with
Assumption 11 The differences between class type and complexity level can be
represented as a set of 12 weights, where each individual weight represents a
specific assumption.
Table 6.7 shows that only 2 weight IDs (18 and 21) can be considered for
the calibration process as discussed above. The validation and calibration of the
weights will be presented in more depth in Section 6.5.2. The other weights with
154
6.5 CMP Parameters Calibration
very few data points, whose values were determined by expert opinions, are kept
at this stage. These values may be subject to change when more data becomes
available in the future.
Data from the survey responses show that it is quite popular to have Infras-
tructure packages installed and configured in the Cloud for a migration project
(10 out of 19 responses). Only 1 project required Application packages (project
16). The amount of effort spent on these installation and configuration tasks is
relatively significant compared to other CMP components, especially for Infras-
configured have significantly different impact on the size and effort of the migra-
tion task.
extra effort to compile the source code. These types require different amount of
effort. Project 1 requires 80 hours to install 5 packages from source code with
155
6. VALIDATION
projects 5 and 13. Project 5 has 2 packages from binary installer and 3 packages
from source code, whereas project 13 also has 2 packages from binary installer
and 3 packages without any installations at all. The former requires 50 hours,
and the latter requires only 7 hours. Some other observations on other projects
also give similar results. Therefore, this assumption is certainly valid.
into complexity levels based on the installation methods and the number of param-
eters to be configured.
Although there are no data explicitly supporting this claim, and it is very
hard to verify this type of assumption, it intuitively makes sense because of the
different impacts of installation methods and the number of parameters to be
configured on the size of migration tasks, as in the previous assumption.
Assumption 15 The differences between package types and complexity levels can
Table 6.7 shows that only 3 weight IDs (25, 26 and 27) can be considered for
the calibration as discussed above. The validation and calibration of the weights
will be presented in more depth in Section 6.5.2. The other weights with very
few data points, whose values were determined by expert opinions, are kept at
this stage. These values may be subject to change when more data is available
in the future.
156
6.5 CMP Parameters Calibration
of the major part of their migration processes (about 30% of total effort). These
projects represent all four types of database change: same relational database
and same version (project 3), same relational database and different version,
migration tasks, for example: populating data to a NoSQL database requires more
tasks than just a “sqldump” command, or JOIN operations from the relational
database must be modified since NoSQL database does not support JOIN. Hence,
more effort is required for NoSQL databases. This assumption is supported by
data from the survey responses. Particularly, project 16 required 5 hours to
only 2 hours to populate the same amount of data from a relational to another
157
6. VALIDATION
data population) have significant impact on the size of the migration tasks.
Our survey responses only report activities related to either query modification
Table 6.7 shows that only 1 weight (weight ID 31) can be calibrated as dis-
cussed above. The validation and calibration of this weight will be presented in
more depth in Section 6.5.2. The other weights with very few data points, whose
values were determined by expert opinions, are kept at this stage. These values
Conclusion:
The above assumptions have been made during our CMP development. We
only stated the main and high level assumptions at this stage. More assumptions
can be extracted and tested when more information becomes available. These
assumptions are essential because of the high complexity of a size metric for
Cloud migration projects.
As can be seen, there are already too many assumptions with too little infor-
mation from the survey responses in order to properly validate their plausibility.
This shows the high complexity and difficulty of validating the CMP metric at
158
6.5 CMP Parameters Calibration
this stage of the Cloud migration context. We attempted to test many of the
In this section, the calibration will be performed on 15 weights, which have three
or more data points from the survey, as discussed in the previous section. The
15 weights from Table 6.7 are:
• Installation and Configuration Component CM Pic : weight IDs 25, 26, and
27
• 4 main weights for each CMP component to compute the final CMP value
The calibration is, first, performed on each CMP component individually, and
then together. For each CMP component, we perform multiple regression on the
tunable weights. For projects that also consist of other un-tunable weights, we
use their expert opinion values. The data used for the calibration are attached
in Appendix B.
follows:
159
6. VALIDATION
The multiple regression for this component uses 11 data points (projects 5,
6, 7, 10, 11, 12, 13, 14, 16, 17, and 18). The 5 tunable weights count as 5 input
variables. The multivariate model is:
f = a1 ∗ x1 + a2 ∗ x2 + a3 ∗ x3 + a4 ∗ x4 + a5 ∗ x5
Project ID a1 a2 a3 a4 a5 f
5 0 1 0 1 0 5
6 0 0 0 0 1 10
7 0 0 0 0 2 20
10 0 5 0 5 5 100
11 0 2 0 2 0 20
12 1 0 1 0 0 2
13 3 0 0 0 0 2
14 0 0 0 0 2 20
16 1 0 0 0 0 2
17 1 0 1 0 0 2
18 1 0 1 0 0 2
Table 6.8: Data points for calibrating network connection component weights
This multiple regression gives regression coefficients that are essentially new
160
6.5 CMP Parameters Calibration
The multiple regression for this component uses 6 data points (projects 1,
3, 5, 12, 15, and 16). The 2 tunable weights count as 2 input variables. This
multiple regression gives regression coefficients that are essentially new values for
used as an input variable. This regression gives a regression coefficient that are
161
6. VALIDATION
There are 4 weights to calculate the final CMP value as in Table 5.11 of
Chapter 5. The multiple regression uses all 19 data points, calculated with new
weights from the calibration processes above. The 4 tunable final weights count
as 4 input variables. This multiple regression gives regression coefficient that are
essentially new values for these 4 weights as in Table 6.13
Conclusion:
There are 15 tunable weights out of 37 weights in total. The rest of the weights
are kept unchanged because they have very few data points for the calibration.
The 15 calibrated weight values have changed quite significantly from the expert
opinion values. The model with new set of weights need to be validated again to
ensure its performance is improved from the original one.
In this section, we perform similar empirical validation as the first two phases
on the new dataset of 19 data points. This dataset essential originates from the
162
6.6 Empirical Validation - Phase 3
survey as in Phase 2; however, the final CMP values in this dataset are calculated
Table 6.14: New dataset - calculated from the new set of calibrated weights
ID Coefficient Intercept
Value t-value p-value Value t-value p-value
163
6. VALIDATION
ID Coefficient Intercept
Value t-value p-value Value t-value p-value
R2 = 0.9571
Effort = 1.25749 × CMP − 16.47287
164
6.6 Empirical Validation - Phase 3
ID Coefficient Intercept
Value t-value p-value Value t-value p-value
165
6. VALIDATION
ID Coefficient Intercept
Value t-value p-value Value t-value p-value
0.25 is 0.9474.
166
6.6 Empirical Validation - Phase 3
ID CMP AE PE MRE
1 382.3 332 464.2656 0.3984
2 9.9 3 3.7282 0.2427
3 92.2 65 89.6283 0.3789
4 22.4 8 17.1412 1.1426
5 82.8 75 78.8244 0.0510
6 12.6 10 6.1665 0.3834
7 29.2 25 23.4141 0.0634
8 24.2 24 17.9743 0.2511
9 11.55 6 5.3188 0.1135
10 112 100 109.2013 0.0920
11 51.1 70 44.6213 0.3626
12 293.7 382 253.8692 0.3354
13 14.45 10 8.2223 0.1778
14 52.6 42 47.8980 0.1404
15 47.85 31 43.3884 0.3996
16 93.95 67 91.4257 0.3646
17 14.9 8 8.8768 0.1096
18 14.9 12 8.5659 0.2862
19 39.6 48 33.3394 0.3054
MMRE 0.2947
PRED(0.25) 0.9474
Conclusion:
This new MMRE value shows a significant improvement from Phase 2 (MMRE
= 1.5155). Although the new MMRE value is still greater than the recommended
0.25 level, it becomes much closer to this value after the calibration. We strongly
believe that when more data on Cloud migration projects comes available in the
future, further calibration can be performed on other weights as well. In addition,
the prediction at level 0.25 is 0.9474, which is higher than the standard value 0.75,
further support the claim that the CMP model can be a potential predictor for
167
6. VALIDATION
The validation process described in this chapter has shown that the CMP model
can help enterprises to map out the migration tasks for their Cloud migration
projects, and it can be a potential predictor for migration effort estimation. How-
ever, in order to generalize this claim to the whole population, we need a much
larger dataset to calibrate the parameters, and a different large dataset to validate
the model. We argue that we divided the dataset into multiple subsets for the
calibration, and then the whole dataset for the validation, which could increase
the reliability of the validation results to some extent. Having said that, the
results of the validation were still biased. However, this very threat of validity is
unavoidable at this stage. In the future, when more data points become available,
a full exercise of calibration and validation can be executed again using the same
The four components of the CMP model and their steps were only validated
with internal and self-development projects. Questions from the survey were also
asked to invite more suggestions and insights on additional tasks that may be
required in a Cloud migration project. However, there were no relevant comments
received. The CMP model itself, besides the weights, would need to be further
In order to increase the validity of the model, the quality and quantity of data
points from survey can be further improved by:
• Clarify some questions, since some questions from survey were not clear
168
6.8 Summary
enough, such as: how much time they spent on migrating data. It could
any effort.
of them. For example, for questions on how many classes were modified in
Code Modification session, some answers were purely wild guess (as com-
mented by the respondents). These answers were discarded.
• Some answers indicate that the total number of hours includes learning
time, but it’s not clear how much time was spent on learning, how much
time was spent on actual tasks. This can be overcome by modifying the
questions, such as: how much time was spent on this task for the first time?
How much time for the second time? However, this will make the question
list longer and possibly more tedious.
• Some answers did not indicate whether the effort included learning time or
6.8 Summary
In phase 1, the CMP metric is first validated using our initial dataset of 6
small-scale migration projects conducted by our group. The result gives good
169
6. VALIDATION
indication that CMP can be a potential predictor for effort estimation in some
Cloud migration cases. We conducted a survey to collect data about past Cloud
migration projects with external organization. The motivation for this study is to
further validate the CMP model externally. A survey and some interviews were
the best approach for our data collection purpose, because there are no existing
data on this.
In phase 2, we validated the CMP metric using the dataset from the survey.
The result indicates that the CMP metric need further calibrations to improve its
performance. At this phase, we also listed a set of assumptions on the structure of
the CMP metric and attempted to test their plausibility using the available data
from the survey. The tunable weights (15 out of 37 weights) were also calibrated
very close to the standard requirement. This indicates that the CMP model can
be a predictor for effort estimation in the Cloud migration context. It also infers
that when more data comes available in the future, the performance of the CMP
model can be further improved with more calibration on other weights as well.
170
Chapter 7
”The more you understand what is wrong with a figure, the more
valuable that figure becomes.”
∼ Lord Kelvin.
from a local server to the Cloud requires different migration tasks to be carefully
planned and performed. Different types of migration task may have significantly
migration tasks and quantify their impact on the migration effort early in a
Cloud migration project, so that enterprises can make well informed decisions on
whether it is worth migrating to the Cloud. On the other hand, this is challenging
because Cloud computing is still relatively immature, and there is very little
171
7. CONCLUSIONS AND FUTURE DIRECTIONS
related work on the topic of interest. Moreover, Cloud migration projects vary in
them.
In this thesis, we have achieved our research goals to understand Cloud mi-
gration projects. We have identified influential cost factors (internal cost factors
and external cost factors) of a Cloud migration project. We also proposed a
This chapter concludes this thesis, using the following structure: In Section
7.1, we summarize the main studies and findings of this research. Section 7.2 elab-
orates on how this research has achieved our research goals and how it contributes
to software engineering domain within the Cloud migration context. The limita-
tions of this research are presented in Section 7.3. Section 7.4 suggests directions
for future research.
projects are conducted, in the form of a list of potential migration tasks that
might be involved in a Cloud migration project. Our experiment is to migrate
the PetShop .Net application from a local server to Windows Azure and SQL
172
7.1 Research Summary
Azure. The migration of Java PetStore into Amazon EC2 and SimpleDB was
external cost factors. The internal cost factors indicate what migration tasks are
required, such as: compatibility issues, library dependency, database features and
connection issues. The external cost factors determine how fast those tasks can be
achieved, such as: project team’s capabilities, existing knowledge and experience
on Cloud providers and technologies, and selecting the correct Cloud platforms
and services.
Some of these influential cost factors are specific for migration to the Cloud,
because they are not applicable for a conventional migration project from one
platform to another. For example, a migration project from Java to .Net is a
complete rewrite and it would not have compatibility, datatabase, connection, or
These factors, one way or the other, all affect the effort spent on the Cloud
migration process.
The list of internal cost factors, together with related work from the literature
review and practitioners’ blogs, enable us to generalise and propose a general
taxonomy of migration tasks that any migration projects may encounter, and
the migration tasks are grouped under 6 categories: Training and Learning, In-
173
7. CONCLUSIONS AND FUTURE DIRECTIONS
different aspects of a Cloud migration project; but on the other hand, they com-
the early stage of any migration projects. Some tasks may be broken down into
more detailed levels, whereas some tasks may be skipped, depending on specific
to the Cloud; therefore, we developed our CMP model for sizing Cloud migra-
tion projects by casting the well-known Function Point (FP) measurement into
our context of interest. The difference between these two contexts is that the
different. Size metrics for functionality development measure the product (i.e.,
the components or class or functions to be developed), whereas size metrics for
migration tasks measure both process (i.e., the migration tasks to be carried out)
and product (i.e., related parts of the system to be migrated).
CMP extends FP not by adding more elements into the existing FP method,
174
7.1 Research Summary
2. Then for each unit, evaluate its complexity level (Low, Average, or High)
Apart from the FP methodology, CMP is also developed on the basis of the
proposed taxonomy of Cloud migration tasks. The CMP model measure the ac-
cumulated size of all migration tasks making up the migration project. Therefore,
the taxonomy can easily be used as the input into the CMP model. After care-
fully analyzing all categories of the taxonomy, the CMP model was determined
to include 4 main components: Installation and Configuration, Database Migra-
tion, Code Modification, and Network Connection. These components capture
distinct aspects of a migration project to the Cloud; therefore, the CMP model
has been developed to cover all these aspects separately. Each of these CMP
components was developed using the FP three-step approach. Then, the final
CMP value is calculated as a weighted sum of its four components CMPconn ,
CMPcode , CMPic , and CMPdb , which measure size of migration tasks related to
connection changes, code changes, installation and configuration, and database
with a group of Cloud engineers, who have carried out different types of Cloud
for Cloud migration projects. Our study shows CMP is more suitable for Cloud
migration projects than other existing size metrics in the literature since it cap-
tures special aspects of the Cloud migration context. Moreover, CMP emphasises
175
7. CONCLUSIONS AND FUTURE DIRECTIONS
the specific features of Cloud migration process, such as that some required third-
party libraries are not readily available in the Cloud as they are in the local data
centre. This is not so much an issue when migrating between two local data cen-
tres because third-party libraries can usually be reused without major changes.
Another Cloud feature reflected in the CMP model is that Cloud users (or devel-
opers) do not possess full control over the Cloud environment as they do in a local
data centre. This results in the limited range of actions for each migration task.
Therefore, the CMP model takes into consideration Cloud-specific dependencies
for each migration task, for example, only security and protocol optimisation are
assessed for connection tasks, and database tasks are concerned with migrating
from relational to NoSQL databases.
In a project development cycle, the CMP model fits well into the pre-implementation
phase and after the design phase. One important assumption for CMP is that
all design decisions have been made. These design decisions have direct impact
on how CMP is counted, since they define all anticipated migration tasks. The
CMP counting process itself should not require much training and effort; how-
ever, its accuracy relies on the completeness and granularity of the migration task
list. Therefore, it is important to carefully analyse the list of expected migration
tasks to ensure it captures all the Cloud migration aspects adequately and with
Briand et al. (1996) proposed a list of properties for product sizing metric,
while CMP is related to both process and product. The CMP model has been
proved to meet all requirements from (Briand et al., 1996); however, additional
176
7.1 Research Summary
The empirical validation was to justify the usefulness of the CMP size measure-
with its initial set of weights as presented in Chapter 5 using our initial set of 6
Cloud migration projects. Because of the limited number of data points publicly
available, the data we use in this first phase of the empirical validation is ex-
This result suggests that the CMP is a good predictor of effort estimation for
some Cloud migration projects that have been considered.
However, to have more confidence in the CMP model, more data on external
projects are required to further validate it. Hence, at the beginning of phase 2, we
conducted a survey to collect data on past migration projects to the Cloud from
external organizations. The reason we had to conduct this survey is because, un-
via web surveys, and some additional interviews. The studied population includes
project teams from NICTA and individual practitioners who have migrated their
systems to the Cloud. The practitioners were identified from Cloud community
and online discussions. Interviews were conducted with NICTA’s project teams
to gain more insights and more detailed data, and surveys were sent to a list of
identified practitioners. The study was conducted on the entire population due
177
7. CONCLUSIONS AND FUTURE DIRECTIONS
We sent out more than 300 surveys to different target audience, including aca-
The main reason for this low responses rate is because most of the projects were
done for exploration and tutorial purposes; hence there were no detailed informa-
tion recorded, especially some information required for calculating CMP. Most
responses could easily answer general questions on why they migrated to the
Cloud, or how they generally did that, but most of them failed to provide suffi-
cient information at the design level of migration tasks. After careful analysis,
we obtained a new dataset of 19 data points.
is 1.5155 and the prediction at level 0.25 is 0.5789. This result suggests that the
CMP weights (or parameters) need further calibration.
There are 37 parameters (or weights) in total in the CMP model. The original
values of these weights were defined by discussion with a group of Cloud engi-
neers who have participated in Cloud migration projects. We asked each Cloud
engineer for their individual judgment on each weight value, then we averaged
all values across all Cloud engineers that we had the discussion with, to derive
a final value for each parameter. These expert opinion weights can be further
refined using data points collected from our survey on Cloud migration projects.
The available data show that only 15 out of 37 weights were considered to have
sufficient information for the calibration. The remaining 22 weights are kept un-
178
7.1 Research Summary
changed with their expert opinion values. However, these values would also be
With the available data, a few assumptions still do not have sufficient information
the new dataset of 19 data points. This dataset originates from the survey as in
phase 2; however, the final CMP values in this dataset are calculated based on the
new set of calibrated weights. The new MMRE value (0.2946) shows a significant
improvement from phase 2 (MMRE = 1.5155). Although the new MMRE value
is still greater than the standard 0.25 level, it is much closer to this value after
the calibration. We strongly believe that when more data on Cloud migration
other weights as well. In addition, the prediction at level 0.25 is 0.9474, which is
higher than the standard value 0.75, further supporting the claim that the CMP
model can be used as a reliable predictor for Cloud migration effort estimation.
179
7. CONCLUSIONS AND FUTURE DIRECTIONS
This research has answered the research questions stated in Section 1.3. Through
tasks on the Cloud migration effort. Our view on this is illustrated with the
CMP model in Chapter 5. The CMP model can be useful for multiple purposes:
(1) helping enterprises map out their migration tasks, (2) identifying the com-
plexity of each task, so that the right staff with right skills can be assigned tasks
accordingly, and (3) estimating the total effort required for the migration project.
To date, no other research has focused on the migration effort aspect of soft-
and size measurement concepts from the traditional software engineering to Cloud
computing domain.
One of our contributions, which can also be seen as one of the difficulties we
encountered, is that no related research with the same focus on Cloud migration
effort is available; hence, the list of migration tasks, influential cost factors, or
validation data cannot be gathered from the literature review. We had to ex-
plore and develop everything from scratch. For example, we carried out a series
180
7.2 Research Contribution
migration tasks are required, and what impact they have on the migration effort.
for further validation. All these activities that we have undertaken can be useful
for other research with similar focus in their starting phase or comparative study.
Contribution 2 This research has identified critical cost factors of Cloud mi-
gration effort.
effort. These factors are categorized into internal and external factors. This is
aligned with traditional size measurement approaches. This research adds to the
existing body of knowledge relating to the size measurement cost drivers, offering
more critical factors in the Cloud migration context.
The taxonomy outlines possible migration tasks that any migration project
its implication on the amount of effort required. We derived these tasks from our
series of migration experiments of different application types to different Cloud
providers.
Point (CMP), for estimating the size of Cloud migration projects, by recasting
181
7. CONCLUSIONS AND FUTURE DIRECTIONS
a well-known software size estimation model called Function Point (FP) into the
nection changes, database migration, code modification, and installation and con-
figuration for the new environment in the Cloud. For each component, we per-
weighted sum CMP provides an indication of how large the migration project is,
and it can be used as an indicator to Cloud migration effort estimation.
Contribution 5 This research has described the survey protocol to collect data
on past Cloud migration projects.
collect data on how they migrated their system to the Cloud and how much time
they spent on the migration tasks. The response rate was quite low because many
of the migration exercises were mainly for exploration purposes, and not many
practitioners kept track of the time spent on each individual task. This survey
questionnaire and its protocol can certainly be re-used and improved to collect
more data on a wider range of projects.
182
7.3 Research Limitation
empirical validation shows that the metric is practically useful under a defined
set of assumptions. This research has outlined and justified each step in the
validation phase. The calibration process has been described and it can be re-
applied to calibrate the model further when there is a larger dataset.
Conclusion:
Our overall contribution is to shed light into Cloud migration and the tasks
involved, which enables Cloud practitioners to estimate the amount of effort re-
quired for the migration of legacy systems into the Cloud. This contributes
towards the cost-benefit analysis and the decision of whether it is worth to move
to the Cloud.
1. This research involved many exploratory activities, and the result cannot
whole at this stage, because there is not enough data. However, the process
of undertaking all activities in this research has been carefully recorded and
justified. This process can certainly be re-applied on a larger set of data to
2. Data collection has been done mainly via web surveys, and the questions
and responses depend on the respondents’ personal interpretation and mem-
ory. In-person interviews would give more reliable and accurate responses,
183
7. CONCLUSIONS AND FUTURE DIRECTIONS
also gains more insights from the interviewee. However, we were not able
to conduct many interviews, because of time and geographical constraints.
3. The low response rate of the survey (10%), together with the nature of a
entire population.
4. For applications that require code modification for the Cloud environment,
CMP only assesses application code changes at “class” level, and employs
”Class Point” for the Code Modification Component. Hence, the CMP
model is only applicable for object-oriented applications. There are still
numerous legacy applications that are not object- oriented and that could
be migrated to the Cloud.
5. The calibration and validation of the CMP model were undertaken with a
small number of data points. The response rate from the survey is quite
low (less than 10%), and some responses were incomplete. The reason for
the low response rate is because not many respondents actually recorded
how long they spent on each migration task. Most of the projects from the
responses are small and medium projects. It was very hard to conduct sur-
6. This research has used the same data to calibrate the model parameters, as
184
7.4 Future Research Directions
for the final validation. This may result in overfitting problems, where the
accuracy of the model may not be applicable for other datasets. We argue
that we divided the dataset into multiple subsets for the calibration, and
then the whole dataset for the validation, which could increase the reliability
of the validation results to some extent. Having said that, the results of
the validation were still biased. However, this very threat of validity is
unavoidable at this stage. In the future, when more data points become
available, a full exercise of calibration and validation can be executed again
using the same methodology presented. This can be a worthwhile future
direction for this research.
7. The four components of the CMP model and their steps were only validated
with internal and self-development projects. Questions from the survey were
also asked to invite more suggestions and insights on additional tasks that
may be required in a Cloud migration project. However, there were no
relevant comments received. The CMP model itself, besides the weights,
Accurate effort estimation has always attracted a lot of attention from the tra-
ditional software engineering community because of its difficulty and complexity.
Casting this concept into the Cloud migration context increases the difficulty be-
cause there are even more angles to investigate. This research has investigated
several aspects of this problem, such as: exploring the cost implications of Cloud
185
7. CONCLUSIONS AND FUTURE DIRECTIONS
taxonomy of migration tasks, and developing a size metric as an indicator for the
migration effort estimation. However, other important aspects also require inves-
to the Cloud.
projects. The survey has collected data on how some external factors affect
the migration effort (Appendix B), but they were not incorporated in our
result. Future research can used these data and investigate further a list of
external cost factors to determine if they are really cost factors and if the
variable into the effort estimation model. All that is required is a sufficient
set of data on effort spent on past migration projects. This future research
can be achieved with a wider range survey or more interviews with larger
organizations, or with proper case studies.
3. This research developed one type of size metric for Cloud migration projects.
186
7.4 Future Research Directions
We believe that the methodology employed is the best suited for this pur-
performed to decide which size metric provides the most accurate effort
predictions. MMRE has been quite widely criticised for its accuracy when
it comes to select the best model. Hence, the comparative study should
the Cloud. The input into this framework is the type of system currently
in use, the type of the Cloud targeted, and performance requirements. The
framework will then quantify both benefits and costs of having the system in
the Cloud. The output of the framework can assist enterprises to conclude
if the benefits outweigh the costs and if moving to the Cloud is a wise
decision.
187
7. CONCLUSIONS AND FUTURE DIRECTIONS
188
Bibliography
Abadi, D.J. (2009). Data management in the cloud: Limitations and opportu-
nities. IEEE Data Eng. Bull., 32, 3–12.
Abran, A. (1999). Functional size measurement for real time and embedded soft-
ware. In Proceedings of the 4th IEEE International Symposium and Forum on
Software Engineering Standards, 259–, IEEE Computer Society, Washington,
125
Aggarwal, S. & McCabe, L. (2009). The compelling tco case for cloud com-
189
BIBLIOGRAPHY
101, 2–4.
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R.H., Kon-
winski, A., Lee, G., Patterson, D.A., Rabkin, A., Stoica, I. & Za-
haria, M. (2009). Above the clouds: A berkeley view of cloud computing.
190
BIBLIOGRAPHY
Babar, M.A. & Chauhan, M.A. (2011). A tale of migration to cloud comput-
ing for sharing experiences and observations. In Proceedings of the 2nd Inter-
national Workshop on Software Engineering for Cloud Computing, SECLOUD
Bisbal, J., Lawless, D., Wu, B., Grimson, J., Wade, V., Richard-
International Computer Science Conference 1997. APSEC ’97 and ICSC ’97.
Proceedings, 529 –530. 32
Bisbal, J., Lawless, D., Wu, B. & Grimson, J. (1999). Legacy information
Boehm, B., Clark, B., Horowitz, E., Madachy, R., Shelby, R. &
Westland, C. (1995). Cost Models for Future Software Life Cycle Processes:
191
BIBLIOGRAPHY
Boehm, B., Abts, C. & Chulani, S. (2000). Software development cost es-
Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J. & Brandic, I. (2008).
Cloud computing and emerging it platforms: Vision, hype, and reality for
delivering computing as the 5th utility. Future Generation Computer Systems,
25, 599–616. 4
192
BIBLIOGRAPHY
Cetin, S., Ilker Altintas, N., Oguztuzun, H., Dogru, A., Tufekci, O.
Chang, F., Dean, J., Ghemawa, S., Hsieh, W.C., Wallach, D.A., Bur-
rows, M., Chandra, T., Fikes, A. & Gruber, R.E. (2006). Bigtable: A
Chow, R., Golle, P., Jakobsson, M., Shi, E., Staddon, J., Masuoka,
tion without outsourcing control. In CCSW ’09: Proceedings of the 2009 ACM
workshop on Cloud computing security, 85–90, ACM, New York, NY, USA.
Conte, S.D., Dunsmore, H.E. & Shen, Y.E. (1986). Software engineering
193
BIBLIOGRAPHY
Deelman, E., Singh, G., Livny, M., Berriman, B. & Good, J. (2008).
The cost of doing science on the cloud: the montage example. In SC ’08: Pro-
194
BIBLIOGRAPHY
Efron, B. & Gong, G. (1983). A leisurely look at the bootstrap, the jackknife,
Elmore, A.J., Das, S., Agrawal, D. & El Abbadi, A. (2011). Zephyr: live
migration in shared nothing databases for elastic cloud platforms. In Proceed-
Erdogmus, H. (2009). Cloud computing: Does nirvana hide behind the nebula?
Software, IEEE , 26, 4 –6. 1
lation study of the model evaluation criterion mmre. IEEE Trans. Softw. Eng.,
29, 985–995. 187
195
BIBLIOGRAPHY
Gabner, R., Schwefel, H.P., Hummel, K.A. & Haring, G. (2011). Op-
195–202.
Ghemawat, S., Gobioff, H. & Leung, S.T. (2003). The google file system.
Hajjat, M., Sun, X., Sung, Y.W.E., Maltz, D., Rao, S., Sripanid-
Hao, W., Yen, I.L. & Thuraisingham, B. (2009). Dynamic service and
SIT ’08: Proceedings of the 2008 annual research conference of the South
196
BIBLIOGRAPHY
Helmer, O. (1966). Social Technology. Helmer, O., Basic Books, New York,
Ho, Y., Liu, P. & Wu, J.J. (2011). Server consolidation algorithms with
bounded migration cost and performance guarantees in cloud computing. Util-
Jayasinghe, D., Malkowski, S., Wang, Q., Li, J., Xiong, P. & Pu, C.
Ji, W., Ma, J. & Ji, X. (2009). A reference model of cloud operating and
open source software implementation mapping. Enabling Technologies, IEEE
International Workshops on, 0, 63–65. 4
Neural network based effort estimation using class points for oo systems. In
197
BIBLIOGRAPHY
tering and artificial neural networks. In Proceedings of the 1st India software
engineering conference, ISEC ’08, 141–142, ACM, New York, NY, USA. 42
Kazman, R., Asundi, J. & Klein, M. (2001). Quantifying the costs and ben-
Keung, J.W., Kitchenham, B.A. & Jeffery, D.R. (2008). Analogy-x: Pro-
challenges for enterprise cloud computing. Tech. rep., Cloud Computing Co-
laboratory, School of Computer Science, University of St Andrews, UK.
198
BIBLIOGRAPHY
P. (2011). Decision support tools for cloud migration in the enterprise. Cloud
Computing, IEEE International Conference on, 0, 541–548. 27
Kitchenham, B., Pfleeger, S., Pickard, L., Jones, P., Hoaglin, D.,
El Emam, K. & Rosenberg, J. (2002). Preliminary guidelines for empirical
research in software engineering. Software Engineering, IEEE Transactions on,
Lai, R. & Huang, S.J. (2003). A model for estimating the size of a formal com-
Leake, G. (2006). Microsoft .net pet shop 4: Migrating an asp.net 1.1 applica-
Lederer, A. & Prasad, J. (1998). A causal model for software cost estimating
199
BIBLIOGRAPHY
Lenk, A., Klems, M., Nimis, J., Tai, S. & Sandholm, T. (2009). What’s
inside the cloud? an architectural map of the cloud landscape. Software Engi-
neering Challenges of Cloud Computing, ICSE Workshop on, 0, 23–31. 5
Li, H., Zhong, L., Liu, J., Li, B. & Xu, K. (2011a). Cost-effective partial
Li, W., Tordsson, J. & Elmroth, E. (2011b). Modeling for dynamic cloud
Li, W.S., Hsiung, W.P., Po, O., Hino, K., Candan, K.S. & Agrawal,
D. (2004). Challenges and practices in deploying web acceleration solutions for
Low, G.C. & Jeffery, D.R. (1990). Function points in the estimation and
evaluation of the software process. IEEE Trans. Softw. Eng., 16, 64–71. 125
200
BIBLIOGRAPHY
Mark Basler, D.N., Sean Brydon & Singh, I. (2010). Introducing the java
Matson, J.E., Barrett, B.E. & Mellichamp, J.M. (1994). Software de-
velopment cost estimation using function points. IEEE Trans. Softw. Eng., 20,
275–287. 39, 125
Meng, X., Shi, J., Liu, X., Liu, H. & Wang, L. (2011). Legacy application
migration to cloud. Cloud Computing, IEEE International Conference on, 0,
750–751.
Microsoft (2012). See how startups are using windows azure today. 13
201
BIBLIOGRAPHY
migration to the service cloud paradigm: Ongoing work in the remics project.
Services, IEEE Congress on, 0, 507–514. 31
Niessink, F. & Vliet, H.v. (1997). Predicting maintenance effort with func-
tion points. In Proceedings of the International Conference on Software Main-
tenance, 32–39, IEEE Computer Society, Washington, DC, USA. 112, 153
202
BIBLIOGRAPHY
Piao, J.T. & Yan, J. (2010). A network-aware virtual machine placement and
Rochwerger, B., Breitgand, D., Levy, E., Galis, A., Nagin, K.,
Llorente, I.M., Montero, R., Wolfsthal, Y., Elmroth, E., Cac-
eres, J., Ben-Yehuda, M., Emmerich, W. & Galan, F. (2009). The
reservoir model and architecture for open federated cloud computing. IBM
Journal of Research and Development, 53.
Ruhe, M., Jeffery, R. & Wieczorek, I. (2003a). Cost estimation for web
applications. In ICSE ’03: Proceedings of the 25th International Conference
Ruhe, M., Jeffery, R. & Wieczorek, I. (2003b). Using web objects for
203
BIBLIOGRAPHY
33, 51, 53
Smith, J.W. (2009). A comparison of public cloud platforms. Tech. rep., StACC:
St Andrews Cloud Computing Collaboratory.
Suen, C.H., Kirchberg, M. & Lee, B.S. (2011). Efficient migration of virtual
machines between public and private cloud. Cloud Computing Technology and
Science, IEEE International Conference on, 0, 549–553. 16
204
BIBLIOGRAPHY
Symons, F.C. & Symons, C. (2001). Come back function point analysis (mod-
Tran, V., Keung, J., Liu, A. & Fekete, A. (2011a). Application migration
to cloud: A taxonomy of critical factors. In Proceedings of the ICSE Software
Engineering For Cloud Computing Workshop, SECLOUD, ACM, New York,
NY, USA.
Tran, V., Lee, K., Fekete, A., Liu, A. & Keung, J. (2011b). Size estima-
tion of cloud migration projects with cloud migration point (cmp). In Proceed-
Truong, H.L. & Dustdar, S. (2010). Composable cost estimation and mon-
Tukey, J.W. (1958). Bias and confidence in not-quite large samples. The Annals
205
BIBLIOGRAPHY
Verma, A., Kumar, G., Koller, R. & Sen, A. (2011). Cosmig: Modeling
the impact of reconfiguration in a cloud. Modeling, Analysis, and Simulation
of Computer Systems, International Symposium on, 0, 3–11. 15, 28, 32
Verner, J. & Tate, G. (1992). A software size model. IEEE Trans. Softw.
206
BIBLIOGRAPHY
Ye, K., Jiang, X., Huang, D., Chen, J. & Wang, B. (2011). Live migration
28
Yin, R.K. (2003). Case study research : design and methods. Sage Publications,
3rd edn.
of the 12th international conference on World Wide Web, 461–471, ACM, New
York, NY, USA. 50
Zhang, G., Chiu, L. & Liu, L. (2010). Adaptive data migration in multi-
207
BIBLIOGRAPHY
208
Appendix A
209
THE UNIVERSITY OF NEW SOUTH WALES AND NICTA
If you decide to participate, we will conduct one interview with you at a time mutually agreed to. In the unlikely case
that there is a need for a follow-up interview, it will also be conducted at a mutually agreed time. Every interview will
be recorded with a voice recorder and should take no more than one hour to complete.
Results of this study might help you better understand factors in migrating legacy software applications to cloud
computing systems. This in turn might improve your work performance or customer satisfaction with your future
software. However, we cannot and do not guarantee or promise that you will receive any benefits from this study.
Any information that is obtained in connection with this study and that can be identified with you will remain
confidential and will be disclosed only with your permission, except as required by law. If you give us your permission
by signing this document, we plan to publish the summary results in a very general form at scientific conferences. The
purpose of this publication would be to inform the broader scientific community about how migration of legacy
applications to cloud computing systems can be alleviated. In any publication, information will be provided in such a
way that you, your company, the software tools that you used/supported/sold/developed, and the vendors of these
software tools cannot be identified.
Complaints may be directed to the Ethics Secretariat, The University of New South Wales, SYDNEY 2052
AUSTRALIA (phone 9385 4234, fax 9385 6648, email ethics.sec@unsw.edu.au). Any complaint you make will be
investigated promptly and you will be informed about the outcome.
After the completion of the study (likely in the second half of 2011), we will present you (and every other participant)
with summary results of this study (via email as a PDF file) and will ask you for some feedback. Your participation in
the feedback is voluntary (i.e. participation in interviews does not automatically imply participation in the feedback
process). If you are participating in the feedback process, you will be required to spend additional time to familiarize
yourself with study results and to provide some comments. The estimated time needed for the feedback is up to one
hour. If you wish to sign up for the feedback process now, you can do so by ticking the box on the next page. Please
note that you can withdraw from the feedback process any time by contacting us.
Your decision whether or not to participate in this study will not prejudice your future relations with the University of
New South Wales and NICTA. If you decide to participate, you are free to withdraw your consent and to discontinue
participation at any time, without any prejudice. You can decline to answer any question, for whatever reason.
If you have any questions, please feel free to ask Thi Khanh Van Tran (phone: 02 9376 2259; e-mail: ThiKhanhVan.
Tran@nicta.com.au) or Kevin Lee (phone: 02 9376 2207, e-mail: Kevin.Lee@nicta.com.au). If you have any
additional questions later, Thi Khanh Van Tran or Kevin Lee will be happy to answer them.
Page 1 of 12
THE UNIVERSITY OF NEW SOUTH WALES AND NICTA
You are making a decision whether or not to participate in this research study. Your signature indicates that,
having read the information provided above, you have decided to participate.
…………………………………………………… .…………………………………………………….
Signature of Research Participant Signature of Witness
…………………………………………………… .…………………………………………………….
(Please PRINT name) (Please PRINT name)
…………………………………………………… .…………………………………………………….
Date Nature of Witness
REVOCATION OF CONSENT
…………………………………………………… .…………………………………………………….
Signature Date
……………………………………………………
Please PRINT Name
The section for Revocation of Consent should be forwarded to NICTA, Attn: Kevin Lee, Software Systems Research
Group, Locked Bag 9013, Alexandria NSW 1435.
Page 2 of 12
A survey on cost factors for migration effort to Cloud
This survey is designed to collect data on migration projects to cloud for determining significant
cost factors that affect migration effort to cloud.
I. General questions
GQ1: What type of cloud did you migrate to? Please specify.
Check any that apply
IaaS __________________________________
PaaS __________________________________
SaaS __________________________________
Web Application
Desktop Software Application
Web Server
Database Server
Database
Operating Systems
Other: _____________________________
Page 3 of 12
II. Cost factors
Questions in this section focus on any cost factors that influence migration effort to cloud.
CF1: Have development team done any similar projects on Cloud before?
Yes
No
No answer
Database
Networking
Software Architecture
Other: ______________________________________
CF3: Please rate the following factors on how they influenced your migration effort to cloud?
1 - None to minor influence, 5 - Significant influence
1 2 3 4 5 No answer
Developers'
expertise
Experience in
software
development
Experience in cloud
Design quality of
migration tasks
Choice of cloud
services
Page 4 of 12
CF4: Are there any other factors influencing the migration effort?
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
Yes
No
MySQL
MSSQL 2008 or later
MSSQL 2005 or older
PostgreSQL
MSAccess
Other: _________________________________
No answer
Page 5 of 12
DB3: What database did you migrate to?
Choose one of the following answers. If you installed your own database server in cloud (e.g., in
an EC2 instance), please specify.
Amazon RDS
Amazon SimpleDB
Amazon S3
Microsoft SQL Azure
Google Bigtable
Other: ______________________________
No answer
DB4: How many SQL queries did you modify for your system to adapt to the new database
in cloud?
Choose one of the following answers
None
1 - 10
More than 10
No answer
DB7: How many person-hours did it take to migrate all data to cloud?
Only numbers may be entered in this field
Page 6 of 12
DB8: Did you perform any of the following for your database to adapt to the new database
in cloud?
Check any that apply
DB10: Did you carry out any other activities for database migration, and how many person-
hours did it take?
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
Yes
No
Page 7 of 12
IC2: How many software were installed to set up the environment in cloud?
e.g., Operation systems, database servers, web servers, etc...
Only numbers may be entered in these fields
IC4: How many person-hours did it take to complete all installation and configuration
tasks?
Only numbers may be entered in this field
IC5: Did you carry out any other activities for installation and configuration, and how many
person-hours did it take?
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
Page 8 of 12
V. Network Connections
Questions in this section focus on migration tasks related to network connection changes because
of the migration.
NC1: Do any components in your system connect with each other via the Internet or a local
network?
Yes
No
NC2: Did you carry out any tasks related to these network connections?
e.g., adding security such as VPC, optimizing network performance by changing packet size, etc...
Yes
No
NC3: How many connections in cloud that you have performed the following tasks?
Add security, i.e., secure a connection with VPC, or with secured protocol such as https
Optimize protocol for performance, such as changing TPC packet size, etc...
Only numbers may be entered in these fields.
Add security
Optimize protocol
NC4: How many connections across the Internet that you have performed the following
tasks?
Only numbers may be entered in these fields
Add security
Optimize protocol
NC5: How many person-hours did it take to complete all tasks related to connection?
Only numbers may be entered in this field
Page 9 of 12
NC6: Did you carry out any other activities for network connection, and how many person-
hours did it take?
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
Yes
No
Page 10 of 12
CM3: How many Human Interaction classes were modified in
Only numbers may be entered in these fields
CM7: How many person-hours did it take to complete all code modification?
Only numbers may be entered in this field
Page 11 of 12
CM8: Did you carry out any other activities for code modification, and how many person-
hours did it take?
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
End of survey.
Thank you for your time and effort for taking this survey.
Please return the completed survey to thikhanhvan.tran@nicta.com.au or
tyao1801@uni.sydney.edu.au
Page 12 of 12
A. CLOUD MIGRATION PROJECTS - SURVEY
QUESTIONNAIRE
222
Appendix B
223
Network Connection
LAN-to-LAN LAN-to-WAN WAN-to-LAN
ID Hours
Low Average High Low Average High Low Average High
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0
5 0 1 0 0 1 0 0 0 0 5
6 0 0 0 0 0 1 0 0 0 10
7 0 0 0 0 0 2 0 0 0 20
B. SURVEY RESPONSES - RAW DATA
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
224
10 0 5 5 0 5 5 0 0 0 100
11 0 2 0 0 2 0 0 0 0 20
12 1 0 0 1 0 0 0 0 0 2
13 3 0 0 0 0 0 0 0 0 2
14 0 0 1 0 0 2 0 0 0 20
15 0 0 0 0 0 0 0 0 0 0
16 1 0 0 0 0 0 0 0 0 2
17 1 0 0 1 0 0 0 0 0 2
18 1 0 0 1 0 0 0 0 0 2
19 0 0 0 0 0 0 0 0 0 0
Table B.1: Survey responses for network connection component
Code Modification
Problem Domain Human Interaction Data Management Task Management
ID Hrs
Low Average High Low Average High Low Average High Low Average High
1 0 0 20 0 0 5 0 0 0 0 0 20 250
2 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 5 0 0 0 40
4 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 1 0 0 1 0 0 1 0 0 1 20
6 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0
225
10 0 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 10 80
13 0 0 0 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 1 2 0 0 0 1 10
16 0 0 0 0 0 0 1 4 4 0 0 0 40
17 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 3 1 24
9 0 0 0 7 0 0 6
226
10 0 0 0 0 0 0 0
11 0 0 0 0 3 2 50
12 0 0 0 0 0 15 300
13 0 0 0 3 2 0 7
14 0 0 0 0 7 0 20
15 0 0 0 1 4 0 14
16 0 0 1 1 0 0 10
17 0 0 0 0 3 0 4
18 0 0 0 2 2 0 8
19 0 0 0 0 12 0 48
Table B.3: Survey responses for installation and configuration component
Database Migration
Query Modification Data Population
ID Hours
Low Average High Low Average High
1 0 0 0 2 0 0 2
2 0 0 0 0 0 0 0
3 20 0 0 3 0 0 25
4 0 0 0 0 0 4 8
5 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0
7 0 0 0 0 2 0 5
8 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0
227
10 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0
13 0 0 0 2 0 0 1
14 0 0 0 2 0 0 2
15 0 5 0 0 2 0 7
16 0 0 8 0 0 2 15
17 0 0 0 2 0 0 2
18 0 0 0 2 0 0 2
19 0 0 0 0 0 0 0
9 5 5 5 5 0
10 2 2 3 3 3
228
11 0 0 0 0 0
12 5 5 5 0 1
13 4 3 2 3 5
14 4 2 5 2 5
15 3 5 5 3 5
16 3 5 5 3 5
17 1 1 2 2 4
18 5 2 5 2 5
19 4 4 5 1 1
Table B.5: Survey responses for external cost factors