Whole

Cost Implications and Size
Estimation of Cloud Migration

Projects with
Cloud Migration Point
TRAN, Thi Khanh Van

School of Computer Science and Engineering,
Faculty of Engineering
University of New South Wales
A thesis submitted for the degree of

Doctor of Philosophy
March 2012
ii
THE UNIVERSITY OF NEW SOUTH WALES
Thesis/Dissertation Sheet
Surname or Family name: TRAN
First name: THI KHANH VAN Other name/s:
Abbreviation for degree as given in the University

calendar: Ph.D. (N.S.W)
School: Computer Science and Engineering Faculty: Engineering
Title: Cost Implications and Size Estimation of Cloud Migration Projects with Cloud Migration Point
Abstract 350 words maximum:
Cloud computing has been a buzz word over the last decade - it offers great potential benefits for enterprises who migrate their
computing systems from local data centers to a Cloud environment. One major obstacle to enterprise adoption of Cloud
technologies has been the lack of visibility into migration effort and cost. Currently, there is very limited existing work in the
literature. This thesis improves our understanding of this matter by identifying critical indicators of Cloud migration effort.
A taxonomy of migration tasks to the Cloud has been proposed, outlining possible migration tasks that any migration project to the
Cloud may encounter. It enables Cloud practitioners to gain an understanding of the specific tasks involved and its implication on
the amount of effort required. A methodology, called Cloud Migration Point (CMP), is presented for estimating the size of Cloud
migration projects, by recasting a well-known software size estimation model, Function Point, into the context of Cloud migration.
The CMP value implies how large the migration project is, and it can be used as an indicator for Cloud migration effort estimation.
The process of calculating CMP also assists one in itemizing the migration tasks, and identifying the complexity of each task. This
is useful for project planning and management. The empirical validation on the set of data points collected from our survey shows
that, with some calibrations, the CMP metric is practically useful as a predictor for effort estimation under a defined set of
assumptions. Besides size measurement, other factors also influence the migration effort. We propose a list of external cost
factors, which do not affect how migration tasks are designed, but may affect how fast migration tasks can be done, such as
development team's experience in software engineering, or experience with the Cloud.
Our overall contribution is to shed light into Cloud migration and the tasks involved, which enables Cloud practitioners to estimate
the amount of effort required for the migration of legacy systems into the Cloud. This contributes towards the cost-benefit analysis
of whether the benefits of the Cloud exceed the migration effort and other Cloud costs.
Declaration relating to disposition of project thesis/dissertation
I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in
part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all
property rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.
I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstracts International (this is applicable to doctoral
theses only).
…………………………………………………………… ……………………………………..…………… ……… …………………...…….…

Signature Witness Date
The University recognises that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for
restriction for a period of up to 2 years must be made in writing. Requests for a longer period of restriction may be considered in exceptional
circumstances and require the approval of the Dean of Graduate Research.
FOR OFFICE USE ONLY Date of completion of requirements for Award:
THIS SHEET IS TO BE GLUED TO THE INSIDE FRONT COVER OF THE THESIS

Abstract
Cloud computing has been a buzz word over the last decade - it offers
great potential benefits for enterprises who migrate their computing

systems from local data centers to a Cloud environment. One major
obstacle to enterprise adoption of Cloud technologies has been the

lack of visibility into migration effort and cost. Currently, there is
very limited existing work in the literature. This thesis improves our
understanding of this matter by identifying critical indicators of Cloud

migration effort.
A taxonomy of migration tasks to the Cloud has been proposed, out-

lining possible migration tasks that any migration project to the Cloud
may encounter. It enables Cloud practitioners to gain an understand-
ing of the specific tasks involved and its implication on the amount of
effort required. A methodology, called Cloud Migration Point (CMP),
is presented for estimating the size of Cloud migration projects, by re-
casting a well-known software size estimation model, Function Point,
into the context of Cloud migration. The CMP value implies how
large the migration project is, and it can be used as an indicator for
Cloud migration effort estimation. The process of calculating CMP
also assists one in itemizing the migration tasks, and identifying the
complexity of each task. This is useful for project planning and man-
agement. The empirical validation on the set of data points collected

from our survey shows that, with some calibrations, the CMP metric
is practically useful as a predictor for effort estimation under a defined

set of assumptions. Besides size measurement, other factors also in-
fluence the migration effort. We propose a list of external cost factors,
which do not affect how migration tasks are designed, but may affect
how fast migration tasks can be done, such as development team’s
experience in software engineering, or experience with the Cloud.
Our overall contribution is to shed light into Cloud migration and

the tasks involved, which enables Cloud practitioners to estimate the
amount of effort required for the migration of legacy systems into the
Cloud. This contributes towards the cost-benefit analysis of whether
the benefits of the Cloud exceed the migration effort and other Cloud
costs.
Dedication
To my parents and Dan

Acknowledgements
I am most indebted to my two supervisors Dr. Anna Liu and Dr. Ray-
mond Wong for their guidance and close supervision over the years.
Dr. Raymond Wong was very encouraging and patient in walking me
through the very first steps in my research journey. Dr. Anna Liu
has inspired me in so many ways. Her tremendous support, care and
understanding made it possible for me to continue this research. I am
especially grateful to my co-supervisor, Dr. Jacky Keung. He has al-

ways guided me in the right direction and his constant support made
me feel confident to complete this thesis. Through my thesis-writing
time, he spent many hours proofreading and providing me with criti-
cal feedback. I wish to express my gratitude and thanks to Professor
Alan Fekete and Kevin Lee, whose constructive and insightful feed-
back have benefited this research and myself in many ways. I would
like to give my sincere thanks to Professor Barbara Kitchenham for
her reviews and expert advices that made great improvements to this
research. This thesis will not be possible without their encouragement
and support.
I would like to thank my colleagues and friends from NICTA, espe-

cially Liang and Sadeka, for accompanying and sharing with me all
ups and downs during our Ph.D. journey. I thank my best friends
for their wonderful friendship, especially Jensyn for spending a lot of
time proofreading my thesis and correcting many grammar mistakes,

Yolanda for her enormous care and emotional support, and our bad-
minton group for all the entertainment and sport activities that got
me through the difficult times.
Last but not least, I am extremely grateful to my beloved parents, my
little sister, my brother and his family, who have always believed in
me and supported me unconditionally; and my husband, Daniel, for
his endless love and for always being there for me during both happy
and hard times. To them I dedicate this thesis.

Preface
Publications that have contributed to this thesis
• Van Tran, Jacky Keung, Anna Liu, and Alan Fekete: “Applica-
tion Migration to Cloud: A Taxonomy of Critical Factors”, in
Proceedings of the 2nd International Workshop on Software En-
gineering (SECLOUD ’11), Honolulu, Hawaii, USA, May 2011,

pp 22-28.
• Van Tran, Kevin Lee, Alan Fekete, Anna Liu, Jacky Keung:
“Size Estimation of Cloud Migration Projects with Cloud Mi-
gration Point (CMP)”, in Proceedings of the 5th International
Symposium on Empirical Software Engineering and Measure-
ment (ESEM ’11), Banff, Alberta, Canada, Sep 2011, pp 265

- 274.
Funding and Grants that have supported the work in this
thesis
• The NICTA International Postgraduate Award (NIPA) Scholar-
ship
• The NICTA Research Project Award (NRPA) Scholarship
• Amazon Research Grant

Contents
List of Figures xv
List of Tables xvii
Glossary xxi
1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Cloud Computing and Its Offerings . . . . . . . . . . . . . 2
1.1.2 The Urge to Migrate to the Cloud . . . . . . . . . . . . . . 12
1.1.3 The Essentials of Effort Estimation . . . . . . . . . . . . . 14
1.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 Research Problem and Aims . . . . . . . . . . . . . . . . . . . . . 18
1.4 Research Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5 Research Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.6 Organisation of the Thesis . . . . . . . . . . . . . . . . . . . . . . 22
2 Literature Review 25
2.1 Cloud Migration Solutions . . . . . . . . . . . . . . . . . . . . . . 26
xi
CONTENTS
2.1.1 Decision Making Support . . . . . . . . . . . . . . . . . . . 26
2.1.2 Experience Reports . . . . . . . . . . . . . . . . . . . . . . 29
2.1.3 Cloud Migration Concerns . . . . . . . . . . . . . . . . . . 30
2.2 Effort Estimation in Traditional Software Engineering . . . . . . . 33
2.2.1 Analogy Approach . . . . . . . . . . . . . . . . . . . . . . 33
2.2.2 Expert Judgement Approach . . . . . . . . . . . . . . . . . 34
2.2.3 Algorithmic Model Approach . . . . . . . . . . . . . . . . 35
2.3 Software Size Estimation in Traditional Software Engineering . . . 36
2.3.1 Source Lines of Code (SLOC) . . . . . . . . . . . . . . . . 36
2.3.2 Function Point . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.3 Function Point Extensions . . . . . . . . . . . . . . . . . . 39
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3 Research Methodology 47
3.1 Cloud Migration Experiments . . . . . . . . . . . . . . . . . . . . 48
3.1.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . 49
3.1.2 Data Collection Strategy . . . . . . . . . . . . . . . . . . . 50
3.2 Discussion with Cloud Engineers . . . . . . . . . . . . . . . . . . 51
3.2.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.2 Discussion Protocols . . . . . . . . . . . . . . . . . . . . . 53
3.2.3 Data Collection and Analysis . . . . . . . . . . . . . . . . 54
3.3 Survey Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.2 Survey Design . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . 59
xii
CONTENTS
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4 Taxonomy of Migration Tasks to the Cloud 63
4.1 Taxonomy in other contexts . . . . . . . . . . . . . . . . . . . . . 65
4.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.1 Measured Data and Observations . . . . . . . . . . . . . . 68
4.3 Migration Influential Cost Factors . . . . . . . . . . . . . . . . . . 73
4.4 Taxonomy of Migration Tasks . . . . . . . . . . . . . . . . . . . . 76
4.5 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.6 Reflection and Discussion . . . . . . . . . . . . . . . . . . . . . . 89
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5 Cloud Migration Point 95
5.1 CMP Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2 Cloud Migration Cost Factors . . . . . . . . . . . . . . . . . . . . 98
5.3 Cloud Migration Project Classification . . . . . . . . . . . . . . . 102
5.4 Cloud Migration Point . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4.1 Network Connection Component: CM Pconn . . . . . . . . 108
5.4.2 Code Modification Component: CM Pcode . . . . . . . . . . 110
5.4.3 Installation and Configuration Component: CM Pic . . . . 114
5.4.4 Database Migration Component: CM Pdb . . . . . . . . . . 117
5.4.5 CMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.5 CMP Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.6 Reflection and Discussion . . . . . . . . . . . . . . . . . . . . . . 124
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
xiii
CONTENTS
6 Validation 129
6.1 Theoretical Validation . . . . . . . . . . . . . . . . . . . . . . . . 131

6.2 Empirical Validation - Phase 1 . . . . . . . . . . . . . . . . . . . . 134
6.2.1 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . 135

6.2.2 Leave-One-Out Cross Validation . . . . . . . . . . . . . . . 136
6.2.3 Ordinary Least Square Regression Analysis . . . . . . . . . 137
6.2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

6.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.5 CMP Parameters Calibration . . . . . . . . . . . . . . . . . . . . 147
6.5.1 CMP Components’ Assumptions . . . . . . . . . . . . . . 150
6.5.2 The Calibration Process . . . . . . . . . . . . . . . . . . . 159

6.7 Threats of Validity and Discussion . . . . . . . . . . . . . . . . . 168
6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
7 Conclusions and Future Directions 171

7.1 Research Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.2 Research Contribution . . . . . . . . . . . . . . . . . . . . . . . . 180
7.3 Research Limitation . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.4 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . 185
Bibliography 189
A Cloud Migration Projects - Survey Questionnaire 209
B Survey Responses - Raw Data 223
xiv
List of Figures
1.1 Three main Cloud services layers . . . . . . . . . . . . . . . . . . 5

1.2 Major Cloud service providers . . . . . . . . . . . . . . . . . . . . 7
1.3 Cloud Computing - Google Trends . . . . . . . . . . . . . . . . . 12

1.4 Cost and benefit of migrating existing applications into the Cloud 16
3.1 Steps of the Research Process and Thesis . . . . . . . . . . . . . . 48
4.1 Migration Overhead Cost . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Diagram of Cloud migration task taxonomy . . . . . . . . . . . . 79
6.1 The boxplots for the six training datasets of variable CMP . . . . 138
6.2 The scatter plots for OLS regression . . . . . . . . . . . . . . . . . 139
xv
LIST OF FIGURES
xvi
List of Tables
1.1 Pricing model comparision: Service charge (Amazon, 2009; Mi-

crosoft, 2009; Google, 2009) . . . . . . . . . . . . . . . . . . . . . 10
1.2 Pricing model comparision: Storage Cost (Amazon, 2009; Microsoft,
2009; Google, 2009) . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Outline of the research approach . . . . . . . . . . . . . . . . . . . 21
3.1 Mapping between research questions and questionnaire . . . . . . 59
4.1 Recorded overhead efforts of preparing PetShop for migration . . 70
4.2 Recorded overhead efforts of putting PetShop to Cloud platform . 71
4.3 Taxonomy of migration tasks . . . . . . . . . . . . . . . . . . . . 78
4.4 Mapping of the FSO migration tasks and the taxonomy . . . . . . 88
4.5 Efforts comparison for migrating to PaaS and IaaS Clouds . . . . 90
5.1 System’s states before and after migration . . . . . . . . . . . . . 103
5.2 Complexity evaluation for each connection . . . . . . . . . . . . . 109
5.3 Evaluating CMPconn . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.4 Elements of each changed class . . . . . . . . . . . . . . . . . . . 111
5.5 Complexity evaluation for each class . . . . . . . . . . . . . . . . 113
xvii
LIST OF TABLES
5.6 Evaluating CMPcode . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.7 Complexity evaluation for each IC task . . . . . . . . . . . . . . . 116
5.8 Evaluating CMPic . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.9 Complexity evaluation for each database task . . . . . . . . . . . 118
5.10 Evaluating CMPdb . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.11 Weighted values of CMP’s components . . . . . . . . . . . . . . . 119
5.12 Code changes for PetShop . . . . . . . . . . . . . . . . . . . . . . 122
5.13 Installations for PetShop . . . . . . . . . . . . . . . . . . . . . . . 123
5.14 Database Migration for PetShop . . . . . . . . . . . . . . . . . . . 123
5.15 CMP components for PetShop . . . . . . . . . . . . . . . . . . . . 124
6.1 Empirical validation data points . . . . . . . . . . . . . . . . . . . 137
6.2 Phase 1 - OLS Regression Analysis . . . . . . . . . . . . . . . . . 140
6.3 Phase 1 - Results Evaluation . . . . . . . . . . . . . . . . . . . . . 141
6.4 Data points from surveys and interviews . . . . . . . . . . . . . . 142
6.7 Number of data points available to calibrate each weight of the
CMP model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.8 Data points for calibrating network connection component weights 160
6.9 Multiple Regression Coefficient Result for CM Pconn . . . . . . . . 160
6.10 Multiple Regression Coefficient Result for CM Pcode . . . . . . . . 161
6.11 Multiple Regression Coefficient Result for CM Pic . . . . . . . . . 161
6.12 Regression Coefficient Result for CM Pconn . . . . . . . . . . . . . 162
6.13 Regression Coefficient Result for the Final CMP . . . . . . . . . . 162
xviii
LIST OF TABLES
6.14 New dataset - calculated from the new set of calibrated weights . 163

B.1 Survey responses for network connection component . . . . . . . . 224
B.2 Survey responses for code modification component . . . . . . . . . 225

B.3 Survey responses for installation and configuration component . . 226
B.4 Survey responses for database migration component . . . . . . . . 227

B.5 Survey responses for external cost factors . . . . . . . . . . . . . . 228
xix
GLOSSARY
xx
FP Function Point
FPA Function Point Analysis
GAE Google App Engine
GSC General System Characteristics

Glossary IaaS Infrastructure-as-a-Service
IDE Integrated Development Environ-

ment
ILF Internal Logical File

API Application Programming Inter-
face IO Input Output
AWS Amazon Web Service IP Internet Point
CMP Cloud Migration Point IT Information Technology
COSMIC FFP COSMIC Full Function LOC Line of Code
Point MKIIFP Mark II Function Point
CPU Central Processing Unit MS Microsoft
CRM Customer Relationship Manage- NICTA National ICT Australia

ment
OO Object-Oriented
EBS Elastic Block Store
OOFP Object-Oriented Function Point
EC2 Elastic Compute Cloud

OP Object Point
EI External Input PaaS Platform-as-a-Service
EIF External Interface File PC Personal Computer
EO External Output RDS Relational Database Service
EQ External Inquiry S3 Simple Storage Service
FFP Full Function Point SaaS Software-as-a-Service
xxi
GLOSSARY
SDK Software Development Kit VAF Value Adjustment Factor
SLOC Source Line of Code

WO Web Object
UCP Use Case Point
WP Web Point
UFP Unadjusted Function Point
xxii
Chapter 1
Introduction
“If you can’t measure it, you can’t manage it”
∼ Tom DeMarco paraphrasing Lord Kelvin.
Cloud computing has recently been the focus of much excitement in the
IT1 community, seen by some as the next platform shift (Erdogmus, 2009), with
impact on enterprise computing that could compare to the change from main-
frames to minicomputers, or from desktop PCs to networked systems. Major

vendors are taking on Cloud computing as a crucial strategy, governments are
discussing national agendas for the coming shift, and start-ups are growing to fill
niches (Mudge, 2010).
While some software is written from scratch specially for the Cloud, many
organizations also wish to migrate existing applications and systems to a Cloud

platform. Such a migration exercise to a Cloud platform is not easy: some
changes need to be made to deal with differences in software environment, such

1
For all abbreviations see the glossary on page xxii.
1
1. INTRODUCTION
as programming model and data storage APIs, as well as varying performance
qualities. An indication of how much effort is anticipated for the migration process
is important for project management, particularly project scheduling and budget
planning. This stimulating context strongly motivates us1 to investigate further

on critical factors of the effort required for the migration process to the Cloud.
This introductory chapter is structured as follows: The background of Cloud

computing and the motivation of this research will be elaborated in Section 1.1.
Some common terms that are used through this thesis will be clarified in Section
1.2. A broad overview of our work will be presented in Sections 1.3 and 1.4.
Section 1.5 will introduce our research methodology. A general layout of how this
thesis is structured will be provided in Section 1.6.
1.1 Background and Motivation
Cloud computing is an attractive environment to enterprises for its distinguished

features and various offerings. As a result, many organizations have expressed
their interest in deploying their computing systems in the Cloud to take advantage
of the potential benefits it offers.
1.1.1 Cloud Computing and Its Offerings
Since its emergence over the last decade, Cloud computing has been well rec-
ognized for its abilities to provide virtualized resources and services, such as
infrastructure, platform, and software (Vaquero et al., 2009a; Armbrust et al.,

1
In this thesis, I use “we” to acknowledge the contributions of my colleagues. However, I
am the main author of all publications that make up the content of this thesis.
2
2009). It is commonly known as a computing paradigm that delivers resources
and services to computers over the Internet.
One of the attractions for an organization using Cloud resources, rather than
those in an enterprise-scale data center, is that it can enjoy cost savings through
larger economies of scale, since the costs of hardware, power, buildings and admin-
istrative support are typically about 5 times lower for internet-scale systems than
for enterprise-scale ones (SalesForce, 2012; Aggarwal & McCabe, 2009). Even
more significant to a rapidly-growing business is the elasticity of costs; instead of
the up-front purchase of an overprovisioned system, one can pay a Cloud provider
ongoing fees that are low at first, and that smoothly increase as and when the
system needs more capacity. Therefore, Cloud users are neither required to plan
for provisioning nor tied to huge up-front commitment on hardware resources and
infrastructures. This enables companies to start small and acquire more resources
only when needed on short-term basis (e.g. hourly processors and daily storage),
and reward conservation by releasing computing machines and storage when they
are no longer required (Armbrust et al., 2009).
For established businesses, there is the potential to use the Cloud as an ad-
ditional resource (alongside existing data centers) to deal with bursts of load,
perhaps seasonal, or due to intermittent activities such as stress testing. Here
the Cloud allows the client to delay the large commitment of funds needed to
scale-up the hardware. Many applications are not extensively used all the time,
but more often than not, they are under-utilized. In other words, the resource
usage pattern is not stable over time. There are times when resource usage stays
idle, while there are other times (peak times) when it is heavily used. In or-
der to accommodate those peak-time usages, enterprises have no better choice
3
1. INTRODUCTION
but to invest a huge amount of resources to be ready for peak periods, which at
other times, stays idle and wasted. Cloud providers address this issue by their
on-demand resource offers. Cloud consumers pay only for the resources they use
during an average period, while over peak times they can obtain additional re-
sources on demand. For example, online shopping systems are of normal use over
the year, which may accumulate up to around 2-3 months worth of resource usage
in total. With the Cloud, they will have to pay for resources of that 2-3 month
actual usage only, rather than overpaying for a whole year if applications were
managed in house. During the Christmas period, resource demand may increase
more than 10 times than normal and can be accommodated by Cloud providers
promptly. After peak times, resources are released back to the providers, and
charges drop back to normal because of the pay-per-use pricing model of the
Cloud (Armbrust et al., 2009).
Cloud Services:
Cloud computing has been seen to offer a wide variety of services, such as
application services, storage services, compute services, and database services
(Amazon, 2009; Google, 2009; Microsoft, 2009; Agrawal et al., 2009; Armbrust
et al., 2009; Buyya et al., 2008; Chang et al., 2006; Chappell, 2008; Ghemawat
et al., 2003; Palankar et al., 2008). These services are accommodated by different
Cloud technologies. Understanding the Cloud technology stacks and their inter-
relations enables the Cloud community to provide better solutions, portals and
gateways for the Cloud, which facilitate the adoption of this emerging computing
paradigm. Hence, there exist several attempts to create a reference model of
Cloud computing (Ji et al., 2009; Mikkilineni & Sarathy, 2009; Youseff et al., 2008;
4
Lenk et al., 2009) to classify Cloud technologies and services into different layers.
Different proposals tackle different aspects of the Cloud ontology; however, they
all use the same basic model with three main common layers: Infrastructure-as-
a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS)

(Figure 1.1).
Figure 1.1: Three main Cloud services layers
The three layers in Figure 1.1 are described as follows:
• IaaS: The infrastructure layer provides basic physical resources and data
storage with virtualization services. The physical units are hardware re-
sources, such as CPU, memory, storage and network devices. Virtualization
software are required for this layer to provide Cloud users with a highly
scalable and manageable basic environment. An example of this layer is
Amazon EC2 and Amazon S3 (Amazon, 2009).
• PaaS: The platform layer works independently with physical resources from
the infrastructure layer. This increases the scalability of the Cloud. This
layer includes components such as:
5
1. INTRODUCTION
– Kernel - managing the infrastructure resources;
– Distributed file system - a network file system with data distributed
across multiple physical nodes (e.g. Google File System, Hadoop Dis-
tributed File System);
– Cloud IO - facilitating data exchange with various kinds of data pro-

tocols;
– Computing driver and engine - providing domain-specific utilities;
– Management and UI interface - providing management console and
interface to the cloud.
Examples of this layer are Google App Engine and Microsoft Azure (Google,
2009; Microsoft, 2009).
• SaaS: The application layer hosts business domain specific applications,

which can be system applications that provide services to other applications,
or user applications that aim at the Cloud end-users. The application layer
is the most visible interface to the Cloud for end-users. The applications are
deployed at the Cloud providers’ computing infrastructure, and the users

access the applications through web-portals. There are usually fees charged
for usage.
Cloud Vendors:
There are a few major Cloud vendors in the market, such as Amazon, Mi-
crosoft, Google, Salesforce, Rackspace, or GoGrid. Each vendor offers several

services, ranging from IaaS, PaaS to SaaS, from compute to storage, from rela-
tional to NoSQL databases. In this section, we will describe three main providers
6
- Amazon, Microsoft, and Google (Figure 1.2) (Amazon, 2009; Microsoft, 2009;
Google, 2009).
Figure 1.2: Major Cloud service providers
• Amazon: Amazon offers an IaaS solution for Cloud computing called Ama-
zon Web Services (AWS) (Amazon, 2009). AWS provides a range of Cloud-
based services, including compute services (e.g., Amazon Elastic Compute

Cloud (EC2), Auto Scaling), content delivery services (e.g., Amazon Cloud-
Front), database services (e.g., Amazon Relational Database Service (RDS),
Amazon DynamoDB, Amazon SimpleDB), and storage services (e.g., Ama-
zon Simple Storage Service (S3), Amazon Elastic Block Store (EBS)).
Amazon EC2 is a central service of AWS. It offers a virtual computing envi-
ronment, where users can launch instances with various operating systems
of their choice. The users are given complete freedom to manage their ap-
plication environment running on EC2 instances. The highlighted feature
of EC2 is the high elasticity where computing capacity (i.e., number of
instances) can be increased or decreased on-demand within minutes.
Database services from AWS are also well-known to Cloud users. Ama-
zon RDS provides a full-featured relational database (MySQL or Oracle
7
1. INTRODUCTION
databases) running on an Amazon RDS database instance. Connections
to the databases can be established in the traditional manner with any

database tools or programming languages. On the other hand, Amazon
SimpleDB is a non-relational database optimized for high availability and

flexibility. Data is automatically indexed and geographically distributed
to enable high availability and data durability. Without any database ad-
ministration burden, users can fully focus on value-added application de-

velopment. Last, but not least, Amazon S3 is also worth discussing for its
offering as a highly scalable storage service over the Internet. This service
enables users to store and retrieve data of any size, at any time, from any
where on the web.
• Microsoft: Microsoft provides Windows Azure Platform as a PaaS solution
for Cloud computing, which is hosted in Microsoft data centers (Microsoft,

2009). Windows Azure Platform facilitates applications to run on Microsoft
data centers, and provides a Software Development Kit (SDK) to develop
these applications. The applications running on Windows Azure Platform
can be delivered as a SaaS for its flexibility and scalability features. Devel-
oping applications for Windows Azure environment is much like developing
programs for standard Windows applications on a local environment. New
developers to the system are supported through templates provided for
Azure applications as part of the Azure SDK Visual Studio 2008.
Cloud services provided by Microsoft also includes SQL Azure Database, a
highly available and scalable cloud-based relational database service. SQL

Azure is built on SQL Server technologies; hence, it provides a full-featured
8
relational database and can be synchronised with on-premises SQL Server
databases.
• Google: Similar to Microsoft, Google also provides a PaaS solution called
Google App Engine (GAE), hosted in Google’s existing infrastructure (Google,

2009). Google provides App Engine SDK for two languages: Java and
Python. This SDK is available as a plugin for Eclipse, the most commonly
used Integrated Development Environment (IDE) for Java. Any of the Java
common features are supported, as long as there is no interference with the

sandbox limitation. Most GAE services can be accessed using standard
Java APIs. Python is also accommodated in a similar manner to Java.
Cloud services provided by Google also includes Datastore, a schemaless

distributed data storage service that includes a powerful query engine and
support for transactions. The Datastore is a non-relational database, opti-
mized for read speed. Development and maintenance on the Datastore is
done via Java or Python APIs.
Cloud Pricing Models:

Cloud vendors charge users on normal usage, including data storage, com-
putational machine (per hour) and data transfer into and out of the cloud. In
addition, charges may occur for additional administrative tools and services such
as resource analysis and monitoring (e.g. CloudWatch from Amazon). Different

changes are also applied for different types of Cloud systems: SaaS charges its
users for subscription fees; IaaS may charge developers for software licenses, or a
group license if the software is installed in multiple instances.
9
1. INTRODUCTION
Tables 1.1 and 1.2 show a comparative pricing model for some main charges
in computational and storage cost of the three major Cloud providers: Amazon,
Microsoft and Google (as of Jannuary 2012).
Cloud vendor Service Charge

Amazon Standard Linux Instances
- Small: $0.085/hr, Large: $0.34/hr, Extra Large: $0.68/hr
Standard Windows Instances
- Small: $0.12/hr, Large: $0.48/hr, Extra Large: $0.96/hr
Microsoft Win- Compute Instances - Small: $0.12/hr, Medium: $0.24/hr,
dows Azure Large: $0.48/hr, Extra Large: $0.96/hr
Google App En- CPU Time: $0.10/CPU hr
gine
Table 1.1: Pricing model comparision: Service charge (Amazon, 2009; Microsoft,
2009; Google, 2009)
This pricing model of Cloud computing demonstrates its pay-as-you-go man-

ner, which is more attractive than normal web hosting services, where users are
charged a fixed monthly or yearly fee. Moreover, additional costs are very likely
to occur when using local servers, such as operational costs or upgrade and main-
tenance costs. Operational costs include power and electricity cost, premises
rental cost, administration staff cost, networking infrastructure cost, and so on.
Upgrade and maintenance costs include new hardware and middleware costs, new
software costs, new license costs and additional labor costs for installation and
configuration.
When a Cloud is made accessible to the public via its pay-per-use pricing mod-
els, it is known as Public Cloud; whereas internal datacenters of an organization
that are not available to the public are referred to as Private Cloud (Armbrust
et al., 2009).
10
Cloud Storage Cost Data Transfer Cost I/O

vendor Cost
- 50TB: $0.125/GB/mth - In: $0.10/GB
- Out:
No charge
AWS S3 - Over 5000TB: + free first 1GB/mth
for I/O
$0.055/GB/mth + US&EU: $0.15/GB
+ AP: $0.19/GB
AWS - $0.10/GB/mth $0.10/1M
- N/A
EBS - Snapshots: $0.15/GB/mth requests
AWS - $0.10/GB/mth (plus hourly $0.10/1M
- Same as AWS S3
RDS CPU cost) requests
- Backup: $0.15/GB/mth
- In:
+ US&EU: $0.10/GB
Azure + AP: $0.30/GB $0.01/10K
- $0.15/GB/mth
Storage - Out: requests
+ US&EU: $0.15/GB
+ AP: $0.45/GB
Azure - Up to 1GB: $10/mth - Same as Azure No charge
SQL - Up to 50GB: $500/mth Storage for I/O
GAE - First 1GB: free - In: $0.10/GB No charge
Blobstore - $0.15/GB/mth - Out: $0.12/GB for I/O
Table 1.2: Pricing model comparision: Storage Cost (Amazon, 2009; Microsoft,
2009; Google, 2009)
Summary:
The key to success of Cloud computing is that it provides a win-win approach
for both providers and users. Cloud computing offers scalable computing re-
sources available on demand without up-front commitment for its users, freeing
users from the burden of software/hardware installation or configuration, and
costing less than a medium-sized datacenter, while still generating a good profit
for Cloud providers (Armbrust et al., 2009).
11
1. INTRODUCTION
1.1.2 The Urge to Migrate to the Cloud
The attractive offerings of Cloud computing, as discussed in the previous sec-
tion, have encouraged many organizations to seriously consider utilizing Cloud

computing solutions for their IT needs. Figure 1.3 below, from Google Trends
(Google, 2011), reflects that, with a graph showing an increasing interest in Cloud
computing over the last several years, starting from late 2007 till present. The
top graph is the search volume index in Google search engine for “Cloud Comput-
ing”, which represents how many searches have been done for this term, relative
to the total number of searches done on Google over time. The bottom graph
shows its news reference volume over years, which represents the number of times
it appeared in Google News stories.
Figure 1.3: Cloud Computing - Google Trends
As has been discussed in many articles, papers, case studies, and blogs, there
are many ways one can use Cloud services (Kundra, 2010; Khajeh-Hosseini et al.,
2010a; Hajjat et al., 2010; Ward et al., 2010). For example, many of the most
famous stories of Cloud computing have been about startups with explosive
growth, where the organization wrote or rewrote software specially to run in
12
the Cloud (Microsoft, 2012; Amazon, 2011). One can also take advantage of
already-deployed Cloud applications, or Cloud-enabled systems in the form of

SaaS, such as Google Docs, or some online Customer Relationship Management
(CRM) applications. However, there are cases where an organization has exist-
ing application software and wants to run this on a Cloud platform. Instead of
a complete rewrite, one could say that they are “migrating” the software from a
traditional platform such as .NET or J2EE, to a Cloud-based one such as Amazon

EC2 or Microsoft Windows Azure.
The migration case is quite practical and popular, since it is likely that cur-
rently operating businesses already have their own IT systems developed and in
use, whereas Cloud computing is relatively new. A migration project to the Cloud
can be carried out in various forms, as described in the illustrative case studies
at the Federal, state and local government levels of the United States (Kundra,
2010). For example, since 2009, the Department of Energy has been exploring
cost and energy efficiencies from leveraging Cloud computing, such as, deploying
mailboxes on Google Federal Premier Apps, Google Docs and Google Sites, as
well as evaluating the use of Amazon EC2 to handle peak usage periods. This
migration spreads over a wide range of migration activities, from SaaS to IaaS
Cloud. Other case studies, such as the City of Miami, Florida, described their
decision to use Windows Azure platform for on-demand hosting in Microsoft data
centers. This type of migration is different from the case of the Department of
Energy mentioned previously, since no installation or environment set up is re-

quired for the PaaS Cloud, but certain modifications must be done to align the
migrated systems with the Cloud offerings.
Many papers have also illustrated case studies where enterprises are keen on
13
1. INTRODUCTION
migrating their IT systems to the Cloud. Khajeh-Hosseini et al. (2010a) presents
a case study of a UK-based organization that provides IT solutions for the Oil and
Gas industry. This organization was considering deploying one of their primary
service offerings to Amazon EC2 because they preferred no modifications to their

application code. This migration was analyzed to be more cost effective for the
organization, although only infrastructure costs were considered.
Hajjat et al. (2010) describes the migration process of an Enterprise Resource
Planning application used in a large university with tens of thousands of students,

and several thousand faculty and staff. The application was planned to migrate
to Windows Azure Cloud. The migration strategy considered various aspects of
the systems, such as databases and networking.
These are some representative examples to illustrate how enterprises are en-
couraged to move to the Cloud. More detailed discussion will be presented in
Chapter 2.
Besides that, many popular practitioners’ blogs (Hamilton, 2011; Linthicum,

2011; Chappell, 2011) also discuss different migration scenarios to the Cloud.
Generally, the status of IT systems ported to the Cloud is quite active and in-
creasingly popular.
1.1.3 The Essentials of Effort Estimation
Although the migration process is a one-off task, it is not automatic, as can be seen
from the above migration examples. Because some installations in the IaaS Cloud
must be done or modifications to the existing systems are unavoidable, and the
amount of effort required could be significant. This effort is due to discrepancies
14
between the environment provided by a Cloud platform, and that in a traditional
platform (Verma et al., 2011). There are often differences in the version of various
infrastructure components, the programming models, the libraries available, and
even the semantics of data access; for example, Cloud platforms typically provide
eventual consistency rather than transactional guarantees. All these extra tasks
of the migration process to the Cloud may not be as easy and straightforward as
one might think.

As effort is required for undertaking those tasks to migrate an IT system to
the Cloud, and the amount of effort required is diverse, early effort estimation for
a migration project to a Cloud platform is essential for its project management,
particularly project scheduling and budget planning.
Migration costs also contribute towards the Overhead Cost component (Figure
1.4) of the cost-benefit analysis (Carriere et al., 2010; de Assuncao et al., 2009)
and decision making process on whether it is worthwhile to migrate a system to
the Cloud.
Figure 1.4 illustrates the analysis of cost and benefits in two options: (1)
migrating an existing application to the Cloud, and (2) keeping the application
on premise. If one decides to go with option (1), one has to pay a total cost of:
application development cost, migration cost (or overhead cost), and on-going
cost paid to the Cloud providers. Otherwise, keeping the application in-house
incurs costs of application development (which is similar to option (1)), and
operational and maintenance costs (Carriere et al., 2010; de Assuncao et al.,
2009).
Weighing up the two options, if:
Overhead Cost + Pay-as-you-go Cost < Operational and Maintenance Cost
15
1. INTRODUCTION
Figure 1.4: Cost and benefit of migrating existing applications into the Cloud
then migrating the application to the Cloud is a wise move. Otherwise, keep-
ing the application in house is more beneficial.
The Overhead Cost component plays an important role in this analysis, and
it is essentially the cost made up from the migration effort. Hence, early effort
estimation is a vital part of this process.
1.2 Definitions
There exists related work on migrating a system to the Cloud; however, the notion
of Cloud migration can still vary, such as in (Suen et al., 2011), it refers to the live
migration of virtual machine images between different Cloud providers, as well
as between private and public Cloud offerings. Hence, it is worth to clarify the
meaning of Cloud migration concepts as well as some other common terms used
throughout this thesis. These definitions were defined based on this research’s
16
1.2 Definitions
activities.
Definition 1 Cloud migration

Cloud migration refers to the activities of moving an IT system or an appli-
cation from local data centers to the Cloud, without sacrificing any performance
attributes. The system can be migrated to the Cloud partially (i.e., only a part
of the system is moved to the Cloud, the rest is still hosted in-house, and the two
parts must working seamlessly together), or as a whole (i.e., the whole system is
ported to the Cloud). The former is called a partial migration, and the latter is
called a full migration.
Definition 2 Cloud migration project

Cloud migration project refers to the process of migrating a system to the
Cloud.
Definition 3 Migrating system/application

Migrating system/application refers to the system/application to be migrated
to the Cloud.
Definition 4 Migration task
A migration task is a defined migration activity within a migration project.
For example, when migrating a Microsoft SQL Server database to SQL Azure,
moving the data is a migration task, and any changes to the database schema are
also called migration tasks.
Definition 5 Migration cost and migration effort
Migration cost and migration effort are used interchangeably in this thesis.
They both refer to the amount of effort spent on migration activities.
17
1. INTRODUCTION
Definition 6 Overhead cost
Overhead cost is used in our analysis of cost and benefit of migrating a sys-
tem to the Cloud. The overhead cost refers to the cost of the actual migration
activities. It is equivalent to migration cost or migration effort.
Those are some common concepts that will be used regularly in this thesis.
There will be more concepts clarified in later chapters where relevant.
1.3 Research Problem and Aims
The decision to migrate applications to Cloud platforms requires various factors,

one of which is understanding of its cost implication (in terms of the amount of
effort required). This is challenging because:
• Applications vary in many dimensions, such as size, complexity, function-

ality, and requirements.
• Migration projects to the Cloud vary in types of Cloud (IaaS or PaaS

Clouds), migration requirements (migration the application to the Cloud
as a whole or just partial), and so on.
• Cloud computing is relatively new and different from the traditional soft-
ware engineering paradigm in many aspects, such as characteristics, pricing
models, and security aspects. Porting an application from a traditional
platform to the Cloud may require changes to the application itself or to

the Cloud environment.
To the best of our knowledge, at the time of writing, no effort estimation ap-
proaches have been specifically designed for Cloud migration projects. Existing
18
1.4 Research Scope
traditional effort estimation approaches for software development are not appli-
cable in this context, because the measures employed as predictors in traditional

approaches do not cover all typical features of a migration project to the Cloud.
These features will be discussed further in Chapters 4 and 5.
The overall objective of this thesis, which is to identify cost implications of
migration to Cloud, requires a clear understanding of how migration projects take
place. This strongly motivates us to, firstly, understand and evaluate the critical
cost factors of the migration process, in order to estimate how much effort would
be needed. Amongst those factors, size measurement of the migration project is

considered one of the most important indicators of effort estimation. Hence, the
second aim of our research is to build a size estimation model, which estimates
how large a migration project to the Cloud is, and which will serve as a basic
indicator for effort estimation approaches.
The specific research questions can be identified as:
• RQ1: What activities are needed to migrate a software system to the Cloud?
• RQ2: How can these activities be classified?
• RQ3: What are the cost implications (in terms of staff effort) of those tasks?
1.4 Research Scope
The focus of this thesis is constrained by the following issues:
• It is important to identify the boundaries of a migration project to the

Cloud. A migration project to Cloud starts with an existing application
19
1. INTRODUCTION
or system, either completely in-house, or partially in-house and partially in
the Cloud. The project ends with the same application or system, either
completely or partially migrated to the Cloud.
• In a migration process, no new functionality is added, and performance

must be preserved (or improved without much tuning). The focus of this
thesis is on actual migration activities to bring an in-house system to the

Cloud. Therefore, our study does not consider any functional development
tasks to add more functionality to the system or maintenance tasks after
the migration. Having said that, some migration activities may involve code
modification to adapt the system to the new environment without adding
more functionality.
• Our study focuses on the migration effort to the Cloud from the consumer’s
point of view; hence, only migration activities carried out by Cloud users
are taken into consideration. In a migration project to an SaaS Cloud,
consumers only need to upload their databases in a certain format to the

SaaS server and the migration process will be handled by the SaaS providers,
such as migrating mailboxes and email accounts to SaaS email providers.

SaaS consumers are free from software management responsibilities, which,
as an obvious trade-off, restricts their flexibility and control over the systems
in the Cloud. Hence, SaaS is deliberately removed from the scope of our
work. On the other hand, migration projects to PaaS and IaaS clouds
are sole responsibility of Cloud consumers. Therefore, the scope of our
work in this thesis is limited to migration projects to PaaS and IaaS Cloud
platforms, but not SaaS Clouds.
20
1.5 Research Approach
• The migration is between two data centers only (typically, one in-house
and one in-Cloud). We assume that migration projects are directional (i.e.
components are moved from local to remote data centers in the Cloud). In
the case where two or more data centers are involved, each pair of data
centers will be assumed to form a separate migration project.
• We assume that the Cloud target has already been selected. We only focus
on the migration process itself; hence, the decision on which Cloud platform
to choose is out of the scope of this thesis. Having said that, applying our
study to each Cloud platform could assist this decision.
The above presented items form the scope and assumptions of this research.
1.5 Research Approach
A thorough understanding of different aspects of a Cloud migration process en-

ables us to identify its cost implications. The following table (Table 1.3) indicates
the steps we take to tackle this issue.
Steps Research Tasks
1 Identify influential cost factors

2 Derive a taxonomy of migration tasks
3 Develop Cloud Migration Point (CMP) model
4 Conduct a survey to collect data on Cloud migration effort
5 Empirically validate the CMP measurement
Table 1.3: Outline of the research approach
Actual cost factors on migration effort to the Cloud need to be identified

in Step (1), since this type of project involved tasks that are different from a
21
1. INTRODUCTION
traditional software development project. We address this by reviewing various
migration case studies in the literature and practitioners’ blogs, as well as con-
ducting a series of migration exercises of different types which will be discussed
further in Chapter 4.
From this exploration, a taxonomy of migration tasks is extracted in Step (2).
A record of the required cost (in terms of effort) is carefully tracked, together
with a note about which tasks require more effort than others.
There are many influential cost factors in Cloud migration effort, amongst
which, size measurement is seen as one of the most significant factors of effort
estimation. Traditional size measurements, such as: Source Line of Code (SLOC),
Function Points (FP) and its extensions, are not applicable in the context of
migration to the Cloud. In Step (3), a Function-Point-like and Cloud-specific

metric, called Cloud Migration Point (CMP), is developed to measure the size of
a Cloud migration project, which can serve as a basis for Cloud migration effort
estimation.
The validation in Step (5) is to ensure that CMP can be a reliable indicator
of effort estimation for Cloud migration projects. Data for this validation process
is not publicly available; hence, we conduct a survey in Step (4) to facilitate the
validation process.
1.6 Organisation of the Thesis
This thesis is structured as follows:
• Chapter 2 provides an overview of the related work in the literature, includ-

ing other research on application migration to the Cloud, and a review of
22
1.6 Organisation of the Thesis
different estimation approaches and size measurement methods (i.e., Source
Line of Code, Function Point and its extensions).
• Chapter 3 describes the methodology that we apply in this research. Each
step of the research process will be elaborated and mapped with each com-
ponent of this thesis.
• Chapter 4 outlines a taxonomy of migration tasks to the Cloud. This tax-

onomy covers possible migration tasks that any migration projects to the
Cloud might encounter. The purpose of this taxonomy is to enhance our

understanding of the migration process to the Cloud, as well as to enable
us to identify the relevant cost factors.
• Chapter 5 tightens our focus on the most dominant indicator of effort esti-
mation - size measurement. This chapter describes our CMP model, built
from recasting a well-known software size estimation model called Func-
tion Point (FP) into the context of cloud migration. We adopt the three-
phased approach of the FP model to estimating size of individuals com-

ponents involved in a migration project. In particular, we focus on Cloud-
relevant components of the migrated systems, including connection changes,

database migration, code modification, and installation and configuration
for the new environment in the Cloud. For each component, we perform an
estimation by identifying relevant activities that contribute to the overall
effort required for that component. Finally, we aggregate all individual es-
timations into a single CMP value by calculating their weighted sum. The
weighted sum CMP provides a measure of how large the migration project
is, and it can be used as an indicator for Cloud migration effort estimation.
23
1. INTRODUCTION
• Chapter 6 validates the CMP model empirically. The empirical validation
shows that our metric is practically useful as a basis for effort estimation
under a defined set of assumptions. We conducted a survey with Cloud
migration projects of various scales from small to large, and cross-validate

these projects to estimate the performance of our model. Data from the
survey has allowed us to calibrate the CMP model to increase its validity
externally. In this chapter, we also state a list of assumptions made for de-
veloping the model, and test their plausibility using the available data. This
list of assumptions imply the high complexity and difficulty of validating
the metric.
• Chapter 7 concludes the thesis by providing a research summary, research
contributions, and limitations. Possible future research directions based on

this thesis are also outlined in this chapter.
24
Chapter 2
Literature Review
“Not everything that counts can be counted, and not everything that
can be counted counts.”
∼ William Bruce Cameron.
Effort estimation and size measurement of software projects have been inter-
esting and challenging areas in traditional software engineering. There has been a
lot of related work in the traditional context. However, none has been considered
for the new settings of Cloud computing. The aim of this literature review is to
examine existing research related to Cloud migration topic, as well as effort es-
timation and size measurement metrics, with consideration of their applicability
to Cloud computing.
The following sections cover a number of issues important for this thesis: Sec-
tion 2.1 reviews other research related to Cloud migration topic, with regard to
their concerns of migration (i.e., risk management, cost saving, and performance).
Section 2.2 reviews effort estimation approaches in traditional software engineer-
25
2. LITERATURE REVIEW
ing and Section 2.3 explores existing size measurement metrics, including Source
Line of Code, Function Points and its various extensions. This section will also
states the requirements for a sizing metric for Cloud migration, and explain why
none of the existing approaches meet these requirements; hence, a new metric is
in need. Lastly, Section 2.4 summarizes and concludes this chapter.
2.1 Cloud Migration Solutions
There have been many publications and research dealing with various aspects of
Cloud computing, such as Cloud computing architectures, Security and Privacy in
Clouds, Monitoring, Management and Maintenance of Clouds, and Performance

Modelling for Clouds; but not until 2011 have we seen many papers concerning
migration to the Cloud. This topic has been of interest both to Cloud practition-
ers and to researchers, although their concerns for migration are quite diverse.
This section reviews existing work on this topic and distinguishes our concern
from others. The sub-sections show different streams of the related work in the
literature.
2.1.1 Decision Making Support
Although there are many benefits associated with the Cloud, whether it is worth
moving an existing working system to the Cloud is still an open question for
enterprises. As cost and benefit analysis is an important tool for IT managers
to evaluate whether the benefits outweigh the costs of an IT investment, many
researches have attempted to help decision-makers by identifying and weighing
benefits versus issues of Cloud migration.
26
Khajeh-Hosseini et al. (2010a) reported a case study of migrating an enter-
prise IT system in the oil and gas industry from a local data center to Amazon
EC2. Their findings indicate that there are significant risks associated with the
organisational dimension, such as decreasing job satisfaction of staff since they

have to depend on third party Cloud providers, or downsizing IT support depart-
ments because Cloud providers will be responsible for their daily tasks, and so
on.
They extended their work (Khajeh-Hosseini et al., 2011) to introduce two

tools to support decision making during the migration process. These tools can
assist decision makers by producing cost estimates of using public IaaS Clouds,
as well as outlining benefits and risks of using IaaS Clouds from an enterprise
perspective. They also explicitly stated that the limitation of their work is only
focusing on infrastructure cost, and ignoring the actual migration work, which
could be significant.
Mastroeni & Naldi (2011) also assessed the risks involved in the decision to
migration to the Cloud storage against its alternative to buy the storage devices
and facilities, based on different decision variables; while Yam et al. (2011) and
Hajjat et al. (2010) addressed this from the uncertainty angle, including security
and business continuity concerns.
Another important criterion that affects the decision to migrate to the Cloud
is cost savings. It is essential to understand how cost effective it can be to migrate
to the Cloud, as opposed to staying in house. The work of Hajjat et al. (2010)
addressed this by proposing a model of a hybrid migration approach, in which a
part of the system is migrated to the Cloud, while the other part stays in house.
This model takes into consideration the cost savings that may result from the
27
migration. This cost is essentially the Internet communication cost. They briefly
mentioned that the one-time cost of the actual migration process can also be
easily incorporated in the model; however, there was no further discussion of how
this one-time cost can be estimated.
Communication cost is the cost related aspect in the framework presented by

Hao et al. (2009). This framework was developed to facilitate service migration to
the Cloud, and a cost model (i.e., communication cost) and the decision algorithm
were designed to evaluate the tradeoffs on service selection and migration. Apart
from communication cost, reconfiguration cost also caught the attention of some
researchers. Verma et al. (2011) designed a model, called CosMig, to model the
cost of frequently reconfiguring a Cloud infrastructure and evaluate its impact on
application performance. These factors are considered to be the cost of using the
Cloud.
Li et al. (2011a) and some other researchers (Ye et al., 2011; Ho et al., 2011;
Mastroeni & Naldi, 2011) identified cost savings from the perspective of Cloud
price and server bandwidth. They compare the price of different Cloud providers,
as well as the cost difference between using the Cloud and staying in house. This
cost is also the cost of using the Cloud.
Klems et al. (2009) proposed a framework to compute the value of cloud by

estimating Cloud computing costs and comparing these costs to conventional IT
solutions, such as hosted service or Grid computing service. Their work defines
cost as the combination of a number of direct costs (e.g. facility, energy, cables and
servers) and indirect costs (e.g. cost from failing to meet business objectives).
However, the list of cost components in this framework is incomplete for both
direct and indirect costs. Furthermore, it does not indicate how these costs can
28
be computed, or how components in the framework link with one another to
determine the estimated cost of Cloud computing.
The system proposed by de Assuncao et al. (2009) provides various scheduling
strategies to augment the capacity of an organisation’s local cluster with Cloud
resources, and evaluates the trade-off between performance improvement and

monetary cost spent for using the Clouds for each proposed strategy. This work
only considers a portion of Cloud cost and focuses specifically on response time
benefits. This is sufficient to analyse costs and benefits amongst the proposed
scheduling strategies of using Clouds, but cannot be applied to a wider scope of
general application development for Clouds.
Conclusion:
The related work to support decision making on whether to migrate to the

Cloud has mainly focused on security and risks. In addition, some work also
looks at the cost of migrating a system to the Cloud. However, the research is
not related to the cost associated with the migration process; it refers to the cost
of using the Cloud assuming that the migration has been done. This differentiates
the focus of our work from others, since our work is concerned with the cost of
the actual migration process.
2.1.2 Experience Reports
Apart from decision making support, a few researchers have reported on their
experiences of migrating a system to the Cloud. Babar & Chauhan (2011) and
Chauhan & Babar (2011) reported their experiences and observations of migrating
Hackystat, an Open Source Software Product to the Cloud. The focus of this
29
migration exercise is on the architecture and design decisions of Hackystat. Their
aim is to provide some guidance for adapting service-based system architecture

to the Cloud.
On the other hand, the experience presented by Thakar & Szalay (2010) dis-
cussed migrating the Sloan Digital Sky Survey science archive, a scientific astro-
nomical database to the Cloud. Their exercise resulted in a strong finding that
it is “very frustrating or impossible” to migrate a database, either large of small,
to the Cloud (such as Amazon EC2 or Microsoft SQL Azure) without changing
either its schema or its settings. Our finding, which will be discussed later in
Chapters 4 and 5, strongly agrees with this observation.
Conclusion:
Current research has attempted to contribute to the knowledge of a migra-

tion process to the Cloud, as there are currently no guidelines or standards on
this topic. However, researchers have only reported preliminary results of their
experiences. It is still necessary to have a guideline for the migration process, in
order to enable practitioners to better plan their own migration process.
2.1.3 Cloud Migration Concerns
This section reviews and categorizes several issues concerning the migration pro-
cess that have been raised in some related research.
• Data Migration
Data transfer between local data centers and the Cloud can affect the overall
application performance significantly. Many researchers have attempted to
30
address this issue during the migration, for example, Piao & Yan (2010)
proposed a virtual machine placement and a migration approach that can

minimize the total data transfer time consumption; hence, it can help to
optimize the overall application performance.
Zhang et al. (2010) took a closer look into application specific workload
characteristics, deadlines, and I/O profiles in order to build an adaptive

data migration model that can improve the overall system performance
and resource utilization while meeting workload deadlines. On another

aspect of data migration, Thakar & Szalay (2010) emphasized that for all
database sizes, extra work is likely required for changing database schemas
and settings to fit well into the Cloud environment.
Live database migration without service interruption has been proposed by

Elmore et al. (2011) with their technique Zephyr. This technique utilizes on-
demand pull and asynchronous push, and requires minimal synchronization
to achieve their stated goal.
• Performance
The Cloud environment has imposed many constraints and challenges to
the migration of legacy systems to the Cloud (Frey & Hasselbring, 2011;
Mohagheghi & Saether, 2011). Hence, there exists research on configuration

during the migration process intended overcome the constraints without
sacrificing any performance variables.
Venugopal et al. (2011) stated that enterprises are sometimes required to
31
re-engineer their applications to utilize the linear scalability of the Cloud.
They proposed a methodology to smoothly migrate and configure the sys-

tem to the Cloud without initial re-engineering effort. Jayasinghe et al.
(2011), on the other hand, found that the configuration for some environ-
ments just does not work for other Cloud environments. Hence, during
migration, reconfiguration and possible re-engineering are necessary.
Other performance issues have also been raised and discussed, such as:
networking or Internet communication (Hao et al., 2009; Hajjat et al., 2010),
and Cloud infrastructure configuration (Verma et al., 2011).
• Other Potential Concerns
Migration projects have been undertaken throughout the history of com-

puting as technologies have changed. Although specific considerations for
Cloud migration can be very different from other contexts, the general issues
encountered in other contexts could be relevant and informative.
Legacy Information System migration could encompass different migration

issues. Some issues are common to all software engineering projects (not
just migration projects), including target system development, testing, and

database model selection. Other issues that are specific to migration con-
cerns include target system database population (Bisbal et al., 1997, 1999).
Cetin et al. (2007) mentioned other concerns in legacy migration to Service-
Oriented Computing, including the need of providing a migration roadmap.
Smith (2007) shared this same view in his migration concerns, such as:
identification of specific components to migrate, recommendations on the
32
2.2 Effort Estimation in Traditional Software Engineering
ordering of migration efforts, and specific migration paths to follow.
Conclusion:
Some issues related to the migration process may result in extra cost and
require extra effort, such as: data and database migration, networking or Internet
communication, Cloud infrastructure configuration, or re-engineer the application
to the Cloud. It is also essential for any migration project (not just to the Cloud)
to have a roadmap to follow.
2.2 Effort Estimation in Traditional Software En-
gineering
Effort estimation is essential at the beginning of a new project. In this section,

effort estimation approaches in traditional software engineering are reviewed for
their applicability to the Cloud migration context. There is a diverse range of
effort estimation approaches in the literature of traditional software engineering.
They can be categorized into three general types: analogy, expert judgement, and
algorithmic models (Jorgensen & Shepperd, 2007; Boehm et al., 2000; Shepperd
& Schofield, 1997; Keung et al., 2008; Helmer, 1966; Baird, 1989; Banker et al.,
1991).
2.2.1 Analogy Approach
Effort estimation using analogy is the approach where a problem is solved using
knowledge derived from similar problems (Shepperd & Schofield, 1997; Keung
33
et al., 2008). It is argued that analogy approach is capable of handling poorly
understood domains because solutions are based upon what has actually hap-
pened. Even so, this approach is still not applicable for the Cloud context at this
stage because the range of completed migration projects is still limited, and it is
not obvious as to where and how similar projects can be identified.
2.2.2 Expert Judgement Approach
Expert judgement is another well-known approach for estimation (Jorgensen,

2004; Helmer, 1966; Baird, 1989). This approach captures knowledge, experi-
ences, and expertise of practitioners who are recognized as experts within a do-
main of interest, and derives estimates based on historical data that they are well
aware of, or past projects that they participated. Similar to the analogy-based
approach, because of the newly emergence of the Cloud, there is a lack of prac-
titioners who have experiences a broad range of migration types to the Cloud.
Nevertheless, this approach shows a great potential when the Cloud gets more
mature in the future.
One popular technique developed to capture expert judgement is Delphi tech-

nique. The Delphi technique (Helmer, 1966) is executed in two rounds. In the
first round, a group of experts are asked for their assessment on some matters in-
dividually, without knowledge of how other participants do. In the second round,
each participant is asked for their assessment again, but this time with knowledge
of how the others have answered in the first round. This technique is to narrow
the range of answers from the participants, pointing to a more reasonable middle
ground regarding the issue of interest.
34
2.2 Effort Estimation in Traditional Software Engineering
2.2.3 Algorithmic Model Approach
Another popular estimation approach is algorithmic models (Jorgensen & Shep-
perd, 2007; Boehm et al., 1995; Banker et al., 1991). This approach estimates
efforts using mathematical formulas to establish the relationship between depen-
dent and independent variables of the models, which are the estimated effort and
influential cost factors, respectively. This approach also required historical data
to develop the algorithmic model; however, the model itself is more generic than
the other two approaches, which makes model-based technique more suitable to
apply for a broader range of migration projects to the Cloud at this stage.
Amongst existing cost estimation models, the COCOMO (COnstructive COst

MOdel) II (Boehm et al., 1995) is one of the most popular model. COCOMO
II consists of three sub-models, namely Applications Composition, Early Design
and Post-Architecture, which can be combined in various ways to deal with the
current and likely future software practices. These sub-models use FPs and/or
LOCs for their sizing parameters. Size of a project is one of the key factors in
algorithmic models for the project’s effort estimation.
Conclusion:
This section has reviewed three popular approaches of traditional software en-
gineering effort estimation. Analogy approach requires a repository of historical
data on similar Cloud migration projects. This can be achieved when the field
of Cloud migration becomes more mature and data on migration projects can be
collected and stored in a repository for future use. Expert judgement approach
relies on practitioners’ expertise in the Cloud migration. This can be achieved
when there are many experts in the field. Algorithmic approach requires a math-
35
ematical formula to be developed with suitable parameters. This last approach
appears to be the most feasible direction to explore at this stage.
2.3 Software Size Estimation in Traditional Soft-
ware Engineering
The literature has showned that effort spent on a development project relies
significantly on the project’s complexity. A more complicated project would
typically require more effort on both development and maintenance. Software
size measurement is a conventional way to indicate a project’s complexity. It is

commonly found in a form of metrics to measure either software’s Lines of Code
or Function Points and its extended variants (Verner & Tate, 1992; Dolado, 2000;
Rosenberg, 1997; Finnie et al., 1997).
2.3.1 Source Lines of Code (SLOC)
SLOC is a traditional size measure that counts the number of lines in a software
product’s source code. SLOC is one of the prime measures which are used as
input into equations for effort estimation (Verner & Tate, 1992; Dolado, 2000;
Rosenberg, 1997). SLOC was popular for its simplicity and straightforwardness.
However, counting SLOC is only possible after the implementation phase when
source code is available, which makes SLOC not applicable for estimation in early
phase of the development cycle (Albrecht & Gaffney, 1983; Lai & Huang, 2003).
There are also more concerns on SLOC’s validity because of its high dependency
on the programming language and programmer’s skills and coding style (Ruhe
36
2.3 Software Size Estimation in Traditional Software Engineering
et al., 2003b).
2.3.2 Function Point
To overcome these disadvantages of SLOC, FP was developed in 1983 by Albrecht
(Albrecht & Gaffney, 1983) to measure size of transactional processing systems

in terms of system functionality, independent of implementation languages. FP
incorporates both size and complexity factors in its counting process. There are
many software development effort estimation approaches using function points,
such as regression model, or artificial intelligence model (e.g. artificial neural
networks and case-based reasoning) (Finnie et al., 1997).
FP is used to estimate the amount of functions a software provides, based on

how much data it uses and generates. FP is found to be more useful and suitable
in many software projects than the LOC method because of its applicability at an
early stage of software development, when LOC is not yet available. The FP of
a system can be obtained relatively easily from discussions with customers early
in the development process.
FP measures system functionality; it is, therefore, believed to also provide,
in association with staff effort, a general measure for development productivity

with less concern for influences of technologies, code reuse, and unexpected code
expansions. Development productivity can be measured in “function points per
work-month” or “work-hours per function point”.
The Function Point Analysis (FPA) method is considered an empirical estima-

tion approach, because it is a sizing method and to be used for effort prediction,
it is necessary to identify a relationship between the effort required to build a
37
system and identifiable system features (such as external inputs, interface files,
outputs, inquiries, and logical internal tables). Counts of system features are
adjusted using weighted values and complexity factors to derive the final size of
the system.
The FPA methodology has three steps, given there exists a list of all functions
that the software should provide. Firstly, each function is classified into one of
five types: External Input (EI), External Output (EO), External Inquiry (EQ),
Internal Logical File (ILF), and External Interface File (EIF). A function is clas-
sified as an EI when it involves user inputs that adds or changes data in a LIF.
A function is an EO when it generates a report or message to the user or other
applications outside the boundary of the application being measured. A function
where an input generates an immediate output with no updates of LIFs is called

an EQ. LIF is a logical file (as distinct from physical files) or logical group of
data in a database context, whereas EIF is a file to pass or share data between
applications. In the second step, each function is evaluated and assigned with a
complexity level of Low, Average, or High. Finally, each function is assigned a
weight value based on its type from the first step and its complexity level from
the second step. The sum of these weight values forms the Unadjusted Function
Point (UFP) of a software. The weighted sum of all five types of functions is
adjusted with an optional Value Adjustment Factor (VAF) obtained by consid-
ering the degree of influence of 14 General System Characteristics (GSC) of the
interested system.
Among traditional software size measurement, FP has achieved a wide ac-
ceptance in sizing software products, mainly due to its applicability in the early
phases of the software development. However, FP has also been subject to some
38
criticisms. Abran & Robillard (1994) pointed out a scale type mismatch and
questioned the math behind the FP approach. Thus, from a theoretical point of
view, the FP may not be considered as a measure that is in conformance with
measurement theory. However, from a pragmatic viewpoint, FP has been suc-

cessfully applied in a number of application domains and is considered to be a
significant improvement over traditional software size measures (Matson et al.,
1994). As a result, despite the criticisms, the FP measure has subsequently been
improved and extended.
2.3.3 Function Point Extensions
Although FP is most applicable for only procedural business systems, it has

formed a firm foundation for a number of extensions suitable for other types
of systems and development paradigms (Costagliola et al., 2005; Abran, 1999;

Dekkers et al., 2003; Antoniol et al., 1999; Mohagheghi et al., 2005; Karner, 1993;
Reifer, 2000).
Over years, software technology has evolved with the development of the web
and the Internet, many people have extended FP to adapt to other emerging
systems. For example:
• Use Case Point (UCP)
Karner (1993) proposed the UCP model inspired by FP. It measures a

system functionality based on use cases, actors, and transactions.
• Full Function Point (FPP)
Abran (1999) extended the applicability of FP to real-time software by
introducing FFP. FFP redefines FP’s function types to capture specific
39
real-time software characteristics that FP fails to measure, such as: large
number of single occurrence groups of data, or fluctuating number of sub-

processes.
• Object-Oriented Function Point (OOFP)
At the same time, Antoniol et al. (1999) developed OOFP for sizing OO
systems. OOFP relies on object models to map FP’s function types into
OO concepts. A remarkable aspect of OOFP approach is its flexibility

which allows practitioners to experiment with several procedures of OOFP
measurement in order to find the best suited practice for their organization.
• Web Object (WO)
Reifer (2000) extended FP to WO for sizing Web projects, by adding four

new web-specific components: multimedia files, web building blocks, scripts,
and links. The WO measure has been successfully used in an adaptation of

the COCOMO II estimation model called WebMo for estimating the effort
and schedule of web-based development (Boehm et al., 2000).
• Web Point (WP)
Cleary (2000) proposed WP for sizing internet applications. In analogy to

the FP analysis, the WP approach classifies pages in a web site and assignes
weights based on their complexity where the number of links and words in a
web page determine its complexity. The WP measure focuses on static web
sites and therefore does not consider behavioural and navigational proper-
ties of web applications.
• Internet Point (IP)
40
Another adaptation of FP for web-based systems is the IP developed at the
Cost Xpert Group, Inc. (Group, 2002). The IP method replaces five types
of constituents of the FP model with seven new types, namely external
interface files, logical internal tables, messages/external queries, reports,

static screens and dynamic screens for measuring the size of web-based
systems. The IP counting process has been automated in a tool called Cost
Xpert that can estimate the equivalent size of a web-based system in LOC
as well as the effort and schedule of its development.
• Class Point
Costagliola et al. (2005), in 2005, proposed Class Point (CP1 and CP2
for initial size estimation at the beginning of the development process and
further detailed estimation when more data are available later in the de-
velopment process, respectively). Class Point does not apply one-to-one

mappings from FP’s function types to OO concepts like other extensions,
but rather focus on classes as the basic units. However, Class Point inherits
the three-step approach from Function Point: (1) Classify classes into four
types (Human Interaction, Problem Domain, Data Management, and Task
Management); (2) Evaluate complexity level for each individual class (com-
plexity levels: Low, Average, or High); and finally (3) Assign a complexity
weight for each class based on the previous two steps. The weighted sum
of all four types of classes is adjusted with a VAF obtained by considering
the degree of influence of 18 GSC of the system under assessment. The

Class Point measure has been used successfully in a least-square regression
model (Costagliola et al., 2005), a neural network approach (Kanmani et al.,
41
2007) and a fuzzy subtractive clustering technique (Kanmani et al., 2008)
for estimating the effort of OO development.
• Object Point (OP)
Despite its name, the Object Point (OP) (Banker et al., 1991) is another
generalised extension to FP which is not tied to OO system. The OP

counting is very similar to the FP analysis but objects are counted instead.
However, such objects are not directly related to objects in OO paradism
but rather refer to screens, reports and third-generation language modules

in software applications. The OP measure has been successfully used in the
COCOMO II cost model for estimating the effort of software development

(Boehm et al., 1995).
• Mark II Function Point (MKIIFP)
In addition to the above specialised extensions, there are other extensions

to FP. Symons (1991) proposed Mark II Function Point (MKIIFP) mea-
sure as an enhancement to Albrecht’s original FP approach. The measure

replaces the five types of constituents of the original approach with logical
transactions and extends the standard set of GSC from 14 to 19 plus any
client defined characteristics (UKSMA, 1998).
• COSMIC Full Function Point (COSMIC FFP)
The Common Software Measurement International Consortium (COSMIC)
proposed another extension to FP called COSMIC Full Function Point
(COSMIC FFP) (Abran, 1999). The COSMIC FFP measure has been
formulated as a refinement of FFP, MKIIFP and the FP models in order
42
to work equally with data-rich business systems and control-rich real-time
systems. However, the method does not explicitly claim to measure the
size of functionality that includes complex mathematical algorithms. In
contrast to FP, the COSMIC FFP measure does not take the effect of tech-
nical and quality requirements of the system into consideration by claiming
adjustment factors are no longer meaningful (Symons & Symons, 2001).
Conclusion:
SLOC, FP and its extensions have been widely used to measure size of different
types of systems and development paradigms. However, their applicability is
limited to software functionality development. The main purpose of migrating
a system to the Cloud is not to develop new functionalities, but to reuse the
existing ones, while, at the same time, to benefit from the best performance of
Cloud offerings. In light of this stance, none of the existing metrics are suitable
for estimating size and effort of a migration project to the Cloud.
We, thereby, wish to apply the FP approach to develop a similar size metric for
the Cloud migration context. Although FP is commonly known as a software size
measurement, it is not purely a size metric. The way FP was counted incorporates
both size and complexity concepts. The size metric for Cloud migration projects
that is based on the FP approach will be similar to FP in the sense that they are
both size-complexity hybrid metrics. However, throughout this thesis, this metric
will still be referred to as a size metric to ensure the consistency of terminology.
43
2.4 Summary
Cost and benefit analysis is an important tool for IT managers to evaluate whether
the benefits outweigh the costs of an IT investment. The determination of cost is

usually the first step to achieve this goal and is often a challenging task for many
project managers, since both overestimating and underestimating would result
in unfavourable impacts to the business’s competitiveness and project resource

planning.
Software costs include tangible costs (hardware and software costs), admin-
istrative costs, and development costs. Most of the time, the dominant cost
is the cost of development staff and managers (Sommerville, 2006). The con-
text of Cloud migration requires a different perspective to understand its effort
costs, given that limited experience is available in the published papers. Amongst
various effort estimation approaches from traditional software engineering, algo-

rithmic approach appears to be the most feasible approach at this stage to adapt
to the context of Cloud migration.
Size measurement is the most dominant factor of algorithmic effort estimation.

Different size measurement metrics have been developed and applied successfully
in traditional software engineering. Many of these metrics are not able to ad-
equately capture the unique and different characteristics of a Cloud migration
project. Effort estimation and size measurement of migration to the Cloud are dif-
ferent from those of traditional software development in the sense that the latter
focus on components to be developed, either functions or classes, whereas the for-

mer are more concerned about migration activities, such as code modification for
migrating to PaaS Clouds, or software installation for migrating to IaaS Clouds.
44
2.4 Summary
Traditional size metrics were developed for functional development or mainte-
nance tasks; hence, mainly focus on code changes (added/removed/modified).

Cloud migration tasks, on the other hand, not only focus on code changes, but
also on other processes such as network configuration and database modification

tasks, which the measures employed as predictors by traditional size metrics fail
to cover.
As a result, we are strongly motivated to:
• Propose a taxonomy of migration tasks to the Cloud, since the literature

shows that there has not been any guidance or standard on this, while a
migration guideline is essential at this stage.
• Develop a new size measurement for Cloud migration, which can be served
as a predictor for migration effort estimation purpose. We aim to cap-
ture the size of the migration process, rather than the size of the migrated
system; hence, none of the existing metrics are applicable.
The taxonomy will be presented in Chapter 4, and the new sizing metric will
be introduced in Chapter 5.
45
46
Chapter 3
Research Methodology
“If you can’t describe what you are doing as a process, you don’t know
what you’re doing.”
∼ W. Edwards Deming.
The literature review in Chapter 2 has shown that there is no related work
in the Cloud migration effort topic. We, therefore, seek to gain insight into the
Cloud migration tasks and understand their cost implications by carrying out
migration experiments from a Cloud consumer perspective, and consequently,
confirm our findings with projects from external organizations.
This research is a hybrid of qualitative and quantitative research, and it fol-

lows the concurrent procedure strategy as discussed by Creswell (2002). Following
the concurrent procedure strategy, we collect both forms of qualitative and quan-
titative data at the same time during the study and then integrate and analyze
them to achieve the overall results. In particular, the process of this research
can be described in three steps, which are mapped with steps in this thesis, as in
47
3. RESEARCH METHODOLOGY
Figure 3.1.
Figure 3.1: Steps of the Research Process and Thesis
The sub-sections in this chapter elaborate the steps of this research process,
as follows: Section 3.1 describes Step 1 in the research process - the experiment
set up for the purpose of exploring possible migration tasks in a Cloud migra-
tion project. Section 3.2 illustrates Step 2 - the discussion protocol with Cloud
engineers from our group to confirm our findings on Cloud migration tasks, and
to develop the CMP metric. Section 3.3 discusses Step 3 - the survey protocol
to obtain more data on Cloud migration projects from external organizations in
order to test the generalizability of this research.
3.1 Cloud Migration Experiments
This is the first step in the research process. I carried out different types of
migration experiments to understand the actual migration activities. The purpose
48
3.1 Cloud Migration Experiments
of the migration experiments is to explore possible migration tasks involved in a
migration project, as well as to understand the cost implication of each task.
3.1.1 Experiment Setup
The experiments should satisfy the following criteria:
• The migration experiments are setup for PaaS and IaaS Clouds only (SaaS
Clouds are ignored as discussed in Section 1.4). PaaS Cloud candidates
can be Windows Azure and SQL Azure, and IaaS Cloud candidates can be
Amazon EC2, Amazon RDS and SimpleDB.
• The applications to be migrated should represent different application types,
that are typically used by enterprises.
• The applications should be N-tier applications, with a proper database.
• The applications could be developed by different developers, but all docu-
mentations should be available.
• The applications in the Cloud after the migration process should work prop-
erly, in terms of functionality and performance.
• Same application can be migrated to different Clouds using different migra-
tion strategies.
.Net PetShop (Leake, 2006) is an application designed to show best practices
for building an enterprise, N-tier .Net 2.0 application. It serves to highlight the
key technologies and architecture to build scalable enterprise Web applications.
49
Its Java version called Java PetStore is also well-known for its use of as an illus-
tration of how the Java EE 5 platform can be used to develop an AJAX-enabled

Web 2.0 application. For these reasons, both versions of PetStore have been used
in various research studies (Li et al., 2004; Singh et al., 2002; Yuan et al., 2003)
and we believe the PetShop application represents a broad class of application
types, that are typically found at an enterprise organisation, and that is also a
prime candidate application type for running in the Cloud.
Our experiment was to migrate the PetShop application from a local server
to the Cloud. Windows Azure and SQL Azure were selected as the PaaS Cloud
platform for migration since they provide the most similar environment for Pet-
Shop .Net as in the local server. Therefore, it was expected that minimal effort
would be required for migration activities.
The migration of Java PetStore into Amazon EC2 and SimpleDB was also
investigated to add more richness to our findings. Amazon EC2 is an IaaS Cloud,
and SimpleDB is a NoSQL database with less support for full-SQL statements
required in the PetStore application; therefore, different migration strategies and
more re-engineering efforts were expected.
3.1.2 Data Collection Strategy
All migration tasks should be recorded, together with the time required to com-
plete each task. Each migration task can be divided into multiple tasks with finer
granularity, or grouped with other tasks to form a more general task. This is to
ensure the uniformity in granular level of all tasks.
The migration tasks should be categorized into different groups, such as in-
50
3.2 Discussion with Cloud Engineers
stallation tasks, or code modification tasks, depending on the nature of each task.
The overhead cost of the migration tasks can be achieved by comparing the time
spent on each migration task category with the development time of the applica-
tion. The application was not developed by us; hence, the development time can
be estimated using an effort estimation approach in the literature (either analogy,
expert judgement, or algorithmic models (Shepperd & Schofield, 1997; Jorgensen,
2004; Finnie et al., 1997)).
In addition, well-known practitioners’ blogs, such as Hamilton (2011); Linthicum
(2011); Chappell (2011), were also consulted to confirm our list of migration tasks.
Although they did not discuss any specific migration project, there are blog en-
tries on the steps of a migration project and database migration concerns.
The output from this step should be a collection of categorized migration
tasks, collected from all migration experiments, together with the associated time
spent on each task. This contributes to the taxonomy of migration tasks, and
forms the basic elements of the CMP model to measure the size of a migration
process.
The migration experiments in Step 1 facilitated us to form a taxonomy of cate-
gorized migration tasks and the structure of the CMP model. In this Step 2, we
conducted interviews with our group members at NICTA1 to confirm the migra-
1
NICTA (National ICT Australia Ltd) is Australia’s Information and Communications
Technology Research Centre of Excellence. Since NICTA was founded in 2002, it has cre-
ated five new companies, developed a substantial technology and intellectual property portfolio
and continues to supply new talent to the ICT industry through a NICTA-supported PhD pro-
gram. NICTA has five laboratories around the country. With over 700 people, NICTA is the
largest organisation in Australia dedicated to ICT research.
51
tion tasks and migration categories in the taxonomy are reasonable, and to seek
for their expert opinion on the parameters of the CMP model.
3.2.1 Participants
The discussion was carried out with 6 participants from our group individually.
The participants included:
• Two senior researchers, with 10 years experience in software development

and 3 years experience with Cloud computing
• Two research engineers, with 5 years experience in software development

and 2 years experience with Cloud computing
• Two Ph.D. Research Students, who are in their middle and final stage
of the Ph.D. study. Their Ph.D. topics are related to Cloud Computing
performance.
All participants have good knowledge of Cloud computing. They have good
understanding on state-of-the-art Cloud offerings and technologies, and many

hands-on experiences related to Cloud offerings. As part of their research, the par-
ticipants have already migrated different types of applications (e.g., benchmarking
systems, different types of databases) to different types of Cloud (including Ama-
zon EC2, Amazon RDS, S3, SimpleDB, Windows Azure, SQL Azure, Google App
Engine, MongoDB, Rackspace), although they were small and medium projects.
In addition to general migration activities, they have also explored other vital
aspects of Cloud computing, such as elasticity and database consistency. With
52
their exposure to the Cloud computing environment, they are reliable and valu-
able participants for our discussion.
3.2.2 Discussion Protocols
We asked each participant similar questions in three steps:
• Firstly, each participant were asked for their opinions on the taxonomy of
migration tasks. They could suggest to add more tasks, remove some tasks,
or re-categorize a task.
• Secondly, the structure of the CMP model was presented to the participants,
and they were asked to nominate a numeric value that they think would be
the best suitable for each parameter of the CMP model.
• Thirdly, each participant were asked to describe a Cloud migration project

that they have participated, together with the time spent on each migration
task in that project.
The discussion was completed with each participant individually, without
knowledge of other participants’ answers in the first round. A second round of dis-
cussion was conducted with each participant again, but this time with knowledge
of other participants’ replies, to decrease the range of answers. This is known as
Delphi technique to combine experts’ opinion for a better judgement (Shepperd
& Schofield, 1997).
53
3.2.3 Data Collection and Analysis
Participants’ answers were then carefully analyzed. Changes to the taxonomy
suggested by most participants would be amended accordingly. The remaining

suggestions that was made by one participant would also be run through other
participants to seek for a concensus.
The value for each parameter of the CMP model was determined by averaging
all expert opinion values for that parameter. This set of values forms the initial
set of parameters for the CMP model, as presented in Chapter 5.
6 migration project and the associated effort decribed by each participant
were used to validate the CMP model (Chapter 6).
3.3 Survey Protocol
Data on migration effort and migration tasks of past Cloud migration projects
are vital elements of a validation process. Unlike data on development effort of
traditional software development projects, the data of interest do not exist on
any public repositories. This is anticipated since Cloud computing is relatively
immature and there is no related work on the migration effort to the Cloud. This
yields both advantages and challenges for our work at this stage. While we enjoy
the flexibility to explore different aspects of the Cloud migration topic, we are
challenged to collect real data ourselves for the validation purpose.
In this section, we describe our process of conducting a survey1 to collect data

on past migration projects to the Cloud. The purpose of this step is collect data
1
This survey was conducted with the assist of an IT Master student, Tingting Yao, from The
University of Sydney. Tingting assisted me to identify potential participants, and to distribute
the questionnaire. This task also contributed to her final project of her Master degree.
54
3.3 Survey Protocol
to validate our taxonomy of migration tasks and the CMP model with external
data points.
3.3.1 Objectives
The objective of the survey was to collect data on past migration projects to
Cloud for determining migration cost factors, including size, and examining their
relationships with the effort required for migration. Many organisations have
been migrating their systems to Cloud; however, no detailed documentations on

migration tasks and effort are found publicly available. Therefore, the survey
aims at practitioners who have experienced with the migration of a legacy sys-
tem to cloud, to gather information on their migrated systems, migration tasks,

and the amount of effort spent (in person-hours), in order to obtain sufficient in-
formation for the empirical validation of the CMP model. We are also interested
in how practitioners evaluate the effect of external cost factors on their migration
projects.
This survey addresses the following research questions:
• RQ1: What migration tasks were carried out?
– RQ1.1: Were database migration carried out?
– RQ1.2: Were any installation and configuration done?
– RQ1.3: Were any code modification required?
– RQ1.4: Were network connection changed?
– RQ1.5: Were any other tasks done?
– RQ1.6: How migration tasks were carried out?
55
∗ RQ1.6.1: What type of database migration was done? e.g., rela-
tional to Nosql, relational to relational (same or different type of

relational database? same or different version?)
∗ RQ1.6.2: How many queries required modification?
∗ RQ1.6.3: How much data was migrated?
∗ RQ1.6.4: How many packages were installed from source code and
binary files?
∗ RQ1.6.5: How many configurations were done for each package?
∗ RQ1.6.6: For each network connection, what type is it (LAN or

WAN)? and what tasks were done, adding security of optimizing
protocol?
∗ RQ1.6.7: For each modified class, what type is it (Human Inter-
action Type, Problem Domain Type, Data Management Type or

Task Management Type)? How many attributes, methods, service
call were changed?
• RQ2: Is the CMP size metric a significant indicator for migration
effort to the Cloud?
– RQ2.1: How many person-hours were required for a migration project

to the Cloud?
∗ RQ2.1.1: How many person-hours were spent on each migration
task?
• RQ3: What external cost factors influence migration efforts?
– RQ3.1: How development teams expertise affect migration effort?
56
3.3 Survey Protocol
– RQ3.2: How development teams experience in software engineering in
general affect migration effort?
– RQ3.3: How development teams experience in cloud affect migration

effort?
– RQ3.4: How design quality of migration tasks affect migration effort?
– RQ3.5: How choice of cloud providers affect migration effort?
– RQ3.6: Any other factors affect migration effort?
3.3.2 Survey Design
This survey is a cross-sectional survey, where the information is gathered on the

population at the current state of Cloud computing (Creswell, 2002). Data were
collected mainly via web surveys, and some additional interviews. We could not
conduct in-person interviews with many practitioners because of geographical
constraints. Hence, web survey approach was our main source of data collection.
The studied population included a project team from NICTA and a list of
individual practitioners who have migrated their systems to the Cloud. The team
from NICTA is different from our group. This team has migrated their system
to the Cloud to take advantage of the Cloud elasticity for their project. The
practitioners were identified from the Cloud community and online discussions,
such as: authors of Cloud scientific papers, and participants in Cloud events (e.g.,
CloudCamp). Interviews were conducted with the NICTA’s project team to gain
more insights and more detailed data, and surveys were sent to a list of identified
practitioners. The study was conducted on the identified population.
A questionnaire was prepared to address the research questions. I prepared
57
the questionnaire to cover all CMP aspects that require information for validation,
and also to gain further insights on how the respondents have conducted their
migration to the Cloud. The questionnaire was run through 6 Cloud engineers
from our group (as already introduced in Section 3.2). In the discussion described
in Section 3.2 prior to this survey, each participant was asked to describe a Cloud
migration project and their time spent on each migration task of the project.
The questionnaire essentially asked for the same information. Answers from the
discussion and the questionnaire were then analysed and compared. I found
that the participants could correctly interpret the questions and answers were
almost the same for both the discussion and questionnaire. The biggest issue
of the questionnaire was that the participants were confused by questions that
were not relevant to their migration tasks. For example, participants who only
migrated their database to the Cloud were lost within the questions about code
modification because they didn’t modify any of their code. To address this issue,
we needed to create different branches of the survey, so that the respondents will
only be asked questions that are relevant to their migration tasks.
We evaluated different survey software and found that LimeSurvey1 best suits
our needs because if its features and pricing scheme. Surveys can be created with
different layers and branches. Incompleted survey responses can be saved for
later view and update. Different types of questions available in LimeSurvey are
sufficient for our needs. Also, we were charged based on the number of responses
rather than within a timeframe like other online survey software. This pricing
scheme suits our needs because we didn’t expect to receive thousands of responses
weekly or monthly.
1
http://limesurvey.org
58
3.3 Survey Protocol
Our survey was created with LimeSurvey. A link to the web survey was sent
via email to the list of practitioners, and responses were recorded by the web
survey once they finished. To ensure the response rate is adequate, a follow-up
email was sent after two weeks.

The following table (Table 3.1) shows a mapping between the research ques-
tions and the questions from the questionnaire (Appendix A).
Research Questions Questions from questionnaire

RQ1 GQ1, GQ2, GQ3
RQ1.1 DB1
RQ1.2 IC1
RQ1.3 CM1
RQ1.4 NC1, NC2
RQ1.5 DB8, DB10, IC5, NC6, CM8
RQ2.1.1 DB5, DB7, DB9, IC4, NC5, DB10, IC5, NC6, CM7, CM8
RQ1.6.1 DB2, DB3
RQ1.6.2 DB4
RQ1.6.3 DB6
RQ1.6.4 IC2
RQ1.6.5 IC3
RQ1.6.6 NC3, NC4
RQ1.6.7 CM2, CM3, CM4, CM5, CM6
RQ3.1 CF2, CF3
RQ3.2 CF3
RQ3.3 CF1, CF3
RQ3.4 CF3
RQ3.5 CF3
RQ3.6 CF4
Table 3.1: Mapping between research questions and questionnaire
3.3.3 Data Collection
The data collection process was done in over three months. First, we sent out
350 invitation emails to different target audience, including academic researchers,
59
industrial groups and companies, and individual practitioners. We did not receive
replies from all recipients, but we received some very positive replies that they
were very interested in participating in our survey. We sent out 308 surveys to the
list of participants again, excluding 42 recipients who replied to our first invitation
email that they were not willing to participate or from whom we received out-of-
office auto replies and failed distribution emails. In this second round, we received
33 responses (around 10% response rate), but some of them were incomplete. For
example, some responses do not provide enough information to calculate CMP;
or some do not have information on total hours spent. The main reason for this
low responses rate is because most of the projects were done for exploration and
tutorial purposes; hence there were no detailed information recorded, especially
some information required for calculating CMP. Most responses could easily an-
swer general questions on why they migrated to the Cloud, or how they generally
did that, but most of them failed to provide sufficient information at the design
level of migration tasks.
After careful analysis to eliminate unreliable and incomplete data, we got a
total of 19 data points. These data points come from responses that provided
sufficient information for CMP calculation. We discarded all responses that were
commented as “wide guess” by the project teams. 17 out of 19 data points are
small projects with around or less than 100 hours in total. Again, this is because
we targeted some individual practitioners, and their survey responses were all for
example migration projects. We tried to target large groups with larger-scope
migration projects, and we could get only 2 corresponding responses.
The final dataset and the validation process are reported in Chapter 6.
60
3.4 Summary
3.4 Summary
In this chapter, we have described the process of undertaking this research. This
research requires a mixed method of both qualitative and quantitative approach.

Data of both forms (qualitative and quantitative data) were collected concurrently
for the purpose of exploring Cloud migration tasks, building the taxonomy of
migration tasks, developing the CMP model for sizing migration projects, and
validating them.
61
62
Chapter 4
Taxonomy of Migration Tasks to

the Cloud
“Our experience shows that not everything that is observable and mea-
surable is predictable, no matter how complete our past observations
may have been.”
∼ Sir William McCrea.
The focus of Cloud migration project, as already discussed in the previous

chapter, is both the migration activities involved (i.e., process) and the system to
be migrated (i.e., product). Successfully migrating a system to the Cloud requires

an appropriate and sufficient set of migration tasks to be carried out. Examining
a migration task involves examining related parts of the system to be migrated.
Cloud migration tasks, hence, are defined as primitive units of our study.
For a better understanding of a Cloud migration project, in this chapter, we
report on several Cloud migration experiences, and capture our understanding
63
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD
of how migration projects are carried out, in the form of a list of potential mi-
gration tasks that might be involved in a Cloud migration project. We call this
a taxonomy of Cloud migration tasks. A taxonomy, as stated by Mens & Gorp
(2006), is defined as:
“A system for naming and organizing things [. . . ] into groups which

share similar qualities.” (Cambridge Dictionary Online)
In this chapter, we will present the process of how migration tasks are ex-
tracted, and categorized into different groups to form the taxonomy. It is both
necessary and challenging to identify the taxonomy. The necessity is because this
will enable us to capture various critical aspects of the cost implications of a Cloud
migration project. A taxonomy of migration tasks to the Cloud might also be

helpful to get new migration projects started. On the other hand, it is challenging
because migration projects vary in multiple dimensions, such as: specification of
the migrated systems (e.g., programming language, system architecture), Cloud
offerings (e.g., IaaS or PaaS, relational databases or NoSQL), or requirements of

the migration projects (e.g., security, network throughput, parallelism).
The content of this chapter includes the sub-sections as follows: Section 4.1
describes how taxonomy is usually derived in other contexts. Section 4.2 shows
our approach to derive the taxonomy of Cloud migration tasks. We report on
our migration experiences with the breakdown of costs (in terms of effort) among
categories of task in Section 4.2.1 for a case-study which migrated a .NET n-
tier application to run on Windows Azure, which results in a list of important
influential factors that impact on the cost of various migration tasks in Section
4.3. The taxonomy of Cloud migration tasks is then described in Section 4.4.
64
4.1 Taxonomy in other contexts
Section 4.5 validates the proposed taxonomy on one industrial migration project
conducted by our group, and also shows how the taxonomy can be applied in
real Cloud migration projects. Section 4.6 reflects on our approach, and on other
experiences. We conclude the chapter with a summary in Section 4.7.
4.1 Taxonomy in other contexts
A taxonomy is a way to precisely categorize things into pre-defined groups, and

increase understanding of the topic of interest while avoiding any confusion in
terminology. In this section, we review the methodology of how a taxonomy is

developed in other contexts, in order to apply it in our Cloud migration context
in the next section.
Mens & Gorp (2006) proposed a taxonomy of model transformation, which
classifies existing model transformation approaches multi-dimensionally, based on

selected concrete criteria. The purpose of the taxonomy is to assist developers
in their decisions of which approaches, tools, and techniques are best for their
needs. The taxonomy was derived based on the discussions of a working group
on Language Engineering for Model-Driven Software Development, on the im-
portant characteristics of model transformations. Essentially, the taxonomy is a

classification of model transformation approaches and their tools and techniques
on the basis of a group discussion.
Similarly in terms of methodology, the taxonomy proposed by Padioleau et al.
(2009) was also obtained from a pool of existing sources. This is a taxonomy
on comments of programmers’ code in order to reveal their needs, such as new
development tools or a language extension. The authors analyzed 1050 comments
65
randomly collected from three open source operating systems Linux, FreeBSD,
and OpenSolaris. The comments were categorized from different aspects, based
on the four basic questions: “what is in comments? whom the comments are
written for or written by? where the comments are? and when the comments
were written?”.
The taxonomy by Mehta et al. (2000) about software connectors was formed
from a classification of three atomic elements of software interactions. The tax-
onomy was proposed for the purpose of increasing the level of understanding of
fundamental blocks of software interactions, and how they interact together to
create more complex blocks. This work is the only one of the three that showed
“taxonomy in action”, i.e., how the taxonomy is applied on the architecture of

an existing system. In other words, this is a form of validation for a taxonomy.
Generally, a taxonomy is obtained from existing unorganized resources, which
are then systematically classified according to some concrete criteria. The valida-
tion of the taxonomy can then be achieved by showing its usefulness on another
system.
4.2 Experiment Setup
For our Cloud migration context, there are no existing pools of migration tasks
ready for the classification stage. As a result, we had to create a list of Cloud
migration tasks ourselves by conducting migration projects. We carried out an
experiment presented in a case study for the purpose of understanding the ac-
tual migration activities to PaaS and IaaS Clouds (SaaS Clouds are ignored as
discussed in Section 1.4). We report here on experiences in doing this technical
66
migration.
The applications used in our experiments are .Net PetShop (Leake, 2006) and
its Java version - Java PetStore, as discussed in Chapter 3. The PetShop applica-
tion was migrated from the local server to Windows Azure and SQL Azure, and
Java PetStore was migrated to Amazon EC2 and SimpleDB. Different migration
strategies and effort were required (as reported in Section 4.2.1).
In order to calculate the migration effort as an overhead cost over the original
development effort, we needed to have a figure of the initial development effort.
This development effort can be achieved in a conventional manner with Function
Point, given that all required information from the PetShop .Net application is
available.
Function Point Analysis (Albrecht & Gaffney, 1983) was applied on the fully
functional PetShop application to estimate its size complexity, which then can be
applied to estimate its development cost. We used this estimated development
cost and the recorded migration cost in our experiment on PetShop to calculate
the overhead cost of migration over development.
Based on the Function Point reference cards provided by IFPUG (2010), Pet-
Shop is calculated to have:
• 28 Internal Logical Files (ILF s),
• 28 External Inputs (EIs),
• 32 External Outputs (EOs),
• 36 External Inquiries (EQs),
• and no External Interface Files (EIF s),
67
and a total of 118 Adjusted Function Points (AF P s). Using similar settings
and existing resources to the migration activities, we isolated 1 feature of PetShop

which is counted as 3 AFP, and re-developed that feature. It took us around 4
to 5 hours to completely develop this feature. Hence, we assumed 1.5 hours on

average for developing 1 AFP of PetShop. Therefore, the effort for developing
PetShop application with 118 AFPs is estimated to be around 177 hours. If
PetShop is developed from scratch, as in a development project for the Cloud,

as distinct from a migration project to the Cloud, it is expected to take roughly
similar amount of effort (177 hours) for having the same functionality (118 AFPs).
The efforts in hours spent on each migration task in our experiment were
recorded for later analysis. It is presented in the following section (Section 4.2.1),
together with observations made during the study.
4.2.1 Measured Data and Observations
This section reports our observations in our experiments as described in the pre-
vious section. The observations and experiences in our study will provide a basis
for the taxonomy of Cloud migration tasks in Section 4.4.
When migrating PetShop to Windows Azure and SQL Azure database, some
migration issues were observed and identified as:
• We have used the existing application PetShop which was not developed
by ourselves; hence, efforts were required to learn, understand, and get
PetShop to work on local machine first.
• PetShop was developed on an older platform than the current version sup-
ported by Windows Azure. This is expected to happen with many other
68
existing applications, since Cloud computing has just emerged recently and
is equipped with the latest technologies and tools, which may yield incom-
patibility issues. In particular, to deploy applications to Windows Azure,
Windows 7 is required, while PetShop installation files were packaged for

Windows XP and could not run properly on Windows 7. We needed to
deploy the PetShop source code onto Windows 7 manually.
• The same issue is applied for PetShop database. There are existing tools of-
fering database and data transfer from local servers to SQL Azure; however,
it requires SQL Server 2008 to be installed, while PetShop was designed to
work with SQL Server 2005 and cannot be installed directly on SQL Server
2008. We had to manually retrieve and run the database script on SQL
Server 2008.
• In order to deploy applications into Windows Azure Cloud, it was important

to create a package file and a configuration file from the existing source
code. The Azure plugin for Visual Studio provides a quite straightforward
method to achieve this; however, this method works with “Web application
project” only, while PetShop was created as a WebSite project, where there
is no project file and it relies on ASP.NET dynamic compilation to compile
pages and classes in the application. Effort was also spent on converting
the WebSite project to a Web Application project. Alternatively, the utility
tool cspack provided by Azure can also be used to create the package file.
Efforts spent on addressing those issues were recorded in terms of duration,
and are summarized in table 4.1 and 4.2.

PetShop was originally designed to work with Windows XP, .Net framework
69
Tasks Effort
(hours)
Install SQL Server 2005 and setup local en- 5.5
vironment in order to run the PetShop in-
stallation file
Get PetShop up and running properly 3.5
Install SQL Server 2008 to get PetShop 2
running with later technology
Migrate databases from SQL Server 2005 5
to SQL Server 2008 and modify PetShop
to work properly with SQL Server 2008
Install .Net 4 and modify PetShop to work 1.5
on Windows 7 and .Net 4
Test Petshop 5
Total 22.5
Table 4.1: Recorded overhead efforts of preparing PetShop for migration
2, and SQL Server 2005. To enable PetShop run properly for the first time, these
prerequisites need to be installed. Data in Table 4.1 shows that most time of this
activity was spent on setting up the environment to allow PetShop to run. Data
in Table 4.2 shows that the most time spent on migration to the Cloud is for
overcoming the learning curve. No new features were introduced, and Windows
Azure provides similar platform to the one on which PetShop was developed on;
therefore, only minimal code modification was required.
In our experiment, learning about the application and the Cloud environ-
ment, as well as installation and configuration, contributed most to the overhead
cost. Experience required to deal with unforeseen issues also counted for major
additional cost. When the learning phase is finished, migrating similar types of
applications will require less efforts. Figure 4.1 shows the overhead cost for each
category of the migration tasks for PetShop which has the complexity of 118
70
Tasks Effort
(hours)
Windows Azure tutorials 6
Create Azure account and setup firewall 1.5
rules
Install and explore MS Azure Training Kit 5
Tutorials: migrating databases to SQL 4
Azure
Migrate PetShop database to SQL Azure 2
Modify PetShop to work with SQL Azure 4
Test PetShop on local servers against SQL 2
Azure
Modify and package PetShop to Windows 5.5
Azure
Deploy PetShop to Windows Azure 1.5
Test PetShop in Windows Azure with SQL 5
Azure
Total 36.5
Table 4.2: Recorded overhead efforts of putting PetShop to Cloud platform
AF P s. The overhead cost is calculated as the percentage of additional efforts

over the estimated application development efforts (177 hours in total)
Figure 4.1: Migration Overhead Cost
71
Other additional issues were also observed as follow while considering Java
PetStore with Amazon EC2 and SimpleDB:
• Java PetStore was developed to work with JavaDB database, connected via
a JDBC driver. It is not straightforward to connect Java PetStore with
SimpleDB instead, since, at the time we carried out our experiment, there
was no JDBC driver written for SimpleDB. Writing a JDBC driver for
SimpleDB from scratch with full features is not feasible.
• Java PetStore uses JPA, which depends heavily on advanced features of
JDBC drivers; therefore, SimpleDB could not be connected directly to Java

PetStore.
• There exists SimpleJPA, an open-source JPA implementation for SimpleDB.

Effort is needed to understand this third-party library.
• SimpleDB is a NoSQL Cloud database and does not support full-featured

SQL statements, such as JOIN operations, which were required by Java
PetStore. Additional efforts were needed to re-write these operations.
• Amazon EC2 is a type of IaaS Cloud, so additional installations are required

compared to the experiment on Windows Azure.
Those issues require additional effort in addition to our experiment with Win-
dows Azure. The additional effort mainly fell into the categories of installation
and code modification.
The measured data and observations presented above create the opportunity
for further classification and future work in identifying migration issues and effort
unique to the Cloud.
72
4.3 Migration Influential Cost Factors
The report on our migration experiences in Section 4.2.1 helped us identify some
influential cost factors that impact on the effort of the migration process. We
differentiate two types of cost factors: internal and external. These two types are
defined as below:
Definition 1 Internal cost factors
Internal cost factors involve with the migrating system itself. These factors
essentially refers to what migration tasks are required, how they can be achieved,
and determine their complexity, without knowledge of who is carrying out those
tasks and in which conditions those tasks are done. An example of internal cost
factors is: “database migration”, which consists of modifiying schemas, and trans-
ferring data from a local database to a Cloud database.
Definition 2 External cost factors
External cost factors concern with environmental factors that are specific to
each organization, such as: development team’s skills and expertise, or knowledge
of Cloud platforms and offerings. External cost factors determine how fast a
migration task can be completed. For example, a Cloud-experienced practitioner
will usually complete a migration task faster than a non-experienced one.
These two types are very well aligned with the fundamental elements of the
Function Point approach. The internal cost factors are commonly identified first
to identify what needs to be done as well as to measure the complexity of a project.
The result will only refect on the characteristics of the project only, without
consideration of which organization is responsible for that project. External cost
73
factors are then localized for each organization, and then applied on top of the
previous result to derive an estimation of the total effort required for this project.
Based on our observations from Section 4.2.1, the influential cost factors (both
internal and external) are identified as follows. Some factors are similar to tra-
ditional software development cost factors (Ruhe et al., 2003a; Madachy, 1997),
some are specific for migration to the Cloud.
Internal Cost Factors:
Different migration strategies involve different migration tasks. Hence, the

internal cost factors, which reflect what migration tasks are needed, resulted
from choices of migration strategies.
• Compatibility issues: This factor is also affected by the similarity of Cloud

platforms and local servers. If the similarity is high, compatibility issues
can be eliminated. Effort spent on resolving these issues varies from case
to case.
• Library dependency: When an application relies on a library to function
in local server, it requires a similar library in the Cloud platform. If there

exists such a library for the Cloud, it can be reused with some minor effort;
otherwise, more effort would be required to rewrite that library. For ex-
ample, PetStore Java uses JDBC driver to connect to its JavaDB database
and it also uses JPA, which depends heavily on advanced features of JDBC
drivers. If we migrate PetStore’s database to SimpleDB in the Cloud, we
have to implement a full-featured JDBC driver for SimpleDB; otherwise,

PetStore’s data access layer must be rewritten.
74
• Database features: Migrating from a Relational Database to Amazon RDS
or Azure SQL requires less efforts than to a NoSQL database like SimpleDB,
because NoSQL database does not support full relational features, such as
Join operation. In the latter case, efforts are required to implement Join
operations or rewrite custom code for the application so that it would not
require Join features.
• Connection issues: In some Cloud migration cases, when only some com-
ponents of the system are migrated to the Cloud while the rest is kept in
house for various reasons (e.g. enterprises may wish to keep their sensitive
data in house), the connection between two parts of the system - one in
house and the other one in the Cloud - may face different issues such as
security and latency.
External Cost Factors:
• Project team’s capabilities: If the project team’s development knowledge

and skills are sufficient, a training process can be picked up quickly and less
effort is required.
• Existing knowledge and experience on Cloud providers and technologies: If
the project team possesses some levels of prior knowledge and experiences
of Cloud services and available tools, the learning curve can be improved
significantly, and hence less effort is required. As discussed in the previous
section, the learning curve is a one time task, but requires significant effort.
• Selecting the correct Cloud platforms and services (IaaS or PaaS): greatly
affects the effort and cost required for the rest of migration activities; how-
75
ever, this practice itself is not a trivial task. If the selected Cloud platform
is highly similar to the application’s environment in the local server, less

effort is required for modification.
• Application’s complexity: If the application’s complexity is high, it requires

more effort to study and modify (if any) the application.
Some of these influential cost factors are specific for migration to the Cloud,
because they are not applicable for a conventional migration project from one
platform to another. For example, a migration project from Java to .Net is a
complete rewrite and it would not have compatibility, datatabase, connection, or

possibly library dependency issues. A migration project from an old version to
a newer version of platform or environment would not have networking issues,

library dependency or database feature issues as discussed above, and so on...
These factors, one way or the other, all affect the effort spent on the Cloud
migration process.
4.4 Taxonomy of Migration Tasks
The purpose of a Cloud migration project is to port an application from a local
data center to a selected Cloud platform with no changes in functionalities or
compromises in performance. In our experiments, our migration projects started
from getting familiar with the application and the selected Cloud platform, to
setting up the environment and the application ready for migration, as well as
modifying and testing to ensure the application properly functions in the Cloud.
Our distinction between internal and external factors suggest that the internal
76
cost factors (i.e., migration tasks) will form the foundation of the taxonomy.
The list of internal cost factors introduced in Section 4.3, together with related
work from literature review and practitioners’ blogs, enables us to generalise
and propose a general taxonomy of migration tasks that any migration projects
may encounter, and the migration tasks are grouped under different categories
as summarized in Table 4.3. If T is the taxonomy of migration tasks and t is a
migration task, then T is a set of t. A migration project P ⊆ T consists of a

list of migration tasks. Some tasks in the taxonomy can be skipped, while some
tasks can be further broken down to accommodate different requirements of each
project.
The diagram in Figure 4.2 shows the sequence in which Cloud migration tasks
from the taxonomy could be executed, and the possible iterations that may occur.
The following provides a summary of the taxonomy proposed in Table 4.3
and Figure 4.2. The last three columns in Table 4.4 represent whether a specific
migration task is supported by examples from the discussion with Cloud engineers
in our group, or from the literature, or from the practitioners’ blogs.
• Training or Learning Curve - In order to ensure compliance to the
Cloud, a basic understanding of the application and the selected Cloud

platform is required.
Effort is needed for analysing the application, understanding its compo-
nents and how they are coupled together, identifying which modules are
unchanged and which modules need to be modified. It is important to un-
derstand the initial system environment, specifications and configurations

before planning any changes. Effort and costs spent on this task may not
77
Categories Tasks Ex. Lit. Blogs
Training on the existing application: Y

Training or Understand system environment, speci-
Learning fications and configurations
Curve Measure system’s size and Estimate sys- Y
tem development effort
Training on the selected cloud platform: Y
Understand its offerings and technolo-
gies used
Identify any compatibility issues Y
Training on third party tools: Iden- Y
tify and understand additional libraries,
tools for data migration, and any re-
quired middlewares
Installation Set up development tools and environ- Y
and ment
Configuration Install and set up environment in IaaS Y Y
Cloud
Install third-party tools Y
Modify database connection Y
Database
Modify database operation query (if us- Y Y Y
Migration
ing NoSQL Cloud database)
Prepare database for migration Y Y Y
Migrate the local database to Cloud Y Y Y
database
Code Modifi- Any required modification for compati- Y Y
cation bility issues
Examine all changes in network connec- Y
Network tions
Connection Tune appropriate parameters for perfor- Y
mance purpose
Ensure connection security Y
Test if local system works with database Y Y
Testing in Cloud
Test if system in Cloud works with Y Y
database in Cloud
Write test cases and test the function- Y Y
ality of the application in Cloud
Table 4.3: Taxonomy of migration tasks
78
Figure 4.2: Diagram of Cloud migration task taxonomy
be trivial for reasons such as: coding style by other developers may be dif-
ficult to study, confidentiality issues may mean that applications are not
totally transparent, and applications with many modules interacting with

each other are difficult to isolate for migration purpose (if required). Many
applications contain requirements on security or performance that need to
be investigated thoroughly.
When porting applications to Cloud platforms, no new features are intro-
duced in this study. Therefore, the complexity of the application in terms
79
of Function Points is unchanged, whereas configuration and database con-
nection are more likely to be modified. Efforts spent on this part is directly
proportional to the complexity of the application. The more complicated
the application is, the more time and skills are required to understand it.
In our experiment, PetShop was measured as 118 Function Points and was
estimated to cost 177 hours for development effort. Its requirements and
configurations were studied to identify which classes were more likely to

expose additional changes when migrating.
There exist quite a few major Cloud providers in the market, providing
different services including PaaS and IaaS. Once Cloud services are evalu-
ated and selected, training on these services is necessary. Some Cloud ser-
vices may not fully support some features provided by similar on-premise
technologies, for example, SQL Azure is the most similar to SQL Server
compared to other Cloud databases, yet SQL Azure does not support dis-
tributed transactions as SQL Server does. In our experiment, effort was
spent on training with Windows Azure using the provided Microsoft Azure
Training Kit.
There have been great contributions from the Cloud community to sup-
port Cloud services that integrate seamlessly with existing technologies and
applications. Many open-source third-party libraries and tools have been
developed. Training on these libraries and tools is also a one-time task,
although it is not easy to select the appropriate libraries and tools without
knowledge about them beforehand. These tools can be categorized as: ad-
ditional libraries (e.g. simpleJPA for SimpleDB as discussed above), tools
80
for data migration (e.g. Codeplex for converting and uploading databases
to SQL Azure), and other utilities (e.g. Windows Azure provides cspack
utility to pack a web site project ready for migrating to Azure). In our
experience, before being aware of this cspack utility, much effort was spent
on transforming a Web Site into a Web Application, which are different in
structure, so that a Web Role can be formed for migrating.
If migrating applications to a specific Cloud platform happens for the first

time, this learning curve is required; otherwise, this step can be skipped. Ef-
fort spent on this learning task depends on the existing skills, knowledge and
experiences of developers, as well as available documentation from Cloud

providers. Although this training activities are one-time tasks, the effort
required is not negligible.
• Installation and Configuration - Different effort is required for these

tasks, depending on different types of Cloud services selected, either PaaS
or IaaS.
Development tools and environment: The application’s development tools
need to be installed to examine the application’s components and to make
any necessary code modifications.
Environment in Cloud: If the target Cloud is IaaS Cloud, effort is required
for setting up and configuring the application’s environment in the Cloud
server similar to its local requirements. If the target Cloud is PaaS Cloud,
this step requires less effort as it is automatically handled by the Cloud

providers. This activity is specific for Cloud migration, as distinct from
migrating an application from one platform to another where there is no
81
such a requirement to replicate the environment.
Third-party tools: Effort is required for installing third-party tools for train-
ing purpose and migration tasks as mentioned above.
• Database Migration - This category depends on how different the two

databases in house and in the Cloud are.
Database connection and query: Database connection string needs to be
changed to connect to the new database server, in our experiment, the con-
nection is modified to use SQL Azure. However, more changes are required
if using SimpleDB as a non-relational database as discussed in Section 4.2.

SimpleDB is a NoSQL database without full support of the JOIN opera-
tions, additional codings are required to provide the same functionalities
and operations of the application, and this can also be categorized as Code
Modification. Even when two databases are the same type but different
versions, changes may also be required for syntax or schema. For example,
PetShop .Net version 4 was developed on SQL Server 2005 while SQL Azure
is only compatible with SQL Server 2008. There is no direct way to convert
PetShop database from SQL Server 2005 to SQL Azure without converting
to SQL Server 2008 first.
Prepare database for migration: SQL scripts need to be transformed appro-
priately to align with third party tools’ requirements for database migration.
Migrate the database: If previous tasks have been properly completed, the
effort required for this task is trivial and it is handled by the third party
tools. Otherwise, plans and actions for previous tasks must be revised.
Nevertheless, the size of the database also affects how fast this task can
82
be achieved. The bigger the size of the database is, the longer it takes to
migrate. Although most of this time is waiting time and may not require
any extra effort, some effort may be necessary for dividing big databases
into smaller chunks for data transfer purpose.
• Code Modification - This category depends on how different the two
environments in house and in the Cloud are.
Code changes: if the selected Cloud platform provides similar services and
technologies to the application’s environment in house, not much code mod-
ification is required. This is the case for the combination of PetShop .Net
and Windows Azure in our experiment.
Configuration changes: This involves configuration changes in both applica-
tions and Cloud platforms. Similar to code changes, configuration changes

in our experiment are minimal, although it was necessary to pack our ap-
plication together with the configuration file to Azure. In case of migrating
to an IaaS Cloud, additional configuration effort is required, including in-

stallation activities to create a similar environment in the Cloud platform.
Compatibility issues also require major modification and reconfiguration
effort, depending on how compatible the two environments are. Cloud
technologies are generally the latest ones, while the existing applications
may have been developed a few years previously. During that gap, tech-
nologies may have gone through many changes and updates. There may
not be a direct method to update from the old technologies to the latest
ones, meaning that more intermediate steps will be necessary. Also, Cloud
technologies may not provide full support for services and features offered
83
by local servers. Although SQL Azure is similar to SQL Server 2008, it does
not support distributed transactions, while SQL Server 2005 does, and Pet-
Shop .Net utilised this feature for its transactions. Code change is required
to accommodate this compatibility issue.
• Network Connection - This category only applies for partial migration
projects, where only a part of the system is migrated to the Cloud, while
the rest is still hosted in house. Connections amongst system components

are certainly affected, which may lead to performance issues. Connection
security may also require extra attention. For full migration projects, where
the migrating system is ported as a whole, this category can be safely
skipped.
• Testing - This step is one of the most important and essential activities. It
happens during migration to ensure each of the previous steps is completed
correctly, and a full testing process needs to be carried out after migration.
If test cases have already been created for local servers, they can be reused
on Clouds to ensure the application works properly. More test cases specific
for Clouds may need to be considered. Testing needs to be done for each
of the actions taken; however, major milestones for testing can be grouped
as following:
If using PaaS Clouds, migrating the database to Cloud database is required

first, then testing the application in local servers with the Cloud database.
The application can then be migrated to the selected Cloud platform, which
allows testing in the Cloud environment.
If using IaaS Clouds, developers can choose to ignore testing the application
84
4.5 Validation
in local server against Cloud database, depending on how the environment
is set up and configured.
If migrating only some components of the system to Cloud platforms, either
PaaS or IaaS, intensive testing needs to be performed to ensure the entire

system is integrated seemlessly and meets important requirements, such
as security levels and performance quality. Effort required for this task is
relatively large.
These categories are mutually exclusive since they cover different aspects of
a Cloud migration project; but on the other hand, they complement each other
and altogether provide a complete picture of migration to the Cloud. These
categorized migration tasks need to be carefully planned at the early stage of any
migration projects. Some tasks may be broken down into more detailed levels,
whereas some tasks may be skipped, depending on specific characteristics of each
project.
4.5 Validation
As discussed in Chapter 3, the discussion with Cloud engineers in our group,
and the input from the literature and practitioners’ blogs have confirmed the
validity of the taxonomy to some extent. This section attempts to validate our
proposed taxonomy using one industrial migration project to the Cloud that was
conducted by two researchers in our group. This is a consulting project with a
large Australian Financial Service Organisation (FSO) who wish to migrate a part
of their system into the Cloud without any changes to their existing application
code. Although for the time being, the FSO has no plan to migrate the production
85
system into a Cloud computing platform, the main purpose of this migration
exercise is to reduce the operational cost of the development environment. The

development environment is used occasionally but only for two months annually.
Therefore, the cost of owning and maintaining the development environment is

expected to be reduced by migrating to a pay-per-use payment model. However,
since the environment is re-activated often, but for a short period of time such as
a week, the cost and time of re-activating the development environment must be
small. Moreover, the licensing fee for the software that the FSO currently pays
is expected to be reduced by migrating to a pay-per-use paying model as well.
The steps taken in this FSO project are summarised as below:
• Step 1 - Analyse the in-house system to understand its components, op-

erations, and functionalities: The system consists of four main components,
one of which is to be migrated to the Cloud, whereas the other three com-
ponents are kept in house because of security concerns.
• Step 2 - Understand the migration requirements of the FSO in order to

define the best strategies for migration: This migration requires seamless
integration between the migrated component in cloud and the existing en-
vironment in house, and no changes to the application code. Therefore, it is
best to migrate the system to an IaaS cloud. After careful consideration of

all possible alternatives based on the system specification and the migration
requirements, it was decided that EC2 is the most suitable cloud platform
for this migration.
• Step 3 - Understand EC2 and its offerings in order to identify if there are
any compatibility issues: The tasks involved were to mirror the system en-
86
4.5 Validation
vironment into the EC2 environment. It may seem straightforward as first
since EC2 provides infrastructure services and all installation and configu-
ration should be possible. However, the existing FSO system is currently
operated on Windows Server 2003 x64 Enterprise Edition, whereas Amazon

Web Service (AWS) at the time of this project only supports Datacenter
Edition of Windows. Their differences are subtle and Datacenter Edition
is considered to be a superset of Enterprise Edition. Therefore, the dif-

ference of editions does not affect the operations of the system. Similarly,
the current system works with SQL Server 2005 x64 Enterprise Edition,
while AWS at the time of this project supports only Standard Edition. The
main difference between the two versions is the support of clustering. The
development environment of the FSO system does not require clustering of

databases. Therefore, the difference between versions is not a factor.
• Step 4 - Design strategies: include network design, system design, security

design, and monitoring and management controls for migration.
• Step 5 - Setting up Amazon Cloud: This includes sign up for an AWS
account, sign up for Amazon EC2, setting up Amazon EC2 command line
tools, setting up an Amazon Virtual Private Cloud (VPC) for security pur-
pose, getting EC2 instances, and finally adding disks to Windows instances
with pre-installed operating systems and required middleware.
• Step 6 - Setting up the migrated system: Since all required operating
systems and middleware are pre-installed on the machine images, only some
additional components are installed at this step for the migrated system to
properly function, such as: IIS Server, and SQL Server.
87
• Step 7 - Functional test: A series of functional tests provided by FSO
was performed to ensure the various components of the systems were func-
tioning properly and to discover potential problems that might be due to
the migration to the AWS. Performance issues were discovered. The reason
behind this was because the network connection between the migrated com-
ponents and others was the bottleneck. Extra effort was spent on tuning
performance parameters and securing the connections.
These steps can be mapped to the proposed taxonomy as in Table 4.4.
Taxonomy Categories FSO Migration Tasks

Training and Learning Step 1, 2, 3, 4
Installation and Configura- Step 5, 6
tion
Database Migration None
Code Modifications None
Network Connection Step 7
Testing Step 7
Table 4.4: Mapping of the FSO migration tasks and the taxonomy
The mapping described in Table 4.4 shows that the proposed taxonomy is
general enough to cover different types of migration tasks to the Cloud. However,
it can also be further broken down to better fit to specific migration tasks in more
details, such as how network connection and security are handled in step 5, where
Amazon VPC is set up, can be separated into a more detailed category than just
the general installation and configuration category.
88
4.6 Reflection and Discussion
In this section, we reflect on our methodology of building the taxonomy of Cloud
migration tasks, and also discuss its threats of validity.
A taxonomy is normally obtained by identifying a list of criteria for a topic of
interest, and then classifying existing elements according to those criteria. In our
context of Cloud migration, the fundamental elements are migration tasks and
they have never been officially identified or organised into a collection. Therefore,
the taxonomy of Cloud migration tasks was derived mainly from our experience
of migrating PetShop .Net to Windows Azure, a PaaS type of Cloud. We also
considered the case of migrating Java PetStore to Amazon EC2, an IaaS type of
Cloud, in an attempt to add more richness to the taxonomy.
In addition, it would be ideal to have external participants for the validation
process, rather than just NICTA projects and participants. However, it was not
easy to locate an external migration project that covers all aspects to be validated
in the taxonomy. It was also not feasible to locate multiple external projects for
this stage’s validation, given that we also had to find data points for next phase.
As a result, the taxonomy proposed in this chapter is exposed to the threat of
external validity. Although the validation in Section 4.5 demonstrates that the
taxonomy can very well fit into a common case of Cloud migration, there is no
guarantee that the taxonomy can be sufficiently applied to every other migration
projects. This is because of the wide variety of Cloud migration project types,
such that, it is not possible to anticipate all migration tasks that could occur in
reality. The taxonomy can only cover general migration tasks that are likely to
occur in a common migration case.
89
However, the structure of the taxonomy is general and flexible enough so that
categories can be broken further down or extended to include new migration

tasks. This characteristic enable the taxonomy to be applicable and adaptable
to any new type of Cloud migration.
In this study, the assumption was that the Cloud target has been already
selected and its selection is outside the scope of a migration project. However, our
experiences show that major effort is required for selecting Cloud providers and
services. Also, for security reasons, large enterprises tend to keep sensitive data
and applications in their local data centers, and migrate only some components
to Cloud platforms. Therefore, enterprises may as well encounter challenging

post-migrating tasks to ensure the entire system functions seemlessly.
The taxonomy is applicable for both PaaS and IaaS Clouds. Due to the
differences of PaaS and IaaS types of Cloud, effort required for each migration
task is also different. Table 4.5 below shows a side by side comparison of how
different effort is required for migrating to PaaS and IaaS Clouds.
Tasks PaaS IaaS

Training or major major
Learning Curve
Installation and minor major
Configuration
Database Migra- major major
tion
Code Modifica- major none
tion
Network Con- none minor
nection
Testing major major
Table 4.5: Efforts comparison for migrating to PaaS and IaaS Clouds
90
• Training or Learning Curve - Both PaaS and IaaS Clouds require
significant learning effort for several reasons: the Cloud offers latest tech-
nologies that one may not be farmiliar with; new offerings and services are
rapidly created; and the Cloud has a broad community who contributes
with numerous third-party tools. The task in this category could take up a
huge amount of time for both IaaS and PaaS Clouds at the beginning.
• Installation and Configuration - Creating a similar environment to
the local server in IaaS Clouds requires significant effort compared to PaaS
Clouds. In PaaS Clouds, the environment is handled by Cloud providers.
• Database Migration - Changing and migrating a database to both IaaS
and PaaS Clouds can be very hard if the local databases and the Cloud
database are different. This could require major effort.
• Code Modification - IaaS Clouds provide a more flexible environment to

deploy and manage applications; therefore, no major code modification is
required if the environment in IaaS Clouds has been installed and configured
similarly to that in the local servers. There is no such flexibility in PaaS
Clouds; hence, code modification is needed for the application to run in
PaaS Clouds.
• Network Connection - PaaS Clouds free their users from the burden of
infrastructure management tasks; hence, their flexibility is lower than IaaS
Clouds. Therefore, PaaS Clouds’ users would not need to concern about
application network connections; whereas IaaS Clouds’ users are responsible
for those of their systems, although only minor effort is anticipated.
91
• Testing - Testing is unavoidable when changes are performed in the appli-
cation, either code changes or configuration changes. The migration team

may need to undertake a full testing on the application in the Cloud, both
IaaS and PaaS, to make sure the application functions properly. This effort
depends on the complexity of the application.
Table 4.5 shows a side by side comparison of whether none, minor or major
effort is required for each migration category in PaaS versus IaaS Clouds. IaaS
Clouds provide a more flexible environment to deploy and manage applications;
therefore, no code modification is required. However, installation and configu-

ration tasks in IaaS Clouds to create similar settings to the applications’ local
environment require significant effort compared to PaaS Clouds. Also, both PaaS
and IaaS Clouds require significant learning effort and testing.
Effort required for a migration project to a Cloud platform, either PaaS or
IaaS type of Cloud, depends on various factors as illustrated above. The study
in this chapter enables us to understand these influential aspects and forms a
background for us in our next step of quantifying Cloud migration tasks in the
next chapter.
4.7 Summary
Migrating applications to Cloud platforms requires extra effort to perform migra-

tion tasks as demonstrated in the previous sections. Application migration from
local servers to Cloud platforms is a one-time task and may seem straightforward
at first. However, our experience showed that this process is not automatic and
effort spent on migration may be not trivial.
92
4.7 Summary
In this chapter, we experimented with the re-engineering and migration of a
software application, and successfully deployed it into a Cloud platform. The

experience allowed us to identify important influential cost factors for migrating
a system to the Cloud, which provides the basis for understanding the cost im-
plications of a Cloud migration project. A taxonomy of migration tasks has been
developed and tailored specifically for our Cloud migration context, and applied
to one validation project using different strategies. It will be used as input into
our size measurement model for migration projects to the Cloud in the following
chapter.
The taxonomy consists of six main categories, namely: Training and Learn-
ing, Installation and Configuration, Database Migration, Code Modifications,
Network Connection and Testing. These categories resulted from the internal
cost factors that were identified from our experiment. We have also identified
external cost factors, which are environmental aspects of organizations intending
to conduct migration projects. While the taxonomy and the internal cost factors
indicate what migration tasks are required, and how those tasks are completed;
the external cost factors determine how fast those tasks can be achieved.
93
94
Chapter 5
Cloud Migration Point
“Measuring programming progress by lines of code is like measuring

aircraft building progress by weight.”
∼ Bill Gates.
The taxonomy of Cloud migration tasks outlined in the previous chapter helps
Cloud consumers to form their migration plans. A Cloud migration project con-
sists of a list of migration tasks from the taxonomy. As a result, the amount of
effort required for a migration project to Cloud is accumulated from the effort
spent on each migration activity or migration task.
In this chapter, we introduce our Cloud Migration Point (CMP) model and
how it can further assist the Cloud consumers in estimating the size of those
migration tasks in their plans, which will facilitate the prediction of the amount
of effort required. We also describe the counting method of the CMP model,
illustrated with examples to help practitioners apply it easily.
CMP is a size metric for Cloud migration projects, which is expected to be
95
5. CLOUD MIGRATION POINT
applicable early in the migration process. Additionally, Cloud migration projects
consider not only the system to be migrated, but also the migration process
where various aspects of the system, besides lines of code, are involved. For
these reasons, Function Point (FP) approach, which has been seen a successful
foundation for many extensions, is more suitable as a basis for the CMP model
than SLOC. We have determined to develop the CMP model by using the well-
known FP approach, and applying it in our Cloud migration context. It is worth

noting that CMP extends FP not by adding more elements into the existing FP
method, but by adopting the three-step approach of FP:
1. Classify the basic estimating units (a function in the FP context, a class in
the Class Point (Costagliola et al., 2005) context, and a migration task in
CMP context) into different pre-defined categories
2. Then for each unit, evaluate its complexity level (Low, Average, or High)
3. Finally, compute the final sizing value
Apart from the FP methodology, the CMP model is also developed on the
basis of the taxonomy presented in Chapter 4. Each category from the taxonomy
will be carefully analyzed and selected to be a CMP component. This will be
discussed in further detail in sub-sections of this chapter.
Sub-sections of this chapter will explain and cover different aspects of the CMP
model, and are arranged as follows: Section 5.1 states the underlying assumptions
of the CMP model. Section 5.2 analyzes the cost factors from the taxonomy
in Chapter 4 to consolidate the fundamental components of the CMP model.

Section 5.3 classifies Cloud migration projects into different types based on their
96
5.1 CMP Assumptions
characteristics. The purpose of this Section is to, later on, show that the CMP
model can be applied to different migration project types. Section 5.4 describes
our CMP metric and its counting process. Section 5.5 demonstrates how CMP
can be applied to size an example Cloud migration project. A reflection on

our process of building the CMP model is presented in Section 5.6. Section 5.7
summarizes and concludes the chapter.
5.1 CMP Assumptions
This section will explain the CMP model’s alignment to the broader scope of our
work (presented in Section 1.4). Some specific assumptions for the CMP model
itself will also be stated.
• We consider migration cases between two data centers only (typically, one
in-house and one in-Cloud). In the case where two or more data centers are
involved, CMP can be applied repeatedly for each pair of data centers.
• Our work only focuses on PaaS and IaaS Clouds. Hence, CMP considers
only IaaS and PaaS, although some parts of our cost model might still be
applicable to other Cloud offerings.
• If it is required to modify the application code for the Cloud environment,
CMP is only applicable for object-oriented applications. It assesses appli-

cation code changes at “class” level.
• We assume that the decision on the Cloud target is not a part of the mi-
gration process. CMP estimates the complexity of migrating to a specific
97
Cloud platform, excluding the process of determining the most suitable
Cloud technologies/providers, and the need to get familiar with the specific
Cloud technology and offering.
• We assume that the design decision for the migration has been made, such as
which components of the system to be migrated to the Cloud, which compo-
nents stay in the local data centre, which pieces of code require modification
for the Cloud environment, which network connections to be modified and
what requirements must be satisfied. CMP requires inputs from the design
phase and is most appropriate to apply before the implementation phase of
a migration.
• CMP takes it for granted that all migration tasks have already been out-
lined. CMP measures the size and complexity of migration tasks, hence
migration tasks must be outlined in advance (i.e., the migration plan has
sufficiently completed).
The above presented items form the scope and assumptions of the CMP model
in this chapter.
5.2 Cloud Migration Cost Factors
In chapter 4, we have defined two types of cost factors of a Cloud migration
projects, namely internal and external cost factors. Internal cost factors refer
to what migration tasks are required, how they can be achieved, and determine
their complexity, without knowledge of who is carrying out those tasks and in
which conditions those tasks are done. External cost factors are concerned with
98
environmental factors that are specific to each organization, such as: develop-
ment team’s skills and expertise, or knowledge on Cloud platforms and offerings.
External cost factors determine how fast a migration task can be completed.
The CMP model aims at sizing Cloud migration projects. In other words, the
CMP model will measure the size of all migration tasks involved in a migration
project. As a result, our CMP model focuses only on the internal cost factors and
identify them as sole indicators of the migration tasks’ complexity, regardless of

who conducts those tasks and under what conditions the tasks are carried out.
The internal cost factors essentially equate to the taxonomy of Cloud migration
tasks.
The CMP model measures the accumulated size of all migration tasks making
up the migration project. Therefore, the taxonomy can comfortably be fed as

the input into the CMP model. This section repeats here all categories of the
taxonomy for the convenience of reading, and will also analyze each category of
the taxonomy and determine which categories are suitable for the CMP model,
taking into considerations the assumptions stated in Section 5.1.
• Training or Learning Curve - The tasks in this category rely heavily on
the Cloud experience of developers and their learning abilities, which are
external cost factors. Although this category contributes significantly to the
total effort required, we exclude this category from the scope of the CMP
model. This category itself should be treated in a separate study since it is
also concerned with the learning ability of different individuals.
• Installation and Configuration - When migrating to an IaaS Cloud such as
Amazon EC2, effort is required to install the necessary system software,
99
database servers, or middleware; environment variables and settings also
need to be configured. When migrating to a PaaS Cloud such as Microsoft

Azure, installation and configuration effort lies in the application layer,
such as libraries or plugins. If the application before migration relies on

some third-party libraries, similar libraries are required in the Cloud as
well. Effort is required to integrate the new libraries with the application
after migration. Hence, the tasks in this category should be included in the
CMP model.
• Database Migration - Migrating a database to the Cloud can result in

database schema changes and query changes because of differences in ver-
sions, variants (MySQL vs. MSSQL), or database types (Relational vs.
NoSQL). Effort is needed to change schemas, modify queries, transfer and

populate databases. The tasks in this group should be covered by the CMP
model.
• Code Modification - In some migration cases, code modification is required

to adapt to the new programming model in the Cloud, or database access
layer needs to be changed to work seamlessly with different databases in the

Cloud. If a relational database is migrated to a NoSQL Cloud database, the
JOIN operation may need to be added to an application’s code to preserve

system’s functionality. If required libraries are not available in the Cloud,
a rewrite of libraries is necessary; or if similar libraries exist, code needs to
be changed so that application-library integration does not interfere with
system’s functionality. These tasks reflect the changes in the migrating

system; hence, should be assessed by the CMP model.
100
• Network Connection Changes - Within a system S before migration, the
connection between two components A and B is a LAN connection. If only

B is migrated to the Cloud and A is kept in local data center, the LAN
connection between A and B becomes a WAN connection. If both A and B

are migrated to the Cloud, the LAN connection between A and B becomes
a LAN connection in the Cloud (network conditions can be different in the
Cloud). Network conditions in the Cloud (even for LAN) may be different
to the original environment. In all cases, the connection is changed and
effort is required to ensure security and performance are optimal. The
CMP model will also take these tasks into account.
• Testing - Many different testing activities may be required. Testing to

make sure the systems function properly with no performance issues can
be incorporated into other categories. For example, testing tasks to ensure

that network connection security and performance are optimal are included
in the network connection category. Other formal tests that have their
requirements, methodology, and test cases for the Cloud migration context
are not different from the traditional software development. Other size
metrics for traditional software development do not take these testing tasks
into their measurement; similarly, this category is excluded from the CMP
model.
From the above analysis, the CMP model is determined to include 4 main
components: Installation and Configuration, Database Migration, Code Modifi-
cation, and Network Connection. These components capture distinct aspects of

a migration project to the Cloud; therefore, the CMP model is intended to cover
101
all these aspects separately.
5.3 Cloud Migration Project Classification
The cost factors identified in Section 5.2 do not apply to all components of the
system, but only to those components that have been affected by the migration.
We classify components involved in a migration into four different categories:
Migrated, Removed, Unchanged and Added. These categories would help us

better understand the dynamics of the migration process, as well as its impact
on the effort as captured in our CMP model. There are two options for an
existing component: either migrated to the Cloud or kept in-house. For the
former option, if the component is migrated to the Cloud without any changes, it
belongs to the Migrated category. If it is migrated to the Cloud and then modified,
it can be considered as a Removed component and a newly Added component.
For the latter option of the component being kept in-house, if nothing changes,
the component belongs to the Unchanged category. If it is changed, it is again

considered as a Removed component and then a newly Added component. If
a component is removed from the system, it belongs to the Removed category.
Similarly, if a new component is added to the system, it belongs to the Added
category. Therefore, these 4 categories are sufficient to cover all components

related to the migration.
It is important to distinguish between a migrating system and a migration
project. The definitions of these two concepts have been clearly defined in Chap-
ter 1, Section 1.2 It is repeated here for convenience purpose: A migrating system
is the system to be migrated to the Cloud, and is defined as a set of components
102
required for the system to function properly, such as: third-party libraries or mid-
dlewares, system software, databases, applications’ code, and network connection

amongst its modules. A migration project is defined as a set of migration tasks
to move a migrating system from a local data center to the Cloud.
We classify a migration project by, first, denoting its migrating system’ states
in a local data center and in cloud before and after the migration as summarized
in Table 5.1.
Local Remote
Before Migration L R
After Migration L R
Table 5.1: System’s states before and after migration
Table 5.1 depicts the components that present at each of the states, with the
rows dividing the components temporally and columns dividing the components
spatially. The set of components at each of the states are denoted by L = ∅,
R, L and R . Note that, the same component may appear in different rows but
they cannot appear twice in the same row (i.e. a component cannot appear both
in-house and in-cloud at the same time). Hence, L and R are disjoint sets, and
similarly, L and R are also disjoint. The allocation of components to each state
can be determined using the design documents.
Definition 1 A migration project is defined as a full migration if L ⊆ R ,

otherwise it is a partial migration.
The set of components involved in a migration project can be partitioned into
three categories (or disjoint subsets):
103
• Migrated components (M = L ∩ R ) - Components moved from in-house to
the cloud. These components are reused with or without modifications. For
example, third-party libraries, database servers, or system software that are
moved to cloud (i.e., effort involved for installation, configuration, and in-
tegration with the rest of the system); application’s code (i.e., effort needed
for moving and changing code); and database (i.e., efforts required for data
transfer and any required modifications in schema and queries).
• Removed components (R = L \ (L ∪ R )) - Components removed from in-

house as a result of the migration. Removal of components is not always
necessarily because some components can exist without interfering or dis-
rupting the functionality of a system, in which case no effort is required.

However, sometimes this may be necessary to ensure normal operation of
the system, then effort will be required.
• Unchanged components (U = L ∩ L ) - Components that remain unchanged

in-house. These components do not participate in the migration process,
they simply continue to operate in-house as usual, hence no effort is re-
quired.
In addition to the above, there is also the category of Added components ((L ∪
R ) \ (L ∪ R)), which are components added to the system as part of the migra-
tion, such as: new libraries in cloud, newly added code for extra functionality,
or integrating new middlewares. For example, when a library is not fitted for
the cloud environment, a more suitable library is used if it exists in cloud or is
rewritten if it is not available. In a partial migration, if a component remains
in-house and is modified to interact with a component that has been migrated
104
to the Cloud, it can be categorized as removing the old component and adding a
new component.
Proposition 2 If x is a component in the local data center before migration
(i.e., x ∈ L), then x is one of a migrated component (i.e., x ∈ M), a removed
component (i.e., x ∈ R) or an unchanged component (i.e., x ∈ U) after migration.
Proof 3 It suffices to show that (1) M ∪ R ∪ U = L, and that (2) the collection
{M, R, U} is pairwise disjoint.
For (1),
M ∪ R ∪ U ≡ (L ∩ R ) ∪ (L \ (L ∪ R )) ∪ (L ∩ L )
≡ (L ∩ (L ∪ R )) ∪ (L \ (L ∪ R )) ≡ L.
For (2), there are three cases:
(i)
M ∩ R ≡ (L ∩ R ) ∩ (L \ (L ∪ R ))
≡ (L ∩ R ) ∩ (L ∩ (¬L ∩ ¬R )) ≡ ∅;
(ii)
M ∩ U ≡ (L ∩ R ) ∩ (L ∩ L ) ≡ (L ∩ (L ∩ R ))
≡ L ∩ ∅ ≡ ∅.
Note that (L ∩ R ) ≡ ∅ as defined above;
(iii)
R ∩ U ≡ (L \ (L ∪ R )) ∩ (L ∩ L )
≡ (L ∩ (¬L ∩ ¬R )) ∩ (L ∩ L ) ≡ ∅.
105
The effort associated with each of the categories defined above are carefully
captured in our CMP model. Roughly speaking, migrating components require

the most effort, followed by adding and removing components then components
with no changes. Each component here can be a piece of code in the application,
or a database, or a third-party software to enable the whole system to function
properly. Extra effort may also be required to ensure these components work
together seamlessly.
5.4 Cloud Migration Point
The classification of Cloud migration projects discussed in Section 5.3 can be

seen as a way to allocate components of a migration project into different types.
Regardless what type a migration project is, the effort required for the whole
project still aligns with the CMP components defined in Section 5.2.
The CMP metric consists of 4 main components (and each component is a set
of related migration tasks):
• Network Connection Component: CMPconn - covers all migration tasks re-

lated to network connection changes.
• Code Modification Component: CMPcode - concerns with all application

code changes.
• Installation and Configuration Component: CMPic - includes all tasks to

install and configure the Cloud environment to be suitable for the migrating
system.
106
• Database Migration Component: CMPdb - considers all database-related
migration tasks.
Each of these CMP components is developed in light of the FP three-step
approach. Particularly:
• Firstly, each migration task in each CMP component is identified and classi-
fied into a pre-defined sub-category. These sub-categories will be discussed
further for each component later in the chapter.
• Secondly, each migration task is evaluated on its complexity level (Low,

Average, or High) based on some pre-defined criteria.
• At this last stage, each migration task has already been classified into a
specific type and has been evaluated with a complexity level in the first two
steps. A weighted value will then be assigned for each task accordingly.
Finally, the total value of this CMP component is the sum of weighted
values of all migration tasks in this component.
Then, the final CMP value is calculated as a weighted sum of its four compo-
nents CMPconn , CMPcode , CMPic , and CMPdb , which measure size of migration
tasks related to connection changes, code changes, installation and configuration,
and database changes, respectively. In this section, we delve further into the
components of each category to assess the complexity of each migration task.
The weighted values assigned for each migration tasks in the third step are
initially derived from our discussion with a group of Cloud engineers, who have
carried out different types of Cloud migration projects themselves. These values
107
will be calibrated further in Chapter 6 with more empirical data. In this chapter
we will present the model with these initial values.
5.4.1 Network Connection Component: CM Pconn
CMPconn assesses all migration tasks related to network connections and evaluates
their complexity. It adopts the three-step approach from FP as discussed above.

First, all network connections that will be affected by the migration process
and require effort to optimize performance are identified and classified into three
types:
• LAN-to-LAN: A connection belongs to this type if both ends A and B

of the connection are migrated from the local data center to the Cloud,
i.e., {A, B} ⊆ L ∩ R . The LAN connection in the local site becomes a

LAN connection in the Cloud. Its performance may be affected because
of possible changes in the network environment. Some migration tasks
and minor effort are expected to ensure that security and performance are
preserved.
• LAN-to-WAN: A connection is classified into this type if only one end A
of the connection is migrated to the Cloud while the other end B stays in-
house (i.e., A ∈ L ∩ R and B ∈ L ∩ L ). The LAN connection in the local

site becomes a WAN connection spanning from in-house to the Cloud over
an Internet connection. Major effort is anticipated for securing the WAN

connection and optimizing its performance.
• WAN-to-LAN: This type of connection happens if before migration, a part
of the system is already in the Cloud, i.e., R = ∅. Before the migration, this
108
is a WAN connection with one end A in local data center (i.e., A ∈ L) and
the other end B in the Cloud (i.e., B ∈ R). After the migration, both ends A
and B are in the Cloud (i.e., A ∈ L ∩ R and B ∈ R ∩ R ). The connection
becomes a LAN connection in the Cloud environment. Migration tasks
related to this type are to undo all security and performance tasks applied
from the previous type LAN-to-WAN. This is necessary because a WAN
optimization is unlikely to be the best option for LAN performance.
Second, the complexity level (Low, Average, or High) of all migration tasks
involved in each connection is evaluated based on its requirements for security
and protocol optimization using Table 5.2. We identify these two dimensions:
Security and Protocol Optimization, as main cost factors for connection-related

tasks in the Cloud context, based on our Cloud migration experience with cost
breakdown analysis, discussion with Cloud engineers, analysis of the taxonomy
from the previous Chapter, and close study of many Cloud practitioners’ blogs
and discussions.
Protocol Security
Optimization Required Not Required
Required High Average
Not Required Average Low
Table 5.2: Complexity evaluation for each connection
Lastly, a weighted value is assigned for each connection, based on its type
identified from the first step and its complexity level evaluated from the second
step, using Table 5.3. For example, if a connection is of LAN-to-WAN type and
of High complexity level (i.e., it requires effort for both security and protocol
optimization), its associated weight value would be 9. Values in Table 5.2 and
109
5.3 are defined from our discussion with a group of cloud engineers involved in
cloud migration projects.
Connection Connection’s Complexity Level

Total
Type Low Average High
LAN-to-LAN ... × 1 = ... ... × 3 = ... ... × 4 = ... ...
LAN-to-WAN ... × 1 = ... ... × 6 = ... ... × 9 = ... ...
WAN-to-LAN ... × 1 = ... ... × 6 = ... ... × 9 = ... ...
CMPconn ...
Table 5.3: Evaluating CMPconn
The value of CMPconn is defined as the weighted sum of all identified connec-
tions:

2
2
CM Pconn = xij × wij
i=0 j=0
where xij is the number of connections type i with complexity level j, and wij
is the weighted value for connection type i and complexity level j.
5.4.2 Code Modification Component: CM Pcode
CMPcode assesses any migration tasks relating to code changes. These tasks can
vary from adding new functionality, removing unnecessary code, to modifying
code to use new databases or integrate with new libraries. CMPcode is inherited
from Class Point (Costagliola et al., 2005) but with modifications to adapt to code
changes rather than adding new functionality. Similar to CMPconn , CMPcode also
follows FP’s three-step approach.
First, all classes in application code that require modification efforts are iden-
tified and classified into four types as defined in Class Point (Costagliola et al.,
110
2005):
• Problem Domain Type (PDT): classes that represent real-world entities in
the application domain of the system.
• Human Interaction Type (HIT): classes designed for information visualiza-
tion and human-computer interaction.
• Data Management Type (DMT): classes that accommodate data storage
and retrieval.
• Task Management Type (TMT): classes that are responsible for definition
and control of tasks, communications between subsystems and with external

systems.
Identify:
Before changing code After changing code
A - a set of attributes A - a set of attributes
M - a set of public methods M - a set of public methods
S - a set of services re- S - a set of services re-
quested from other classes quested from other classes
Derive:
|A \ A | : number of attributes removed
|A \ A| : number of attributes added
|M \ M | : number of methods removed
|M \ M | : number of methods added
|S \ S | : number of requested services removed
|S \ S| : number of requested services added
Define the changes:
CA = |A \ A | × 0.2 + |A \ A| : changes in attributes
CM = |M \ M | × 0.2 + |M \ M | : changes in methods
CS = |S \ S | × 0.2 + |S \ S| : changes in services requested
Table 5.4: Elements of each changed class
111
Second, each class’s changes in three dimensions: attributes (CA), public
methods (CM ), and services requested from other classes (CS), are evaluated.
These changes are made of the number of elements to be removed and added by
following three steps in Table 5.4.
The sets of three elements (attributes, methods, services requested) of the
system are identified both before and after code change (e.g., A and A are sets
of attributes before and after the migration, respectively). This information is
already available after the design phase of the development cycle, where all design
decisions have been made.
The number of elements to be removed and added is calculated by taking
the differences between its sets before and after the migration(e.g., |A \ A | and
|A \A| are the number of attributes to be removed and added, respectively). The
final values CA, CM , and CS are determined by applying a factor of 0.2 and 1
on removing and adding tasks, respectively (e.g., , CA = |A \ A | × 0.2 + |A \ A|).
These factors were suggested by Niessink and Vliet (Niessink & Vliet, 1997) since
a removing task also requires effort although not as significant as an adding task.
One element which no longer contributes towards a system’s functionality is better
removed because its presence may result in system’s unexpected behaviours.
CA, CM , and CS are defined to capture aspects of changed classes. Special
circumstances happen when a class is newly added, i.e., there are no existing sets
of elements before the migration, or A = M = S = ∅. In this case,
CA = |A \ A | × 0.2 + |A \ A| = 0 × 0.2 + |A | = |A |,
which is the number of attributes in the new class. Similarly CM = |M | and
CS = |S |, which are the number of methods and services requested in the new
112
class. These three values are similar to Class Point for sizing a new class for
development effort. In other words, CA, CM and CS are also valid for capturing
newly added code.
These three dimensions form the basis to evaluate each changed class’s com-
plexity level as in Table 5.5. The complexity level indicators are inherited from
Class Point.
Changes in CA
CA)
Changes in Attributes (CA
CM )
methods (CM 0−5 6−9 ≥ 10
0−4 Low Low Average
5−8 Low Average High
≥9 Average High High
(a) Changes in services requested (CS): 0 − 2
Changes in CA
CA)
CM )
methods (CM 0−4 5−8 ≥9
(b) Changes in services requested (CS): 3 − 4
Changes in CA
CA)
CM )
methods (CM 0−3 4−7 ≥8
(c) Changes in services requested (CS): ≥ 5
Table 5.5: Complexity evaluation for each class
Lastly, a weighted value is assigned for each changed class based on its type
identified from the first step and its complexity level evaluated from the second
step. These weights are also adopted from Class Point (shown in Table 5.6).
113
Class Class’ Complexity Level

Total
PDT ... × 3 = ... ... × 6 = ... ... × 10 = ... ...
HIT ... × 4 = ... ... × 7 = ... ... × 12 = ... ...
DMT ... × 5 = ... ... × 8 = ... ... × 13 = ... ...
TMT ... × 4 = ... ... × 6 = ... ... × 9 = ... ...
CMPcode ...
Table 5.6: Evaluating CMPcode
The value of CMPcode is computed as a weighted sum of all changed classes:

3
2
CM Pcode = xij × wij
i=0 j=0
where xij is the number of classes of type i with complexity level j, and wij
is the weighted value for class type i and complexity level j.
CMPcode is analogous to Class Point in the sense that it also assesses a class’
attributes, public methods, and services requested from other classes. However,
it extends Class Point by evaluating the changes of elements in a class by taking
into account both adding and removing tasks. Nevertheless, its validity still holds
when it comes to adding an entirely new class, in which case its counting approach
is exactly the same as Class Point, as shown above. As a result, all complexity
levels and weighted values can be sufficiently inherited from Class Point.
5.4.3 Installation and Configuration Component: CM Pic
CMPic assesses all migration tasks related to Installation and Configuration (IC),
such as: installation of system software, middleware, database server, third-party
library; or configuration of environment variable and basic network information.
114
CMPic is determined in a similar manner as the previous two components of
CMP.
First, all required installation and configuration tasks are identified and clas-
sified into two types:
• Infrastructure level: software or servers required to set up the environment
belong to this type, for example, setting up EC2 instance or image, installing
the operating system and middleware, or installing database server.
• Application level: this type consists of any third-party libraries that the
application requires, for example, JDBC drivers for databases. When an
application relies on an external library to function properly, and that li-

brary does not exist within the Cloud environment, there are two options:
– (1) Rewrite the library from scratch for the Cloud environment - This
is seen by CMP as adding new code into the system and is sufficiently
captured by CMPcode . Hence, the migration tasks related to this option
are excluded from CMPic .
– (2) Reuse a similar library (if one exists) in the Cloud environment,
and change code in the system to preserve functionality and to connect
with the new library seamlessly - The migration tasks involved in this
option are integrating the new library into the system, which will be
assessed by CMPic , and changing code, which is assessed by CMPcode
and excluded from CMPic . If the libraries are available in the Cloud
environment exactly as required, the migration tasks expected are to

integrate them with the system and are measured by CMPic .
115
Second, we evaluate the complexity of each IC task based on the number
of configuration steps required and the installation methods (from binary files
or source code) as in Table 5.7. Installation and Configuration usually go to-
gether for each package or software, for example, when java is installed, the
JAVA HOME variable needs to be set accordingly; or when MySQL is installed
in an Ubuntu EC2 instance, it is not accessible from outside the instance by de-
fault, hence reconfiguration for accessibility is required. Therefore, Installation
and Configuration tasks are evaluated together based on the following criteria:
• Installation: is the installation package or only source code available? or

no installation is required at all?
• Configuration: for each installation, how many configuration steps are re-
quired?
Installation
Configuration
No installation Package Source Code
<2 Low Low Average
Table 5.7: Complexity evaluation for each IC task
For example, the IC task of installing MySQL from an installation file and
consisting of one configuration to allow global accessibility is of Low complexity.
Finally, each IC task is assigned with a weighted value as in Table 5.8 based
on its type from the first step and its complexity level from the second one. This
last step is necessary because an IC task at the Application level requires different
amount of effort from the same complexity IC task at the Infrastructure level.
The final value of CMPic is determined as:
116
IC IC’s Complexity Level

Total
Application ... × 1 = ... ... × 2 = ... ... × 7 = ... ...
Infrastructure ... × 1 = ... ... × 3 = ... ... × 9 = ... ...
CMPic ...
Table 5.8: Evaluating CMPic

1
2
CM Pic = xij × wij
i=0 j=0
where xij is the number of IC tasks of type i with complexity level j, and wij
is the weighted value for IC task type i and complexity level j.
5.4.4 Database Migration Component: CM Pdb
CMPdb assesses all migration tasks related to modifying queries and populating
data to new databases, excluding database server installation tasks and any code
changes required which have been covered by CMPic and CMPcode , respectively.
Since the effort required for each query modification task or data population task
is quite uniform, CMPdb is easier to calculate than other CMP components.
First, all database related tasks are identified and classified into two types:
• Query modification task: when a database changes in database type (e.g.,
MySQL to MSSQL), or database version, or from relational to NoSQL
database, queries must be modified accordingly.
• Data population task: Data in each table must be packaged and loaded into
the new database.
117
Second, the complexity of each task is determined based on the differences
between the database of the local data center and the database in cloud: same
type of relational database, same type of relational database but different versions,
different types of relational databases, or relational to NoSQL database. Table

5.9 summarizes these complexity levels.
Database changes Complexity level

Same relational database, same version Low
Same relational database, different version Average
Different relational databases Average
Relational to NoSQL databases High
Table 5.9: Complexity evaluation for each database task
Finally, CMPdb is determined by the number of database tasks and for each
database task its associated weight as in Table 5.10.
Complexity Level
Type Total
Low Average High
Query Modification ... × 1 = ... ... × 3 = ... ... × 8 = ... ...
Data Population ... × 3 = ... ... × 4 = ... ... × 10 = ... ...
CMPdb ...
Table 5.10: Evaluating CMPdb
The final value of CMPdb is calculated as:

1
2
CM Pdb = xij × wij
i=0 j=0
where xij is the number of database tasks of type i (i.e., the number of queries
to be modified or the number of tables to be populated) with complexity level j,
and wij is the weighted value for database task type i and complexity level j.
118
5.5 CMP Application
5.4.5 CMP
The final value of CMP is determined as a weighted sum of its four components
CMPi with i ∈{conn, code, ic, db}:

3
CM P = CM Pi × wi
i=0
where CMPi is the value of CMP type i, and wi is the weighted value for CMP
type i (as shown in Table 5.11).
Type CMPconn CMPcode CMPic CMPdb

Weight 3 5 2 1
Table 5.11: Weighted values of CMP’s components
Conclusion:
In this section, we have presented the CMP model and its counting method
for sizing a Cloud migration project. The greater the CMP value is, the more
complicated the project is, and the more effort is required.
5.5 CMP Application
This section will demonstrate how CMP can be applied to size a Cloud migration
project in practice. In this section, we use the example of PetShop .Net that
has been described in the previous chapter (Section 4.2). For the convenience
of referencing, we summarize here again our experiment process of migrating
PetShop .Net to Windows Azure and SQL Azure databases as following:
119
• We have used the existing application PetShop which was not developed by
ourselves; hence, we need to learn, understand, and get PetShop to work

on local machine first.
• PetShop was developed on an older platform than the current version sup-
ported by Windows Azure. Windows 7 is required to deploy applications to

Windows Azure, while PetShop installation file was packaged for Windows
XP and could not run properly on Windows 7. We needed to deploy the
PetShop source code onto Windows 7 manually.
• The same issue is applied for PetShop database. There are existing tools of-
fering database and data transfer from local servers to SQL Azure; however,
it requires SQL Server 2008 to be installed, while PetShop was designed to
work with SQL Server 2005 and cannot be installed directly on SQL Server
2008. We had to manually retrieve and run the database script on SQL
Server 2008.
• In order to deploy applications into Windows Azure Cloud, it was important
to create a package file and a configuration file from the existing source code.
Azure plugin for Visual Studio provides a quite straightforward method to
achieve this; however, this method works with “Web application project”
only, while PetShop was created as a WebSite project, where there is no
project file and it relies on ASP.NET dynamic compilation to compile pages
and classes in the application. Effort was also spent on converting WebSite
project to Web Application project. Alternatively, the utility tool cspack
provided by Azure can also be used to create the package file.
120
5.5 CMP Application
Our experiment with PetShop .Net includes tasks to enable the application
work on local machine prior to migration. This is out of scope of our Cloud
migration project as outlined in Sections 1.4 and 5.1. The starting point of the
migration project is defined when PetShop .Net is already running in the local
machine and is ready to be migrated, and the ending point of the migration
project is when PetShop has been all moved to Windows Azure together with its
database. CMP model for sizing a migration project to the Cloud only consider
migration tasks within the scope of the defined migration project. Hence, we
exclude all tasks to understand the application’s source code and operations, or
to install packages to enable the application to work on local machines.
As a result, migration tasks for PetShop can be selected and categorized into
four components of the CMP model as following:
• CMPconn : There is no LAN or WAN connections amongst components

of the PetShop application; hence, no migration tasks are required for the
CMP connection component. As a result,
CM Pconn = 0
• CMPcode : There is not much code modification required; however, we need

to modify the database connection string to use the new database in SQL
Azure. Also SQL Azure does not support distributed transactions, which
PetShop utilised this feature for its transactions. Hence we need to modify
code to accommodate this compatibility. The changes in code are reported
in Table 5.12. The weight values in Table 5.12 are referenced from Table
5.6.
121
Classes Complexity Weights

1 class of Data Management Type Low 5
2 classes of Data Management Type Average 8
1 class of Task Management Type High 9
Table 5.12: Code changes for PetShop
The value of CMPcode is computed as a weighted sum of all changed classes:
CM Pcode = (1 × 5) + (2 × 8) + (1 × 9) = 30
Total number of hours spent on these tasks was recorded as 10 hours.
• CMPic : This is a migration project to PaaS Cloud, so there was no in-

stallation required in Windows Azure. However, some installations were
required to facilitate the migration of the application and its database to

Windows Azure cloud, including Visual Studio 2010 to modify and compile
PetShop code, cspack utility to convert PetShop from Website to Web Ap-
plication, SQL Server 2008 to convert PetShop database from SQL Server
2005 to compatible format for SQL Azure, codeplex to migrate data to SQL
Azure, and Windows Azure Tools for Visual Studio to create package file
and configuration file from PetShop source code, so that it can be deployed
into Windows Azure platform.
All installation tasks are reported in Table 5.13. The weight values in Table
5.13 are referenced from Table 5.8.
The value of CMPic is computed as a weighted sum of all installation tasks:
CM Pcode = (1 × 1) + (4 × 3) = 13
122
5.5 CMP Application
Number of Installations Type Complexity Weights

1 Infrastructure Low 1
4 Infrastructure Average 3
Table 5.13: Installations for PetShop
• CMPdb : PetShop database is SQL Server 2005, while SQL Azure requires
a database in SQL Server 2008. This migration is considered as same rela-
tional database type with different version. Based on Table 5.9, the com-
plexity of this database migration is Average. Some query modification

tasks were performed to align PetShop database to the new version 2008,
and some tasks were done to populate data into the new database in SQL
Azure, including dumping the old database and restoring it to the new
database.
All database-related tasks are reported in Table 5.14. The weight values in
Table 5.14 are referenced from Table 5.10.
Number of Tasks Type Complexity Weights

5 Query modification Average 3
2 Data population Average 4
Table 5.14: Database Migration for PetShop
The value of CMPdb is computed as a weighted sum of all database-related

tasks:
CM Pdb = (5 × 3) + (2 × 4) = 23
123
Components Value Weights Hours

CMPconn 0 3 0
CMPcode 30 5 10
CMPic 13 2 14
CMPdb 23 1 7
Table 5.15: CMP components for PetShop
The value of those four components are summarize in Table 5.15 together with
their weights as referenced from Table 5.11.
The value of CMP is computed as a weighted sum of its four components:
CM P = (0 × 3) + (30 × 5) + (13 × 2) + (23 × 1) = 199
Total number of hours spent on these tasks are: 0 + 10 + 14 + 7 = 31 hours.
Conclusion:
This Section has demonstrated how the CMP counting process can be applied
for sizing a Cloud migration project within its scope.
In this section, we will reflect on our process of developing the CMP model. The
discussion will evolve around the structure and methodology of the CMP model.
Other discussion on its validity will be saved for chapter 6 about Validation.
The model has been developed through a few iterations. The model presented
in this chapter is the most basic version, which can be used as a foundation for
any tuning on its parameters later on.
There are 37 tunable parameters in the model (reflected in Tables 5.3, 5.6, 5.8,
124
5.10, and 5.11). In this basic version, the initial values for these parameters were
derived from our discussion with a group of Cloud engineers, who have conducted
some migration projects to the Cloud. Individual discussion was carried out with
each Cloud engineer to determine the value of each parameter. We then derived
the average value from all discussion for each parameter. We employed the expert
judgement approach for the parameter values at this stage because of the lack of
past projects of migration to Cloud. The only data points we had at that stage
were from the migration exercises and projects conducted by our group.
We took a further step to improve our model by, firstly, looking for more
data points. Survey and interviews were conducted with academic and industrial
practitioners, which will be described in more details in the next chapter. These
data points are more general and of larger scope than the initial ones. The data
collection and tuning process will be discussed further in Chapter 6.
Although CMP is developed based on FP, it is different from other FP ex-

tensions in the sense that it does not add more components into the existing FP
model, it only follows the three-step approach of FP. As a result, CMP is also
affected by some limitations of FP as already being criticized (Lokan, 1998; Low
& Jeffery, 1990; Symons, 1988; Matson et al., 1994; Kitchenham, 1997), such as:
Classification of all system component types’ complexity as low, average, or high,
has the merit of being straightforward, but is criticized as oversimplified. The

work by Abran & Maya (1995) addressed this oversimplifying issue by proposing
an extended FPA technique, which subdivides the complexity classification of FP
from three intervals (low, average, and high) into five intermediate subintervals.
This extension proposed a finer granularity for counting FP of a development

project; however, this approach still does not address the upper bound of count-
125
ing FP (and similarly for CMP). For example, a system component containing
over 100 data elements is given at most twice the function points of a component
with one data element. Similarly, CMP suffers from the same problem as FP,
e.g., an installation and configuration tasks containing 100 configuration steps

is given at most three times the points of an installation and configuration task
with one configuration step. However, compared to FP and other extensions, this
limitation is less problematic for CMP, since there are normally many migration
tasks with few steps in each task.
The choice of weights has been derived from the expert judgement method
and, in the next chapter, tuned using a set of projects from external sources, but
it is also reasonable to ask if it will be valid in all circumstances. The threats of
validity discussion will be covered in Chapter 6.
The current CMP model only considers internal factors, but not external
factors. Internal factors are to ensure all necessary migration tasks are counted;
while external factors are to adjust and assess the complexity of the migration
tasks to each organization. Further work is scheduled to explore external factors
as well. The challenge with external factors is it is very difficult to identify a
sound list of external factors. It is extremely hard, if not impossible, to justify
whether they are the right factors, or whether the list is complete, and how to
identify all factors.
The CMP model measures the size of a migration project from a local data
center to cloud with the condition L = ∅, as discussed in Section 5.3. However,
the CMP model was developed without any constraint on L. In other words, the
CMP model is also applicable to migration projects with L = ∅, which means

the system can be migrated from cloud back to the local data center. This
126
5.7 Summary
characteristic of CMP enables the measurement to expand beyond just two data
centers. When there are more than two data centers (either from local to cloud, or
vice versa) involved in the migration process, the CMP model can be repeatedly
applied for each pair at a time, and this can be repeated until all migration tasks
are considered.
5.7 Summary
In this chapter, we have developed the CMP model as an important software

size measure for legacy-to-Cloud migration projects. Our study shows CMP is
more suitable for Cloud migration projects than other existing size metrics in
the literature since it captures special aspects of the Cloud migration context,
as discussed in section 5.2. Moreover, CMP emphasises the distinct features of
the Cloud migration, as distinct from migrating between two local data centres,
for example, Cloud users (or developers) do not possess full control over the
Cloud environment as they do in a local data centre. This results in limited
range of actions for each migration task. Therefore, the CMP model takes into
consideration Cloud-specific dependencies for each migration task, for example,
only security and protocol optimisation are assessed for each connection task, and
database tasks are concerned with migrating from relational to NoSQL databases,
and so on.
In a project development cycle, the CMP model fits well before the implemen-
tation phase and after the design phase. One important assumption for CMP is
all design decisions have been made. These design decisions have direct impact
on how CMP is counted, since they define all anticipated migration tasks. The
127
CMP counting process itself does not require much training and effort; however,
its accuracy relies on the sufficiency and granularity of the migration task list.
Therefore, it is important to carefully analyse the list of expected migration tasks
to ensure it captures the Cloud migration aspects adequately and with as much
detail as possible.
128
Chapter 6
Validation
“Trying to improve something when you dont have a means of mea-

surement and performance standards is like setting out on a cross-
country trip in a car without a fuel gauge. You can make calculated
guesses and assumptions based on experience and observations, but
without hard data, conclusions are based on insufficient evidence.”
∼ Mikel Harry.
Validation is an essential process to justify whether a software metric meets

its specification and fulfils its intended purpose (Briand et al., 1996; Costagliola
et al., 2005). It is widely accepted that there are two types of validation required
for software metrics, namely theoretical validation and empirical validation. The
objective of the theoretical validation is to prove that a metric sufficiently satisfies
the necessary conditions of a measurement metric that it claims to be (such
as sizing metrics, complexity metrics, cohesion metrics and coupling metrics),

whereas the empirical validation is to show that the metric is practically useful
129
6. VALIDATION
within a given context.
The CMP metric, similar to FP, incorporates both size and complexity con-
cepts. In addition, CMP is a metric related to both processes and products.

Briand et al. (1996) proposed a list of mathematical properties for size metrics
and complexity metrics, which focus on products. There exists no set of proper-
ties for both product and process sizing metrics yet ; hence, the set of criteria for
product-only size as proposed by Briand et al. (1996) is used in our theoretical
validation, although it is not quite sufficient yet.
Therefore, the main validation of CMP in this chapter is empirically based.

We are challenged to demonstrate that CMP is practically useful in the Cloud
migration context. Data on past Cloud migration projects must be available for
this purpose, including what tasks have been carried out and how much time has
been spent on those tasks. However, there exists no public repositories for such
data unlike traditional software development projects. As a result, a survey has
been conducted at this stage to collect relevant data. More details on this will
be presented later in the sub-sections of this chapter. Also, in this chapter, two
terms CMP weights and CMP parameters will be used interchangeable, and they
both mean the weighted values of each CMP component and their elements as
presented in Chapter 5.
The structure of this chapter is arranged as follows to cover important aspects

of the validation process: Section 6.1 demonstrates how the CMP model satisfies
a set of criteria proposed for product sizing metrics. The empirical validation is
divided into three phases. Section 6.2 describes the first phase of the empirical
validation, where the CMP model is evaluated on the initial set of 6 migration
projects conducted by our group. This section also states the evaluation criteria
130
6.1 Theoretical Validation
and the approach we follow for the empirical validation purpose. The result of
this phase 1 validation shows that CMP is potentially an indicator for Cloud
migration effort estimation. However, more data from external organizations are
necessary to demonstrate that CMP is also externally valid. Section 6.3 presents
the final dataset we obtained from conducting a survey. A similar empirical
validation is performed again on CMP using the new dataset, called Empirical
Validation Phase 2, and is presented in Section 6.4. The result shows that the
parameters (or weights) of CMP need further calibration. Hence, Section 6.5
demonstrates the process of calibrating the CMP weights. In this section, we
also state a list of assumptions made for developing the model, and test their
plausibility using the available data from the survey. This list of assumptions
demonstrate the high complexity and difficulty of validating the metric. Section
6.6 illustrates the Empirical Validation Phase 3, where CMP with the calibrated
weights is validated on the new dataset. The result shows that the calibration
improves the performance of the CMP model significantly, and the model can
be used as a predictor for effort estimation of the Cloud migration. Section 6.7
discusses the threats of validity of the model. Lastly, Section 6.8 summarizes and
concludes this chapter.
Briand et al. (1996) proposed a generic mathematical framework that defines
some software measurement concepts, such as size and complexity. The frame-
work provides different sets of convenient and intuitive properties which are used
as necessary conditions for each measurement concept. In this section, we math-
131
6. VALIDATION
ematically validate CMP against three properties of a size concept proposed in
(Briand et al., 1996), since CMP is a sizing metric developed to measure the size
of migration projects.
Based on (Briand et al., 1996), a system S can be represented as a pair E, R ,

where E is the set of elements of S, and R is a binary relation of E (R ⊆ E × E).
In the context of this paper, a migration project is defined as a set of migration
tasks. In light of this analogy, a migration project can be represented as a system

S = E, R , where E is the set of migrating tasks of S, and R is the set of
relations between migration tasks e ∈ E. Particularly, if e1 , e2 ∈ E are network

connection tasks, then the relation r ∈ R such that r = e1 × e2 is the common
system component involved in these two network connections. Similarly, if e1 and
e2 are database tasks, then r is the relation or table involved in these tasks, and
so on.
Three properties for a size metric proposed by Briand et al. (1996) are: Non-
negativity, Null Value, and Module Additivity. These properties are formalized
as:
• Property Size 1: Non-negativity - The size of a system S = E, R is

non-negative:
Size(S) ≥ 0
Proof 1 Size(S) is the CMP value of the migration project S. CMP is ob-
tained as a weighted sum of its four components, which in turn are weighted
sums of non-negative numbers. Hence, CM P = Size(S) ≥ 0, or the Non-

Negativity Property is verified.
132
• Property Size 2: Null Value - The size of a system S = E, R is null
if E is empty:
E = ∅ ⇒ Size(S) = 0
Proof 2 CMP is determined by assessing each component to be migrated,

evaluating its migration task’s complexity, and assigning an associated weight
to it. The final value of CMP is the sum of all the weights of the migration
task set. If E = ∅, i.e., there exist no components to be migrated, there is

no weight to be assigned. Hence, Size(S) = CM P = 0, or the Null Value
Property holds.
• Property Size 3: Module Additivity - The size of a system S = E, R

is equal to the sum of the size of two of its modules m1 = Em1 , Rm1 and
m2 = Em2 , Rm2 such that any element of S is an element of either m1 or
m2 :
∀m1 , m2 ((m1 ⊆ S and m2 ⊆ S and
E = Em1 ∪ Em2 and Em1 ∩ Em2 = ∅)
⇒ Size(S) = Size(m1 ) + Size(m2 ))
Proof 3 The CMP calculation examines each element ei (i.e., migrating

task i) of E = {e0 , e1 , ..., en−1 } independently. Each element ei is assessed
and assigned a weight wi and described in the previous Chapter. CMP is

then determined as the sum of all these weights, i.e., CM P = n−1
i=0 wi .
If E is divided into two disjoint subsets Em1 and Em2 , with no loss of
133
6. VALIDATION
generality, Em1 and Em2 can be represented as: Em1 = {e0 , e1 , ..., ek−1 } and
Em2 = {ek , ek+1 , ..., en−1 }, where k ≤ n.
Applying the same process of determining CMP, the values CM Pm1 and
CM Pm2 of these two subsets of migration tasks Em1 and Em2 are: CM Pm1 =
k−1 n−1
i=0 wi and CM P = i=k wi .
As a result,

k−1
n−1
CM Pm1 + CM Pm2 = wi + wi
i=0 i=k

n−1
= wi = CM P
i=0
Hence, the Module Additivity Property is satisfied.
We have shown that CMP satisfies all three necessary conditions of a size
measurement proposed by Briand et al. (1996). However, an empirical validation
is also required to demonstrate that CMP is practically useful as a predictor for
effort estimation in the cloud migration context.
6.2 Empirical Validation - Phase 1
Empirical validation is necessary to ensure that CMP is practically useful as an

indicator of effort estimation in terms of person-hours. The empirical validation
is divided into three phases. This Phase 1 will evaluate the CMP model with
its initial set of weights as presented in Chapter 5 using our initial set of 6
Cloud migration projects. Because of the limited number of data points publicly
134
available, the data we use in this first phase of the empirical validation is extracted
from a number of small-scale projects conducted at Nicta.
Although the validity of these data points has not been verified externally with
other research projects, they are suitable for this empirical validation because:
1. We have access to all necessary information required to determine CMP.
2. These projects cover different migration project types. In an actual mi-

gration project, not all aspects of CMP happens at the same time in one
project. Therefore, these data points sufficiently reflect what is likely to
happen in reality.
3. The uniformity of these projects are ensured, because they were carried
out by the same team. Therefore, the external cost factors as discussed in
Section 4.3 have minimal impact on these data points. This is suitable for
validating the CMP model since we focus on internal cost factors only.
In this section, we also state the evaluation criteria and the approach we follow
for the purpose of empirical validation. These are also applied for the other two
phases.
6.2.1 Evaluation Criteria
The details of each migration task are used to calculate the size of the migration
project to the Cloud, using the CMP model. Regression analysis will be used to
determine the relationship between the size of a Cloud migration project and the
effort required.
135
6. VALIDATION
The reliability of an effort estimation is assessed using the following criteria
as suggested in (Conte et al., 1986):
• Magnitude Relative Error (M RE):
|AE − P E|
M RE =
AE
where AE is Actual Effort, and P E is Predicted Effort.
• Mean Magnitude Relative Error (M M RE):

M RE
M M RE =
n
where n is the sample size, and M M RE ≤ 0.25 is acceptable.
• Prediction at level l (or P RED(l) in short):
k
P RED(l) =
n
where k is the number of observations such that M RE ≤ l. Note that,
k ≤ n, hence 0 ≤ P RED(l) ≤ 1. The closer the P RED(l) value to 1 the

better, and P RED(0.25) ≥ 0.75 is acceptable.
6.2.2 Leave-One-Out Cross Validation
We followed a leave-one-out or jackknife approach (Tukey, 1958; Efron & Gong,
1983) to examine the relationship between CMP values and the actual effort
for migrating a system to cloud. This approach is the same as a k-fold cross-
136
validation, in which k is equal to the number of data points. The k-fold cross-
validation has been successfully used to validate cost estimation models in the
literature, and is especially recommended for small data sets (Briand et al., 1999;
Costagliola et al., 2005). In the leave-one-out cross validation, each single data
point is used as the validation data, whereas the remaining data are used as
training sets. This is repeated until each data point is used once as the validation
data.
6.2.3 Ordinary Least Square Regression Analysis
Table 6.1 shows the data points extracted from our six projects. For project 1,
the majority of the effort was spent on securing and optimizing WAN connection.
While projects 2, 3, and 4 required most effort on installation and population

of data. The migration process of projects 5 and 6 involved installation, data
population, and code changes. The table shows the final CMP value of each
project and its associated number of hours spent on migration tasks.
No Effort(hours) CMP
1 45 504
2 4 60
3 6 95
4 9 149
5 32 337
6 51 645
Table 6.1: Empirical validation data points
We followed a leave-one-out cross-validation approach on this dataset. In this
phase, we performed six rounds of validation. Each round uses five projects as
the training set, and one project is left out as the validation set. Descriptive
137
6. VALIDATION
statistics were computed for each training set, based on which the boxplot and
outliers of each set were analysed. Figure 6.1 shows that there are no outliers in
the training sets of the six validation rounds which may biasedly influence the
derived models from regression analysis.
Figure 6.1: The boxplots for the six training datasets of variable CMP
The scatter plots in Figure 6.2 show a positive linear relationship between
CMP and Effort (in hours) of each training set. As a result, an Ordinary Least-
Squares (OLS) regression analysis is then applied on each training set to derive
the equation of the trend line, which can be used as a prediction model for effort
required in hours.
The proficiency of each regression model is determined by the Coefficient of
Determination R2 , representing the proportion of the dependent variable effort

(in hours) explained by the independent variable CMP. Moreover, the statisti-
cal significance of CMP as a predictor of effort is evaluated with t-test and is
138
Figure 6.2: The scatter plots for OLS regression
determined by t-value and p-value of the coefficient of the prediction model. If

p-value < 0.05, the null hypothesis can be rejected; in other words, it shows that
CMP is a significant predictor of effort. The t-value is then applied to indicate
the reliability of the predictor. If t-value > 1.5, it shows that CMP is a potential
predictor of effort. The results of R2 , t-value, and p-value of the coefficients and
the intercepts of all six validation rounds are summarized in Table 6.2. (Note
that the p = 0.05 critical value of the t-test with 3 degrees of freedom is 3.18 for
a two-sided test and 2.35 for a one-sided test, and coefficients are expected to be
positive (Figure 6.2); hence, one-sided test can be used here.)
The result suggests that the coefficients of the models are statistically signifi-
cant and hence CMP is indicated to be a significant predictor of effort. Although

the intercepts are statistically insignificant, each derived model has a high value
of R2 and all the coefficients pass the significant test. In other words, the OLS
regression analysis results still shows a strong linear relationship between CMP
139
6. VALIDATION
ID Coefficient Intercept
Value t-value p-value Value t-value p-value
0.0869 10.504 0.002 −1.4664 −0.4399 0.6898
1 R2 = 0.9736
Effort = 0.0869 × CMP − 1.4664
0.0858 10.958 0.002 −0.8785 −0.2789 0.7984
2 R2 = 0.9756
Effort = 0.0858 × CMP − 0.8785
0.0849 12.507 0.001 −0.2669 −0.0985 0.9277
3 R2 = 0.9812
Effort = 0.0849 × CMP − 0.2669
0.086 16.339 0.000 −1.9929 −1.0084 0.3876
4 R2 = 0.9889
Effort = 0.086 × CMP − 1.9929
0.0839 12.069 0.001 −1.1697 −0.501 0.6508
5 R2 = 0.9798
Effort = 0.0839 × CMP − 1.1697
0.0972 17.112 0.000 −3.0637 −1.9008 0.1535
6 R2 = 0.9899
Effort = 0.0972 × CMP − 3.0637
Table 6.2: Phase 1 - OLS Regression Analysis
and effort (in hours). For example, in the first training set, the derived model
is: Effort = 0.0869× CMP −1.9929, with high value of R2 = 0.9736 and the
coefficient is significant at level 0.05.
The cross-validation result is determined by using the derived models to com-
pute the predicted effort of the left-out project in each validation round (reported
in Table 6.3). The results is then evaluated using metrics described in Section
6.2.1.
Table 6.3 shows that the MMRE value is 0.199 and the prediction at level
0.25 is 0.833. This result suggests that the CMP model shows a good predictor
for effort estimation in the considered Cloud migration projects.
140
6.3 Data Collection
No CMP AE PE MRE
1 504 4541.116 0.086
2 60 4 3.221 0.195
3 95 6 7.272 0.212
4 149 912.383 0.376
5 337 3226.989 0.157
6 645 51 59.63 0.169
MMRE 0.199
PRED(0.25) 0.833
Table 6.3: Phase 1 - Results Evaluation
6.2.4 Conclusion
In this section, we have shown that phase 1 of the empirical validation yields
good result of the CMP model as a predictor for effort estimation in some Cloud
migration cases. However, to have more confidence in the CMP model, more data
on external projects are required to further validate it.
6.3 Data Collection
We conducted a survey to collect data on migration projects from external or-

ganizations, as described in Chapter 3. The objective of the survey is to collect
data on past migration projects to Cloud for determining migration cost fac-
tors, including size, and examining their relationships with the effort required for
migration.
Table 6.4 shows the data points we got from our own projects, survey, and in-
terviews, together with the corresponding CMP values calculated with the initial
parameters (as presented in Chapter 5).
These data points are calculated for each CMP component separately, then
141
Database Install.& Config. Connection Code Total
ID
CMP db Hours CMPic Hours CMP conn Hours CMPcode Hours CMP Hours
1 6 2 45 80 0 0 440 250 3232 332
2 0 0 9 3 0 0 0 0 18 3
3 29 25 0 0 0 0 65 40 493 65
4 40 8 0 0 0 0 0 0 56 8
5 0 0 33 50 9 5 44 20 387 75
6 0 0 0 0 9 10 0 0 54 10
7 8 5 0 0 18 20 0 0 118 25
8 0 0 18 24 0 0 0 0 44 24
9 0 0 7 6 0 0 0 0 21 6
142
10 0 0 0 0 110 100 0 0 480 100
11 0 0 27 50 18 20 0 0 124 70
12 0 0 135 300 2 2 90 80 1158 382
13 6 1 9 7 3 2 0 0 32 10
14 6 2 21 20 22 20 0 0 167 42
15 23 7 13 14 0 0 30 10 207 31
16 84 15 8 10 1 2 89 40 511 67
17 6 2 9 4 2 2 0 0 38 8
6. VALIDATION
18 6 2 8 8 2 2 0 0 38 12
19 0 0 36 48 0 0 0 0 72 48
Table 6.4: Data points from surveys and interviews
the CMP value can be accumulated with associated weights from the model for
each component. Some data points consist of all 4 CMP components, but some
only have one or two components of CMP.
CMP only considers internal factors, but not external factors, as discussed in
Chapter 5; while these data points come from different organizations. As a result,
in the survey and interview questions, as well as in the data analysis process, we
tried to eliminate the affect of external factors as much as possible, in order to

ensure the data points can be normalized for validating CMP.
In our context, we performed 19 rounds of validation similar to the approach

outlined in Phase 1. Each round uses 18 projects as the training set, and one
project is left out as the validation set. The results of R2 , t-value, and p-value of
the coefficients and the intercepts of all 19 validation rounds are summarized in
Table 6.5.
0.27476 9.663 4.42E − 08 −6.55571 −0.639 0.532

1 R2 = 0.8537
Effort = 0.27476 × CMP − 6.55571
0.11759 6.067 1.63E − 05 25.81171 1.580 0.134

2 R2 = 0.697
Continued on Next Page. . .
143
6. VALIDATION
Table 6.5 – Continued
Effort = 0.11759 × CMP + 25.81171
0.1187 6.150 1.40E − 05 25.0533 1.555 0.140

3 R2 = 0.7027
Effort = 0.1187 × CMP + 25.0533
0.11770 6.080 1.59E − 05 25.73660 1.577 0.134

4 R2 = 0.6979
Effort = 0.11770 × CMP + 25.73660
0.11849 6.128 1.46E − 05 23.87854 1.474 0.16

5 R2 = 0.7012
Effort = 0.11849 × CMP + 23.87854
0.11777 6.078 1.6E − 05 25.58319 1.566 0.137

6 R2 = 0.6978
Effort = 0.11777 × CMP + 25.58319
0.11812 6.094 1.55E − 05 25.03 1.533 0.145

7 R2 = 0.6989
Effort = 0.11812 × CMP + 25.03
0.11830 6.081 1.59E − 05 24.53021 1.496 0.154

8 R2 = 0.698
Effort = 0.11830 × CMP + 24.53021
0.1177 6.067 1.63E − 05 25.6207 1.567 0.137

9 Continued on Next Page. . .
144
R2 = 0.697
Effort = 0.1177 × CMP + 25.6207
0.11829 6.133 1.44E − 05 23.17480 1.438 0.170

10 R2 = 0.7015
Effort = 0.11829 × CMP + 23.17480
0.11934 6.202 1.27E − 05 22.08703 1.363 0.192

11 R2 = 0.7063
Effort = 0.11934 × CMP + 22.08703
0.09923 16.992 1.16E − 11 18.41721 3.955 0.00114

12 R2 = 0.9475
Effort = 0.09923 × CMP + 18.41721
0.11782 6.070 1.62E − 05 25.42006 1.554 0.140

13 R2 = 0.6972
Effort = 0.11782 × CMP + 25.42006
0.11845 6.109 1.51E − 05 24.27992 1.487 0.156

14 R2 = 0.7
Effort = 0.11845 × CMP + 24.27992
0.11817 6.117 1.49E − 05 25.26461 1.554 0.140

15 R2 = 0.7004
Effort = 0.11817 × CMP + 25.26461
145
6. VALIDATION
0.1187 6.150 1.40E − 05 25.0477 1.555 0.139

16 R2 = 0.7028
Effort = 0.1187 × CMP + 25.0477
0.11773 6.072 1.62E − 05 25.60553 1.567 0.137

17 R2 = 0.6974
Effort = 0.11773 × CMP + 25.60553
0.11788 6.072 1.62E − 05 25.32458 1.548 0.141

18 R2 = 0.6974
Effort = 0.11788 × CMP + 25.32458
0.1190 6.134 1.44E − 05 23.1027 1.413 0.177

19 R2 = 0.7016
Effort = 0.1190 × CMP + 23.1027
The cross-validation result were determined by using the derived models to

compute the predicted effort of the left-out project in each validation round (re-
ported in Table 6.6). Table 6.6 shows that the MMRE value is 1.5155 and the
prediction at level 0.25 is 0.5789. This result suggests that the CMP weights (or
parameters) need further calibration.
146
6.5 CMP Parameters Calibration
ID CMP AE PE MRE
1 3232 332 881.4686 1.6550
2 18 3 27.9283 8.3094
3 493 65 83.5724 0.2857
4 56 8 32.3278 3.0410
5 387 75 69.7342 0.0702
6 54 10 31.9428 2.1943
7 118 25 38.9682 0.5587
8 44 24 29.7354 0.2390
9 21 6 28.0924 3.6821
10 480 100 79.9540 0.2005
11 124 70 36.8852 0.4731
12 1158 382 133.3256 0.6510
13 32 10 29.1903 1.9190
14 167 42 44.0611 0.0491
15 207 31 49.7258 0.6041
16 511 67 85.7034 0.2792
17 38 8 30.0793 2.7599
18 38 12 29.8040 1.4837
19 72 48 31.6707 0.3402
MMRE 1.5155
PRED(0.25) 0.5789
This section presents our attempt to calibrate the CMP model, in order to increase
its validity externally, so that CMP can be more widely useful.
There are 37 parameters (or weights) in total in the CMP model (reflected
in Tables 5.3, 5.6, 5.8, 5.10, and 5.11 of Chapter 5). The original values of these
weights were defined by discussion with a group of Cloud engineers who have
participated in Cloud migration projects. We asked each Cloud engineer for their
individual judgment on each weight value, then we averaged all values across all
Cloud engineers that we had the discussion with, to derive a final value for each
147
6. VALIDATION
parameter.
These expert opinion weights can be further refined using data points collected
from our survey on Cloud migration projects. Since questions for each component
of the CMP was asked separately, so data collected for each CMP component is
also separated from one another; hence, we can separate each survey response for
each component of the CMP (connection, code, installation and configuration,

and database). The number of data points available for each weight of each
component is summarized as in Table 6.7.
Table 6.7 clearly shows that there are 11 parameters without any data points
(Weight IDs: 7, 8, 9, 10, 11, 13, 14, 19, 20, 22, and 23). These weights cannot
be calibrated without data points; hence, we keep the expert opinion values for
these weights. These values are candidates for adjustment when more data points
become available.
There are 4 weights with only 1 data point each (Weight IDS: 24, 28, 29,
and 30), and 7 weights with 2 data points each (Weight IDs: 3, 12, 15, 16, 17,
32, and 33). With too few data points, these values can be easily changed to
improve the prediction level of the model, but it could lead to the problem of
overfitting. Therefore, we decided that these weights do not have sufficient data
points for the calibration, and their expert opinion values are also kept for the
time being. However, these values may also be subject to change if more data
points are available in the future.
The remaining 11 weights in Table 6.7 and 4 other weights associated with each
CMP components have 3 or more data points each; hence, they are considered
for the calibration process, although the number of data points for each weight
is still not ideally sufficient.
148
Components Type Complexity ID Weight # of

Value Data
Points
Low 1 1 5
LAN-to-LAN Average 2 3 3
High 3 4 2
Low 4 1 3
LAN-to-
CMPconn Average 5 6 3
WAN
High 6 9 4
Low 7 1 0
WAN-to-
Average 8 6 0
LAN
High 9 9 0
Low 10 3 0
Problem
Average 11 6 0
Domain
High 12 10 2
Low 13 4 0
Human
Average 14 7 0
Interaction
High 15 12 2
CMPcode
Low 16 5 2
Data
Average 17 8 2
Management
High 18 13 3
Low 19 4 0
Task
Average 20 6 0
Management
High 21 9 4
Low 22 1 0
Application Average 23 2 0
High 24 7 1
CMPic
Low 25 1 5
Infrastructure Average 26 3 10
High 27 9 5
Low 28 1 1
Query
Average 29 3 1
Modification
High 30 8 1
CMPdb
Low 31 3 6
Data
Average 32 4 2
Population
High 33 10 2
Table 6.7: Number of data points available to calibrate each weight of the CMP
model
149
6. VALIDATION
Although with the current dataset, the calibration process can be performed
on at most 15 weights out of 37 weights in total, it is worth explicitly stating

all assumptions made for each CMP components and their sub-elements. It is
important to test the plausibility of those assumptions given the available data,
before performing any calibrations. The validation process in the following section
will rely on raw data from survey responses, as attached in Appendix B.
6.5.1 CMP Components’ Assumptions
The assumptions on each CMP components and their elements are stated as
follows:
Network Connection Component: CM Pconn
Assumption 1 There are three types of connection changes: LAN-to-LAN, LAN-

to-WAN, WAN-to-LAN.
Relevant projects in the survey responses show connection changes in the first
two types only. None of the responses demonstrated WAN-to-LAN connection
changes, or any other types different from those proposed. Hence, this assumption
is considered valid.
Assumption 2 Two types of connection changes (LAN-to-WAN and WAN-to-

LAN) have the same impact on the size of a migration task. The reason behind
this assumption is because any changes that make a WAN connection become a
LAN connection are essentially reversed activities of changes to make a LAN
connection become a WAN connection, given the source and destination of these
150
connections remain unchanged. Effort required for carrying out those changes
and their reversals is expected to be the same.
None of the survey responses encountered WAN-to-LAN connections. Hence,
we cannot validate this assumption at this stage. This assumption should be

tested for its plausibility when more data on relevant projects are available.
Assumption 3 The other type of connection change (LAN-to-LAN), compared
to the first two types mentioned above, has a significantly different impact on the
size of a migration task. Essentially, effort required to amend a LAN connection
in the local environment to adapt to the new environment in cloud should be much
less than effort required for LAN-to-WAN and WAN-to-LAN connection changes.
This assumption is reflected quite clearly in projects 6, 7, 13, and 16. Projects
6 and 7 consist of only LAN-to-WAN connections, while projects 13 and 16 have

only LAN-to-LAN connections. The effort required for the connection component
of projects 6 and 7 is significantly greater than that of the other two projects (10
and 20 hours vs. 2 and 2 hours). Similar observation is for projects 7 and 14,
where both projects have 2 LAN-to-WAN connections and project 14 also consists
of another LAN-to-LAN connection. However, no differences in effort have been
observed for these two projects (both projects required 20 hours each). Therefore,
this assumption is verified.
Assumption 4 The requirements for Protocol Optimization and/or Security each
has a significantly different impact on the size and effort of the migration task.
The requirements for Protocol Optimization and/or Security are defined into
3 levels of Complexity: Low, Average, and High (Table 5.2). Given the avail-
able data of the component CM Pconn as in Appendix B, project 5 has 1 average
151
6. VALIDATION
complexity LAN-to-LAN connection and 1 average complexity LAN-to-WAN con-
nection, whereas project 6 has only 1 high complexity LAN-to-WAN connection.

The effort spent on this 1 high complexity LAN-to-WAN connection of project 6
is twice as much as that spent on 2 connection changes of average complexity of

project 5 (10 hours vs. 5 hours). Similar observation can be seen in projects 11
and 12, where 4 average complexity connections require effort 10 times as much
as that of 2 low complexity connections (20 hours vs. 2 hours). Data on sev-
eral other projects (such as 13 and 18) also yield similar result. Therefore, this
assumption is plausible.
Assumption 5 The relative impact of the three types and performance and se-
curity can be represented by the set of significantly different weights (weight IDs
1, 2, 3, 4, 5, 6, 7, 8 and 9, in Table 6.7).
Each individual weight is a specific assumption. Table 6.7 shows that only 5
weight IDs (1, 2, 4, 5, and 6) can be considered for the calibration exercise as
discussed above. The validation and calibration of these weights will be presented
in more depth in Section 6.5.2. The other weights with very few data points,
whose values were determined by expert opinion, are kept at this stage. These
values may be subject to change if more data is available in the future.
Code Modification Component: CM Pcode

Although this CMP component is mainly inherited from Class Point (Costagli-
ola et al., 2005), in this section, we still state all assumptions and validate them
on the available data of our Cloud migration context.
Assumption 6 Four different types of class have a significant impact on the size
152
of the migration tasks.
6 out of 19 projects from our survey responses (projects 1, 3, 5, 12, 15, and
16) involve code modification component, and they spread over 4 types of class:
Problem Domain, Human Interaction, Data Management, and Task Management.
None of the responses suggested any different type of class, apart from these 4
types. In these 6 projects with code modification, the effort required to modify
the four types of class plays a major part in the total effort required for the whole
projects ( e.g., in project 1: 250 hours for code modification out of 332 hours
in total (75% of total effort); or project 3: 40 hours for code modification out
of 65 hours in total (62% of total effort)). Hence, this assumption is considered

verified.
Assumption 7 There are three different types of change in a class: attributes,

public methods, and services requested.
Data from the 6 corresponding projects show that all these types of class
changes were actually carried out during their migration, although the data sup-
porting this claim still does not seem very clear and explicit. Also, these types
were inherited from Class Point. Hence, we consider this assumption is valid to
some certain extent.
Assumption 8 The relative impact on tasks’ size and effort of added and deleted
elements for each class type and each change type have a significantly different
impact on task size and effort in the ratio 5 to 1.
This assumption is based on the suggestion of Niessink & Vliet (1997) that
a removing task requires effort 0.2 times as much as that of an adding task.
153
6. VALIDATION
Unfortunately, our data is not sufficient to test this assumption. This assumption
should be subject to be tested when more data comes available in the future.
Assumption 9 The total size of unchanged elements is irrelevant to task size

and effort.
The reason behind this assumption is because unchanged elements require no

effort at all. None of the responses raised any effort spent on unchanged elements.
Therefore, this claim holds.
Assumption 10 The impact of changes can be categorized into complexity levels
based on ranges of the individual change counts, and counts greater than the upper
value all have the same impact.
This is an important assumption, inherited from Class Point, and it has been
validated in (Costagliola et al., 2005). The second part of this claim may result
in problems with development effort, and this is a known issue from Function
Point. However, in the Cloud migration context, this issue is less problematic
since data from survey responses have shown that there are very few tasks with
counts greater than the upper value.
Assumption 11 The differences between class type and complexity level can be
represented as a set of 12 weights, where each individual weight represents a
specific assumption.
Table 6.7 shows that only 2 weight IDs (18 and 21) can be considered for
the calibration process as discussed above. The validation and calibration of the
weights will be presented in more depth in Section 6.5.2. The other weights with
154
very few data points, whose values were determined by expert opinions, are kept
at this stage. These values may be subject to change when more data becomes
available in the future.
Installation and Configuration Component: CM Pic
Assumption 12 Two different types of package to be installed and configured in
the Cloud (Application packages and Infrastructure packages) have a significant

impact on the size of the migration tasks.
Data from the survey responses show that it is quite popular to have Infras-
tructure packages installed and configured in the Cloud for a migration project
(10 out of 19 responses). Only 1 project required Application packages (project
16). The amount of effort spent on these installation and configuration tasks is
relatively significant compared to other CMP components, especially for Infras-
tructure packages. Therefore, this assumption is considered valid.
Assumption 13 The installation methods and the number of parameters to be
configured have significantly different impact on the size and effort of the migra-
tion task.
A package may require no installations at all, or simple installation from bi-

nary installers, or more complicated installation from source code, which requires
extra effort to compile the source code. These types require different amount of
effort. Project 1 requires 80 hours to install 5 packages from source code with
large number of parameters to be configured, while project 2 requires only 3 hours

to install 3 packages from binary installers. Similar observation can be seen in
155
6. VALIDATION
projects 5 and 13. Project 5 has 2 packages from binary installer and 3 packages
from source code, whereas project 13 also has 2 packages from binary installer
and 3 packages without any installations at all. The former requires 50 hours,
and the latter requires only 7 hours. Some other observations on other projects
also give similar results. Therefore, this assumption is certainly valid.
Assumption 14 The impact of installation and configuration can be categorized
into complexity levels based on the installation methods and the number of param-
eters to be configured.
Although there are no data explicitly supporting this claim, and it is very
hard to verify this type of assumption, it intuitively makes sense because of the
different impacts of installation methods and the number of parameters to be
configured on the size of migration tasks, as in the previous assumption.
Assumption 15 The differences between package types and complexity levels can
be represented as a set of 6 weights, where each individual weight represents a

specific assumption.
Table 6.7 shows that only 3 weight IDs (25, 26 and 27) can be considered for
the calibration as discussed above. The validation and calibration of the weights
will be presented in more depth in Section 6.5.2. The other weights with very
few data points, whose values were determined by expert opinions, are kept at
this stage. These values may be subject to change when more data is available
in the future.
Database Migration Component: CM Pdb
156
Assumption 16 Four different types of database change have a significant im-
pact on the size of the migration tasks.
Some projects only have database migration component, such as project 4.

Some other projects (such as 3, 8, 16 and 17) have the database component as one
of the major part of their migration processes (about 30% of total effort). These
projects represent all four types of database change: same relational database
and same version (project 3), same relational database and different version,
or different relational databases (projects 7 and 15), and relational to NoSQL

databases (projects 4 and 16). Therefore, this assumption is validated.
Assumption 17 NoSQL databases have a significantly different impact on the
size of the migration tasks more than relational databases.
Migrating a relational database to a NoSQL Cloud database requires more
migration tasks, for example: populating data to a NoSQL database requires more
tasks than just a “sqldump” command, or JOIN operations from the relational
database must be modified since NoSQL database does not support JOIN. Hence,
more effort is required for NoSQL databases. This assumption is supported by
data from the survey responses. Particularly, project 16 required 5 hours to
populate data from a relational to a NoSQL database, while project 15 required
only 2 hours to populate the same amount of data from a relational to another
relational database in the Cloud. Project 1 required 2 hours to populate data

from a relational to a relational database, while project 4 required 8 hours to
populate twice as much data from a relational to a NoSQL database. All in all,
this assumption is reasonable.
157
6. VALIDATION
Assumption 18 Two different database migration tasks (query modification and
data population) have significant impact on the size of the migration tasks.
Our survey responses only report activities related to either query modification
or data population, or both. None of them raised any other database-related
migration activities. Hence, we consider this assumption plausible, although more

types of task might be added if extra information in the future suggests so.
Assumption 19 The impact of database migration tasks and complexity level

can be represented as a set of 6 weights, where each individual weight represents
a specific assumption.
Table 6.7 shows that only 1 weight (weight ID 31) can be calibrated as dis-
cussed above. The validation and calibration of this weight will be presented in
more depth in Section 6.5.2. The other weights with very few data points, whose
values were determined by expert opinions, are kept at this stage. These values
are subject to change with more data in the future.
Conclusion:
The above assumptions have been made during our CMP development. We
only stated the main and high level assumptions at this stage. More assumptions
can be extracted and tested when more information becomes available. These
assumptions are essential because of the high complexity of a size metric for
Cloud migration projects.
As can be seen, there are already too many assumptions with too little infor-
mation from the survey responses in order to properly validate their plausibility.
This shows the high complexity and difficulty of validating the CMP metric at
158
this stage of the Cloud migration context. We attempted to test many of the
assumptions, nonetheless, with the available data from our survey.
6.5.2 The Calibration Process
In this section, the calibration will be performed on 15 weights, which have three
or more data points from the survey, as discussed in the previous section. The
15 weights from Table 6.7 are:
• Network Connection Component CM Pconn : weight IDs 1, 2, 4, 5, and 6
• Code Modification Component CM Pcode : weight IDs 18, and 21
• Installation and Configuration Component CM Pic : weight IDs 25, 26, and
27
• Database Migration Component CM Pdb : weight ID 31
• 4 main weights for each CMP component to compute the final CMP value
The calibration is, first, performed on each CMP component individually, and
then together. For each CMP component, we perform multiple regression on the
tunable weights. For projects that also consist of other un-tunable weights, we
use their expert opinion values. The data used for the calibration are attached
in Appendix B.
The result of multiple regression on each CMP component is presented as
follows:
Network Connection Component: CM Pconn
159
6. VALIDATION
The multiple regression for this component uses 11 data points (projects 5,
6, 7, 10, 11, 12, 13, 14, 16, 17, and 18). The 5 tunable weights count as 5 input
variables. The multivariate model is:
f = a1 ∗ x1 + a2 ∗ x2 + a3 ∗ x3 + a4 ∗ x4 + a5 ∗ x5
where x1 → x5 are weights to be calibrated. Values of f and a1 → a5 are
from Table 6.8 (extracted from Table B.1).
Project ID a1 a2 a3 a4 a5 f
5 0 1 0 1 0 5
6 0 0 0 0 1 10
7 0 0 0 0 2 20
10 0 5 0 5 5 100
11 0 2 0 2 0 20
12 1 0 1 0 0 2
13 3 0 0 0 0 2
14 0 0 0 0 2 20
16 1 0 0 0 0 2
17 1 0 1 0 0 2
18 1 0 1 0 0 2
Table 6.8: Data points for calibrating network connection component weights
This multiple regression gives regression coefficients that are essentially new
values for these 5 weights as in Table 6.9.
Weight ID Old Value New Value

1 1 1.2
2 3 9.6
4 1 1.7
5 6 6
6 9 10.5
Table 6.9: Multiple Regression Coefficient Result for CM Pconn
Code Modification Component: CM Pcode
160
The multiple regression for this component uses 6 data points (projects 1,
3, 5, 12, 15, and 16). The 2 tunable weights count as 2 input variables. This
multiple regression gives regression coefficients that are essentially new values for
these 2 weights as in Table 6.10.

18 13 11.5
21 9 12.3
Table 6.10: Multiple Regression Coefficient Result for CM Pcode
Installation and Configuration Component: CM Pic

The multiple regression for this component uses 14 data points (projects 1, 2,
5, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, and 19). The 3 tunable weights count as
3 input variables. This multiple regression gives regression coefficients that are
essentially new values for these 5 weights as in Table 6.11.

25 1 2.5
26 3 4.5
27 9 20.2
Table 6.11: Multiple Regression Coefficient Result for CM Pic
Database Migration Component: CM Pdb

This component has only 1 tunable weight. The regression for this component
uses 6 data points (projects 1, 3, 13, 14, 17, and 18). The only tunable weight is
used as an input variable. This regression gives a regression coefficient that are
essentially new value for this only weight as in Table 6.12.
Final CMP Value Calculation
161
6. VALIDATION

31 3 2.3
Table 6.12: Regression Coefficient Result for CM Pconn
There are 4 weights to calculate the final CMP value as in Table 5.11 of
Chapter 5. The multiple regression uses all 19 data points, calculated with new
weights from the calibration processes above. The 4 tunable final weights count
as 4 input variables. This multiple regression gives regression coefficient that are
essentially new values for these 4 weights as in Table 6.13

34 3 0.7
35 5 0.5
36 2 1.1
37 1 0.4
Table 6.13: Regression Coefficient Result for the Final CMP
Conclusion:
There are 15 tunable weights out of 37 weights in total. The rest of the weights
are kept unchanged because they have very few data points for the calibration.
The 15 calibrated weight values have changed quite significantly from the expert
opinion values. The model with new set of weights need to be validated again to
ensure its performance is improved from the original one.
In this section, we perform similar empirical validation as the first two phases
on the new dataset of 19 data points. This dataset essential originates from the
162
survey as in Phase 2; however, the final CMP values in this dataset are calculated
based on the new set of weights calibrated in the previous section.

The new dataset from the new set of weights is presented in Table 6.14
ID CMP Value Total Hours

1 382.3 332
2 9.9 3
3 92.2 65
4 22.4 8
5 82.8 75
6 12.6 10
7 29.2 25
8 24.2 24
9 11.55 6
10 112 100
11 51.1 70
12 293.7 382
13 14.45 10
14 52.6 42
15 47.85 31
16 93.95 67
17 14.9 8
18 14.9 12
19 39.6 48
Table 6.14: New dataset - calculated from the new set of calibrated weights
We perform 19 rounds of cross-validation on the new dataset. The result is

described in Table 6.15.
1.25749 18.903 2.28e − 12 −16.47287 −2.857 0.0114

1 Continued on Next Page. . .
163
6. VALIDATION
R2 = 0.9571
Effort = 1.25749 × CMP − 16.47287
1.02783 15.203 6.25e − 11 −6.44734 −0.763 0.457

2 R2 = 0.9353
Effort = 1.02783 × CMP − 6.44734
1.03064 15.777 3.57e − 11 −5.39675 −0.671 0.512

3 R2 = 0.9396
Effort = 1.03064 × CMP − 5.39675
1.02545 15.280 5.79e − 11 −5.82892 −0.695 0.497

4 R2 = 0.9359
Effort = 1.02545 × CMP − 5.82892
1.02829 15.401 5.14e − 11 −6.31798 −0.766 0.455

5 R2 = 0.9368
Effort = 1.02829 × CMP − 6.31798
1.02941 15.249 5.97e − 11 −6.80407 −0.806 0.432

6 R2 = 0.9356
Effort = 1.02941 × CMP − 6.80407
1.02849 15.310 5.62e − 11 −6.61778 −0.789 0.442

7 R2 = 0.9361
Effort = 1.02849 × CMP − 6.61778
164
1.02977 15.326 5.53e − 11 −6.94611 −0.828 0.42

8 R2 = 0.9362
Effort = 1.02977 × CMP − 6.94611
1.02833 15.220 6.14e − 11 −6.55842 −0.777 0.449

9 R2 = 0.9354
Effort = 1.02833 × CMP − 6.55842
1.03007 15.405 5.12e − 11 −6.16652 −0.755 0.461

10 R2 = 0.9368
Effort = 1.03007 × CMP − 6.16652
1.03133 15.802 3.49e − 11 −8.07971 −0.995 0.33

11 R2 = 0.9398
Effort = 1.03133 × CMP − 8.07971
0.86969 30.930 1.06e − 15 −1.55872 −0.532 0.602

12 R2 = 0.9836
Effort = 0.86969 × CMP − 1.55872
1.02869 15.243 6.01e − 11 −6.64232 −0.788 0.442

13 R2 = 0.9356
Effort = 1.02869 × CMP − 6.64232
1.02739 15.383 5.23e − 11 −6.14267 −0.739 0.47

14 R2 = 0.9367
165
6. VALIDATION
Effort = 1.02739 × CMP − 6.14267
1.02629 15.42 5.03e − 11 −5.71958 −0.69 0.5

15 R2 = 0.937
Effort = 1.02629 × CMP − 5.71958
1.03086 15.771 3.6e − 11 −5.42361 −0.674 0.51

16 R2 = 0.9396
Effort = 1.03086 × CMP − 5.42361
1.02780 15.231 6.08e − 11 −6.43739 −0.763 0.456

17 R2 = 0.9355
Effort = 1.02780 × CMP − 6.43739
1.02923 15.258 5.92e − 11 −6.76964 −0.803 0.434

18 R2 = 0.9357
Effort = 1.02923 × CMP − 6.76964
1.03091 15.506 4.64e − 11 −7.48460 −0.903 0.38

19 R2 = 0.9376
Effort = 1.03091 × CMP − 7.48460
The cross-validation result were determined by using the derived models to

compute the predicted effort of the left-out project in each validation round (re-
ported in Table 6.16).

Table 6.16 shows that the MMRE value is 0.2947 and the prediction at level
0.25 is 0.9474.
166
ID CMP AE PE MRE
1 382.3 332 464.2656 0.3984
2 9.9 3 3.7282 0.2427
3 92.2 65 89.6283 0.3789
4 22.4 8 17.1412 1.1426
5 82.8 75 78.8244 0.0510
6 12.6 10 6.1665 0.3834
7 29.2 25 23.4141 0.0634
8 24.2 24 17.9743 0.2511
9 11.55 6 5.3188 0.1135
10 112 100 109.2013 0.0920
11 51.1 70 44.6213 0.3626
12 293.7 382 253.8692 0.3354
13 14.45 10 8.2223 0.1778
14 52.6 42 47.8980 0.1404
15 47.85 31 43.3884 0.3996
16 93.95 67 91.4257 0.3646
17 14.9 8 8.8768 0.1096
18 14.9 12 8.5659 0.2862
19 39.6 48 33.3394 0.3054
MMRE 0.2947
PRED(0.25) 0.9474
Conclusion:
This new MMRE value shows a significant improvement from Phase 2 (MMRE
= 1.5155). Although the new MMRE value is still greater than the recommended
0.25 level, it becomes much closer to this value after the calibration. We strongly
believe that when more data on Cloud migration projects comes available in the
future, further calibration can be performed on other weights as well. In addition,
the prediction at level 0.25 is 0.9474, which is higher than the standard value 0.75,
further support the claim that the CMP model can be a potential predictor for
Cloud migration effort estimation.
167
6. VALIDATION
6.7 Threats of Validity and Discussion
The validation process described in this chapter has shown that the CMP model
can help enterprises to map out the migration tasks for their Cloud migration
projects, and it can be a potential predictor for migration effort estimation. How-
ever, in order to generalize this claim to the whole population, we need a much
larger dataset to calibrate the parameters, and a different large dataset to validate
the model. We argue that we divided the dataset into multiple subsets for the
calibration, and then the whole dataset for the validation, which could increase
the reliability of the validation results to some extent. Having said that, the
results of the validation were still biased. However, this very threat of validity is
unavoidable at this stage. In the future, when more data points become available,
a full exercise of calibration and validation can be executed again using the same
methodology presented. Moreover, because of the limited number of data points

we could secure from interviews and survey, all 19 data points were used, which
also could affect the validity of the dataset.
The four components of the CMP model and their steps were only validated
with internal and self-development projects. Questions from the survey were also
asked to invite more suggestions and insights on additional tasks that may be
required in a Cloud migration project. However, there were no relevant comments
received. The CMP model itself, besides the weights, would need to be further
validated with external projects if available in the future.
In order to increase the validity of the model, the quality and quantity of data
points from survey can be further improved by:
• Clarify some questions, since some questions from survey were not clear
168
6.8 Summary
enough, such as: how much time they spent on migrating data. It could
be understood that migrating data means just transferring data to cloud,

whether it includes idle/waiting time for transferring, which does not require
any effort.
• Some parts of the model requires detailed information on some specific

tasks; however, in almost every case, the respondents did not keep track
of them. For example, for questions on how many classes were modified in
Code Modification session, some answers were purely wild guess (as com-
mented by the respondents). These answers were discarded.
• Some answers indicate that the total number of hours includes learning
time, but it’s not clear how much time was spent on learning, how much
time was spent on actual tasks. This can be overcome by modifying the
questions, such as: how much time was spent on this task for the first time?
How much time for the second time? However, this will make the question
list longer and possibly more tedious.
• Some answers did not indicate whether the effort included learning time or
not. This should be made clear by modifying the questions.
6.8 Summary
In this section, we have presented our process of theoretically and empirically

validating the metric over 3 phases.
In phase 1, the CMP metric is first validated using our initial dataset of 6
small-scale migration projects conducted by our group. The result gives good
169
6. VALIDATION
indication that CMP can be a potential predictor for effort estimation in some
Cloud migration cases. We conducted a survey to collect data about past Cloud
migration projects with external organization. The motivation for this study is to
further validate the CMP model externally. A survey and some interviews were
the best approach for our data collection purpose, because there are no existing
data on this.
In phase 2, we validated the CMP metric using the dataset from the survey.
The result indicates that the CMP metric need further calibrations to improve its
performance. At this phase, we also listed a set of assumptions on the structure of
the CMP metric and attempted to test their plausibility using the available data
from the survey. The tunable weights (15 out of 37 weights) were also calibrated
using the multiple regression approach.

In phase 3, the CMP model is validated again with the new set of calibrated
weights. The result of this phase improves significantly from phase 2, and gets
very close to the standard requirement. This indicates that the CMP model can
be a predictor for effort estimation in the Cloud migration context. It also infers
that when more data comes available in the future, the performance of the CMP
model can be further improved with more calibration on other weights as well.
170
Chapter 7
Conclusions and Future

Directions
”The more you understand what is wrong with a figure, the more
valuable that figure becomes.”
∼ Lord Kelvin.
The main objective of this thesis is to understand Cloud migration projects

and the associated cost implications. Particularly, migrating a legacy system
from a local server to the Cloud requires different migration tasks to be carefully
planned and performed. Different types of migration task may have significantly
different impact on the migration effort. It is important to identify possible
migration tasks and quantify their impact on the migration effort early in a
Cloud migration project, so that enterprises can make well informed decisions on
whether it is worth migrating to the Cloud. On the other hand, this is challenging
because Cloud computing is still relatively immature, and there is very little
171
7. CONCLUSIONS AND FUTURE DIRECTIONS
related work on the topic of interest. Moreover, Cloud migration projects vary in
many dimensions (different types of Cloud, different types of system/application,

different types of migration requirement), and it is challenging to fully understand
them.
In this thesis, we have achieved our research goals to understand Cloud mi-
gration projects. We have identified influential cost factors (internal cost factors
and external cost factors) of a Cloud migration project. We also proposed a
taxonomy of possible migration tasks that a migration project might encounter.

A size metric was developed to measure the size of a Cloud migration project.
The size of a migration project can give a good indication on how much effort is
anticipated for this project.
This chapter concludes this thesis, using the following structure: In Section
7.1, we summarize the main studies and findings of this research. Section 7.2 elab-
orates on how this research has achieved our research goals and how it contributes
to software engineering domain within the Cloud migration context. The limita-
tions of this research are presented in Section 7.3. Section 7.4 suggests directions
for future research.
7.1 Research Summary
For a better understanding of a Cloud migration project, we undertook several

Cloud migration experiences, and captured our understanding of how migration
projects are conducted, in the form of a list of potential migration tasks that
might be involved in a Cloud migration project. Our experiment is to migrate
the PetShop .Net application from a local server to Windows Azure and SQL
172
Azure. The migration of Java PetStore into Amazon EC2 and SimpleDB was
also investigated to add more richness to our findings.
The report on our migration experiences helped us identify some influential

cost factors that impact on the effort of the migration process, both internal and
external cost factors. The internal cost factors indicate what migration tasks are
required, such as: compatibility issues, library dependency, database features and
connection issues. The external cost factors determine how fast those tasks can be
achieved, such as: project team’s capabilities, existing knowledge and experience
on Cloud providers and technologies, and selecting the correct Cloud platforms
and services.
Some of these influential cost factors are specific for migration to the Cloud,
because they are not applicable for a conventional migration project from one
platform to another. For example, a migration project from Java to .Net is a
complete rewrite and it would not have compatibility, datatabase, connection, or
possibly library dependency issues. A migration project from an old version to

a newer version of platform or environment would not have networking issues,
library dependency or database feature issues as discussed above, and so on...
These factors, one way or the other, all affect the effort spent on the Cloud
migration process.
The list of internal cost factors, together with related work from the literature
review and practitioners’ blogs, enable us to generalise and propose a general
taxonomy of migration tasks that any migration projects may encounter, and
the migration tasks are grouped under 6 categories: Training and Learning, In-
stallation and Configuration, Database Migration, Code Modifications, Network

Connection and Testing. These categories are mutually exclusive since they cover
173
different aspects of a Cloud migration project; but on the other hand, they com-
plement each other and altogether provide a complete picture of migration to

the Cloud. These categorized migration tasks need to be carefully planned at
the early stage of any migration projects. Some tasks may be broken down into
more detailed levels, whereas some tasks may be skipped, depending on specific
characteristics of each project.
Amongst many cost factors affecting traditional software development effort,

project size is considered to be the main cost driver, and has been used in many
cost estimation models. We apply this theory into the context of migration to
the Cloud that migration project size also significantly influences migration ef-
fort. There exists no size measurement in the literature for migration projects
to the Cloud; therefore, we developed our CMP model for sizing Cloud migra-
tion projects by casting the well-known Function Point (FP) measurement into
our context of interest. The difference between these two contexts is that the
traditional software development focuses on functionality development, whereas

the Cloud migration context concerns with porting an existing system into a
Cloud environment. As a result, size metrics for these two contexts are also
different. Size metrics for functionality development measure the product (i.e.,
the components or class or functions to be developed), whereas size metrics for
migration tasks measure both process (i.e., the migration tasks to be carried out)
and product (i.e., related parts of the system to be migrated).
CMP extends FP not by adding more elements into the existing FP method,
but by adopting the three-step approach of FP:
1. Classify the basic estimating units (a function in the FP context, a class in

the Class Point (Costagliola et al., 2005) context, and a migration task in
174
CMP context) into different pre-defined categories
2. Then for each unit, evaluate its complexity level (Low, Average, or High)
3. Finally, compute the final sizing value
Apart from the FP methodology, CMP is also developed on the basis of the
proposed taxonomy of Cloud migration tasks. The CMP model measure the ac-
cumulated size of all migration tasks making up the migration project. Therefore,
the taxonomy can easily be used as the input into the CMP model. After care-
fully analyzing all categories of the taxonomy, the CMP model was determined
to include 4 main components: Installation and Configuration, Database Migra-
tion, Code Modification, and Network Connection. These components capture
distinct aspects of a migration project to the Cloud; therefore, the CMP model
has been developed to cover all these aspects separately. Each of these CMP
components was developed using the FP three-step approach. Then, the final
CMP value is calculated as a weighted sum of its four components CMPconn ,
CMPcode , CMPic , and CMPdb , which measure size of migration tasks related to
connection changes, code changes, installation and configuration, and database
changes, respectively. 37 weighted values assigned for each migration tasks in

the CMP model are expert opinion values, initially derived from our discussion
with a group of Cloud engineers, who have carried out different types of Cloud
migration projects themselves.

The CMP model has been developed as an important software size measure
for Cloud migration projects. Our study shows CMP is more suitable for Cloud
migration projects than other existing size metrics in the literature since it cap-
tures special aspects of the Cloud migration context. Moreover, CMP emphasises
175
the specific features of Cloud migration process, such as that some required third-
party libraries are not readily available in the Cloud as they are in the local data
centre. This is not so much an issue when migrating between two local data cen-
tres because third-party libraries can usually be reused without major changes.
Another Cloud feature reflected in the CMP model is that Cloud users (or devel-
opers) do not possess full control over the Cloud environment as they do in a local
data centre. This results in the limited range of actions for each migration task.
Therefore, the CMP model takes into consideration Cloud-specific dependencies
for each migration task, for example, only security and protocol optimisation are
assessed for connection tasks, and database tasks are concerned with migrating
from relational to NoSQL databases.
In a project development cycle, the CMP model fits well into the pre-implementation
phase and after the design phase. One important assumption for CMP is that
all design decisions have been made. These design decisions have direct impact
on how CMP is counted, since they define all anticipated migration tasks. The
CMP counting process itself should not require much training and effort; how-
ever, its accuracy relies on the completeness and granularity of the migration task
list. Therefore, it is important to carefully analyse the list of expected migration
tasks to ensure it captures all the Cloud migration aspects adequately and with
as much details as possible.
Briand et al. (1996) proposed a list of properties for product sizing metric,
while CMP is related to both process and product. The CMP model has been
proved to meet all requirements from (Briand et al., 1996); however, additional
properties for process sizing metrics (and process-product-hybrid sizing metrics)
would be ideal. Therefore, the validation of CMP is mainly empirically based.
176
The empirical validation was to justify the usefulness of the CMP size measure-
ment as a significant indicator of migration effort to the Cloud. The empirical

validation is divided into three phases. In phase 1, we evaluated the CMP model
with its initial set of weights as presented in Chapter 5 using our initial set of 6
Cloud migration projects. Because of the limited number of data points publicly
available, the data we use in this first phase of the empirical validation is ex-
tracted from a number of small-scale projects conducted at NICTA. We followed

a leave-one-out cross-validation approach on this dataset. In this phase, we per-
formed six rounds of validation. Each round uses five projects as the training
set, and one project is left out as the validation set. The cross-validation result
shows that the MMRE value is 0.199 and the prediction at level 0.25 is 0.833.
This result suggests that the CMP is a good predictor of effort estimation for
some Cloud migration projects that have been considered.
However, to have more confidence in the CMP model, more data on external
projects are required to further validate it. Hence, at the beginning of phase 2, we
conducted a survey to collect data on past migration projects to the Cloud from
external organizations. The reason we had to conduct this survey is because, un-
like data on development effort of traditional software development projects, the

data of interest do not exist in any public repositories. Data was collected mainly
via web surveys, and some additional interviews. The studied population includes
project teams from NICTA and individual practitioners who have migrated their
systems to the Cloud. The practitioners were identified from Cloud community
and online discussions. Interviews were conducted with NICTA’s project teams
to gain more insights and more detailed data, and surveys were sent to a list of
identified practitioners. The study was conducted on the entire population due
177
to its limited size.
We sent out more than 300 surveys to different target audience, including aca-
demic researchers, industrial groups and companies, and individual practitioners.

We received more than 30 responses (10%), but some of them were incomplete.
The main reason for this low responses rate is because most of the projects were
done for exploration and tutorial purposes; hence there were no detailed informa-
tion recorded, especially some information required for calculating CMP. Most
responses could easily answer general questions on why they migrated to the
Cloud, or how they generally did that, but most of them failed to provide suffi-
cient information at the design level of migration tasks. After careful analysis,
we obtained a new dataset of 19 data points.
In phase 2, we performed the same analysis as in phase 1. We performed 19

rounds of validation, each round uses 18 projects as the training set, and one
project is left out as the validation set. The result shows that the MMRE value
is 1.5155 and the prediction at level 0.25 is 0.5789. This result suggests that the
CMP weights (or parameters) need further calibration.
There are 37 parameters (or weights) in total in the CMP model. The original
values of these weights were defined by discussion with a group of Cloud engi-
neers who have participated in Cloud migration projects. We asked each Cloud
engineer for their individual judgment on each weight value, then we averaged
all values across all Cloud engineers that we had the discussion with, to derive
a final value for each parameter. These expert opinion weights can be further
refined using data points collected from our survey on Cloud migration projects.
The available data show that only 15 out of 37 weights were considered to have
sufficient information for the calibration. The remaining 22 weights are kept un-
178
changed with their expert opinion values. However, these values would also be
subject to change when more data points are available.
Although the calibration process can be performed on at most 15 weights out

of 37 weights in total, we explicitly stated all assumptions made for each CMP
components and their sub-elements. It is important to test the plausibility of

those assumptions given the available data, before performing any calibrations.
With the available data, a few assumptions still do not have sufficient information
to be tested. We attempted to test as many of the assumptions as possible to

show that they are plausible.
Calibration was performed on each CMP component individually, and then
together. For each CMP component, we performed multiple regression on the

tunable weights. The 15 calibrated weight values have changed quite significantly
from the expert opinion values. The model with new set of weights needed to be
validated again to ensure its performance is improved from the original one.
In phase 3, we performed similar empirical validation as the first two phases on
the new dataset of 19 data points. This dataset originates from the survey as in
phase 2; however, the final CMP values in this dataset are calculated based on the
new set of calibrated weights. The new MMRE value (0.2946) shows a significant
improvement from phase 2 (MMRE = 1.5155). Although the new MMRE value
is still greater than the standard 0.25 level, it is much closer to this value after
the calibration. We strongly believe that when more data on Cloud migration
projects come available in the future, further calibration can be performed on
other weights as well. In addition, the prediction at level 0.25 is 0.9474, which is
higher than the standard value 0.75, further supporting the claim that the CMP
model can be used as a reliable predictor for Cloud migration effort estimation.
179
7.2 Research Contribution
This research has answered the research questions stated in Section 1.3. Through
the research process as described in Chapter 3, we have understood what mi-

gration tasks are required for a Cloud migration project, and how they can be
classified. Our understanding is captured in the taxonomy of migration tasks

presented in Chapter 4. We have also understood the cost implications of those
tasks on the Cloud migration effort. Our view on this is illustrated with the
CMP model in Chapter 5. The CMP model can be useful for multiple purposes:
(1) helping enterprises map out their migration tasks, (2) identifying the com-
plexity of each task, so that the right staff with right skills can be assigned tasks
accordingly, and (3) estimating the total effort required for the migration project.
To date, no other research has focused on the migration effort aspect of soft-
ware engineering in Cloud computing. In this section, we will elaborate on pri-

mary contributions of this study to the common knowledge of the domain of
interest.
Contribution 1 This research has initiated the application of effort estimation
and size measurement concepts from the traditional software engineering to Cloud
computing domain.
One of our contributions, which can also be seen as one of the difficulties we
encountered, is that no related research with the same focus on Cloud migration
effort is available; hence, the list of migration tasks, influential cost factors, or
validation data cannot be gathered from the literature review. We had to ex-
plore and develop everything from scratch. For example, we carried out a series
of migration experiments to understand how migration projects happen, what
180
7.2 Research Contribution
migration tasks are required, and what impact they have on the migration effort.
We performed more migration projects ourselves to initially validate our model.

We conducted surveys and interviews on external projects to collect more data
for further validation. All these activities that we have undertaken can be useful
for other research with similar focus in their starting phase or comparative study.
Contribution 2 This research has identified critical cost factors of Cloud mi-
gration effort.
We identified different factors that have significant impact on the migration
effort. These factors are categorized into internal and external factors. This is
aligned with traditional size measurement approaches. This research adds to the
existing body of knowledge relating to the size measurement cost drivers, offering
more critical factors in the Cloud migration context.
Contribution 3 This research has proposed a taxonomy of migration tasks to

the Cloud.
The taxonomy outlines possible migration tasks that any migration project
to the Cloud may encounter. It enables Cloud practitioners to gain an under-

standing of the combination of tasks involved in a Cloud migration project and
its implication on the amount of effort required. We derived these tasks from our
series of migration experiments of different application types to different Cloud
providers.
Contribution 4 This research has developed a size metric, Cloud Migration
Point (CMP), for estimating the size of Cloud migration projects, by recasting
181
a well-known software size estimation model called Function Point (FP) into the
context of Cloud migration.
We adopted the three-phased approach of the FP model to estimating size

of individuals components involved in a migration project. In particular, we
focused on Cloud-relevant components of the migrated systems, including con-
nection changes, database migration, code modification, and installation and con-
figuration for the new environment in the Cloud. For each component, we per-
formed the measurement by identifying relevant activities that contribute to the

overall effort required for that component. Finally, we aggregated all individual
measurements into a single CMP value by calculating their weighted sum. The
weighted sum CMP provides an indication of how large the migration project is,
and it can be used as an indicator to Cloud migration effort estimation.
Contribution 5 This research has described the survey protocol to collect data
on past Cloud migration projects.
We have conducted a survey with external organizations and individuals to
collect data on how they migrated their system to the Cloud and how much time
they spent on the migration tasks. The response rate was quite low because many
of the migration exercises were mainly for exploration purposes, and not many
practitioners kept track of the time spent on each individual task. This survey
questionnaire and its protocol can certainly be re-used and improved to collect
more data on a wider range of projects.
Contribution 6 This research has demonstrated that the proposed metric is
practically useful as an indicator of migration effort estimation.
182
7.3 Research Limitation
We validated our CMP model by conducting an empirical evaluation. The
empirical validation shows that the metric is practically useful under a defined
set of assumptions. This research has outlined and justified each step in the
validation phase. The calibration process has been described and it can be re-
applied to calibrate the model further when there is a larger dataset.
Conclusion:
Our overall contribution is to shed light into Cloud migration and the tasks
involved, which enables Cloud practitioners to estimate the amount of effort re-
quired for the migration of legacy systems into the Cloud. This contributes
towards the cost-benefit analysis and the decision of whether it is worth to move
to the Cloud.
7.3 Research Limitation
Several limitations to this research have been identified, for example:
1. This research involved many exploratory activities, and the result cannot
be generalized to the general population of Cloud migration projects as a
whole at this stage, because there is not enough data. However, the process
of undertaking all activities in this research has been carefully recorded and
justified. This process can certainly be re-applied on a larger set of data to
generalize the result.
2. Data collection has been done mainly via web surveys, and the questions
and responses depend on the respondents’ personal interpretation and mem-
ory. In-person interviews would give more reliable and accurate responses,
183
because it allows both interviewer and interviewee clarify any confusion. It
also gains more insights from the interviewee. However, we were not able
to conduct many interviews, because of time and geographical constraints.
3. The low response rate of the survey (10%), together with the nature of a
self-selected sample (because not everyone contacted will respond to the

survey), raise potential impact of the reliance on responses. The results
from these responses are not representative enough to be generalized to the
entire population.
4. For applications that require code modification for the Cloud environment,
CMP only assesses application code changes at “class” level, and employs
”Class Point” for the Code Modification Component. Hence, the CMP
model is only applicable for object-oriented applications. There are still
numerous legacy applications that are not object- oriented and that could
be migrated to the Cloud.
5. The calibration and validation of the CMP model were undertaken with a
small number of data points. The response rate from the survey is quite
low (less than 10%), and some responses were incomplete. The reason for
the low response rate is because not many respondents actually recorded
how long they spent on each migration task. Most of the projects from the
responses are small and medium projects. It was very hard to conduct sur-
veys or interviews with large organizations. The model would be validated
more effectively with data from larger scale projects.
6. This research has used the same data to calibrate the model parameters, as
184
7.4 Future Research Directions
for the final validation. This may result in overfitting problems, where the
accuracy of the model may not be applicable for other datasets. We argue
that we divided the dataset into multiple subsets for the calibration, and
then the whole dataset for the validation, which could increase the reliability
of the validation results to some extent. Having said that, the results of
the validation were still biased. However, this very threat of validity is
unavoidable at this stage. In the future, when more data points become
available, a full exercise of calibration and validation can be executed again
using the same methodology presented. This can be a worthwhile future
direction for this research.
7. The four components of the CMP model and their steps were only validated
with internal and self-development projects. Questions from the survey were
also asked to invite more suggestions and insights on additional tasks that
may be required in a Cloud migration project. However, there were no
relevant comments received. The CMP model itself, besides the weights,
would need to be further validated with external projects if available in the

future.
Accurate effort estimation has always attracted a lot of attention from the tra-
ditional software engineering community because of its difficulty and complexity.
Casting this concept into the Cloud migration context increases the difficulty be-
cause there are even more angles to investigate. This research has investigated
several aspects of this problem, such as: exploring the cost implications of Cloud
185
migration projects, identifying internal and external cost factors, proposing a
taxonomy of migration tasks, and developing a size metric as an indicator for the
migration effort estimation. However, other important aspects also require inves-
tigation in order to accurately estimate the effort required, as well as to better

assist the decision making process of enterprises on whether it is worth migrating
to the Cloud.
Some further research directions worth pursuing have emerged as a result of

this research, for example:
1. This research focuses mainly on internal cost factors of Cloud migration
projects. The survey has collected data on how some external factors affect
the migration effort (Appendix B), but they were not incorporated in our
result. Future research can used these data and investigate further a list of
external cost factors to determine if they are really cost factors and if the
list is complete. This can be tackled by examining the causal relationship

between the cost factors and the effort required. However, a causal rela-
tionship is very hard to prove in software engineering domain because the
involved factors are normally closely coupled.
2. An effort estimation model can be developed for Cloud migration projects.

The size metric developed in this thesis can be used as an important input
variable into the effort estimation model. All that is required is a sufficient
set of data on effort spent on past migration projects. This future research
can be achieved with a wider range survey or more interviews with larger
organizations, or with proper case studies.
3. This research developed one type of size metric for Cloud migration projects.
186
We believe that the methodology employed is the best suited for this pur-
pose. However, future research can explore different methodologies to build

different size metrics for the same context. A comparative study can be
performed to decide which size metric provides the most accurate effort
predictions. MMRE has been quite widely criticised for its accuracy when
it comes to select the best model. Hence, the comparative study should
also considered other alternatives, such as MMER (Mean Magnitude of Er-

ror Relative to the Estimate) or RSD (Relative Standard Deviation) (Foss
et al., 2003).
4. This research only examines object-oriented applications to be migrated to

the Cloud. Future research can explore different options for the Code Modi-
fication Component rather than employing the ”Class Point” measurement,

so that it is applicable for other types of applications and is not limited to
only object-oriented systems.
5. This research aims to understand the cost of migrating a system to the

Cloud. Future research can use this result as a component to build a cost-
benefit framework to assist the decision of whether one should migrate to
the Cloud. The input into this framework is the type of system currently
in use, the type of the Cloud targeted, and performance requirements. The
framework will then quantify both benefits and costs of having the system in
the Cloud. The output of the framework can assist enterprises to conclude
if the benefits outweigh the costs and if moving to the Cloud is a wise
decision.
6. The validation in this research is empirically based, because there is no well-
187
established framework for the theoretical validation. Future research can
establish a mathematical framework for validation size-complexity hybrid

metrics, which are related to both processes and products.
188
Bibliography
Abadi, D.J. (2009). Data management in the cloud: Limitations and opportu-
nities. IEEE Data Eng. Bull., 32, 3–12.
Abran, A. (1999). Functional size measurement for real time and embedded soft-
ware. In Proceedings of the 4th IEEE International Symposium and Forum on
Software Engineering Standards, 259–, IEEE Computer Society, Washington,
DC, USA. 39, 42
Abran, A. & Maya, M. (1995). A sizing measure for adaptive maintenance
work products. In Proceedings of the International Conference on Software

Maintenance, ICSM ’95, 286–, IEEE Computer Society, Washington, DC, USA.
125
Abran, A. & Robillard, P.N. (1994). Function points: a study of their
measurement processes and scale transformations. J. Syst. Softw., 25, 171–

184. 39
Aggarwal, S. & McCabe, L. (2009). The compelling tco case for cloud com-
puting in smb and mid market enterprises. Whitepaper, sponsored by NetSuite.

3
189
BIBLIOGRAPHY
Agrawal, R., Ailamaki, A., Bernstein, P.A., Brewer, E.A., Carey,
M.J., Chaudhuri, S., Doan, A., Florescu, D., Franklin, M.J.,

Garcia-Molina, H., Gehrke, J., Gruenwald, L., Haas, L.M.,
Halevy, A.Y., Hellerstein, J.M., Ioannidis, Y.E., Korth, H.F.,

Kossmann, D., Madden, S., Magoulas, R., Ooi, B.C., O’Reilly, T.,
Ramakrishnan, R., Sarawagi, S., Stonebraker, M., Szalay, A.S. &
Weikum, G. (2009). The claremont report on database research. Commun.

ACM , 52, 56–65. 4
Albanes, D. (2009). Vitamin supplements and cancer prevention: Where do

randomized controlled trials stand? Journal of the National Cancer Institute,
101, 2–4.
Albrecht, A. & Gaffney, J. (1983). Software function, source lines of code,
and development effort prediction: A software science validation. IEEE Trans-

actions on Software Engineering, 9, 639–648. 36, 37, 67
Amazon (2009). Amazon elastic compute cloud. xvii, 4, 5, 7, 10, 11
Amazon (2011). Amazon web services blog. 13
Antoniol, G., Lokan, C., Caldiera, G. & Fiutem, R. (1999). A function

point-like measure for object-oriented software. Empirical Software Engineer-
ing, 4, 263–287. 39, 40
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R.H., Kon-
winski, A., Lee, G., Patterson, D.A., Rabkin, A., Stoica, I. & Za-
haria, M. (2009). Above the clouds: A berkeley view of cloud computing.
190
BIBLIOGRAPHY
Tech. rep., Electrical Engineering and Computer Sciences, University of Cali-
fornia at Berkeley. 2, 3, 4, 10, 11
Babar, M.A. & Chauhan, M.A. (2011). A tale of migration to cloud comput-
ing for sharing experiences and observations. In Proceedings of the 2nd Inter-
national Workshop on Software Engineering for Cloud Computing, SECLOUD
’11, 50–56, ACM, New York, NY, USA. 29
Baird, B. (1989). Managerial Decisions Under Uncertainty. Baird, B., John
Wiley & Sons. 33, 34
Banker, R.D., Kauffman, R.J. & Kumar, R. (1991). An empirical test of
object-based output measurement metrics in a computer aided software engi-

neering (case) environment. J. Manage. Inf. Syst., 8, 127–150. 33, 35, 42
Bisbal, J., Lawless, D., Wu, B., Grimson, J., Wade, V., Richard-
son, R. & O’Sullivan, D. (1997). An overview of legacy information sys-

tem migration. In Software Engineering Conference, 1997. Asia Pacific and
International Computer Science Conference 1997. APSEC ’97 and ICSC ’97.
Proceedings, 529 –530. 32
Bisbal, J., Lawless, D., Wu, B. & Grimson, J. (1999). Legacy information
systems: issues and directions. Software, IEEE , 16, 103 –111. 32
Boehm, B., Clark, B., Horowitz, E., Madachy, R., Shelby, R. &
Westland, C. (1995). Cost Models for Future Software Life Cycle Processes:
COCOMO 2.0. Annals of Software Engineering, 1, 57–94. 35, 42
191
BIBLIOGRAPHY
Boehm, B., Abts, C. & Chulani, S. (2000). Software development cost es-
timation approaches a survey. Annals of Software Engineering, 10, 177–205.

33, 40
Boehm, B.W. (1981). Software Engineering Economics. Prentice Hall PTR,
Upper Saddle River, NJ, USA.
Briand, L., Morasca, S. & Basili, V. (1996). Property-based software en-

gineering measurement. IEEE Transactions on Software Engineering, 22, 68
–86. 129, 130, 131, 132, 134, 176
Briand, L., El Emam, K., Surmann, D., Wieczorek, I. & Maxwell,

K. (1999). An assessment and comparison of common software cost estima-
tion modeling techniques. In Proceedings of the International Conference on
Software Engineering ICSE , 313 –323. 137
Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J. & Brandic, I. (2008).
Cloud computing and emerging it platforms: Vision, hype, and reality for
delivering computing as the 5th utility. Future Generation Computer Systems,
25, 599–616. 4
Calheiros, R.N., Ranjan, R., Rose, C.A.F.D. & Buyya, R. (2009).

Cloudsim: A novel framework for modeling and simulation of cloud computing
infrastructures and services. CoRR.
Carriere, J., Kazman, R. & Ozkaya, I. (2010). A cost-benefit framework for
making architectural decisions in a business context. In ICSE ’10: Proceedings

of the 32nd ACM/IEEE International Conference on Software Engineering,
149–157, ACM, New York, NY, USA. 15
192
BIBLIOGRAPHY
Cetin, S., Ilker Altintas, N., Oguztuzun, H., Dogru, A., Tufekci, O.
& Suloglu, S. (2007). Legacy migration to service-oriented computing with

mashups. In Software Engineering Advances, 2007. ICSEA 2007. International
Conference on, 21. 32
Chang, F., Dean, J., Ghemawa, S., Hsieh, W.C., Wallach, D.A., Bur-
rows, M., Chandra, T., Fikes, A. & Gruber, R.E. (2006). Bigtable: A
distributed storage system for structured data. In OSDI ’06 , 205–218. 4
Chappell, D. (2008). Introducing the azure services platform. Whitepaper,
sponsored by Microsoft Corporation. 4
Chappell, D. (2011). Opinari - david chappell’s blog. 14, 51
Chauhan, M.A. & Babar, M.A. (2011). Migrating service-oriented system to

cloud computing: An experience report. Cloud Computing, IEEE International
Conference on, 0, 404–411. 29
Chow, R., Golle, P., Jakobsson, M., Shi, E., Staddon, J., Masuoka,
R. & Molina, J. (2009). Controlling data in the cloud: outsourcing computa-
tion without outsourcing control. In CCSW ’09: Proceedings of the 2009 ACM
workshop on Cloud computing security, 85–90, ACM, New York, NY, USA.
Cleary, D. (2000). Web-based Development and Functional Size Measurement.
In IFPUG 2000 Annual Conference, Charismatek Software Metrics. 40
Conte, S.D., Dunsmore, H.E. & Shen, Y.E. (1986). Software engineering
metrics and models. Benjamin-Cummings Publishing Co., Inc., Redwood City,

CA, USA. 136
193
BIBLIOGRAPHY
Costagliola, G., Ferrucci, F., Tortora, G. & Vitiello, G. (2005).
Class point: An approach for the size estimation of object-oriented systems.

IEEE Transactions on Software Engineering, 31, 52–74. 39, 41, 96, 110, 129,
137, 152, 154, 174
Creswell, J.W. (2002). Research design : qualitative, quantitative, and mixed
methods approaches. Sage Publ., 2nd edn. 47, 57
de Assuncao, M.D., di Costanzo, A. & Buyya, R. (2009). Evaluating

the cost-benefit of using cloud computing to extend the capacity of clusters.
In HPDC ’09: Proceedings of the 18th ACM international symposium on High

performance distributed computing, 141–150, ACM, New York, NY, USA. 15,
29
Dean, J. & Ghemawat, S. (2004). Mapreduce: Simplified data processing on
large clusters. In OSDI ’04 .
Deelman, E., Singh, G., Livny, M., Berriman, B. & Good, J. (2008).
The cost of doing science on the cloud: the montage example. In SC ’08: Pro-
ceedings of the 2008 ACM/IEEE conference on Supercomputing, 1–12, IEEE
Press, Piscataway, NJ, USA.
Dekkers, T., Vogelezang, F. & V, S.N.B. (2003). Cosmic full function

points: Additional to or replacing fpa. In Proceedings of the Ninth International
Software Metrics Symposium, ACOSM. 39
Dolado, J.J. (2000). A validation of the component-based method for software

size estimation. IEEE Trans. Softw. Eng., 26, 1006–1021. 36
194
BIBLIOGRAPHY
Efron, B. & Gong, G. (1983). A leisurely look at the bootstrap, the jackknife,
and cross-validation. The American Statistician, 37, 36–48. 136
Elmore, A.J., Das, S., Agrawal, D. & El Abbadi, A. (2011). Zephyr: live
migration in shared nothing databases for elastic cloud platforms. In Proceed-
ings of the 2011 international conference on Management of data, SIGMOD
’11, 301–312, ACM, New York, NY, USA. 31
Elmroth, E. & Larsson, L. (2009). Interfaces for placement, migration, and

monitoring of virtual machines in federated clouds. International Conference
on Grid and Cooperative Computing, 0, 253–260.
Erdogmus, H. (2009). Cloud computing: Does nirvana hide behind the nebula?
Software, IEEE , 26, 4 –6. 1
Finnie, G.R., Wittig, G.E. & Desharnais, J.M. (1997). A comparison

of software effort estimation techniques: Using function points with neural
networks, case-based reasoning and regression models. Journal of Systems and

Software, 39, 281 – 289. 36, 37, 51
Foss, T., Stensrud, E., Kitchenham, B. & Myrtveit, I. (2003). A simu-
lation study of the model evaluation criterion mmre. IEEE Trans. Softw. Eng.,
29, 985–995. 187
Frey, S. & Hasselbring, W. (2011). An extensible architecture for detecting
violations of a cloud environment’s constraints during legacy software system
migration. Software Maintenance and Reengineering, European Conference on,

0, 269–278. 31
195
BIBLIOGRAPHY
Gabner, R., Schwefel, H.P., Hummel, K.A. & Haring, G. (2011). Op-
timal model-based policies for component migration of mobile cloud services.

Network Computing and Applications, IEEE International Symposium on, 0,
195–202.
Ghemawat, S., Gobioff, H. & Leung, S.T. (2003). The google file system.
SIGOPS Oper. Syst. Rev., 37, 29–43. 4
Google (2009). Google app engine. xvii, 4, 6, 7, 9, 10, 11
Google (2011). Google trends. 12
Group, C.X. (2002). Estimating internet developement. 41
Hajjat, M., Sun, X., Sung, Y.W.E., Maltz, D., Rao, S., Sripanid-
kulchai, K. & Tawarmalani, M. (2010). Cloudward bound: planning for

beneficial migration of enterprise applications to the cloud. In Proceedings of the
ACM SIGCOMM 2010 conference on SIGCOMM , SIGCOMM ’10, 243–254,
ACM, New York, NY, USA. 12, 14, 27, 32
Hamilton, J. (2011). Perspective - james hamilton’s blog. 14, 51
Hao, W., Yen, I.L. & Thuraisingham, B. (2009). Dynamic service and
data migration in the clouds. Computer Software and Applications Conference,
Annual International , 2, 134–139. 28, 32
Hazelhurst, S. (2008). Scientific computing using virtual high-performance

computing: a case study using the amazon elastic computing cloud. In SAIC-
SIT ’08: Proceedings of the 2008 annual research conference of the South
196
BIBLIOGRAPHY
African Institute of Computer Scientists and Information Technologists on IT
research in developing countries, 94–103, ACM, New York, NY, USA.
Helmer, O. (1966). Social Technology. Helmer, O., Basic Books, New York,
NY, USA. 33, 34
Ho, Y., Liu, P. & Wu, J.J. (2011). Server consolidation algorithms with
bounded migration cost and performance guarantees in cloud computing. Util-
ity and Cloud Computing, IEEE Internatonal Conference on, 0, 154–161. 28
IFPUG (2010). Function point counting practices manual. 67
Jayasinghe, D., Malkowski, S., Wang, Q., Li, J., Xiong, P. & Pu, C.
(2011). Variations in performance and scalability when migrating n-tier appli-

cations to different clouds. Cloud Computing, IEEE International Conference
on, 0, 73–80. 32
Ji, W., Ma, J. & Ji, X. (2009). A reference model of cloud operating and
open source software implementation mapping. Enabling Technologies, IEEE
International Workshops on, 0, 63–65. 4
Jorgensen, M. (2004). A review of studies on expert estimation of software

development effort. Journal of Systems and Software, 70, 37 – 60. 34, 51
Jorgensen, M. & Shepperd, M. (2007). A systematic review of software de-
velopment cost estimation studies. IEEE Transactions on Software Engineer-
ing, 33, 33–53. 33, 35
Kanmani, S., Kathiravan, J., Kumar, S.S. & Shanmugam, M. (2007).
Neural network based effort estimation using class points for oo systems. In
197
BIBLIOGRAPHY
Proceedings of the International Conference on Computing: Theory and Appli-
cations, 261–266, IEEE Computer Society, Washington, DC, USA. 41
Kanmani, S., Kathiravan, J., Kumar, S.S. & Shanmugam, M. (2008).

Class point based effort estimation of oo systems using fuzzy subtractive clus-
tering and artificial neural networks. In Proceedings of the 1st India software
engineering conference, ISEC ’08, 141–142, ACM, New York, NY, USA. 42
Karner, G. (1993). Resource Estimation for Objectory Projects. Objectory Sys-

tems. 39
Kazman, R., Asundi, J. & Klein, M. (2001). Quantifying the costs and ben-
efits of architectural decisions. In ICSE ’01: Proceedings of the 23rd Interna-

tional Conference on Software Engineering, 297–306, IEEE Computer Society,
Washington, DC, USA.
Keung, J.W., Kitchenham, B.A. & Jeffery, D.R. (2008). Analogy-x: Pro-
viding statistical inference to analogy-based software cost estimation. IEEE

Trans. Softw. Eng., 34, 471–484. 33
Khajeh-Hosseini, A., Greenwood, D. & Sommerville, I. (2010a). Cloud
migration: A case study of migrating an enterprise it system to iaas. In Cloud

Computing (CLOUD), 2010 IEEE 3rd International Conference on, 450 –457.
12, 14, 26
Khajeh-Hosseini, A., Sommerville, I. & Sriram, I. (2010b). Research
challenges for enterprise cloud computing. Tech. rep., Cloud Computing Co-
laboratory, School of Computer Science, University of St Andrews, UK.
198
BIBLIOGRAPHY
Khajeh-Hosseini, A., Sommerville, I., Bogaerts, J. & Teregowda,
P. (2011). Decision support tools for cloud migration in the enterprise. Cloud
Computing, IEEE International Conference on, 0, 541–548. 27
Kitchenham, B. (1997). Counterpoint: The problem with function points.
IEEE Softw., 14, 29–. 125
Kitchenham, B., Pfleeger, S., Pickard, L., Jones, P., Hoaglin, D.,
El Emam, K. & Rosenberg, J. (2002). Preliminary guidelines for empirical
research in software engineering. Software Engineering, IEEE Transactions on,
28, 721 – 734.
Klems, M., Nimis, J. & Tai, S. (2009). Do clouds compute? a framework

for estimating the value of cloud computing. Designing E-Business Systems.
Markets, Services, and Networks, 22, 110–123. 28
Kundra, V. (2010). State of public sector cloud computing. 12, 13
Lai, R. & Huang, S.J. (2003). A model for estimating the size of a formal com-
munication protocol specification and its implementation. IEEE Trans. Softw.
Eng., 29, 46–62. 36
Leake, G. (2006). Microsoft .net pet shop 4: Migrating an asp.net 1.1 applica-
tion to 2.0. 49, 67
Lederer, A. & Prasad, J. (1998). A causal model for software cost estimating
error. Software Engineering, IEEE Transactions on, 24, 137 –148.
199
BIBLIOGRAPHY
Lenk, A., Klems, M., Nimis, J., Tai, S. & Sandholm, T. (2009). What’s
inside the cloud? an architectural map of the cloud landscape. Software Engi-
neering Challenges of Cloud Computing, ICSE Workshop on, 0, 23–31. 5
Li, H., Zhong, L., Liu, J., Li, B. & Xu, K. (2011a). Cost-effective partial
migration of vod services to content clouds. Cloud Computing, IEEE Interna-

tional Conference on, 0, 203–210. 28
Li, W., Tordsson, J. & Elmroth, E. (2011b). Modeling for dynamic cloud
scheduling via migration of virtual machines. Cloud Computing Technology and

Science, IEEE International Conference on, 0, 163–171.
Li, W.S., Hsiung, W.P., Po, O., Hino, K., Candan, K.S. & Agrawal,
D. (2004). Challenges and practices in deploying web acceleration solutions for
distributed enterprise systems. In WWW ’04: Proceedings of the 13th interna-

tional conference on World Wide Web, 297–308, ACM, New York, NY, USA.
50
Linthicum, D. (2011). Cloud computing - david linthicum’s blog. 14, 51
Lokan, C.J. (1998). An empirical analysis of function point adjustment factors.
Information and Software Technology, 42, 649–660. 125
Low, G.C. & Jeffery, D.R. (1990). Function points in the estimation and
evaluation of the software process. IEEE Trans. Softw. Eng., 16, 64–71. 125
Madachy, R. (1997). Heuristic risk assessment using cost factors. Software,
IEEE , 14, 51 –59. 74
200
BIBLIOGRAPHY
Mark Basler, D.N., Sean Brydon & Singh, I. (2010). Introducing the java
pet store 2.0 application.
Mastroeni, L. & Naldi, M. (2011). Long-range evaluation of risk in the migra-

tion to cloud storage. E-Commerce Technology, IEEE International Conference
on, 0, 260–266. 27, 28
Matson, J.E., Barrett, B.E. & Mellichamp, J.M. (1994). Software de-
velopment cost estimation using function points. IEEE Trans. Softw. Eng., 20,
275–287. 39, 125
Mehta, N.R., Medvidovic, N. & Phadke, S. (2000). Towards a taxonomy
of software connectors. In Proceedings of the 22nd international conference on

Software engineering, ICSE ’00, 178–187, ACM, New York, NY, USA. 66
Meng, X., Shi, J., Liu, X., Liu, H. & Wang, L. (2011). Legacy application
migration to cloud. Cloud Computing, IEEE International Conference on, 0,
750–751.
Mens, T. & Gorp, P.V. (2006). A taxonomy of model transformation. Elec-
tronic Notes in Theoretical Computer Science, 152, 125 – 142, proceedings of

the International Workshop on Graph and Model Transformation (GraMoT
2005). 64, 65
Microsoft (2009). Microsoft azure platform. xvii, 4, 6, 7, 8, 10, 11
Microsoft (2012). See how startups are using windows azure today. 13
201
BIBLIOGRAPHY
Mikkilineni, R. & Sarathy, V. (2009). Cloud computing and the lessons
from the past. Enabling Technologies, IEEE International Workshops on, 0,

57–62. 4
Mohagheghi, P. & Saether, T. (2011). Software engineering challenges for
migration to the service cloud paradigm: Ongoing work in the remics project.
Services, IEEE Congress on, 0, 507–514. 31
Mohagheghi, P., Anda, B. & Conradi, R. (2005). Effort estimation of
use cases for incremental large-scale software development. In Proceedings of

the 27th international conference on Software engineering, ICSE ’05, 303–311,
ACM, New York, NY, USA. 39
Mudge, J.C. (2010). CLOUD COMPUTING: opportunities and challenges for
australia. Tech. rep., The australian academy of Technological sciences and

engineering, Melbourne, Victoria. 1
Network, S.D. (2010). Java blueprints.
Niessink, F. & Vliet, H.v. (1997). Predicting maintenance effort with func-
tion points. In Proceedings of the International Conference on Software Main-
tenance, 32–39, IEEE Computer Society, Washington, DC, USA. 112, 153
Padioleau, Y., Tan, L. & Zhou, Y. (2009). Listening to programmers. Soft-

ware Engineering, International Conference on, 0, 331–341. 65
Palankar, M.R., Iamnitchi, A., Ripeanu, M. & Garfinkel, S. (2008).
Amazon s3 for science grids: a viable solution? In Proceedings of the 2008
202
BIBLIOGRAPHY
international workshop on Data-aware distributed computing, DADC ’08, 55–
64, ACM, New York, NY, USA. 4
Piao, J.T. & Yan, J. (2010). A network-aware virtual machine placement and
migration approach in cloud computing. Grid and Cloud Computing, Interna-
tional Conference on, 0, 87–92. 31
Reifer, D. (2000). Web development: estimating quick-to-market software. Soft-
ware, IEEE , 17, 57 –64. 39, 40
RightScale (2009). Rightscale cloud management.
Rochwerger, B., Breitgand, D., Levy, E., Galis, A., Nagin, K.,
Llorente, I.M., Montero, R., Wolfsthal, Y., Elmroth, E., Cac-
eres, J., Ben-Yehuda, M., Emmerich, W. & Galan, F. (2009). The
reservoir model and architecture for open federated cloud computing. IBM
Journal of Research and Development, 53.
Rosenberg, J. (1997). Some misconceptions about lines of code. IEEE Inter-
national Symposium on Software Metrics, 0, 137. 36
Ruhe, M., Jeffery, R. & Wieczorek, I. (2003a). Cost estimation for web
applications. In ICSE ’03: Proceedings of the 25th International Conference
on Software Engineering, 285–294, IEEE Computer Society, Washington, DC,

USA. 74
Ruhe, M., Jeffery, R. & Wieczorek, I. (2003b). Using web objects for
estimating software development effort for web applications. In Proceedings of
203
BIBLIOGRAPHY
the 9th International Symposium on Software Metrics, 30–39, IEEE Computer
Society, Washington, DC, USA. 36
SalesForce (2012). Roi for it. 3
Shepperd, M. & Schofield, C. (1997). Estimating software project effort
using analogies. IEEE Transactions on Software Engineering, 23, 736 –743.
33, 51, 53
Singh, I., Stearns, B. & Johnson, M. (2002). Designing enterprise applica-

tions with the J2EE platform. Addison-Wesley Longman Publishing Co., Inc.,
Boston, MA, USA. 50
Smith, D. (2007). Migration of legacy assets to service-oriented architecture
environments. In Software Engineering - Companion, 2007. ICSE 2007 Com-

panion. 29th International Conference on, 174 –175. 32
Smith, J.W. (2009). A comparison of public cloud platforms. Tech. rep., StACC:
St Andrews Cloud Computing Collaboratory.
Sommerville, I. (2006). Software Engineering. Pearson Education, 8th edn. 44
Suen, C.H., Kirchberg, M. & Lee, B.S. (2011). Efficient migration of virtual
machines between public and private cloud. Cloud Computing Technology and
Science, IEEE International Conference on, 0, 549–553. 16
Symons, C.R. (1988). Function point analysis: Difficulties and improvements.
IEEE Trans. Softw. Eng., 14, 2–11. 125
Symons, C.R. (1991). Software sizing and estimating: Mk II FPA (Function

Point Analysis). John Wiley & Sons, Inc., New York, NY, USA. 42
204
BIBLIOGRAPHY
Symons, F.C. & Symons, C. (2001). Come back function point analysis (mod-
ernised) – all is. In Software Measurement Services Ltd , 413–426. 43
Thakar, A. & Szalay, A. (2010). Migrating a (large) science database to
the cloud. In Proceedings of the 19th ACM International Symposium on High
Performance Distributed Computing, HPDC ’10, 430–434, ACM, New York,

NY, USA. 30, 31
Tilley, S. & Parveen, T. (2010). Migrating software testing to the cloud.

Software Maintenance, IEEE International Conference on, 0, 1.
Tran, V., Keung, J., Liu, A. & Fekete, A. (2011a). Application migration
to cloud: A taxonomy of critical factors. In Proceedings of the ICSE Software
Engineering For Cloud Computing Workshop, SECLOUD, ACM, New York,
NY, USA.
Tran, V., Lee, K., Fekete, A., Liu, A. & Keung, J. (2011b). Size estima-
tion of cloud migration projects with cloud migration point (cmp). In Proceed-
ings of the 5th International Symposium on Empirical Software Engineering

and Measurement, ESEM, ACM.
Truong, H.L. & Dustdar, S. (2010). Composable cost estimation and mon-
itoring for computational applications in cloud computing environments. Pro-

cedia Computer Science, 1, 2169 – 2178, iCCS 2010.
Tukey, J.W. (1958). Bias and confidence in not-quite large samples. The Annals
of Mathematical Statistics, 29, 614. 136
UKSMA (1998). Mkii function point analysis counting practices manual. 42
205
BIBLIOGRAPHY
Vaquero, L.M., Rodero-Merino, L., Caceres, J. & Lindner, M.
(2009a). A break in the clouds: towards a cloud definition. SIGCOMM Com-

puter Communication Review , 39, 50–55. 2
Vaquero, L.M., Rodero-Merino, L., Caceres, J. & Lindner, M.
(2009b). A break in the clouds: towards a cloud definition. SIGCOMM Com-
puter Communication Review , 39, 50–55.
Venugopal, S., Desikan, S. & Ganesan, K. (2011). Effective migration of

enterprise applications in multicore cloud. Utility and Cloud Computing, IEEE
Internatonal Conference on, 0, 463–468. 31
Verma, A., Kumar, G., Koller, R. & Sen, A. (2011). Cosmig: Modeling
the impact of reconfiguration in a cloud. Modeling, Analysis, and Simulation
of Computer Systems, International Symposium on, 0, 3–11. 15, 28, 32
Verner, J. & Tate, G. (1992). A software size model. IEEE Trans. Softw.
Eng., 18, 265–278. 36
Ward, C., Aravamudan, N., Bhattacharya, K., Cheng, K., Filepp,
R., Kearney, R., Peterson, B., Shwartz, L. & Young, C. (2010).
Workload migration into clouds challenges, experiences, opportunities. In Cloud

Computing (CLOUD), 2010 IEEE 3rd International Conference on, 164 –171.
12
Yam, C.Y., Baldwin, A., Shiu, S. & Ioannidis, C. (2011). Migration
to cloud as real option: Investment decision under uncertainty. IEEE Trust-

Com/IEEE ICESS/FCST, International Joint Conference of , 0, 940–949. 27
206
BIBLIOGRAPHY
Ye, K., Jiang, X., Huang, D., Chen, J. & Wang, B. (2011). Live migration
of multiple virtual machines with resource reservation in cloud computing en-

vironments. Cloud Computing, IEEE International Conference on, 0, 267–274.
28
Yi, S., Andrzejak, A. & Kondo, D. (2011). Monetary cost-aware check-

pointing and migration on amazon cloud spot instances. IEEE Transactions
on Services Computing, 99.
Yin, R.K. (2003). Case study research : design and methods. Sage Publications,
3rd edn.
Youseff, L., Butrico, M. & Da Silva, D. (2008). Toward a unified ontology
of cloud computing. In Grid Computing Environments Workshop, 2008. GCE

’08 , 1–10. 4
Yuan, C., Chen, Y. & Zhang, Z. (2003). Evaluation of edge

caching/offloading for dynamic content delivery. In WWW ’03: Proceedings
of the 12th international conference on World Wide Web, 461–471, ACM, New
York, NY, USA. 50
Zhang, G., Chiu, L. & Liu, L. (2010). Adaptive data migration in multi-
tiered storage based cloud environment. Cloud Computing, IEEE International
Conference on, 0, 148–155. 31
207
BIBLIOGRAPHY
208
Appendix A
Cloud Migration Projects -

Survey Questionnaire
209
THE UNIVERSITY OF NEW SOUTH WALES AND NICTA
PARTICIPANT INFORMATION STATEMENT AND CONSENT FORM
Effort Estimation of Migration of Legacy Systems to Cloud

You are invited to participate in a study of factors in migrating legacy software applications to cloud computing
systems. We hope to learn more about important facts and also to check whether some of our models correctly capture
these factors. You were selected as a possible participant in this study because of your experience in migrating legacy
software applications to cloud computing systems.
If you decide to participate, we will conduct one interview with you at a time mutually agreed to. In the unlikely case
that there is a need for a follow-up interview, it will also be conducted at a mutually agreed time. Every interview will
be recorded with a voice recorder and should take no more than one hour to complete.
Results of this study might help you better understand factors in migrating legacy software applications to cloud
computing systems. This in turn might improve your work performance or customer satisfaction with your future
software. However, we cannot and do not guarantee or promise that you will receive any benefits from this study.
Any information that is obtained in connection with this study and that can be identified with you will remain
confidential and will be disclosed only with your permission, except as required by law. If you give us your permission
by signing this document, we plan to publish the summary results in a very general form at scientific conferences. The
purpose of this publication would be to inform the broader scientific community about how migration of legacy
applications to cloud computing systems can be alleviated. In any publication, information will be provided in such a
way that you, your company, the software tools that you used/supported/sold/developed, and the vendors of these
software tools cannot be identified.
Complaints may be directed to the Ethics Secretariat, The University of New South Wales, SYDNEY 2052
AUSTRALIA (phone 9385 4234, fax 9385 6648, email ethics.sec@unsw.edu.au). Any complaint you make will be
investigated promptly and you will be informed about the outcome.
After the completion of the study (likely in the second half of 2011), we will present you (and every other participant)
with summary results of this study (via email as a PDF file) and will ask you for some feedback. Your participation in
the feedback is voluntary (i.e. participation in interviews does not automatically imply participation in the feedback
process). If you are participating in the feedback process, you will be required to spend additional time to familiarize
yourself with study results and to provide some comments. The estimated time needed for the feedback is up to one
hour. If you wish to sign up for the feedback process now, you can do so by ticking the box on the next page. Please
note that you can withdraw from the feedback process any time by contacting us.
I would like to provide my feedback on a draft of the summary results.
Your decision whether or not to participate in this study will not prejudice your future relations with the University of
New South Wales and NICTA. If you decide to participate, you are free to withdraw your consent and to discontinue
participation at any time, without any prejudice. You can decline to answer any question, for whatever reason.
If you have any questions, please feel free to ask Thi Khanh Van Tran (phone: 02 9376 2259; e-mail: ThiKhanhVan.
Tran@nicta.com.au) or Kevin Lee (phone: 02 9376 2207, e-mail: Kevin.Lee@nicta.com.au). If you have any
additional questions later, Thi Khanh Van Tran or Kevin Lee will be happy to answer them.
You will be given a copy of this form to keep.
Page 1 of 12
THE UNIVERSITY OF NEW SOUTH WALES AND NICTA
PARTICIPANT INFORMATION STATEMENT AND CONSENT FORM (continued)
You are making a decision whether or not to participate in this research study. Your signature indicates that,
having read the information provided above, you have decided to participate.
…………………………………………………… .…………………………………………………….
Signature of Research Participant Signature of Witness
…………………………………………………… .…………………………………………………….
(Please PRINT name) (Please PRINT name)
…………………………………………………… .…………………………………………………….
Date Nature of Witness
REVOCATION OF CONSENT

I hereby wish to WITHDRAW my consent to participate in the research proposal described above and understand that
such withdrawal WILL NOT jeopardise any treatment or my relationship with The University of New South Wales
and NICTA.
…………………………………………………… .…………………………………………………….
Signature Date
……………………………………………………
Please PRINT Name
The section for Revocation of Consent should be forwarded to NICTA, Attn: Kevin Lee, Software Systems Research
Group, Locked Bag 9013, Alexandria NSW 1435.
Page 2 of 12
A survey on cost factors for migration effort to Cloud
This survey is designed to collect data on migration projects to cloud for determining significant
cost factors that affect migration effort to cloud.
There are 36 questions in this survey.
I. General questions
GQ1: What type of cloud did you migrate to? Please specify.
Check any that apply
IaaS __________________________________
PaaS __________________________________
SaaS __________________________________
GQ2: What components of your system did you migrate to cloud?

Web Application
Desktop Software Application
Web Server
Database Server
Database
Operating Systems
Other: _____________________________
GQ3: Did you migrate the whole system to cloud?

Choose one of the following answers
The entire system was migrated to cloud

A part of the system was migrated to cloud, the rest stays in house
No answer
Page 3 of 12
II. Cost factors
Questions in this section focus on any cost factors that influence migration effort to cloud.
CF1: Have development team done any similar projects on Cloud before?
Yes
No
No answer
CF2: What is development team's expertise?

Database
Networking
Software Architecture
Other: ______________________________________
CF3: Please rate the following factors on how they influenced your migration effort to cloud?
1 - None to minor influence, 5 - Significant influence
1 2 3 4 5 No answer
Developers'
expertise
Experience in
software
development
Experience in cloud
Design quality of
migration tasks
Choice of cloud
services
Page 4 of 12
CF4: Are there any other factors influencing the migration effort?
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
III. Database Migration

Questions in this section focus on the database migration part of your system
DB1: Did you migrate your database to Cloud?

This includes the migration of data, database server, etc...
If yes, there will be a few questions related to database migration tasks.
If no, you will be taken to the next section.
Yes
No
DB2: What database did you use before the migration?

MySQL
MSSQL 2008 or later
MSSQL 2005 or older
PostgreSQL
MSAccess
Other: _________________________________
No answer
Page 5 of 12
DB3: What database did you migrate to?
Choose one of the following answers. If you installed your own database server in cloud (e.g., in
an EC2 instance), please specify.
Amazon RDS
Amazon SimpleDB
Amazon S3
Microsoft SQL Azure
Google Bigtable
Other: ______________________________
No answer
DB4: How many SQL queries did you modify for your system to adapt to the new database
in cloud?
None
1 - 10
More than 10
No answer
DB6: How many GBs of data did you migrate to cloud?

Only numbers may be entered in this field
DB7: How many person-hours did it take to migrate all data to cloud?
Page 6 of 12
DB8: Did you perform any of the following for your database to adapt to the new database
in cloud?
Modify database schema

Split data into multiple databases
Replicate data into multiple databases
None
Other: ____________________________________
DB9: How many person-hours did it take to perform those tasks?

DB10: Did you carry out any other activities for database migration, and how many person-
hours did it take?
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
IV. Installation and Configuration

Questions in this section focus on installation and configuration tasks.
IC1: Did you install or configure any software in cloud?
Yes
No
Page 7 of 12
IC2: How many software were installed to set up the environment in cloud?
e.g., Operation systems, database servers, web servers, etc...
Only numbers may be entered in these fields
Installed from binary files

Installed from source code
No installation required
IC3: How many software were reconfigured?

Configuration variable examples are pathname, environment variables, etc...
Only numbers may be entered in these fields.
Re-configured with less than 6 configuration variables

Re-configured with 6 configuration variables and more
IC4: How many person-hours did it take to complete all installation and configuration
tasks?
IC5: Did you carry out any other activities for installation and configuration, and how many
person-hours did it take?
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
Page 8 of 12
V. Network Connections
Questions in this section focus on migration tasks related to network connection changes because
of the migration.
NC1: Do any components in your system connect with each other via the Internet or a local
network?
Yes
No
NC2: Did you carry out any tasks related to these network connections?
e.g., adding security such as VPC, optimizing network performance by changing packet size, etc...
Yes
No
NC3: How many connections in cloud that you have performed the following tasks?
Add security, i.e., secure a connection with VPC, or with secured protocol such as https
Optimize protocol for performance, such as changing TPC packet size, etc...
Only numbers may be entered in these fields.
Add security
Optimize protocol
NC4: How many connections across the Internet that you have performed the following
tasks?
Add security
Optimize protocol
NC5: How many person-hours did it take to complete all tasks related to connection?
Page 9 of 12
NC6: Did you carry out any other activities for network connection, and how many person-
hours did it take?
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
VI. Code modification

Questions in this section focus on how application code has been modified.
CM1: Did you modify any parts of the application code?

Code modification can be for any purposes: add new functionality, modify data access layer to
adapt to a new database, etc...
Yes
No
CM2: How many classes were modified?

- Problem Domain Type (PDT): classes that represent real-world entities in the application
domain of the system.
- Human Interaction Type (HIT): classes designed for information visualization and human-
computer interaction.
- Data Management Type (DMT): classes that accommodate data storage and retrieval.
- Task Management Type (TMT): classes that are responsible for definition and control of tasks,
communications between subsystems and with external systems.
Human interaction classes

Data management classes
Task management classes
Problem domain classes
Page 10 of 12
CM3: How many Human Interaction classes were modified in
More than 5 attributes

More than 5 methods
More than 5 calls to other classes
CM4: How many Data Management classes were modified in


More than 5 methods
CM5: How many Task Management classes were modified in


More than 5 methods
CM6: How many Problem Domain classes were modified in:


More than 5 methods
CM7: How many person-hours did it take to complete all code modification?
Page 11 of 12
CM8: Did you carry out any other activities for code modification, and how many person-
hours did it take?
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
End of survey.
Thank you for your time and effort for taking this survey.
Please return the completed survey to thikhanhvan.tran@nicta.com.au or
tyao1801@uni.sydney.edu.au
Page 12 of 12
A. CLOUD MIGRATION PROJECTS - SURVEY
QUESTIONNAIRE
222
Appendix B
Survey Responses - Raw Data
223
Network Connection
LAN-to-LAN LAN-to-WAN WAN-to-LAN
ID Hours
Low Average High Low Average High Low Average High
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0
5 0 1 0 0 1 0 0 0 0 5
6 0 0 0 0 0 1 0 0 0 10
7 0 0 0 0 0 2 0 0 0 20
B. SURVEY RESPONSES - RAW DATA
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
224
10 0 5 5 0 5 5 0 0 0 100
11 0 2 0 0 2 0 0 0 0 20
12 1 0 0 1 0 0 0 0 0 2
13 3 0 0 0 0 0 0 0 0 2
14 0 0 1 0 0 2 0 0 0 20
15 0 0 0 0 0 0 0 0 0 0
16 1 0 0 0 0 0 0 0 0 2
17 1 0 0 1 0 0 0 0 0 2
18 1 0 0 1 0 0 0 0 0 2
19 0 0 0 0 0 0 0 0 0 0
Table B.1: Survey responses for network connection component
Code Modification
Problem Domain Human Interaction Data Management Task Management
ID Hrs
Low Average High Low Average High Low Average High Low Average High
1 0 0 20 0 0 5 0 0 0 0 0 20 250
2 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 5 0 0 0 40
4 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 1 0 0 1 0 0 1 0 0 1 20
6 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0
225
10 0 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 10 80
13 0 0 0 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 1 2 0 0 0 1 10
16 0 0 0 0 0 0 1 4 4 0 0 0 40
17 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 0
Table B.2: Survey responses for code modification component

Installation and Configuration
Application Infrastructure
ID Hours
Low Average High Low Average High
1 0 0 0 0 0 5 80
2 0 0 0 0 3 0 3
3 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0
5 0 0 0 0 2 3 50
6 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0
8 0 0 0 0 3 1 24
9 0 0 0 7 0 0 6
226
10 0 0 0 0 0 0 0
11 0 0 0 0 3 2 50
12 0 0 0 0 0 15 300
13 0 0 0 3 2 0 7
14 0 0 0 0 7 0 20
15 0 0 0 1 4 0 14
16 0 0 1 1 0 0 10
17 0 0 0 0 3 0 4
18 0 0 0 2 2 0 8
19 0 0 0 0 12 0 48
Table B.3: Survey responses for installation and configuration component
Database Migration
Query Modification Data Population
ID Hours
Low Average High Low Average High
1 0 0 0 2 0 0 2
2 0 0 0 0 0 0 0
3 20 0 0 3 0 0 25
4 0 0 0 0 0 4 8
5 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0
7 0 0 0 0 2 0 5
8 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0
227
10 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0
13 0 0 0 2 0 0 1
14 0 0 0 2 0 0 2
15 0 5 0 0 2 0 7
16 0 0 8 0 0 2 15
17 0 0 0 2 0 0 2
18 0 0 0 2 0 0 2
19 0 0 0 0 0 0 0
Table B.4: Survey responses for database migration component

ID Dev. Expertise Exp. in Soft. Dev. Exp. in Cloud Design Quality of Mig. Tasks Choice of Cloud
1 5 5 1 0 0
2 3 3 4 0 1
3 4 4 5 1 4
4 3 3 3 2 4
5 5 5 5 2 4
6 2 3 5 4 4
7 4 3 5 4 5
8 4 4 3 2 5
9 5 5 5 5 0
10 2 2 3 3 3
228
11 0 0 0 0 0
12 5 5 5 0 1
13 4 3 2 3 5
14 4 2 5 2 5
15 3 5 5 3 5
16 3 5 5 3 5
17 1 1 2 2 4
18 5 2 5 2 5
19 4 4 5 1 1
Table B.5: Survey responses for external cost factors

Whole

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Whole

Загружено:

Авторское право:

Доступные форматы

Cost Implications and Size

Estimation of Cloud Migration

TRAN, Thi Khanh Van

A thesis submitted for the degree of

Surname or Family name: TRAN

First name: THI KHANH VAN Other name/s:

Abbreviation for degree as given in the University

School: Computer Science and Engineering Faculty: Engineering

Abstract 350 words maximum:

Declaration relating to disposition of project thesis/dissertation

…………………………………………………………… ……………………………………..…………… ……… …………………...…….…

FOR OFFICE USE ONLY Date of completion of requirements for Award:

THIS SHEET IS TO BE GLUED TO THE INSIDE FRONT COVER OF THE THESIS

great potential beneﬁts for enterprises who migrate their computing

obstacle to enterprise adoption of Cloud technologies has been the

understanding of this matter by identifying critical indicators of Cloud

A taxonomy of migration tasks to the Cloud has been proposed, out-

is presented for estimating the size of Cloud migration projects, by re-

casting a well-known software size estimation model, Function Point,

agement. The empirical validation on the set of data points collected

is practically useful as a predictor for eﬀort estimation under a deﬁned

ﬂuence the migration eﬀort. We propose a list of external cost factors,

Our overall contribution is to shed light into Cloud migration and

To my parents and Dan

especially grateful to my co-supervisor, Dr. Jacky Keung. He has al-

I would like to thank my colleagues and friends from NICTA, espe-

time proofreading my thesis and correcting many grammar mistakes,

Last but not least, I am extremely grateful to my beloved parents, my

and hard times. To them I dedicate this thesis.

Publications that have contributed to this thesis

gineering (SECLOUD ’11), Honolulu, Hawaii, USA, May 2011,

Symposium on Empirical Software Engineering and Measure-

ment (ESEM ’11), Banﬀ, Alberta, Canada, Sep 2011, pp 265

Funding and Grants that have supported the work in this

• The NICTA International Postgraduate Award (NIPA) Scholar-

• The NICTA Research Project Award (NRPA) Scholarship

• Amazon Research Grant

List of Tables xvii

1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Cloud Computing and Its Oﬀerings . . . . . . . . . . . . . 2

1.1.2 The Urge to Migrate to the Cloud . . . . . . . . . . . . . . 12

1.1.3 The Essentials of Eﬀort Estimation . . . . . . . . . . . . . 14

1.3 Research Problem and Aims . . . . . . . . . . . . . . . . . . . . . 18

1.4 Research Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.5 Research Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.6 Organisation of the Thesis . . . . . . . . . . . . . . . . . . . . . . 22

2.1 Cloud Migration Solutions . . . . . . . . . . . . . . . . . . . . . . 26

2.1.1 Decision Making Support . . . . . . . . . . . . . . . . . . . 26

2.1.2 Experience Reports . . . . . . . . . . . . . . . . . . . . . . 29

2.1.3 Cloud Migration Concerns . . . . . . . . . . . . . . . . . . 30

2.2 Eﬀort Estimation in Traditional Software Engineering . . . . . . . 33

2.2.1 Analogy Approach . . . . . . . . . . . . . . . . . . . . . . 33

2.2.2 Expert Judgement Approach . . . . . . . . . . . . . . . . . 34

2.2.3 Algorithmic Model Approach . . . . . . . . . . . . . . . . 35

2.3 Software Size Estimation in Traditional Software Engineering . . . 36

2.3.1 Source Lines of Code (SLOC) . . . . . . . . . . . . . . . . 36

2.3.2 Function Point . . . . . . . . . . . . . . . . . . . . . . . . 37

2.3.3 Function Point Extensions . . . . . . . . . . . . . . . . . . 39

3.1 Cloud Migration Experiments . . . . . . . . . . . . . . . . . . . . 48

3.1.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . 49

3.1.2 Data Collection Strategy . . . . . . . . . . . . . . . . . . . 50

3.2 Discussion with Cloud Engineers . . . . . . . . . . . . . . . . . . 51

3.2.2 Discussion Protocols . . . . . . . . . . . . . . . . . . . . . 53