Вы находитесь на странице: 1из 254

Cost Implications and Size

Estimation of Cloud Migration


Projects with
Cloud Migration Point

TRAN, Thi Khanh Van


School of Computer Science and Engineering,
Faculty of Engineering
University of New South Wales

A thesis submitted for the degree of


Doctor of Philosophy
March 2012
ii
THE UNIVERSITY OF NEW SOUTH WALES
Thesis/Dissertation Sheet

Surname or Family name: TRAN

First name: THI KHANH VAN Other name/s:

Abbreviation for degree as given in the University


calendar: Ph.D. (N.S.W)

School: Computer Science and Engineering Faculty: Engineering

Title: Cost Implications and Size Estimation of Cloud Migration Projects with Cloud Migration Point

Abstract 350 words maximum:

Cloud computing has been a buzz word over the last decade - it offers great potential benefits for enterprises who migrate their
computing systems from local data centers to a Cloud environment. One major obstacle to enterprise adoption of Cloud
technologies has been the lack of visibility into migration effort and cost. Currently, there is very limited existing work in the
literature. This thesis improves our understanding of this matter by identifying critical indicators of Cloud migration effort.

A taxonomy of migration tasks to the Cloud has been proposed, outlining possible migration tasks that any migration project to the
Cloud may encounter. It enables Cloud practitioners to gain an understanding of the specific tasks involved and its implication on
the amount of effort required. A methodology, called Cloud Migration Point (CMP), is presented for estimating the size of Cloud
migration projects, by recasting a well-known software size estimation model, Function Point, into the context of Cloud migration.
The CMP value implies how large the migration project is, and it can be used as an indicator for Cloud migration effort estimation.
The process of calculating CMP also assists one in itemizing the migration tasks, and identifying the complexity of each task. This
is useful for project planning and management. The empirical validation on the set of data points collected from our survey shows
that, with some calibrations, the CMP metric is practically useful as a predictor for effort estimation under a defined set of
assumptions. Besides size measurement, other factors also influence the migration effort. We propose a list of external cost
factors, which do not affect how migration tasks are designed, but may affect how fast migration tasks can be done, such as
development team's experience in software engineering, or experience with the Cloud.

Our overall contribution is to shed light into Cloud migration and the tasks involved, which enables Cloud practitioners to estimate
the amount of effort required for the migration of legacy systems into the Cloud. This contributes towards the cost-benefit analysis
of whether the benefits of the Cloud exceed the migration effort and other Cloud costs.

Declaration relating to disposition of project thesis/dissertation

I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in
part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all
property rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.

I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstracts International (this is applicable to doctoral
theses only).

…………………………………………………………… ……………………………………..…………… ……… …………………...…….…


Signature Witness Date

The University recognises that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for
restriction for a period of up to 2 years must be made in writing. Requests for a longer period of restriction may be considered in exceptional
circumstances and require the approval of the Dean of Graduate Research.

FOR OFFICE USE ONLY Date of completion of requirements for Award:

THIS SHEET IS TO BE GLUED TO THE INSIDE FRONT COVER OF THE THESIS


Abstract

Cloud computing has been a buzz word over the last decade - it offers

great potential benefits for enterprises who migrate their computing


systems from local data centers to a Cloud environment. One major

obstacle to enterprise adoption of Cloud technologies has been the


lack of visibility into migration effort and cost. Currently, there is
very limited existing work in the literature. This thesis improves our

understanding of this matter by identifying critical indicators of Cloud


migration effort.

A taxonomy of migration tasks to the Cloud has been proposed, out-


lining possible migration tasks that any migration project to the Cloud
may encounter. It enables Cloud practitioners to gain an understand-

ing of the specific tasks involved and its implication on the amount of
effort required. A methodology, called Cloud Migration Point (CMP),

is presented for estimating the size of Cloud migration projects, by re-

casting a well-known software size estimation model, Function Point,

into the context of Cloud migration. The CMP value implies how
large the migration project is, and it can be used as an indicator for
Cloud migration effort estimation. The process of calculating CMP

also assists one in itemizing the migration tasks, and identifying the
complexity of each task. This is useful for project planning and man-

agement. The empirical validation on the set of data points collected


from our survey shows that, with some calibrations, the CMP metric

is practically useful as a predictor for effort estimation under a defined


set of assumptions. Besides size measurement, other factors also in-

fluence the migration effort. We propose a list of external cost factors,

which do not affect how migration tasks are designed, but may affect
how fast migration tasks can be done, such as development team’s
experience in software engineering, or experience with the Cloud.

Our overall contribution is to shed light into Cloud migration and


the tasks involved, which enables Cloud practitioners to estimate the
amount of effort required for the migration of legacy systems into the
Cloud. This contributes towards the cost-benefit analysis of whether

the benefits of the Cloud exceed the migration effort and other Cloud
costs.
Dedication

To my parents and Dan


Acknowledgements

I am most indebted to my two supervisors Dr. Anna Liu and Dr. Ray-

mond Wong for their guidance and close supervision over the years.
Dr. Raymond Wong was very encouraging and patient in walking me

through the very first steps in my research journey. Dr. Anna Liu
has inspired me in so many ways. Her tremendous support, care and
understanding made it possible for me to continue this research. I am

especially grateful to my co-supervisor, Dr. Jacky Keung. He has al-


ways guided me in the right direction and his constant support made
me feel confident to complete this thesis. Through my thesis-writing
time, he spent many hours proofreading and providing me with criti-
cal feedback. I wish to express my gratitude and thanks to Professor

Alan Fekete and Kevin Lee, whose constructive and insightful feed-

back have benefited this research and myself in many ways. I would
like to give my sincere thanks to Professor Barbara Kitchenham for

her reviews and expert advices that made great improvements to this
research. This thesis will not be possible without their encouragement

and support.

I would like to thank my colleagues and friends from NICTA, espe-


cially Liang and Sadeka, for accompanying and sharing with me all

ups and downs during our Ph.D. journey. I thank my best friends
for their wonderful friendship, especially Jensyn for spending a lot of

time proofreading my thesis and correcting many grammar mistakes,


Yolanda for her enormous care and emotional support, and our bad-

minton group for all the entertainment and sport activities that got
me through the difficult times.

Last but not least, I am extremely grateful to my beloved parents, my

little sister, my brother and his family, who have always believed in
me and supported me unconditionally; and my husband, Daniel, for
his endless love and for always being there for me during both happy

and hard times. To them I dedicate this thesis.


Preface

Publications that have contributed to this thesis

• Van Tran, Jacky Keung, Anna Liu, and Alan Fekete: “Applica-
tion Migration to Cloud: A Taxonomy of Critical Factors”, in
Proceedings of the 2nd International Workshop on Software En-

gineering (SECLOUD ’11), Honolulu, Hawaii, USA, May 2011,


pp 22-28.

• Van Tran, Kevin Lee, Alan Fekete, Anna Liu, Jacky Keung:
“Size Estimation of Cloud Migration Projects with Cloud Mi-
gration Point (CMP)”, in Proceedings of the 5th International

Symposium on Empirical Software Engineering and Measure-

ment (ESEM ’11), Banff, Alberta, Canada, Sep 2011, pp 265


- 274.

Funding and Grants that have supported the work in this

thesis

• The NICTA International Postgraduate Award (NIPA) Scholar-

ship

• The NICTA Research Project Award (NRPA) Scholarship

• Amazon Research Grant


Contents

List of Figures xv

List of Tables xvii

Glossary xxi

1 Introduction 1

1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Cloud Computing and Its Offerings . . . . . . . . . . . . . 2

1.1.2 The Urge to Migrate to the Cloud . . . . . . . . . . . . . . 12

1.1.3 The Essentials of Effort Estimation . . . . . . . . . . . . . 14

1.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.3 Research Problem and Aims . . . . . . . . . . . . . . . . . . . . . 18

1.4 Research Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.5 Research Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.6 Organisation of the Thesis . . . . . . . . . . . . . . . . . . . . . . 22

2 Literature Review 25

2.1 Cloud Migration Solutions . . . . . . . . . . . . . . . . . . . . . . 26

xi
CONTENTS

2.1.1 Decision Making Support . . . . . . . . . . . . . . . . . . . 26

2.1.2 Experience Reports . . . . . . . . . . . . . . . . . . . . . . 29

2.1.3 Cloud Migration Concerns . . . . . . . . . . . . . . . . . . 30

2.2 Effort Estimation in Traditional Software Engineering . . . . . . . 33

2.2.1 Analogy Approach . . . . . . . . . . . . . . . . . . . . . . 33

2.2.2 Expert Judgement Approach . . . . . . . . . . . . . . . . . 34

2.2.3 Algorithmic Model Approach . . . . . . . . . . . . . . . . 35

2.3 Software Size Estimation in Traditional Software Engineering . . . 36

2.3.1 Source Lines of Code (SLOC) . . . . . . . . . . . . . . . . 36

2.3.2 Function Point . . . . . . . . . . . . . . . . . . . . . . . . 37

2.3.3 Function Point Extensions . . . . . . . . . . . . . . . . . . 39

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 Research Methodology 47

3.1 Cloud Migration Experiments . . . . . . . . . . . . . . . . . . . . 48

3.1.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . 49

3.1.2 Data Collection Strategy . . . . . . . . . . . . . . . . . . . 50

3.2 Discussion with Cloud Engineers . . . . . . . . . . . . . . . . . . 51

3.2.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.2.2 Discussion Protocols . . . . . . . . . . . . . . . . . . . . . 53

3.2.3 Data Collection and Analysis . . . . . . . . . . . . . . . . 54

3.3 Survey Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.3.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.3.2 Survey Design . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.3.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . 59

xii
CONTENTS

3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4 Taxonomy of Migration Tasks to the Cloud 63

4.1 Taxonomy in other contexts . . . . . . . . . . . . . . . . . . . . . 65

4.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.2.1 Measured Data and Observations . . . . . . . . . . . . . . 68

4.3 Migration Influential Cost Factors . . . . . . . . . . . . . . . . . . 73

4.4 Taxonomy of Migration Tasks . . . . . . . . . . . . . . . . . . . . 76

4.5 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.6 Reflection and Discussion . . . . . . . . . . . . . . . . . . . . . . 89

4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5 Cloud Migration Point 95

5.1 CMP Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.2 Cloud Migration Cost Factors . . . . . . . . . . . . . . . . . . . . 98

5.3 Cloud Migration Project Classification . . . . . . . . . . . . . . . 102

5.4 Cloud Migration Point . . . . . . . . . . . . . . . . . . . . . . . . 106

5.4.1 Network Connection Component: CM Pconn . . . . . . . . 108

5.4.2 Code Modification Component: CM Pcode . . . . . . . . . . 110

5.4.3 Installation and Configuration Component: CM Pic . . . . 114

5.4.4 Database Migration Component: CM Pdb . . . . . . . . . . 117

5.4.5 CMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5.5 CMP Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5.6 Reflection and Discussion . . . . . . . . . . . . . . . . . . . . . . 124

5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

xiii
CONTENTS

6 Validation 129

6.1 Theoretical Validation . . . . . . . . . . . . . . . . . . . . . . . . 131


6.2 Empirical Validation - Phase 1 . . . . . . . . . . . . . . . . . . . . 134

6.2.1 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . 135


6.2.2 Leave-One-Out Cross Validation . . . . . . . . . . . . . . . 136

6.2.3 Ordinary Least Square Regression Analysis . . . . . . . . . 137

6.2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 141


6.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.4 Empirical Validation - Phase 2 . . . . . . . . . . . . . . . . . . . . 143
6.5 CMP Parameters Calibration . . . . . . . . . . . . . . . . . . . . 147
6.5.1 CMP Components’ Assumptions . . . . . . . . . . . . . . 150

6.5.2 The Calibration Process . . . . . . . . . . . . . . . . . . . 159


6.6 Empirical Validation - Phase 3 . . . . . . . . . . . . . . . . . . . . 162
6.7 Threats of Validity and Discussion . . . . . . . . . . . . . . . . . 168

6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

7 Conclusions and Future Directions 171


7.1 Research Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.2 Research Contribution . . . . . . . . . . . . . . . . . . . . . . . . 180

7.3 Research Limitation . . . . . . . . . . . . . . . . . . . . . . . . . 183

7.4 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . 185

Bibliography 189

A Cloud Migration Projects - Survey Questionnaire 209

B Survey Responses - Raw Data 223

xiv
List of Figures

1.1 Three main Cloud services layers . . . . . . . . . . . . . . . . . . 5


1.2 Major Cloud service providers . . . . . . . . . . . . . . . . . . . . 7

1.3 Cloud Computing - Google Trends . . . . . . . . . . . . . . . . . 12


1.4 Cost and benefit of migrating existing applications into the Cloud 16

3.1 Steps of the Research Process and Thesis . . . . . . . . . . . . . . 48

4.1 Migration Overhead Cost . . . . . . . . . . . . . . . . . . . . . . . 71

4.2 Diagram of Cloud migration task taxonomy . . . . . . . . . . . . 79

6.1 The boxplots for the six training datasets of variable CMP . . . . 138

6.2 The scatter plots for OLS regression . . . . . . . . . . . . . . . . . 139

xv
LIST OF FIGURES

xvi
List of Tables

1.1 Pricing model comparision: Service charge (Amazon, 2009; Mi-


crosoft, 2009; Google, 2009) . . . . . . . . . . . . . . . . . . . . . 10

1.2 Pricing model comparision: Storage Cost (Amazon, 2009; Microsoft,

2009; Google, 2009) . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3 Outline of the research approach . . . . . . . . . . . . . . . . . . . 21

3.1 Mapping between research questions and questionnaire . . . . . . 59

4.1 Recorded overhead efforts of preparing PetShop for migration . . 70

4.2 Recorded overhead efforts of putting PetShop to Cloud platform . 71

4.3 Taxonomy of migration tasks . . . . . . . . . . . . . . . . . . . . 78

4.4 Mapping of the FSO migration tasks and the taxonomy . . . . . . 88

4.5 Efforts comparison for migrating to PaaS and IaaS Clouds . . . . 90

5.1 System’s states before and after migration . . . . . . . . . . . . . 103

5.2 Complexity evaluation for each connection . . . . . . . . . . . . . 109

5.3 Evaluating CMPconn . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.4 Elements of each changed class . . . . . . . . . . . . . . . . . . . 111

5.5 Complexity evaluation for each class . . . . . . . . . . . . . . . . 113

xvii
LIST OF TABLES

5.6 Evaluating CMPcode . . . . . . . . . . . . . . . . . . . . . . . . . 114

5.7 Complexity evaluation for each IC task . . . . . . . . . . . . . . . 116

5.8 Evaluating CMPic . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

5.9 Complexity evaluation for each database task . . . . . . . . . . . 118

5.10 Evaluating CMPdb . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.11 Weighted values of CMP’s components . . . . . . . . . . . . . . . 119

5.12 Code changes for PetShop . . . . . . . . . . . . . . . . . . . . . . 122

5.13 Installations for PetShop . . . . . . . . . . . . . . . . . . . . . . . 123

5.14 Database Migration for PetShop . . . . . . . . . . . . . . . . . . . 123

5.15 CMP components for PetShop . . . . . . . . . . . . . . . . . . . . 124

6.1 Empirical validation data points . . . . . . . . . . . . . . . . . . . 137

6.2 Phase 1 - OLS Regression Analysis . . . . . . . . . . . . . . . . . 140

6.3 Phase 1 - Results Evaluation . . . . . . . . . . . . . . . . . . . . . 141

6.4 Data points from surveys and interviews . . . . . . . . . . . . . . 142

6.5 Phase 2 - OLS Regression Analysis . . . . . . . . . . . . . . . . . 143

6.6 Phase 2 - Results Evaluation . . . . . . . . . . . . . . . . . . . . . 147

6.7 Number of data points available to calibrate each weight of the

CMP model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

6.8 Data points for calibrating network connection component weights 160

6.9 Multiple Regression Coefficient Result for CM Pconn . . . . . . . . 160

6.10 Multiple Regression Coefficient Result for CM Pcode . . . . . . . . 161

6.11 Multiple Regression Coefficient Result for CM Pic . . . . . . . . . 161

6.12 Regression Coefficient Result for CM Pconn . . . . . . . . . . . . . 162

6.13 Regression Coefficient Result for the Final CMP . . . . . . . . . . 162

xviii
LIST OF TABLES

6.14 New dataset - calculated from the new set of calibrated weights . 163

6.15 Phase 3 - OLS Regression Analysis . . . . . . . . . . . . . . . . . 163


6.16 Phase 3 - Results Evaluation . . . . . . . . . . . . . . . . . . . . . 167

B.1 Survey responses for network connection component . . . . . . . . 224

B.2 Survey responses for code modification component . . . . . . . . . 225


B.3 Survey responses for installation and configuration component . . 226

B.4 Survey responses for database migration component . . . . . . . . 227


B.5 Survey responses for external cost factors . . . . . . . . . . . . . . 228

xix
GLOSSARY

xx
FP Function Point

FPA Function Point Analysis

GAE Google App Engine

GSC General System Characteristics


Glossary IaaS Infrastructure-as-a-Service

IDE Integrated Development Environ-


ment

ILF Internal Logical File


API Application Programming Inter-
face IO Input Output

AWS Amazon Web Service IP Internet Point

CMP Cloud Migration Point IT Information Technology

COSMIC FFP COSMIC Full Function LOC Line of Code

Point MKIIFP Mark II Function Point

CPU Central Processing Unit MS Microsoft

CRM Customer Relationship Manage- NICTA National ICT Australia


ment
OO Object-Oriented
EBS Elastic Block Store
OOFP Object-Oriented Function Point

EC2 Elastic Compute Cloud


OP Object Point

EI External Input PaaS Platform-as-a-Service

EIF External Interface File PC Personal Computer

EO External Output RDS Relational Database Service

EQ External Inquiry S3 Simple Storage Service

FFP Full Function Point SaaS Software-as-a-Service

xxi
GLOSSARY

SDK Software Development Kit VAF Value Adjustment Factor

SLOC Source Line of Code


WO Web Object
UCP Use Case Point
WP Web Point
UFP Unadjusted Function Point

xxii
Chapter 1

Introduction

“If you can’t measure it, you can’t manage it”

∼ Tom DeMarco paraphrasing Lord Kelvin.

Cloud computing has recently been the focus of much excitement in the

IT1 community, seen by some as the next platform shift (Erdogmus, 2009), with
impact on enterprise computing that could compare to the change from main-

frames to minicomputers, or from desktop PCs to networked systems. Major


vendors are taking on Cloud computing as a crucial strategy, governments are

discussing national agendas for the coming shift, and start-ups are growing to fill

niches (Mudge, 2010).

While some software is written from scratch specially for the Cloud, many

organizations also wish to migrate existing applications and systems to a Cloud


platform. Such a migration exercise to a Cloud platform is not easy: some

changes need to be made to deal with differences in software environment, such


1
For all abbreviations see the glossary on page xxii.

1
1. INTRODUCTION

as programming model and data storage APIs, as well as varying performance

qualities. An indication of how much effort is anticipated for the migration process
is important for project management, particularly project scheduling and budget

planning. This stimulating context strongly motivates us1 to investigate further


on critical factors of the effort required for the migration process to the Cloud.

This introductory chapter is structured as follows: The background of Cloud


computing and the motivation of this research will be elaborated in Section 1.1.

Some common terms that are used through this thesis will be clarified in Section

1.2. A broad overview of our work will be presented in Sections 1.3 and 1.4.
Section 1.5 will introduce our research methodology. A general layout of how this
thesis is structured will be provided in Section 1.6.

1.1 Background and Motivation

Cloud computing is an attractive environment to enterprises for its distinguished


features and various offerings. As a result, many organizations have expressed
their interest in deploying their computing systems in the Cloud to take advantage

of the potential benefits it offers.

1.1.1 Cloud Computing and Its Offerings

Since its emergence over the last decade, Cloud computing has been well rec-
ognized for its abilities to provide virtualized resources and services, such as

infrastructure, platform, and software (Vaquero et al., 2009a; Armbrust et al.,


1
In this thesis, I use “we” to acknowledge the contributions of my colleagues. However, I
am the main author of all publications that make up the content of this thesis.

2
1.1 Background and Motivation

2009). It is commonly known as a computing paradigm that delivers resources

and services to computers over the Internet.

One of the attractions for an organization using Cloud resources, rather than

those in an enterprise-scale data center, is that it can enjoy cost savings through

larger economies of scale, since the costs of hardware, power, buildings and admin-
istrative support are typically about 5 times lower for internet-scale systems than

for enterprise-scale ones (SalesForce, 2012; Aggarwal & McCabe, 2009). Even
more significant to a rapidly-growing business is the elasticity of costs; instead of

the up-front purchase of an overprovisioned system, one can pay a Cloud provider
ongoing fees that are low at first, and that smoothly increase as and when the
system needs more capacity. Therefore, Cloud users are neither required to plan

for provisioning nor tied to huge up-front commitment on hardware resources and
infrastructures. This enables companies to start small and acquire more resources
only when needed on short-term basis (e.g. hourly processors and daily storage),
and reward conservation by releasing computing machines and storage when they

are no longer required (Armbrust et al., 2009).

For established businesses, there is the potential to use the Cloud as an ad-

ditional resource (alongside existing data centers) to deal with bursts of load,
perhaps seasonal, or due to intermittent activities such as stress testing. Here

the Cloud allows the client to delay the large commitment of funds needed to
scale-up the hardware. Many applications are not extensively used all the time,

but more often than not, they are under-utilized. In other words, the resource

usage pattern is not stable over time. There are times when resource usage stays

idle, while there are other times (peak times) when it is heavily used. In or-
der to accommodate those peak-time usages, enterprises have no better choice

3
1. INTRODUCTION

but to invest a huge amount of resources to be ready for peak periods, which at

other times, stays idle and wasted. Cloud providers address this issue by their
on-demand resource offers. Cloud consumers pay only for the resources they use

during an average period, while over peak times they can obtain additional re-
sources on demand. For example, online shopping systems are of normal use over

the year, which may accumulate up to around 2-3 months worth of resource usage

in total. With the Cloud, they will have to pay for resources of that 2-3 month
actual usage only, rather than overpaying for a whole year if applications were
managed in house. During the Christmas period, resource demand may increase
more than 10 times than normal and can be accommodated by Cloud providers
promptly. After peak times, resources are released back to the providers, and

charges drop back to normal because of the pay-per-use pricing model of the
Cloud (Armbrust et al., 2009).

Cloud Services:

Cloud computing has been seen to offer a wide variety of services, such as
application services, storage services, compute services, and database services

(Amazon, 2009; Google, 2009; Microsoft, 2009; Agrawal et al., 2009; Armbrust
et al., 2009; Buyya et al., 2008; Chang et al., 2006; Chappell, 2008; Ghemawat

et al., 2003; Palankar et al., 2008). These services are accommodated by different

Cloud technologies. Understanding the Cloud technology stacks and their inter-
relations enables the Cloud community to provide better solutions, portals and

gateways for the Cloud, which facilitate the adoption of this emerging computing
paradigm. Hence, there exist several attempts to create a reference model of

Cloud computing (Ji et al., 2009; Mikkilineni & Sarathy, 2009; Youseff et al., 2008;

4
1.1 Background and Motivation

Lenk et al., 2009) to classify Cloud technologies and services into different layers.

Different proposals tackle different aspects of the Cloud ontology; however, they
all use the same basic model with three main common layers: Infrastructure-as-

a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS)


(Figure 1.1).

Figure 1.1: Three main Cloud services layers

The three layers in Figure 1.1 are described as follows:

• IaaS: The infrastructure layer provides basic physical resources and data

storage with virtualization services. The physical units are hardware re-

sources, such as CPU, memory, storage and network devices. Virtualization

software are required for this layer to provide Cloud users with a highly
scalable and manageable basic environment. An example of this layer is

Amazon EC2 and Amazon S3 (Amazon, 2009).

• PaaS: The platform layer works independently with physical resources from

the infrastructure layer. This increases the scalability of the Cloud. This

layer includes components such as:

5
1. INTRODUCTION

– Kernel - managing the infrastructure resources;

– Distributed file system - a network file system with data distributed

across multiple physical nodes (e.g. Google File System, Hadoop Dis-
tributed File System);

– Cloud IO - facilitating data exchange with various kinds of data pro-


tocols;

– Computing driver and engine - providing domain-specific utilities;

– Management and UI interface - providing management console and

interface to the cloud.

Examples of this layer are Google App Engine and Microsoft Azure (Google,

2009; Microsoft, 2009).

• SaaS: The application layer hosts business domain specific applications,


which can be system applications that provide services to other applications,
or user applications that aim at the Cloud end-users. The application layer
is the most visible interface to the Cloud for end-users. The applications are

deployed at the Cloud providers’ computing infrastructure, and the users


access the applications through web-portals. There are usually fees charged
for usage.

Cloud Vendors:
There are a few major Cloud vendors in the market, such as Amazon, Mi-

crosoft, Google, Salesforce, Rackspace, or GoGrid. Each vendor offers several


services, ranging from IaaS, PaaS to SaaS, from compute to storage, from rela-

tional to NoSQL databases. In this section, we will describe three main providers

6
1.1 Background and Motivation

- Amazon, Microsoft, and Google (Figure 1.2) (Amazon, 2009; Microsoft, 2009;

Google, 2009).

Figure 1.2: Major Cloud service providers

• Amazon: Amazon offers an IaaS solution for Cloud computing called Ama-
zon Web Services (AWS) (Amazon, 2009). AWS provides a range of Cloud-

based services, including compute services (e.g., Amazon Elastic Compute


Cloud (EC2), Auto Scaling), content delivery services (e.g., Amazon Cloud-
Front), database services (e.g., Amazon Relational Database Service (RDS),
Amazon DynamoDB, Amazon SimpleDB), and storage services (e.g., Ama-
zon Simple Storage Service (S3), Amazon Elastic Block Store (EBS)).

Amazon EC2 is a central service of AWS. It offers a virtual computing envi-

ronment, where users can launch instances with various operating systems
of their choice. The users are given complete freedom to manage their ap-

plication environment running on EC2 instances. The highlighted feature

of EC2 is the high elasticity where computing capacity (i.e., number of

instances) can be increased or decreased on-demand within minutes.

Database services from AWS are also well-known to Cloud users. Ama-

zon RDS provides a full-featured relational database (MySQL or Oracle

7
1. INTRODUCTION

databases) running on an Amazon RDS database instance. Connections

to the databases can be established in the traditional manner with any


database tools or programming languages. On the other hand, Amazon

SimpleDB is a non-relational database optimized for high availability and


flexibility. Data is automatically indexed and geographically distributed

to enable high availability and data durability. Without any database ad-

ministration burden, users can fully focus on value-added application de-


velopment. Last, but not least, Amazon S3 is also worth discussing for its
offering as a highly scalable storage service over the Internet. This service
enables users to store and retrieve data of any size, at any time, from any
where on the web.

• Microsoft: Microsoft provides Windows Azure Platform as a PaaS solution

for Cloud computing, which is hosted in Microsoft data centers (Microsoft,


2009). Windows Azure Platform facilitates applications to run on Microsoft
data centers, and provides a Software Development Kit (SDK) to develop

these applications. The applications running on Windows Azure Platform

can be delivered as a SaaS for its flexibility and scalability features. Devel-
oping applications for Windows Azure environment is much like developing
programs for standard Windows applications on a local environment. New

developers to the system are supported through templates provided for

Azure applications as part of the Azure SDK Visual Studio 2008.

Cloud services provided by Microsoft also includes SQL Azure Database, a

highly available and scalable cloud-based relational database service. SQL


Azure is built on SQL Server technologies; hence, it provides a full-featured

8
1.1 Background and Motivation

relational database and can be synchronised with on-premises SQL Server

databases.

• Google: Similar to Microsoft, Google also provides a PaaS solution called

Google App Engine (GAE), hosted in Google’s existing infrastructure (Google,


2009). Google provides App Engine SDK for two languages: Java and

Python. This SDK is available as a plugin for Eclipse, the most commonly
used Integrated Development Environment (IDE) for Java. Any of the Java

common features are supported, as long as there is no interference with the


sandbox limitation. Most GAE services can be accessed using standard
Java APIs. Python is also accommodated in a similar manner to Java.

Cloud services provided by Google also includes Datastore, a schemaless


distributed data storage service that includes a powerful query engine and
support for transactions. The Datastore is a non-relational database, opti-
mized for read speed. Development and maintenance on the Datastore is
done via Java or Python APIs.

Cloud Pricing Models:


Cloud vendors charge users on normal usage, including data storage, com-

putational machine (per hour) and data transfer into and out of the cloud. In

addition, charges may occur for additional administrative tools and services such

as resource analysis and monitoring (e.g. CloudWatch from Amazon). Different


changes are also applied for different types of Cloud systems: SaaS charges its
users for subscription fees; IaaS may charge developers for software licenses, or a

group license if the software is installed in multiple instances.

9
1. INTRODUCTION

Tables 1.1 and 1.2 show a comparative pricing model for some main charges

in computational and storage cost of the three major Cloud providers: Amazon,
Microsoft and Google (as of Jannuary 2012).

Cloud vendor Service Charge


Amazon Standard Linux Instances
- Small: $0.085/hr, Large: $0.34/hr, Extra Large: $0.68/hr
Standard Windows Instances
- Small: $0.12/hr, Large: $0.48/hr, Extra Large: $0.96/hr
Microsoft Win- Compute Instances - Small: $0.12/hr, Medium: $0.24/hr,
dows Azure Large: $0.48/hr, Extra Large: $0.96/hr
Google App En- CPU Time: $0.10/CPU hr
gine

Table 1.1: Pricing model comparision: Service charge (Amazon, 2009; Microsoft,
2009; Google, 2009)

This pricing model of Cloud computing demonstrates its pay-as-you-go man-


ner, which is more attractive than normal web hosting services, where users are
charged a fixed monthly or yearly fee. Moreover, additional costs are very likely

to occur when using local servers, such as operational costs or upgrade and main-

tenance costs. Operational costs include power and electricity cost, premises
rental cost, administration staff cost, networking infrastructure cost, and so on.

Upgrade and maintenance costs include new hardware and middleware costs, new
software costs, new license costs and additional labor costs for installation and

configuration.

When a Cloud is made accessible to the public via its pay-per-use pricing mod-

els, it is known as Public Cloud; whereas internal datacenters of an organization

that are not available to the public are referred to as Private Cloud (Armbrust
et al., 2009).

10
1.1 Background and Motivation

Cloud Storage Cost Data Transfer Cost I/O


vendor Cost
- 50TB: $0.125/GB/mth - In: $0.10/GB
- Out:
No charge
AWS S3 - Over 5000TB: + free first 1GB/mth
for I/O
$0.055/GB/mth + US&EU: $0.15/GB
+ AP: $0.19/GB
AWS - $0.10/GB/mth $0.10/1M
- N/A
EBS - Snapshots: $0.15/GB/mth requests
AWS - $0.10/GB/mth (plus hourly $0.10/1M
- Same as AWS S3
RDS CPU cost) requests
- Backup: $0.15/GB/mth
- In:
+ US&EU: $0.10/GB
Azure + AP: $0.30/GB $0.01/10K
- $0.15/GB/mth
Storage - Out: requests
+ US&EU: $0.15/GB
+ AP: $0.45/GB
Azure - Up to 1GB: $10/mth - Same as Azure No charge
SQL - Up to 50GB: $500/mth Storage for I/O
GAE - First 1GB: free - In: $0.10/GB No charge
Blobstore - $0.15/GB/mth - Out: $0.12/GB for I/O

Table 1.2: Pricing model comparision: Storage Cost (Amazon, 2009; Microsoft,
2009; Google, 2009)

Summary:

The key to success of Cloud computing is that it provides a win-win approach

for both providers and users. Cloud computing offers scalable computing re-
sources available on demand without up-front commitment for its users, freeing

users from the burden of software/hardware installation or configuration, and

costing less than a medium-sized datacenter, while still generating a good profit

for Cloud providers (Armbrust et al., 2009).

11
1. INTRODUCTION

1.1.2 The Urge to Migrate to the Cloud

The attractive offerings of Cloud computing, as discussed in the previous sec-

tion, have encouraged many organizations to seriously consider utilizing Cloud


computing solutions for their IT needs. Figure 1.3 below, from Google Trends

(Google, 2011), reflects that, with a graph showing an increasing interest in Cloud

computing over the last several years, starting from late 2007 till present. The
top graph is the search volume index in Google search engine for “Cloud Comput-

ing”, which represents how many searches have been done for this term, relative
to the total number of searches done on Google over time. The bottom graph
shows its news reference volume over years, which represents the number of times
it appeared in Google News stories.

Figure 1.3: Cloud Computing - Google Trends

As has been discussed in many articles, papers, case studies, and blogs, there
are many ways one can use Cloud services (Kundra, 2010; Khajeh-Hosseini et al.,

2010a; Hajjat et al., 2010; Ward et al., 2010). For example, many of the most
famous stories of Cloud computing have been about startups with explosive

growth, where the organization wrote or rewrote software specially to run in

12
1.1 Background and Motivation

the Cloud (Microsoft, 2012; Amazon, 2011). One can also take advantage of

already-deployed Cloud applications, or Cloud-enabled systems in the form of


SaaS, such as Google Docs, or some online Customer Relationship Management

(CRM) applications. However, there are cases where an organization has exist-
ing application software and wants to run this on a Cloud platform. Instead of

a complete rewrite, one could say that they are “migrating” the software from a

traditional platform such as .NET or J2EE, to a Cloud-based one such as Amazon


EC2 or Microsoft Windows Azure.

The migration case is quite practical and popular, since it is likely that cur-
rently operating businesses already have their own IT systems developed and in
use, whereas Cloud computing is relatively new. A migration project to the Cloud

can be carried out in various forms, as described in the illustrative case studies
at the Federal, state and local government levels of the United States (Kundra,
2010). For example, since 2009, the Department of Energy has been exploring
cost and energy efficiencies from leveraging Cloud computing, such as, deploying

mailboxes on Google Federal Premier Apps, Google Docs and Google Sites, as

well as evaluating the use of Amazon EC2 to handle peak usage periods. This
migration spreads over a wide range of migration activities, from SaaS to IaaS
Cloud. Other case studies, such as the City of Miami, Florida, described their

decision to use Windows Azure platform for on-demand hosting in Microsoft data

centers. This type of migration is different from the case of the Department of

Energy mentioned previously, since no installation or environment set up is re-


quired for the PaaS Cloud, but certain modifications must be done to align the
migrated systems with the Cloud offerings.

Many papers have also illustrated case studies where enterprises are keen on

13
1. INTRODUCTION

migrating their IT systems to the Cloud. Khajeh-Hosseini et al. (2010a) presents

a case study of a UK-based organization that provides IT solutions for the Oil and
Gas industry. This organization was considering deploying one of their primary

service offerings to Amazon EC2 because they preferred no modifications to their


application code. This migration was analyzed to be more cost effective for the

organization, although only infrastructure costs were considered.

Hajjat et al. (2010) describes the migration process of an Enterprise Resource

Planning application used in a large university with tens of thousands of students,


and several thousand faculty and staff. The application was planned to migrate
to Windows Azure Cloud. The migration strategy considered various aspects of

the systems, such as databases and networking.

These are some representative examples to illustrate how enterprises are en-
couraged to move to the Cloud. More detailed discussion will be presented in
Chapter 2.

Besides that, many popular practitioners’ blogs (Hamilton, 2011; Linthicum,


2011; Chappell, 2011) also discuss different migration scenarios to the Cloud.

Generally, the status of IT systems ported to the Cloud is quite active and in-
creasingly popular.

1.1.3 The Essentials of Effort Estimation

Although the migration process is a one-off task, it is not automatic, as can be seen

from the above migration examples. Because some installations in the IaaS Cloud
must be done or modifications to the existing systems are unavoidable, and the

amount of effort required could be significant. This effort is due to discrepancies

14
1.1 Background and Motivation

between the environment provided by a Cloud platform, and that in a traditional

platform (Verma et al., 2011). There are often differences in the version of various
infrastructure components, the programming models, the libraries available, and

even the semantics of data access; for example, Cloud platforms typically provide
eventual consistency rather than transactional guarantees. All these extra tasks

of the migration process to the Cloud may not be as easy and straightforward as

one might think.


As effort is required for undertaking those tasks to migrate an IT system to
the Cloud, and the amount of effort required is diverse, early effort estimation for
a migration project to a Cloud platform is essential for its project management,
particularly project scheduling and budget planning.

Migration costs also contribute towards the Overhead Cost component (Figure
1.4) of the cost-benefit analysis (Carriere et al., 2010; de Assuncao et al., 2009)
and decision making process on whether it is worthwhile to migrate a system to

the Cloud.
Figure 1.4 illustrates the analysis of cost and benefits in two options: (1)
migrating an existing application to the Cloud, and (2) keeping the application

on premise. If one decides to go with option (1), one has to pay a total cost of:
application development cost, migration cost (or overhead cost), and on-going

cost paid to the Cloud providers. Otherwise, keeping the application in-house
incurs costs of application development (which is similar to option (1)), and

operational and maintenance costs (Carriere et al., 2010; de Assuncao et al.,

2009).

Weighing up the two options, if:

Overhead Cost + Pay-as-you-go Cost < Operational and Maintenance Cost

15
1. INTRODUCTION

Figure 1.4: Cost and benefit of migrating existing applications into the Cloud

then migrating the application to the Cloud is a wise move. Otherwise, keep-
ing the application in house is more beneficial.

The Overhead Cost component plays an important role in this analysis, and
it is essentially the cost made up from the migration effort. Hence, early effort

estimation is a vital part of this process.

1.2 Definitions

There exists related work on migrating a system to the Cloud; however, the notion

of Cloud migration can still vary, such as in (Suen et al., 2011), it refers to the live
migration of virtual machine images between different Cloud providers, as well

as between private and public Cloud offerings. Hence, it is worth to clarify the

meaning of Cloud migration concepts as well as some other common terms used

throughout this thesis. These definitions were defined based on this research’s

16
1.2 Definitions

activities.

Definition 1 Cloud migration


Cloud migration refers to the activities of moving an IT system or an appli-

cation from local data centers to the Cloud, without sacrificing any performance

attributes. The system can be migrated to the Cloud partially (i.e., only a part
of the system is moved to the Cloud, the rest is still hosted in-house, and the two

parts must working seamlessly together), or as a whole (i.e., the whole system is
ported to the Cloud). The former is called a partial migration, and the latter is
called a full migration.

Definition 2 Cloud migration project


Cloud migration project refers to the process of migrating a system to the

Cloud.

Definition 3 Migrating system/application


Migrating system/application refers to the system/application to be migrated

to the Cloud.

Definition 4 Migration task

A migration task is a defined migration activity within a migration project.

For example, when migrating a Microsoft SQL Server database to SQL Azure,

moving the data is a migration task, and any changes to the database schema are
also called migration tasks.

Definition 5 Migration cost and migration effort

Migration cost and migration effort are used interchangeably in this thesis.
They both refer to the amount of effort spent on migration activities.

17
1. INTRODUCTION

Definition 6 Overhead cost

Overhead cost is used in our analysis of cost and benefit of migrating a sys-
tem to the Cloud. The overhead cost refers to the cost of the actual migration

activities. It is equivalent to migration cost or migration effort.

Those are some common concepts that will be used regularly in this thesis.

There will be more concepts clarified in later chapters where relevant.

1.3 Research Problem and Aims

The decision to migrate applications to Cloud platforms requires various factors,


one of which is understanding of its cost implication (in terms of the amount of

effort required). This is challenging because:

• Applications vary in many dimensions, such as size, complexity, function-


ality, and requirements.

• Migration projects to the Cloud vary in types of Cloud (IaaS or PaaS


Clouds), migration requirements (migration the application to the Cloud

as a whole or just partial), and so on.

• Cloud computing is relatively new and different from the traditional soft-
ware engineering paradigm in many aspects, such as characteristics, pricing

models, and security aspects. Porting an application from a traditional

platform to the Cloud may require changes to the application itself or to


the Cloud environment.

To the best of our knowledge, at the time of writing, no effort estimation ap-

proaches have been specifically designed for Cloud migration projects. Existing

18
1.4 Research Scope

traditional effort estimation approaches for software development are not appli-

cable in this context, because the measures employed as predictors in traditional


approaches do not cover all typical features of a migration project to the Cloud.

These features will be discussed further in Chapters 4 and 5.

The overall objective of this thesis, which is to identify cost implications of

migration to Cloud, requires a clear understanding of how migration projects take

place. This strongly motivates us to, firstly, understand and evaluate the critical
cost factors of the migration process, in order to estimate how much effort would

be needed. Amongst those factors, size measurement of the migration project is


considered one of the most important indicators of effort estimation. Hence, the
second aim of our research is to build a size estimation model, which estimates

how large a migration project to the Cloud is, and which will serve as a basic
indicator for effort estimation approaches.

The specific research questions can be identified as:

• RQ1: What activities are needed to migrate a software system to the Cloud?

• RQ2: How can these activities be classified?

• RQ3: What are the cost implications (in terms of staff effort) of those tasks?

1.4 Research Scope

The focus of this thesis is constrained by the following issues:

• It is important to identify the boundaries of a migration project to the


Cloud. A migration project to Cloud starts with an existing application

19
1. INTRODUCTION

or system, either completely in-house, or partially in-house and partially in

the Cloud. The project ends with the same application or system, either
completely or partially migrated to the Cloud.

• In a migration process, no new functionality is added, and performance


must be preserved (or improved without much tuning). The focus of this

thesis is on actual migration activities to bring an in-house system to the


Cloud. Therefore, our study does not consider any functional development

tasks to add more functionality to the system or maintenance tasks after

the migration. Having said that, some migration activities may involve code
modification to adapt the system to the new environment without adding
more functionality.

• Our study focuses on the migration effort to the Cloud from the consumer’s
point of view; hence, only migration activities carried out by Cloud users
are taken into consideration. In a migration project to an SaaS Cloud,

consumers only need to upload their databases in a certain format to the


SaaS server and the migration process will be handled by the SaaS providers,

such as migrating mailboxes and email accounts to SaaS email providers.


SaaS consumers are free from software management responsibilities, which,

as an obvious trade-off, restricts their flexibility and control over the systems
in the Cloud. Hence, SaaS is deliberately removed from the scope of our

work. On the other hand, migration projects to PaaS and IaaS clouds

are sole responsibility of Cloud consumers. Therefore, the scope of our

work in this thesis is limited to migration projects to PaaS and IaaS Cloud
platforms, but not SaaS Clouds.

20
1.5 Research Approach

• The migration is between two data centers only (typically, one in-house

and one in-Cloud). We assume that migration projects are directional (i.e.
components are moved from local to remote data centers in the Cloud). In

the case where two or more data centers are involved, each pair of data
centers will be assumed to form a separate migration project.

• We assume that the Cloud target has already been selected. We only focus
on the migration process itself; hence, the decision on which Cloud platform

to choose is out of the scope of this thesis. Having said that, applying our
study to each Cloud platform could assist this decision.

The above presented items form the scope and assumptions of this research.

1.5 Research Approach

A thorough understanding of different aspects of a Cloud migration process en-


ables us to identify its cost implications. The following table (Table 1.3) indicates

the steps we take to tackle this issue.

Steps Research Tasks

1 Identify influential cost factors


2 Derive a taxonomy of migration tasks
3 Develop Cloud Migration Point (CMP) model
4 Conduct a survey to collect data on Cloud migration effort
5 Empirically validate the CMP measurement

Table 1.3: Outline of the research approach

Actual cost factors on migration effort to the Cloud need to be identified


in Step (1), since this type of project involved tasks that are different from a

21
1. INTRODUCTION

traditional software development project. We address this by reviewing various

migration case studies in the literature and practitioners’ blogs, as well as con-
ducting a series of migration exercises of different types which will be discussed

further in Chapter 4.
From this exploration, a taxonomy of migration tasks is extracted in Step (2).

A record of the required cost (in terms of effort) is carefully tracked, together

with a note about which tasks require more effort than others.
There are many influential cost factors in Cloud migration effort, amongst
which, size measurement is seen as one of the most significant factors of effort
estimation. Traditional size measurements, such as: Source Line of Code (SLOC),
Function Points (FP) and its extensions, are not applicable in the context of

migration to the Cloud. In Step (3), a Function-Point-like and Cloud-specific


metric, called Cloud Migration Point (CMP), is developed to measure the size of
a Cloud migration project, which can serve as a basis for Cloud migration effort

estimation.
The validation in Step (5) is to ensure that CMP can be a reliable indicator
of effort estimation for Cloud migration projects. Data for this validation process

is not publicly available; hence, we conduct a survey in Step (4) to facilitate the
validation process.

1.6 Organisation of the Thesis

This thesis is structured as follows:

• Chapter 2 provides an overview of the related work in the literature, includ-


ing other research on application migration to the Cloud, and a review of

22
1.6 Organisation of the Thesis

different estimation approaches and size measurement methods (i.e., Source

Line of Code, Function Point and its extensions).

• Chapter 3 describes the methodology that we apply in this research. Each

step of the research process will be elaborated and mapped with each com-
ponent of this thesis.

• Chapter 4 outlines a taxonomy of migration tasks to the Cloud. This tax-


onomy covers possible migration tasks that any migration projects to the

Cloud might encounter. The purpose of this taxonomy is to enhance our


understanding of the migration process to the Cloud, as well as to enable
us to identify the relevant cost factors.

• Chapter 5 tightens our focus on the most dominant indicator of effort esti-
mation - size measurement. This chapter describes our CMP model, built
from recasting a well-known software size estimation model called Func-
tion Point (FP) into the context of cloud migration. We adopt the three-

phased approach of the FP model to estimating size of individuals com-


ponents involved in a migration project. In particular, we focus on Cloud-

relevant components of the migrated systems, including connection changes,


database migration, code modification, and installation and configuration

for the new environment in the Cloud. For each component, we perform an
estimation by identifying relevant activities that contribute to the overall

effort required for that component. Finally, we aggregate all individual es-

timations into a single CMP value by calculating their weighted sum. The

weighted sum CMP provides a measure of how large the migration project
is, and it can be used as an indicator for Cloud migration effort estimation.

23
1. INTRODUCTION

• Chapter 6 validates the CMP model empirically. The empirical validation

shows that our metric is practically useful as a basis for effort estimation
under a defined set of assumptions. We conducted a survey with Cloud

migration projects of various scales from small to large, and cross-validate


these projects to estimate the performance of our model. Data from the

survey has allowed us to calibrate the CMP model to increase its validity

externally. In this chapter, we also state a list of assumptions made for de-
veloping the model, and test their plausibility using the available data. This
list of assumptions imply the high complexity and difficulty of validating
the metric.

• Chapter 7 concludes the thesis by providing a research summary, research

contributions, and limitations. Possible future research directions based on


this thesis are also outlined in this chapter.

24
Chapter 2

Literature Review

“Not everything that counts can be counted, and not everything that
can be counted counts.”

∼ William Bruce Cameron.

Effort estimation and size measurement of software projects have been inter-

esting and challenging areas in traditional software engineering. There has been a
lot of related work in the traditional context. However, none has been considered

for the new settings of Cloud computing. The aim of this literature review is to

examine existing research related to Cloud migration topic, as well as effort es-
timation and size measurement metrics, with consideration of their applicability

to Cloud computing.

The following sections cover a number of issues important for this thesis: Sec-
tion 2.1 reviews other research related to Cloud migration topic, with regard to

their concerns of migration (i.e., risk management, cost saving, and performance).

Section 2.2 reviews effort estimation approaches in traditional software engineer-

25
2. LITERATURE REVIEW

ing and Section 2.3 explores existing size measurement metrics, including Source

Line of Code, Function Points and its various extensions. This section will also
states the requirements for a sizing metric for Cloud migration, and explain why

none of the existing approaches meet these requirements; hence, a new metric is
in need. Lastly, Section 2.4 summarizes and concludes this chapter.

2.1 Cloud Migration Solutions

There have been many publications and research dealing with various aspects of
Cloud computing, such as Cloud computing architectures, Security and Privacy in

Clouds, Monitoring, Management and Maintenance of Clouds, and Performance


Modelling for Clouds; but not until 2011 have we seen many papers concerning
migration to the Cloud. This topic has been of interest both to Cloud practition-
ers and to researchers, although their concerns for migration are quite diverse.
This section reviews existing work on this topic and distinguishes our concern

from others. The sub-sections show different streams of the related work in the

literature.

2.1.1 Decision Making Support

Although there are many benefits associated with the Cloud, whether it is worth

moving an existing working system to the Cloud is still an open question for
enterprises. As cost and benefit analysis is an important tool for IT managers

to evaluate whether the benefits outweigh the costs of an IT investment, many

researches have attempted to help decision-makers by identifying and weighing

benefits versus issues of Cloud migration.

26
2.1 Cloud Migration Solutions

Khajeh-Hosseini et al. (2010a) reported a case study of migrating an enter-

prise IT system in the oil and gas industry from a local data center to Amazon
EC2. Their findings indicate that there are significant risks associated with the

organisational dimension, such as decreasing job satisfaction of staff since they


have to depend on third party Cloud providers, or downsizing IT support depart-

ments because Cloud providers will be responsible for their daily tasks, and so

on.

They extended their work (Khajeh-Hosseini et al., 2011) to introduce two


tools to support decision making during the migration process. These tools can

assist decision makers by producing cost estimates of using public IaaS Clouds,
as well as outlining benefits and risks of using IaaS Clouds from an enterprise
perspective. They also explicitly stated that the limitation of their work is only

focusing on infrastructure cost, and ignoring the actual migration work, which
could be significant.

Mastroeni & Naldi (2011) also assessed the risks involved in the decision to
migration to the Cloud storage against its alternative to buy the storage devices
and facilities, based on different decision variables; while Yam et al. (2011) and

Hajjat et al. (2010) addressed this from the uncertainty angle, including security

and business continuity concerns.

Another important criterion that affects the decision to migrate to the Cloud
is cost savings. It is essential to understand how cost effective it can be to migrate

to the Cloud, as opposed to staying in house. The work of Hajjat et al. (2010)

addressed this by proposing a model of a hybrid migration approach, in which a

part of the system is migrated to the Cloud, while the other part stays in house.
This model takes into consideration the cost savings that may result from the

27
2. LITERATURE REVIEW

migration. This cost is essentially the Internet communication cost. They briefly

mentioned that the one-time cost of the actual migration process can also be
easily incorporated in the model; however, there was no further discussion of how

this one-time cost can be estimated.

Communication cost is the cost related aspect in the framework presented by


Hao et al. (2009). This framework was developed to facilitate service migration to

the Cloud, and a cost model (i.e., communication cost) and the decision algorithm

were designed to evaluate the tradeoffs on service selection and migration. Apart
from communication cost, reconfiguration cost also caught the attention of some

researchers. Verma et al. (2011) designed a model, called CosMig, to model the
cost of frequently reconfiguring a Cloud infrastructure and evaluate its impact on
application performance. These factors are considered to be the cost of using the

Cloud.

Li et al. (2011a) and some other researchers (Ye et al., 2011; Ho et al., 2011;

Mastroeni & Naldi, 2011) identified cost savings from the perspective of Cloud
price and server bandwidth. They compare the price of different Cloud providers,
as well as the cost difference between using the Cloud and staying in house. This

cost is also the cost of using the Cloud.

Klems et al. (2009) proposed a framework to compute the value of cloud by


estimating Cloud computing costs and comparing these costs to conventional IT
solutions, such as hosted service or Grid computing service. Their work defines

cost as the combination of a number of direct costs (e.g. facility, energy, cables and

servers) and indirect costs (e.g. cost from failing to meet business objectives).

However, the list of cost components in this framework is incomplete for both
direct and indirect costs. Furthermore, it does not indicate how these costs can

28
2.1 Cloud Migration Solutions

be computed, or how components in the framework link with one another to

determine the estimated cost of Cloud computing.

The system proposed by de Assuncao et al. (2009) provides various scheduling

strategies to augment the capacity of an organisation’s local cluster with Cloud

resources, and evaluates the trade-off between performance improvement and


monetary cost spent for using the Clouds for each proposed strategy. This work

only considers a portion of Cloud cost and focuses specifically on response time
benefits. This is sufficient to analyse costs and benefits amongst the proposed
scheduling strategies of using Clouds, but cannot be applied to a wider scope of

general application development for Clouds.

Conclusion:

The related work to support decision making on whether to migrate to the


Cloud has mainly focused on security and risks. In addition, some work also

looks at the cost of migrating a system to the Cloud. However, the research is
not related to the cost associated with the migration process; it refers to the cost
of using the Cloud assuming that the migration has been done. This differentiates
the focus of our work from others, since our work is concerned with the cost of

the actual migration process.

2.1.2 Experience Reports

Apart from decision making support, a few researchers have reported on their

experiences of migrating a system to the Cloud. Babar & Chauhan (2011) and
Chauhan & Babar (2011) reported their experiences and observations of migrating

Hackystat, an Open Source Software Product to the Cloud. The focus of this

29
2. LITERATURE REVIEW

migration exercise is on the architecture and design decisions of Hackystat. Their

aim is to provide some guidance for adapting service-based system architecture


to the Cloud.

On the other hand, the experience presented by Thakar & Szalay (2010) dis-

cussed migrating the Sloan Digital Sky Survey science archive, a scientific astro-
nomical database to the Cloud. Their exercise resulted in a strong finding that

it is “very frustrating or impossible” to migrate a database, either large of small,

to the Cloud (such as Amazon EC2 or Microsoft SQL Azure) without changing
either its schema or its settings. Our finding, which will be discussed later in
Chapters 4 and 5, strongly agrees with this observation.

Conclusion:

Current research has attempted to contribute to the knowledge of a migra-


tion process to the Cloud, as there are currently no guidelines or standards on

this topic. However, researchers have only reported preliminary results of their
experiences. It is still necessary to have a guideline for the migration process, in
order to enable practitioners to better plan their own migration process.

2.1.3 Cloud Migration Concerns

This section reviews and categorizes several issues concerning the migration pro-
cess that have been raised in some related research.

• Data Migration

Data transfer between local data centers and the Cloud can affect the overall

application performance significantly. Many researchers have attempted to

30
2.1 Cloud Migration Solutions

address this issue during the migration, for example, Piao & Yan (2010)

proposed a virtual machine placement and a migration approach that can


minimize the total data transfer time consumption; hence, it can help to

optimize the overall application performance.

Zhang et al. (2010) took a closer look into application specific workload

characteristics, deadlines, and I/O profiles in order to build an adaptive


data migration model that can improve the overall system performance

and resource utilization while meeting workload deadlines. On another


aspect of data migration, Thakar & Szalay (2010) emphasized that for all
database sizes, extra work is likely required for changing database schemas

and settings to fit well into the Cloud environment.

Live database migration without service interruption has been proposed by


Elmore et al. (2011) with their technique Zephyr. This technique utilizes on-
demand pull and asynchronous push, and requires minimal synchronization

to achieve their stated goal.

• Performance

The Cloud environment has imposed many constraints and challenges to

the migration of legacy systems to the Cloud (Frey & Hasselbring, 2011;

Mohagheghi & Saether, 2011). Hence, there exists research on configuration


during the migration process intended overcome the constraints without
sacrificing any performance variables.

Venugopal et al. (2011) stated that enterprises are sometimes required to

31
2. LITERATURE REVIEW

re-engineer their applications to utilize the linear scalability of the Cloud.

They proposed a methodology to smoothly migrate and configure the sys-


tem to the Cloud without initial re-engineering effort. Jayasinghe et al.

(2011), on the other hand, found that the configuration for some environ-
ments just does not work for other Cloud environments. Hence, during

migration, reconfiguration and possible re-engineering are necessary.

Other performance issues have also been raised and discussed, such as:

networking or Internet communication (Hao et al., 2009; Hajjat et al., 2010),

and Cloud infrastructure configuration (Verma et al., 2011).

• Other Potential Concerns

Migration projects have been undertaken throughout the history of com-


puting as technologies have changed. Although specific considerations for
Cloud migration can be very different from other contexts, the general issues
encountered in other contexts could be relevant and informative.

Legacy Information System migration could encompass different migration


issues. Some issues are common to all software engineering projects (not

just migration projects), including target system development, testing, and


database model selection. Other issues that are specific to migration con-

cerns include target system database population (Bisbal et al., 1997, 1999).

Cetin et al. (2007) mentioned other concerns in legacy migration to Service-

Oriented Computing, including the need of providing a migration roadmap.

Smith (2007) shared this same view in his migration concerns, such as:
identification of specific components to migrate, recommendations on the

32
2.2 Effort Estimation in Traditional Software Engineering

ordering of migration efforts, and specific migration paths to follow.

Conclusion:

Some issues related to the migration process may result in extra cost and

require extra effort, such as: data and database migration, networking or Internet
communication, Cloud infrastructure configuration, or re-engineer the application

to the Cloud. It is also essential for any migration project (not just to the Cloud)
to have a roadmap to follow.

2.2 Effort Estimation in Traditional Software En-

gineering

Effort estimation is essential at the beginning of a new project. In this section,


effort estimation approaches in traditional software engineering are reviewed for
their applicability to the Cloud migration context. There is a diverse range of

effort estimation approaches in the literature of traditional software engineering.

They can be categorized into three general types: analogy, expert judgement, and
algorithmic models (Jorgensen & Shepperd, 2007; Boehm et al., 2000; Shepperd

& Schofield, 1997; Keung et al., 2008; Helmer, 1966; Baird, 1989; Banker et al.,
1991).

2.2.1 Analogy Approach

Effort estimation using analogy is the approach where a problem is solved using
knowledge derived from similar problems (Shepperd & Schofield, 1997; Keung

33
2. LITERATURE REVIEW

et al., 2008). It is argued that analogy approach is capable of handling poorly

understood domains because solutions are based upon what has actually hap-
pened. Even so, this approach is still not applicable for the Cloud context at this

stage because the range of completed migration projects is still limited, and it is
not obvious as to where and how similar projects can be identified.

2.2.2 Expert Judgement Approach

Expert judgement is another well-known approach for estimation (Jorgensen,


2004; Helmer, 1966; Baird, 1989). This approach captures knowledge, experi-
ences, and expertise of practitioners who are recognized as experts within a do-

main of interest, and derives estimates based on historical data that they are well
aware of, or past projects that they participated. Similar to the analogy-based
approach, because of the newly emergence of the Cloud, there is a lack of prac-

titioners who have experiences a broad range of migration types to the Cloud.
Nevertheless, this approach shows a great potential when the Cloud gets more
mature in the future.

One popular technique developed to capture expert judgement is Delphi tech-


nique. The Delphi technique (Helmer, 1966) is executed in two rounds. In the

first round, a group of experts are asked for their assessment on some matters in-

dividually, without knowledge of how other participants do. In the second round,
each participant is asked for their assessment again, but this time with knowledge

of how the others have answered in the first round. This technique is to narrow
the range of answers from the participants, pointing to a more reasonable middle

ground regarding the issue of interest.

34
2.2 Effort Estimation in Traditional Software Engineering

2.2.3 Algorithmic Model Approach

Another popular estimation approach is algorithmic models (Jorgensen & Shep-

perd, 2007; Boehm et al., 1995; Banker et al., 1991). This approach estimates
efforts using mathematical formulas to establish the relationship between depen-

dent and independent variables of the models, which are the estimated effort and
influential cost factors, respectively. This approach also required historical data

to develop the algorithmic model; however, the model itself is more generic than

the other two approaches, which makes model-based technique more suitable to
apply for a broader range of migration projects to the Cloud at this stage.

Amongst existing cost estimation models, the COCOMO (COnstructive COst


MOdel) II (Boehm et al., 1995) is one of the most popular model. COCOMO
II consists of three sub-models, namely Applications Composition, Early Design

and Post-Architecture, which can be combined in various ways to deal with the
current and likely future software practices. These sub-models use FPs and/or
LOCs for their sizing parameters. Size of a project is one of the key factors in

algorithmic models for the project’s effort estimation.

Conclusion:

This section has reviewed three popular approaches of traditional software en-

gineering effort estimation. Analogy approach requires a repository of historical

data on similar Cloud migration projects. This can be achieved when the field

of Cloud migration becomes more mature and data on migration projects can be
collected and stored in a repository for future use. Expert judgement approach
relies on practitioners’ expertise in the Cloud migration. This can be achieved

when there are many experts in the field. Algorithmic approach requires a math-

35
2. LITERATURE REVIEW

ematical formula to be developed with suitable parameters. This last approach

appears to be the most feasible direction to explore at this stage.

2.3 Software Size Estimation in Traditional Soft-

ware Engineering

The literature has showned that effort spent on a development project relies
significantly on the project’s complexity. A more complicated project would
typically require more effort on both development and maintenance. Software

size measurement is a conventional way to indicate a project’s complexity. It is


commonly found in a form of metrics to measure either software’s Lines of Code
or Function Points and its extended variants (Verner & Tate, 1992; Dolado, 2000;

Rosenberg, 1997; Finnie et al., 1997).

2.3.1 Source Lines of Code (SLOC)

SLOC is a traditional size measure that counts the number of lines in a software
product’s source code. SLOC is one of the prime measures which are used as

input into equations for effort estimation (Verner & Tate, 1992; Dolado, 2000;
Rosenberg, 1997). SLOC was popular for its simplicity and straightforwardness.

However, counting SLOC is only possible after the implementation phase when

source code is available, which makes SLOC not applicable for estimation in early
phase of the development cycle (Albrecht & Gaffney, 1983; Lai & Huang, 2003).

There are also more concerns on SLOC’s validity because of its high dependency
on the programming language and programmer’s skills and coding style (Ruhe

36
2.3 Software Size Estimation in Traditional Software Engineering

et al., 2003b).

2.3.2 Function Point

To overcome these disadvantages of SLOC, FP was developed in 1983 by Albrecht

(Albrecht & Gaffney, 1983) to measure size of transactional processing systems


in terms of system functionality, independent of implementation languages. FP

incorporates both size and complexity factors in its counting process. There are
many software development effort estimation approaches using function points,
such as regression model, or artificial intelligence model (e.g. artificial neural

networks and case-based reasoning) (Finnie et al., 1997).

FP is used to estimate the amount of functions a software provides, based on


how much data it uses and generates. FP is found to be more useful and suitable
in many software projects than the LOC method because of its applicability at an

early stage of software development, when LOC is not yet available. The FP of
a system can be obtained relatively easily from discussions with customers early
in the development process.

FP measures system functionality; it is, therefore, believed to also provide,

in association with staff effort, a general measure for development productivity


with less concern for influences of technologies, code reuse, and unexpected code

expansions. Development productivity can be measured in “function points per

work-month” or “work-hours per function point”.

The Function Point Analysis (FPA) method is considered an empirical estima-


tion approach, because it is a sizing method and to be used for effort prediction,

it is necessary to identify a relationship between the effort required to build a

37
2. LITERATURE REVIEW

system and identifiable system features (such as external inputs, interface files,

outputs, inquiries, and logical internal tables). Counts of system features are
adjusted using weighted values and complexity factors to derive the final size of

the system.

The FPA methodology has three steps, given there exists a list of all functions
that the software should provide. Firstly, each function is classified into one of

five types: External Input (EI), External Output (EO), External Inquiry (EQ),
Internal Logical File (ILF), and External Interface File (EIF). A function is clas-

sified as an EI when it involves user inputs that adds or changes data in a LIF.
A function is an EO when it generates a report or message to the user or other
applications outside the boundary of the application being measured. A function

where an input generates an immediate output with no updates of LIFs is called


an EQ. LIF is a logical file (as distinct from physical files) or logical group of
data in a database context, whereas EIF is a file to pass or share data between
applications. In the second step, each function is evaluated and assigned with a

complexity level of Low, Average, or High. Finally, each function is assigned a

weight value based on its type from the first step and its complexity level from
the second step. The sum of these weight values forms the Unadjusted Function
Point (UFP) of a software. The weighted sum of all five types of functions is

adjusted with an optional Value Adjustment Factor (VAF) obtained by consid-

ering the degree of influence of 14 General System Characteristics (GSC) of the

interested system.

Among traditional software size measurement, FP has achieved a wide ac-

ceptance in sizing software products, mainly due to its applicability in the early

phases of the software development. However, FP has also been subject to some

38
2.3 Software Size Estimation in Traditional Software Engineering

criticisms. Abran & Robillard (1994) pointed out a scale type mismatch and

questioned the math behind the FP approach. Thus, from a theoretical point of
view, the FP may not be considered as a measure that is in conformance with

measurement theory. However, from a pragmatic viewpoint, FP has been suc-


cessfully applied in a number of application domains and is considered to be a

significant improvement over traditional software size measures (Matson et al.,

1994). As a result, despite the criticisms, the FP measure has subsequently been
improved and extended.

2.3.3 Function Point Extensions

Although FP is most applicable for only procedural business systems, it has


formed a firm foundation for a number of extensions suitable for other types

of systems and development paradigms (Costagliola et al., 2005; Abran, 1999;


Dekkers et al., 2003; Antoniol et al., 1999; Mohagheghi et al., 2005; Karner, 1993;
Reifer, 2000).
Over years, software technology has evolved with the development of the web
and the Internet, many people have extended FP to adapt to other emerging

systems. For example:

• Use Case Point (UCP)

Karner (1993) proposed the UCP model inspired by FP. It measures a


system functionality based on use cases, actors, and transactions.

• Full Function Point (FPP)

Abran (1999) extended the applicability of FP to real-time software by

introducing FFP. FFP redefines FP’s function types to capture specific

39
2. LITERATURE REVIEW

real-time software characteristics that FP fails to measure, such as: large

number of single occurrence groups of data, or fluctuating number of sub-


processes.

• Object-Oriented Function Point (OOFP)

At the same time, Antoniol et al. (1999) developed OOFP for sizing OO
systems. OOFP relies on object models to map FP’s function types into

OO concepts. A remarkable aspect of OOFP approach is its flexibility


which allows practitioners to experiment with several procedures of OOFP
measurement in order to find the best suited practice for their organization.

• Web Object (WO)

Reifer (2000) extended FP to WO for sizing Web projects, by adding four


new web-specific components: multimedia files, web building blocks, scripts,

and links. The WO measure has been successfully used in an adaptation of


the COCOMO II estimation model called WebMo for estimating the effort
and schedule of web-based development (Boehm et al., 2000).

• Web Point (WP)

Cleary (2000) proposed WP for sizing internet applications. In analogy to


the FP analysis, the WP approach classifies pages in a web site and assignes

weights based on their complexity where the number of links and words in a

web page determine its complexity. The WP measure focuses on static web

sites and therefore does not consider behavioural and navigational proper-
ties of web applications.

• Internet Point (IP)

40
2.3 Software Size Estimation in Traditional Software Engineering

Another adaptation of FP for web-based systems is the IP developed at the

Cost Xpert Group, Inc. (Group, 2002). The IP method replaces five types
of constituents of the FP model with seven new types, namely external

interface files, logical internal tables, messages/external queries, reports,


static screens and dynamic screens for measuring the size of web-based

systems. The IP counting process has been automated in a tool called Cost

Xpert that can estimate the equivalent size of a web-based system in LOC
as well as the effort and schedule of its development.

• Class Point

Costagliola et al. (2005), in 2005, proposed Class Point (CP1 and CP2
for initial size estimation at the beginning of the development process and
further detailed estimation when more data are available later in the de-

velopment process, respectively). Class Point does not apply one-to-one


mappings from FP’s function types to OO concepts like other extensions,
but rather focus on classes as the basic units. However, Class Point inherits

the three-step approach from Function Point: (1) Classify classes into four
types (Human Interaction, Problem Domain, Data Management, and Task

Management); (2) Evaluate complexity level for each individual class (com-
plexity levels: Low, Average, or High); and finally (3) Assign a complexity

weight for each class based on the previous two steps. The weighted sum

of all four types of classes is adjusted with a VAF obtained by considering

the degree of influence of 18 GSC of the system under assessment. The


Class Point measure has been used successfully in a least-square regression
model (Costagliola et al., 2005), a neural network approach (Kanmani et al.,

41
2. LITERATURE REVIEW

2007) and a fuzzy subtractive clustering technique (Kanmani et al., 2008)

for estimating the effort of OO development.

• Object Point (OP)

Despite its name, the Object Point (OP) (Banker et al., 1991) is another

generalised extension to FP which is not tied to OO system. The OP


counting is very similar to the FP analysis but objects are counted instead.

However, such objects are not directly related to objects in OO paradism

but rather refer to screens, reports and third-generation language modules


in software applications. The OP measure has been successfully used in the

COCOMO II cost model for estimating the effort of software development


(Boehm et al., 1995).

• Mark II Function Point (MKIIFP)

In addition to the above specialised extensions, there are other extensions


to FP. Symons (1991) proposed Mark II Function Point (MKIIFP) mea-

sure as an enhancement to Albrecht’s original FP approach. The measure


replaces the five types of constituents of the original approach with logical
transactions and extends the standard set of GSC from 14 to 19 plus any

client defined characteristics (UKSMA, 1998).

• COSMIC Full Function Point (COSMIC FFP)

The Common Software Measurement International Consortium (COSMIC)

proposed another extension to FP called COSMIC Full Function Point

(COSMIC FFP) (Abran, 1999). The COSMIC FFP measure has been
formulated as a refinement of FFP, MKIIFP and the FP models in order

42
2.3 Software Size Estimation in Traditional Software Engineering

to work equally with data-rich business systems and control-rich real-time

systems. However, the method does not explicitly claim to measure the
size of functionality that includes complex mathematical algorithms. In

contrast to FP, the COSMIC FFP measure does not take the effect of tech-
nical and quality requirements of the system into consideration by claiming

adjustment factors are no longer meaningful (Symons & Symons, 2001).

Conclusion:

SLOC, FP and its extensions have been widely used to measure size of different
types of systems and development paradigms. However, their applicability is
limited to software functionality development. The main purpose of migrating

a system to the Cloud is not to develop new functionalities, but to reuse the
existing ones, while, at the same time, to benefit from the best performance of
Cloud offerings. In light of this stance, none of the existing metrics are suitable

for estimating size and effort of a migration project to the Cloud.

We, thereby, wish to apply the FP approach to develop a similar size metric for
the Cloud migration context. Although FP is commonly known as a software size

measurement, it is not purely a size metric. The way FP was counted incorporates

both size and complexity concepts. The size metric for Cloud migration projects

that is based on the FP approach will be similar to FP in the sense that they are
both size-complexity hybrid metrics. However, throughout this thesis, this metric
will still be referred to as a size metric to ensure the consistency of terminology.

43
2. LITERATURE REVIEW

2.4 Summary

Cost and benefit analysis is an important tool for IT managers to evaluate whether

the benefits outweigh the costs of an IT investment. The determination of cost is


usually the first step to achieve this goal and is often a challenging task for many

project managers, since both overestimating and underestimating would result

in unfavourable impacts to the business’s competitiveness and project resource


planning.

Software costs include tangible costs (hardware and software costs), admin-
istrative costs, and development costs. Most of the time, the dominant cost

is the cost of development staff and managers (Sommerville, 2006). The con-
text of Cloud migration requires a different perspective to understand its effort
costs, given that limited experience is available in the published papers. Amongst

various effort estimation approaches from traditional software engineering, algo-


rithmic approach appears to be the most feasible approach at this stage to adapt
to the context of Cloud migration.

Size measurement is the most dominant factor of algorithmic effort estimation.


Different size measurement metrics have been developed and applied successfully

in traditional software engineering. Many of these metrics are not able to ad-
equately capture the unique and different characteristics of a Cloud migration

project. Effort estimation and size measurement of migration to the Cloud are dif-

ferent from those of traditional software development in the sense that the latter

focus on components to be developed, either functions or classes, whereas the for-


mer are more concerned about migration activities, such as code modification for
migrating to PaaS Clouds, or software installation for migrating to IaaS Clouds.

44
2.4 Summary

Traditional size metrics were developed for functional development or mainte-

nance tasks; hence, mainly focus on code changes (added/removed/modified).


Cloud migration tasks, on the other hand, not only focus on code changes, but

also on other processes such as network configuration and database modification


tasks, which the measures employed as predictors by traditional size metrics fail

to cover.

As a result, we are strongly motivated to:

• Propose a taxonomy of migration tasks to the Cloud, since the literature


shows that there has not been any guidance or standard on this, while a
migration guideline is essential at this stage.

• Develop a new size measurement for Cloud migration, which can be served
as a predictor for migration effort estimation purpose. We aim to cap-
ture the size of the migration process, rather than the size of the migrated
system; hence, none of the existing metrics are applicable.

The taxonomy will be presented in Chapter 4, and the new sizing metric will

be introduced in Chapter 5.

45
2. LITERATURE REVIEW

46
Chapter 3

Research Methodology

“If you can’t describe what you are doing as a process, you don’t know
what you’re doing.”

∼ W. Edwards Deming.

The literature review in Chapter 2 has shown that there is no related work

in the Cloud migration effort topic. We, therefore, seek to gain insight into the
Cloud migration tasks and understand their cost implications by carrying out

migration experiments from a Cloud consumer perspective, and consequently,

confirm our findings with projects from external organizations.

This research is a hybrid of qualitative and quantitative research, and it fol-


lows the concurrent procedure strategy as discussed by Creswell (2002). Following

the concurrent procedure strategy, we collect both forms of qualitative and quan-
titative data at the same time during the study and then integrate and analyze

them to achieve the overall results. In particular, the process of this research

can be described in three steps, which are mapped with steps in this thesis, as in

47
3. RESEARCH METHODOLOGY

Figure 3.1.

Figure 3.1: Steps of the Research Process and Thesis

The sub-sections in this chapter elaborate the steps of this research process,
as follows: Section 3.1 describes Step 1 in the research process - the experiment
set up for the purpose of exploring possible migration tasks in a Cloud migra-

tion project. Section 3.2 illustrates Step 2 - the discussion protocol with Cloud
engineers from our group to confirm our findings on Cloud migration tasks, and

to develop the CMP metric. Section 3.3 discusses Step 3 - the survey protocol

to obtain more data on Cloud migration projects from external organizations in

order to test the generalizability of this research.

3.1 Cloud Migration Experiments

This is the first step in the research process. I carried out different types of
migration experiments to understand the actual migration activities. The purpose

48
3.1 Cloud Migration Experiments

of the migration experiments is to explore possible migration tasks involved in a

migration project, as well as to understand the cost implication of each task.

3.1.1 Experiment Setup

The experiments should satisfy the following criteria:

• The migration experiments are setup for PaaS and IaaS Clouds only (SaaS
Clouds are ignored as discussed in Section 1.4). PaaS Cloud candidates
can be Windows Azure and SQL Azure, and IaaS Cloud candidates can be

Amazon EC2, Amazon RDS and SimpleDB.

• The applications to be migrated should represent different application types,

that are typically used by enterprises.

• The applications should be N-tier applications, with a proper database.

• The applications could be developed by different developers, but all docu-

mentations should be available.

• The applications in the Cloud after the migration process should work prop-

erly, in terms of functionality and performance.

• Same application can be migrated to different Clouds using different migra-

tion strategies.

.Net PetShop (Leake, 2006) is an application designed to show best practices

for building an enterprise, N-tier .Net 2.0 application. It serves to highlight the
key technologies and architecture to build scalable enterprise Web applications.

49
3. RESEARCH METHODOLOGY

Its Java version called Java PetStore is also well-known for its use of as an illus-

tration of how the Java EE 5 platform can be used to develop an AJAX-enabled


Web 2.0 application. For these reasons, both versions of PetStore have been used

in various research studies (Li et al., 2004; Singh et al., 2002; Yuan et al., 2003)
and we believe the PetShop application represents a broad class of application

types, that are typically found at an enterprise organisation, and that is also a

prime candidate application type for running in the Cloud.

Our experiment was to migrate the PetShop application from a local server
to the Cloud. Windows Azure and SQL Azure were selected as the PaaS Cloud
platform for migration since they provide the most similar environment for Pet-

Shop .Net as in the local server. Therefore, it was expected that minimal effort
would be required for migration activities.

The migration of Java PetStore into Amazon EC2 and SimpleDB was also
investigated to add more richness to our findings. Amazon EC2 is an IaaS Cloud,

and SimpleDB is a NoSQL database with less support for full-SQL statements
required in the PetStore application; therefore, different migration strategies and
more re-engineering efforts were expected.

3.1.2 Data Collection Strategy

All migration tasks should be recorded, together with the time required to com-
plete each task. Each migration task can be divided into multiple tasks with finer

granularity, or grouped with other tasks to form a more general task. This is to
ensure the uniformity in granular level of all tasks.

The migration tasks should be categorized into different groups, such as in-

50
3.2 Discussion with Cloud Engineers

stallation tasks, or code modification tasks, depending on the nature of each task.

The overhead cost of the migration tasks can be achieved by comparing the time
spent on each migration task category with the development time of the applica-

tion. The application was not developed by us; hence, the development time can
be estimated using an effort estimation approach in the literature (either analogy,

expert judgement, or algorithmic models (Shepperd & Schofield, 1997; Jorgensen,

2004; Finnie et al., 1997)).

In addition, well-known practitioners’ blogs, such as Hamilton (2011); Linthicum

(2011); Chappell (2011), were also consulted to confirm our list of migration tasks.
Although they did not discuss any specific migration project, there are blog en-

tries on the steps of a migration project and database migration concerns.

The output from this step should be a collection of categorized migration

tasks, collected from all migration experiments, together with the associated time
spent on each task. This contributes to the taxonomy of migration tasks, and
forms the basic elements of the CMP model to measure the size of a migration
process.

3.2 Discussion with Cloud Engineers

The migration experiments in Step 1 facilitated us to form a taxonomy of cate-

gorized migration tasks and the structure of the CMP model. In this Step 2, we
conducted interviews with our group members at NICTA1 to confirm the migra-
1
NICTA (National ICT Australia Ltd) is Australia’s Information and Communications
Technology Research Centre of Excellence. Since NICTA was founded in 2002, it has cre-
ated five new companies, developed a substantial technology and intellectual property portfolio
and continues to supply new talent to the ICT industry through a NICTA-supported PhD pro-
gram. NICTA has five laboratories around the country. With over 700 people, NICTA is the
largest organisation in Australia dedicated to ICT research.

51
3. RESEARCH METHODOLOGY

tion tasks and migration categories in the taxonomy are reasonable, and to seek

for their expert opinion on the parameters of the CMP model.

3.2.1 Participants

The discussion was carried out with 6 participants from our group individually.

The participants included:

• Two senior researchers, with 10 years experience in software development


and 3 years experience with Cloud computing

• Two research engineers, with 5 years experience in software development


and 2 years experience with Cloud computing

• Two Ph.D. Research Students, who are in their middle and final stage
of the Ph.D. study. Their Ph.D. topics are related to Cloud Computing
performance.

All participants have good knowledge of Cloud computing. They have good

understanding on state-of-the-art Cloud offerings and technologies, and many


hands-on experiences related to Cloud offerings. As part of their research, the par-
ticipants have already migrated different types of applications (e.g., benchmarking

systems, different types of databases) to different types of Cloud (including Ama-

zon EC2, Amazon RDS, S3, SimpleDB, Windows Azure, SQL Azure, Google App

Engine, MongoDB, Rackspace), although they were small and medium projects.
In addition to general migration activities, they have also explored other vital

aspects of Cloud computing, such as elasticity and database consistency. With

52
3.2 Discussion with Cloud Engineers

their exposure to the Cloud computing environment, they are reliable and valu-

able participants for our discussion.

3.2.2 Discussion Protocols

We asked each participant similar questions in three steps:

• Firstly, each participant were asked for their opinions on the taxonomy of
migration tasks. They could suggest to add more tasks, remove some tasks,

or re-categorize a task.

• Secondly, the structure of the CMP model was presented to the participants,

and they were asked to nominate a numeric value that they think would be
the best suitable for each parameter of the CMP model.

• Thirdly, each participant were asked to describe a Cloud migration project


that they have participated, together with the time spent on each migration

task in that project.

The discussion was completed with each participant individually, without

knowledge of other participants’ answers in the first round. A second round of dis-
cussion was conducted with each participant again, but this time with knowledge

of other participants’ replies, to decrease the range of answers. This is known as

Delphi technique to combine experts’ opinion for a better judgement (Shepperd

& Schofield, 1997).

53
3. RESEARCH METHODOLOGY

3.2.3 Data Collection and Analysis

Participants’ answers were then carefully analyzed. Changes to the taxonomy

suggested by most participants would be amended accordingly. The remaining


suggestions that was made by one participant would also be run through other

participants to seek for a concensus.

The value for each parameter of the CMP model was determined by averaging

all expert opinion values for that parameter. This set of values forms the initial
set of parameters for the CMP model, as presented in Chapter 5.

6 migration project and the associated effort decribed by each participant

were used to validate the CMP model (Chapter 6).

3.3 Survey Protocol

Data on migration effort and migration tasks of past Cloud migration projects
are vital elements of a validation process. Unlike data on development effort of
traditional software development projects, the data of interest do not exist on

any public repositories. This is anticipated since Cloud computing is relatively

immature and there is no related work on the migration effort to the Cloud. This
yields both advantages and challenges for our work at this stage. While we enjoy

the flexibility to explore different aspects of the Cloud migration topic, we are
challenged to collect real data ourselves for the validation purpose.

In this section, we describe our process of conducting a survey1 to collect data


on past migration projects to the Cloud. The purpose of this step is collect data
1
This survey was conducted with the assist of an IT Master student, Tingting Yao, from The
University of Sydney. Tingting assisted me to identify potential participants, and to distribute
the questionnaire. This task also contributed to her final project of her Master degree.

54
3.3 Survey Protocol

to validate our taxonomy of migration tasks and the CMP model with external

data points.

3.3.1 Objectives

The objective of the survey was to collect data on past migration projects to

Cloud for determining migration cost factors, including size, and examining their
relationships with the effort required for migration. Many organisations have

been migrating their systems to Cloud; however, no detailed documentations on


migration tasks and effort are found publicly available. Therefore, the survey
aims at practitioners who have experienced with the migration of a legacy sys-

tem to cloud, to gather information on their migrated systems, migration tasks,


and the amount of effort spent (in person-hours), in order to obtain sufficient in-
formation for the empirical validation of the CMP model. We are also interested
in how practitioners evaluate the effect of external cost factors on their migration
projects.

This survey addresses the following research questions:

• RQ1: What migration tasks were carried out?

– RQ1.1: Were database migration carried out?

– RQ1.2: Were any installation and configuration done?

– RQ1.3: Were any code modification required?

– RQ1.4: Were network connection changed?

– RQ1.5: Were any other tasks done?

– RQ1.6: How migration tasks were carried out?

55
3. RESEARCH METHODOLOGY

∗ RQ1.6.1: What type of database migration was done? e.g., rela-

tional to Nosql, relational to relational (same or different type of


relational database? same or different version?)

∗ RQ1.6.2: How many queries required modification?

∗ RQ1.6.3: How much data was migrated?

∗ RQ1.6.4: How many packages were installed from source code and

binary files?

∗ RQ1.6.5: How many configurations were done for each package?

∗ RQ1.6.6: For each network connection, what type is it (LAN or


WAN)? and what tasks were done, adding security of optimizing

protocol?

∗ RQ1.6.7: For each modified class, what type is it (Human Inter-

action Type, Problem Domain Type, Data Management Type or


Task Management Type)? How many attributes, methods, service
call were changed?

• RQ2: Is the CMP size metric a significant indicator for migration

effort to the Cloud?

– RQ2.1: How many person-hours were required for a migration project


to the Cloud?

∗ RQ2.1.1: How many person-hours were spent on each migration

task?

• RQ3: What external cost factors influence migration efforts?

– RQ3.1: How development teams expertise affect migration effort?

56
3.3 Survey Protocol

– RQ3.2: How development teams experience in software engineering in

general affect migration effort?

– RQ3.3: How development teams experience in cloud affect migration


effort?

– RQ3.4: How design quality of migration tasks affect migration effort?

– RQ3.5: How choice of cloud providers affect migration effort?

– RQ3.6: Any other factors affect migration effort?

3.3.2 Survey Design

This survey is a cross-sectional survey, where the information is gathered on the


population at the current state of Cloud computing (Creswell, 2002). Data were

collected mainly via web surveys, and some additional interviews. We could not
conduct in-person interviews with many practitioners because of geographical
constraints. Hence, web survey approach was our main source of data collection.

The studied population included a project team from NICTA and a list of
individual practitioners who have migrated their systems to the Cloud. The team

from NICTA is different from our group. This team has migrated their system

to the Cloud to take advantage of the Cloud elasticity for their project. The

practitioners were identified from the Cloud community and online discussions,
such as: authors of Cloud scientific papers, and participants in Cloud events (e.g.,

CloudCamp). Interviews were conducted with the NICTA’s project team to gain
more insights and more detailed data, and surveys were sent to a list of identified

practitioners. The study was conducted on the identified population.

A questionnaire was prepared to address the research questions. I prepared

57
3. RESEARCH METHODOLOGY

the questionnaire to cover all CMP aspects that require information for validation,

and also to gain further insights on how the respondents have conducted their
migration to the Cloud. The questionnaire was run through 6 Cloud engineers

from our group (as already introduced in Section 3.2). In the discussion described
in Section 3.2 prior to this survey, each participant was asked to describe a Cloud

migration project and their time spent on each migration task of the project.

The questionnaire essentially asked for the same information. Answers from the
discussion and the questionnaire were then analysed and compared. I found
that the participants could correctly interpret the questions and answers were
almost the same for both the discussion and questionnaire. The biggest issue
of the questionnaire was that the participants were confused by questions that

were not relevant to their migration tasks. For example, participants who only
migrated their database to the Cloud were lost within the questions about code
modification because they didn’t modify any of their code. To address this issue,

we needed to create different branches of the survey, so that the respondents will
only be asked questions that are relevant to their migration tasks.

We evaluated different survey software and found that LimeSurvey1 best suits
our needs because if its features and pricing scheme. Surveys can be created with

different layers and branches. Incompleted survey responses can be saved for

later view and update. Different types of questions available in LimeSurvey are

sufficient for our needs. Also, we were charged based on the number of responses
rather than within a timeframe like other online survey software. This pricing

scheme suits our needs because we didn’t expect to receive thousands of responses
weekly or monthly.
1
http://limesurvey.org

58
3.3 Survey Protocol

Our survey was created with LimeSurvey. A link to the web survey was sent

via email to the list of practitioners, and responses were recorded by the web
survey once they finished. To ensure the response rate is adequate, a follow-up

email was sent after two weeks.


The following table (Table 3.1) shows a mapping between the research ques-

tions and the questions from the questionnaire (Appendix A).

Research Questions Questions from questionnaire


RQ1 GQ1, GQ2, GQ3
RQ1.1 DB1
RQ1.2 IC1
RQ1.3 CM1
RQ1.4 NC1, NC2
RQ1.5 DB8, DB10, IC5, NC6, CM8
RQ2.1.1 DB5, DB7, DB9, IC4, NC5, DB10, IC5, NC6, CM7, CM8
RQ1.6.1 DB2, DB3
RQ1.6.2 DB4
RQ1.6.3 DB6
RQ1.6.4 IC2
RQ1.6.5 IC3
RQ1.6.6 NC3, NC4
RQ1.6.7 CM2, CM3, CM4, CM5, CM6
RQ3.1 CF2, CF3
RQ3.2 CF3
RQ3.3 CF1, CF3
RQ3.4 CF3
RQ3.5 CF3
RQ3.6 CF4

Table 3.1: Mapping between research questions and questionnaire

3.3.3 Data Collection

The data collection process was done in over three months. First, we sent out
350 invitation emails to different target audience, including academic researchers,

59
3. RESEARCH METHODOLOGY

industrial groups and companies, and individual practitioners. We did not receive

replies from all recipients, but we received some very positive replies that they
were very interested in participating in our survey. We sent out 308 surveys to the

list of participants again, excluding 42 recipients who replied to our first invitation
email that they were not willing to participate or from whom we received out-of-

office auto replies and failed distribution emails. In this second round, we received

33 responses (around 10% response rate), but some of them were incomplete. For
example, some responses do not provide enough information to calculate CMP;
or some do not have information on total hours spent. The main reason for this
low responses rate is because most of the projects were done for exploration and
tutorial purposes; hence there were no detailed information recorded, especially

some information required for calculating CMP. Most responses could easily an-
swer general questions on why they migrated to the Cloud, or how they generally
did that, but most of them failed to provide sufficient information at the design

level of migration tasks.

After careful analysis to eliminate unreliable and incomplete data, we got a

total of 19 data points. These data points come from responses that provided
sufficient information for CMP calculation. We discarded all responses that were

commented as “wide guess” by the project teams. 17 out of 19 data points are

small projects with around or less than 100 hours in total. Again, this is because

we targeted some individual practitioners, and their survey responses were all for
example migration projects. We tried to target large groups with larger-scope

migration projects, and we could get only 2 corresponding responses.

The final dataset and the validation process are reported in Chapter 6.

60
3.4 Summary

3.4 Summary

In this chapter, we have described the process of undertaking this research. This

research requires a mixed method of both qualitative and quantitative approach.


Data of both forms (qualitative and quantitative data) were collected concurrently

for the purpose of exploring Cloud migration tasks, building the taxonomy of
migration tasks, developing the CMP model for sizing migration projects, and

validating them.

61
3. RESEARCH METHODOLOGY

62
Chapter 4

Taxonomy of Migration Tasks to


the Cloud

“Our experience shows that not everything that is observable and mea-
surable is predictable, no matter how complete our past observations

may have been.”

∼ Sir William McCrea.

The focus of Cloud migration project, as already discussed in the previous


chapter, is both the migration activities involved (i.e., process) and the system to

be migrated (i.e., product). Successfully migrating a system to the Cloud requires


an appropriate and sufficient set of migration tasks to be carried out. Examining

a migration task involves examining related parts of the system to be migrated.

Cloud migration tasks, hence, are defined as primitive units of our study.

For a better understanding of a Cloud migration project, in this chapter, we

report on several Cloud migration experiences, and capture our understanding

63
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD

of how migration projects are carried out, in the form of a list of potential mi-

gration tasks that might be involved in a Cloud migration project. We call this
a taxonomy of Cloud migration tasks. A taxonomy, as stated by Mens & Gorp

(2006), is defined as:

“A system for naming and organizing things [. . . ] into groups which


share similar qualities.” (Cambridge Dictionary Online)

In this chapter, we will present the process of how migration tasks are ex-

tracted, and categorized into different groups to form the taxonomy. It is both
necessary and challenging to identify the taxonomy. The necessity is because this
will enable us to capture various critical aspects of the cost implications of a Cloud

migration project. A taxonomy of migration tasks to the Cloud might also be


helpful to get new migration projects started. On the other hand, it is challenging
because migration projects vary in multiple dimensions, such as: specification of
the migrated systems (e.g., programming language, system architecture), Cloud

offerings (e.g., IaaS or PaaS, relational databases or NoSQL), or requirements of


the migration projects (e.g., security, network throughput, parallelism).

The content of this chapter includes the sub-sections as follows: Section 4.1

describes how taxonomy is usually derived in other contexts. Section 4.2 shows
our approach to derive the taxonomy of Cloud migration tasks. We report on
our migration experiences with the breakdown of costs (in terms of effort) among

categories of task in Section 4.2.1 for a case-study which migrated a .NET n-

tier application to run on Windows Azure, which results in a list of important

influential factors that impact on the cost of various migration tasks in Section
4.3. The taxonomy of Cloud migration tasks is then described in Section 4.4.

64
4.1 Taxonomy in other contexts

Section 4.5 validates the proposed taxonomy on one industrial migration project

conducted by our group, and also shows how the taxonomy can be applied in
real Cloud migration projects. Section 4.6 reflects on our approach, and on other

experiences. We conclude the chapter with a summary in Section 4.7.

4.1 Taxonomy in other contexts

A taxonomy is a way to precisely categorize things into pre-defined groups, and


increase understanding of the topic of interest while avoiding any confusion in

terminology. In this section, we review the methodology of how a taxonomy is


developed in other contexts, in order to apply it in our Cloud migration context
in the next section.

Mens & Gorp (2006) proposed a taxonomy of model transformation, which

classifies existing model transformation approaches multi-dimensionally, based on


selected concrete criteria. The purpose of the taxonomy is to assist developers
in their decisions of which approaches, tools, and techniques are best for their

needs. The taxonomy was derived based on the discussions of a working group
on Language Engineering for Model-Driven Software Development, on the im-

portant characteristics of model transformations. Essentially, the taxonomy is a


classification of model transformation approaches and their tools and techniques

on the basis of a group discussion.

Similarly in terms of methodology, the taxonomy proposed by Padioleau et al.

(2009) was also obtained from a pool of existing sources. This is a taxonomy

on comments of programmers’ code in order to reveal their needs, such as new

development tools or a language extension. The authors analyzed 1050 comments

65
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD

randomly collected from three open source operating systems Linux, FreeBSD,

and OpenSolaris. The comments were categorized from different aspects, based
on the four basic questions: “what is in comments? whom the comments are

written for or written by? where the comments are? and when the comments
were written?”.

The taxonomy by Mehta et al. (2000) about software connectors was formed
from a classification of three atomic elements of software interactions. The tax-

onomy was proposed for the purpose of increasing the level of understanding of
fundamental blocks of software interactions, and how they interact together to
create more complex blocks. This work is the only one of the three that showed

“taxonomy in action”, i.e., how the taxonomy is applied on the architecture of


an existing system. In other words, this is a form of validation for a taxonomy.

Generally, a taxonomy is obtained from existing unorganized resources, which

are then systematically classified according to some concrete criteria. The valida-
tion of the taxonomy can then be achieved by showing its usefulness on another
system.

4.2 Experiment Setup

For our Cloud migration context, there are no existing pools of migration tasks
ready for the classification stage. As a result, we had to create a list of Cloud

migration tasks ourselves by conducting migration projects. We carried out an

experiment presented in a case study for the purpose of understanding the ac-
tual migration activities to PaaS and IaaS Clouds (SaaS Clouds are ignored as

discussed in Section 1.4). We report here on experiences in doing this technical

66
4.2 Experiment Setup

migration.

The applications used in our experiments are .Net PetShop (Leake, 2006) and
its Java version - Java PetStore, as discussed in Chapter 3. The PetShop applica-

tion was migrated from the local server to Windows Azure and SQL Azure, and
Java PetStore was migrated to Amazon EC2 and SimpleDB. Different migration

strategies and effort were required (as reported in Section 4.2.1).

In order to calculate the migration effort as an overhead cost over the original
development effort, we needed to have a figure of the initial development effort.
This development effort can be achieved in a conventional manner with Function
Point, given that all required information from the PetShop .Net application is
available.

Function Point Analysis (Albrecht & Gaffney, 1983) was applied on the fully
functional PetShop application to estimate its size complexity, which then can be
applied to estimate its development cost. We used this estimated development

cost and the recorded migration cost in our experiment on PetShop to calculate
the overhead cost of migration over development.
Based on the Function Point reference cards provided by IFPUG (2010), Pet-

Shop is calculated to have:

• 28 Internal Logical Files (ILF s),

• 28 External Inputs (EIs),

• 32 External Outputs (EOs),

• 36 External Inquiries (EQs),

• and no External Interface Files (EIF s),

67
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD

and a total of 118 Adjusted Function Points (AF P s). Using similar settings

and existing resources to the migration activities, we isolated 1 feature of PetShop


which is counted as 3 AFP, and re-developed that feature. It took us around 4

to 5 hours to completely develop this feature. Hence, we assumed 1.5 hours on


average for developing 1 AFP of PetShop. Therefore, the effort for developing

PetShop application with 118 AFPs is estimated to be around 177 hours. If

PetShop is developed from scratch, as in a development project for the Cloud,


as distinct from a migration project to the Cloud, it is expected to take roughly
similar amount of effort (177 hours) for having the same functionality (118 AFPs).
The efforts in hours spent on each migration task in our experiment were
recorded for later analysis. It is presented in the following section (Section 4.2.1),

together with observations made during the study.

4.2.1 Measured Data and Observations

This section reports our observations in our experiments as described in the pre-
vious section. The observations and experiences in our study will provide a basis
for the taxonomy of Cloud migration tasks in Section 4.4.

When migrating PetShop to Windows Azure and SQL Azure database, some

migration issues were observed and identified as:

• We have used the existing application PetShop which was not developed
by ourselves; hence, efforts were required to learn, understand, and get

PetShop to work on local machine first.

• PetShop was developed on an older platform than the current version sup-

ported by Windows Azure. This is expected to happen with many other

68
4.2 Experiment Setup

existing applications, since Cloud computing has just emerged recently and

is equipped with the latest technologies and tools, which may yield incom-
patibility issues. In particular, to deploy applications to Windows Azure,

Windows 7 is required, while PetShop installation files were packaged for


Windows XP and could not run properly on Windows 7. We needed to

deploy the PetShop source code onto Windows 7 manually.

• The same issue is applied for PetShop database. There are existing tools of-
fering database and data transfer from local servers to SQL Azure; however,
it requires SQL Server 2008 to be installed, while PetShop was designed to
work with SQL Server 2005 and cannot be installed directly on SQL Server

2008. We had to manually retrieve and run the database script on SQL
Server 2008.

• In order to deploy applications into Windows Azure Cloud, it was important


to create a package file and a configuration file from the existing source
code. The Azure plugin for Visual Studio provides a quite straightforward

method to achieve this; however, this method works with “Web application
project” only, while PetShop was created as a WebSite project, where there
is no project file and it relies on ASP.NET dynamic compilation to compile

pages and classes in the application. Effort was also spent on converting

the WebSite project to a Web Application project. Alternatively, the utility

tool cspack provided by Azure can also be used to create the package file.

Efforts spent on addressing those issues were recorded in terms of duration,

and are summarized in table 4.1 and 4.2.


PetShop was originally designed to work with Windows XP, .Net framework

69
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD

Tasks Effort
(hours)
Install SQL Server 2005 and setup local en- 5.5
vironment in order to run the PetShop in-
stallation file
Get PetShop up and running properly 3.5
Install SQL Server 2008 to get PetShop 2
running with later technology
Migrate databases from SQL Server 2005 5
to SQL Server 2008 and modify PetShop
to work properly with SQL Server 2008
Install .Net 4 and modify PetShop to work 1.5
on Windows 7 and .Net 4
Test Petshop 5
Total 22.5

Table 4.1: Recorded overhead efforts of preparing PetShop for migration

2, and SQL Server 2005. To enable PetShop run properly for the first time, these
prerequisites need to be installed. Data in Table 4.1 shows that most time of this
activity was spent on setting up the environment to allow PetShop to run. Data

in Table 4.2 shows that the most time spent on migration to the Cloud is for
overcoming the learning curve. No new features were introduced, and Windows

Azure provides similar platform to the one on which PetShop was developed on;

therefore, only minimal code modification was required.

In our experiment, learning about the application and the Cloud environ-

ment, as well as installation and configuration, contributed most to the overhead

cost. Experience required to deal with unforeseen issues also counted for major
additional cost. When the learning phase is finished, migrating similar types of

applications will require less efforts. Figure 4.1 shows the overhead cost for each

category of the migration tasks for PetShop which has the complexity of 118

70
4.2 Experiment Setup

Tasks Effort
(hours)
Windows Azure tutorials 6
Create Azure account and setup firewall 1.5
rules
Install and explore MS Azure Training Kit 5
Tutorials: migrating databases to SQL 4
Azure
Migrate PetShop database to SQL Azure 2
Modify PetShop to work with SQL Azure 4
Test PetShop on local servers against SQL 2
Azure
Modify and package PetShop to Windows 5.5
Azure
Deploy PetShop to Windows Azure 1.5
Test PetShop in Windows Azure with SQL 5
Azure
Total 36.5

Table 4.2: Recorded overhead efforts of putting PetShop to Cloud platform

AF P s. The overhead cost is calculated as the percentage of additional efforts


over the estimated application development efforts (177 hours in total)

Figure 4.1: Migration Overhead Cost

71
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD

Other additional issues were also observed as follow while considering Java

PetStore with Amazon EC2 and SimpleDB:

• Java PetStore was developed to work with JavaDB database, connected via

a JDBC driver. It is not straightforward to connect Java PetStore with

SimpleDB instead, since, at the time we carried out our experiment, there
was no JDBC driver written for SimpleDB. Writing a JDBC driver for

SimpleDB from scratch with full features is not feasible.

• Java PetStore uses JPA, which depends heavily on advanced features of

JDBC drivers; therefore, SimpleDB could not be connected directly to Java


PetStore.

• There exists SimpleJPA, an open-source JPA implementation for SimpleDB.


Effort is needed to understand this third-party library.

• SimpleDB is a NoSQL Cloud database and does not support full-featured


SQL statements, such as JOIN operations, which were required by Java

PetStore. Additional efforts were needed to re-write these operations.

• Amazon EC2 is a type of IaaS Cloud, so additional installations are required


compared to the experiment on Windows Azure.

Those issues require additional effort in addition to our experiment with Win-
dows Azure. The additional effort mainly fell into the categories of installation
and code modification.

The measured data and observations presented above create the opportunity

for further classification and future work in identifying migration issues and effort

unique to the Cloud.

72
4.3 Migration Influential Cost Factors

4.3 Migration Influential Cost Factors

The report on our migration experiences in Section 4.2.1 helped us identify some

influential cost factors that impact on the effort of the migration process. We
differentiate two types of cost factors: internal and external. These two types are

defined as below:

Definition 1 Internal cost factors

Internal cost factors involve with the migrating system itself. These factors
essentially refers to what migration tasks are required, how they can be achieved,
and determine their complexity, without knowledge of who is carrying out those

tasks and in which conditions those tasks are done. An example of internal cost
factors is: “database migration”, which consists of modifiying schemas, and trans-
ferring data from a local database to a Cloud database.

Definition 2 External cost factors

External cost factors concern with environmental factors that are specific to

each organization, such as: development team’s skills and expertise, or knowledge
of Cloud platforms and offerings. External cost factors determine how fast a
migration task can be completed. For example, a Cloud-experienced practitioner

will usually complete a migration task faster than a non-experienced one.

These two types are very well aligned with the fundamental elements of the
Function Point approach. The internal cost factors are commonly identified first
to identify what needs to be done as well as to measure the complexity of a project.

The result will only refect on the characteristics of the project only, without

consideration of which organization is responsible for that project. External cost

73
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD

factors are then localized for each organization, and then applied on top of the

previous result to derive an estimation of the total effort required for this project.

Based on our observations from Section 4.2.1, the influential cost factors (both

internal and external) are identified as follows. Some factors are similar to tra-
ditional software development cost factors (Ruhe et al., 2003a; Madachy, 1997),

some are specific for migration to the Cloud.

Internal Cost Factors:

Different migration strategies involve different migration tasks. Hence, the


internal cost factors, which reflect what migration tasks are needed, resulted

from choices of migration strategies.

• Compatibility issues: This factor is also affected by the similarity of Cloud


platforms and local servers. If the similarity is high, compatibility issues
can be eliminated. Effort spent on resolving these issues varies from case

to case.

• Library dependency: When an application relies on a library to function

in local server, it requires a similar library in the Cloud platform. If there


exists such a library for the Cloud, it can be reused with some minor effort;

otherwise, more effort would be required to rewrite that library. For ex-
ample, PetStore Java uses JDBC driver to connect to its JavaDB database

and it also uses JPA, which depends heavily on advanced features of JDBC

drivers. If we migrate PetStore’s database to SimpleDB in the Cloud, we

have to implement a full-featured JDBC driver for SimpleDB; otherwise,


PetStore’s data access layer must be rewritten.

74
4.3 Migration Influential Cost Factors

• Database features: Migrating from a Relational Database to Amazon RDS

or Azure SQL requires less efforts than to a NoSQL database like SimpleDB,
because NoSQL database does not support full relational features, such as

Join operation. In the latter case, efforts are required to implement Join
operations or rewrite custom code for the application so that it would not

require Join features.

• Connection issues: In some Cloud migration cases, when only some com-
ponents of the system are migrated to the Cloud while the rest is kept in
house for various reasons (e.g. enterprises may wish to keep their sensitive

data in house), the connection between two parts of the system - one in
house and the other one in the Cloud - may face different issues such as
security and latency.

External Cost Factors:

• Project team’s capabilities: If the project team’s development knowledge


and skills are sufficient, a training process can be picked up quickly and less

effort is required.

• Existing knowledge and experience on Cloud providers and technologies: If

the project team possesses some levels of prior knowledge and experiences

of Cloud services and available tools, the learning curve can be improved
significantly, and hence less effort is required. As discussed in the previous
section, the learning curve is a one time task, but requires significant effort.

• Selecting the correct Cloud platforms and services (IaaS or PaaS): greatly

affects the effort and cost required for the rest of migration activities; how-

75
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD

ever, this practice itself is not a trivial task. If the selected Cloud platform

is highly similar to the application’s environment in the local server, less


effort is required for modification.

• Application’s complexity: If the application’s complexity is high, it requires


more effort to study and modify (if any) the application.

Some of these influential cost factors are specific for migration to the Cloud,
because they are not applicable for a conventional migration project from one
platform to another. For example, a migration project from Java to .Net is a

complete rewrite and it would not have compatibility, datatabase, connection, or


possibly library dependency issues. A migration project from an old version to

a newer version of platform or environment would not have networking issues,


library dependency or database feature issues as discussed above, and so on...
These factors, one way or the other, all affect the effort spent on the Cloud

migration process.

4.4 Taxonomy of Migration Tasks

The purpose of a Cloud migration project is to port an application from a local

data center to a selected Cloud platform with no changes in functionalities or

compromises in performance. In our experiments, our migration projects started

from getting familiar with the application and the selected Cloud platform, to
setting up the environment and the application ready for migration, as well as
modifying and testing to ensure the application properly functions in the Cloud.

Our distinction between internal and external factors suggest that the internal

76
4.4 Taxonomy of Migration Tasks

cost factors (i.e., migration tasks) will form the foundation of the taxonomy.

The list of internal cost factors introduced in Section 4.3, together with related
work from literature review and practitioners’ blogs, enables us to generalise

and propose a general taxonomy of migration tasks that any migration projects
may encounter, and the migration tasks are grouped under different categories

as summarized in Table 4.3. If T is the taxonomy of migration tasks and t is a

migration task, then T is a set of t. A migration project P ⊆ T consists of a


list of migration tasks. Some tasks in the taxonomy can be skipped, while some
tasks can be further broken down to accommodate different requirements of each
project.

The diagram in Figure 4.2 shows the sequence in which Cloud migration tasks

from the taxonomy could be executed, and the possible iterations that may occur.

The following provides a summary of the taxonomy proposed in Table 4.3

and Figure 4.2. The last three columns in Table 4.4 represent whether a specific
migration task is supported by examples from the discussion with Cloud engineers
in our group, or from the literature, or from the practitioners’ blogs.

• Training or Learning Curve - In order to ensure compliance to the

Cloud, a basic understanding of the application and the selected Cloud


platform is required.

Effort is needed for analysing the application, understanding its compo-

nents and how they are coupled together, identifying which modules are

unchanged and which modules need to be modified. It is important to un-

derstand the initial system environment, specifications and configurations


before planning any changes. Effort and costs spent on this task may not

77
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD

Categories Tasks Ex. Lit. Blogs

Training on the existing application: Y


Training or Understand system environment, speci-
Learning fications and configurations
Curve Measure system’s size and Estimate sys- Y
tem development effort
Training on the selected cloud platform: Y
Understand its offerings and technolo-
gies used
Identify any compatibility issues Y
Training on third party tools: Iden- Y
tify and understand additional libraries,
tools for data migration, and any re-
quired middlewares
Installation Set up development tools and environ- Y
and ment
Configuration Install and set up environment in IaaS Y Y
Cloud
Install third-party tools Y
Modify database connection Y
Database
Modify database operation query (if us- Y Y Y
Migration
ing NoSQL Cloud database)
Prepare database for migration Y Y Y
Migrate the local database to Cloud Y Y Y
database
Code Modifi- Any required modification for compati- Y Y
cation bility issues
Examine all changes in network connec- Y
Network tions
Connection Tune appropriate parameters for perfor- Y
mance purpose
Ensure connection security Y
Test if local system works with database Y Y
Testing in Cloud
Test if system in Cloud works with Y Y
database in Cloud
Write test cases and test the function- Y Y
ality of the application in Cloud

Table 4.3: Taxonomy of migration tasks

78
4.4 Taxonomy of Migration Tasks

Figure 4.2: Diagram of Cloud migration task taxonomy

be trivial for reasons such as: coding style by other developers may be dif-
ficult to study, confidentiality issues may mean that applications are not

totally transparent, and applications with many modules interacting with


each other are difficult to isolate for migration purpose (if required). Many

applications contain requirements on security or performance that need to

be investigated thoroughly.

When porting applications to Cloud platforms, no new features are intro-

duced in this study. Therefore, the complexity of the application in terms

79
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD

of Function Points is unchanged, whereas configuration and database con-

nection are more likely to be modified. Efforts spent on this part is directly
proportional to the complexity of the application. The more complicated

the application is, the more time and skills are required to understand it.
In our experiment, PetShop was measured as 118 Function Points and was

estimated to cost 177 hours for development effort. Its requirements and

configurations were studied to identify which classes were more likely to


expose additional changes when migrating.

There exist quite a few major Cloud providers in the market, providing
different services including PaaS and IaaS. Once Cloud services are evalu-
ated and selected, training on these services is necessary. Some Cloud ser-

vices may not fully support some features provided by similar on-premise
technologies, for example, SQL Azure is the most similar to SQL Server
compared to other Cloud databases, yet SQL Azure does not support dis-
tributed transactions as SQL Server does. In our experiment, effort was
spent on training with Windows Azure using the provided Microsoft Azure

Training Kit.

There have been great contributions from the Cloud community to sup-

port Cloud services that integrate seamlessly with existing technologies and
applications. Many open-source third-party libraries and tools have been
developed. Training on these libraries and tools is also a one-time task,

although it is not easy to select the appropriate libraries and tools without

knowledge about them beforehand. These tools can be categorized as: ad-

ditional libraries (e.g. simpleJPA for SimpleDB as discussed above), tools

80
4.4 Taxonomy of Migration Tasks

for data migration (e.g. Codeplex for converting and uploading databases

to SQL Azure), and other utilities (e.g. Windows Azure provides cspack
utility to pack a web site project ready for migrating to Azure). In our

experience, before being aware of this cspack utility, much effort was spent
on transforming a Web Site into a Web Application, which are different in

structure, so that a Web Role can be formed for migrating.

If migrating applications to a specific Cloud platform happens for the first


time, this learning curve is required; otherwise, this step can be skipped. Ef-
fort spent on this learning task depends on the existing skills, knowledge and

experiences of developers, as well as available documentation from Cloud


providers. Although this training activities are one-time tasks, the effort

required is not negligible.

• Installation and Configuration - Different effort is required for these


tasks, depending on different types of Cloud services selected, either PaaS
or IaaS.

Development tools and environment: The application’s development tools

need to be installed to examine the application’s components and to make

any necessary code modifications.

Environment in Cloud: If the target Cloud is IaaS Cloud, effort is required

for setting up and configuring the application’s environment in the Cloud

server similar to its local requirements. If the target Cloud is PaaS Cloud,

this step requires less effort as it is automatically handled by the Cloud


providers. This activity is specific for Cloud migration, as distinct from

migrating an application from one platform to another where there is no

81
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD

such a requirement to replicate the environment.

Third-party tools: Effort is required for installing third-party tools for train-
ing purpose and migration tasks as mentioned above.

• Database Migration - This category depends on how different the two


databases in house and in the Cloud are.

Database connection and query: Database connection string needs to be

changed to connect to the new database server, in our experiment, the con-
nection is modified to use SQL Azure. However, more changes are required

if using SimpleDB as a non-relational database as discussed in Section 4.2.


SimpleDB is a NoSQL database without full support of the JOIN opera-
tions, additional codings are required to provide the same functionalities

and operations of the application, and this can also be categorized as Code
Modification. Even when two databases are the same type but different
versions, changes may also be required for syntax or schema. For example,
PetShop .Net version 4 was developed on SQL Server 2005 while SQL Azure
is only compatible with SQL Server 2008. There is no direct way to convert

PetShop database from SQL Server 2005 to SQL Azure without converting

to SQL Server 2008 first.

Prepare database for migration: SQL scripts need to be transformed appro-

priately to align with third party tools’ requirements for database migration.

Migrate the database: If previous tasks have been properly completed, the

effort required for this task is trivial and it is handled by the third party

tools. Otherwise, plans and actions for previous tasks must be revised.
Nevertheless, the size of the database also affects how fast this task can

82
4.4 Taxonomy of Migration Tasks

be achieved. The bigger the size of the database is, the longer it takes to

migrate. Although most of this time is waiting time and may not require
any extra effort, some effort may be necessary for dividing big databases

into smaller chunks for data transfer purpose.

• Code Modification - This category depends on how different the two

environments in house and in the Cloud are.

Code changes: if the selected Cloud platform provides similar services and
technologies to the application’s environment in house, not much code mod-

ification is required. This is the case for the combination of PetShop .Net

and Windows Azure in our experiment.

Configuration changes: This involves configuration changes in both applica-

tions and Cloud platforms. Similar to code changes, configuration changes


in our experiment are minimal, although it was necessary to pack our ap-
plication together with the configuration file to Azure. In case of migrating

to an IaaS Cloud, additional configuration effort is required, including in-


stallation activities to create a similar environment in the Cloud platform.

Compatibility issues also require major modification and reconfiguration

effort, depending on how compatible the two environments are. Cloud

technologies are generally the latest ones, while the existing applications
may have been developed a few years previously. During that gap, tech-
nologies may have gone through many changes and updates. There may

not be a direct method to update from the old technologies to the latest

ones, meaning that more intermediate steps will be necessary. Also, Cloud

technologies may not provide full support for services and features offered

83
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD

by local servers. Although SQL Azure is similar to SQL Server 2008, it does

not support distributed transactions, while SQL Server 2005 does, and Pet-
Shop .Net utilised this feature for its transactions. Code change is required

to accommodate this compatibility issue.

• Network Connection - This category only applies for partial migration

projects, where only a part of the system is migrated to the Cloud, while

the rest is still hosted in house. Connections amongst system components


are certainly affected, which may lead to performance issues. Connection

security may also require extra attention. For full migration projects, where
the migrating system is ported as a whole, this category can be safely

skipped.

• Testing - This step is one of the most important and essential activities. It
happens during migration to ensure each of the previous steps is completed
correctly, and a full testing process needs to be carried out after migration.

If test cases have already been created for local servers, they can be reused
on Clouds to ensure the application works properly. More test cases specific

for Clouds may need to be considered. Testing needs to be done for each
of the actions taken; however, major milestones for testing can be grouped
as following:

If using PaaS Clouds, migrating the database to Cloud database is required


first, then testing the application in local servers with the Cloud database.

The application can then be migrated to the selected Cloud platform, which

allows testing in the Cloud environment.

If using IaaS Clouds, developers can choose to ignore testing the application

84
4.5 Validation

in local server against Cloud database, depending on how the environment

is set up and configured.

If migrating only some components of the system to Cloud platforms, either

PaaS or IaaS, intensive testing needs to be performed to ensure the entire


system is integrated seemlessly and meets important requirements, such

as security levels and performance quality. Effort required for this task is

relatively large.

These categories are mutually exclusive since they cover different aspects of

a Cloud migration project; but on the other hand, they complement each other
and altogether provide a complete picture of migration to the Cloud. These
categorized migration tasks need to be carefully planned at the early stage of any

migration projects. Some tasks may be broken down into more detailed levels,
whereas some tasks may be skipped, depending on specific characteristics of each
project.

4.5 Validation

As discussed in Chapter 3, the discussion with Cloud engineers in our group,

and the input from the literature and practitioners’ blogs have confirmed the
validity of the taxonomy to some extent. This section attempts to validate our

proposed taxonomy using one industrial migration project to the Cloud that was
conducted by two researchers in our group. This is a consulting project with a

large Australian Financial Service Organisation (FSO) who wish to migrate a part

of their system into the Cloud without any changes to their existing application

code. Although for the time being, the FSO has no plan to migrate the production

85
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD

system into a Cloud computing platform, the main purpose of this migration

exercise is to reduce the operational cost of the development environment. The


development environment is used occasionally but only for two months annually.

Therefore, the cost of owning and maintaining the development environment is


expected to be reduced by migrating to a pay-per-use payment model. However,

since the environment is re-activated often, but for a short period of time such as

a week, the cost and time of re-activating the development environment must be
small. Moreover, the licensing fee for the software that the FSO currently pays
is expected to be reduced by migrating to a pay-per-use paying model as well.
The steps taken in this FSO project are summarised as below:

• Step 1 - Analyse the in-house system to understand its components, op-


erations, and functionalities: The system consists of four main components,
one of which is to be migrated to the Cloud, whereas the other three com-

ponents are kept in house because of security concerns.

• Step 2 - Understand the migration requirements of the FSO in order to


define the best strategies for migration: This migration requires seamless

integration between the migrated component in cloud and the existing en-

vironment in house, and no changes to the application code. Therefore, it is

best to migrate the system to an IaaS cloud. After careful consideration of


all possible alternatives based on the system specification and the migration
requirements, it was decided that EC2 is the most suitable cloud platform

for this migration.

• Step 3 - Understand EC2 and its offerings in order to identify if there are
any compatibility issues: The tasks involved were to mirror the system en-

86
4.5 Validation

vironment into the EC2 environment. It may seem straightforward as first

since EC2 provides infrastructure services and all installation and configu-
ration should be possible. However, the existing FSO system is currently

operated on Windows Server 2003 x64 Enterprise Edition, whereas Amazon


Web Service (AWS) at the time of this project only supports Datacenter

Edition of Windows. Their differences are subtle and Datacenter Edition

is considered to be a superset of Enterprise Edition. Therefore, the dif-


ference of editions does not affect the operations of the system. Similarly,
the current system works with SQL Server 2005 x64 Enterprise Edition,
while AWS at the time of this project supports only Standard Edition. The
main difference between the two versions is the support of clustering. The

development environment of the FSO system does not require clustering of


databases. Therefore, the difference between versions is not a factor.

• Step 4 - Design strategies: include network design, system design, security


design, and monitoring and management controls for migration.

• Step 5 - Setting up Amazon Cloud: This includes sign up for an AWS

account, sign up for Amazon EC2, setting up Amazon EC2 command line

tools, setting up an Amazon Virtual Private Cloud (VPC) for security pur-
pose, getting EC2 instances, and finally adding disks to Windows instances

with pre-installed operating systems and required middleware.

• Step 6 - Setting up the migrated system: Since all required operating

systems and middleware are pre-installed on the machine images, only some

additional components are installed at this step for the migrated system to
properly function, such as: IIS Server, and SQL Server.

87
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD

• Step 7 - Functional test: A series of functional tests provided by FSO

was performed to ensure the various components of the systems were func-
tioning properly and to discover potential problems that might be due to

the migration to the AWS. Performance issues were discovered. The reason
behind this was because the network connection between the migrated com-

ponents and others was the bottleneck. Extra effort was spent on tuning

performance parameters and securing the connections.

These steps can be mapped to the proposed taxonomy as in Table 4.4.

Taxonomy Categories FSO Migration Tasks


Training and Learning Step 1, 2, 3, 4
Installation and Configura- Step 5, 6
tion
Database Migration None
Code Modifications None
Network Connection Step 7
Testing Step 7

Table 4.4: Mapping of the FSO migration tasks and the taxonomy

The mapping described in Table 4.4 shows that the proposed taxonomy is

general enough to cover different types of migration tasks to the Cloud. However,
it can also be further broken down to better fit to specific migration tasks in more

details, such as how network connection and security are handled in step 5, where

Amazon VPC is set up, can be separated into a more detailed category than just

the general installation and configuration category.

88
4.6 Reflection and Discussion

4.6 Reflection and Discussion

In this section, we reflect on our methodology of building the taxonomy of Cloud

migration tasks, and also discuss its threats of validity.

A taxonomy is normally obtained by identifying a list of criteria for a topic of

interest, and then classifying existing elements according to those criteria. In our
context of Cloud migration, the fundamental elements are migration tasks and

they have never been officially identified or organised into a collection. Therefore,

the taxonomy of Cloud migration tasks was derived mainly from our experience
of migrating PetShop .Net to Windows Azure, a PaaS type of Cloud. We also

considered the case of migrating Java PetStore to Amazon EC2, an IaaS type of
Cloud, in an attempt to add more richness to the taxonomy.

In addition, it would be ideal to have external participants for the validation

process, rather than just NICTA projects and participants. However, it was not
easy to locate an external migration project that covers all aspects to be validated
in the taxonomy. It was also not feasible to locate multiple external projects for

this stage’s validation, given that we also had to find data points for next phase.
As a result, the taxonomy proposed in this chapter is exposed to the threat of

external validity. Although the validation in Section 4.5 demonstrates that the
taxonomy can very well fit into a common case of Cloud migration, there is no

guarantee that the taxonomy can be sufficiently applied to every other migration

projects. This is because of the wide variety of Cloud migration project types,

such that, it is not possible to anticipate all migration tasks that could occur in
reality. The taxonomy can only cover general migration tasks that are likely to
occur in a common migration case.

89
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD

However, the structure of the taxonomy is general and flexible enough so that

categories can be broken further down or extended to include new migration


tasks. This characteristic enable the taxonomy to be applicable and adaptable

to any new type of Cloud migration.

In this study, the assumption was that the Cloud target has been already

selected and its selection is outside the scope of a migration project. However, our

experiences show that major effort is required for selecting Cloud providers and
services. Also, for security reasons, large enterprises tend to keep sensitive data
and applications in their local data centers, and migrate only some components

to Cloud platforms. Therefore, enterprises may as well encounter challenging


post-migrating tasks to ensure the entire system functions seemlessly.

The taxonomy is applicable for both PaaS and IaaS Clouds. Due to the
differences of PaaS and IaaS types of Cloud, effort required for each migration
task is also different. Table 4.5 below shows a side by side comparison of how

different effort is required for migrating to PaaS and IaaS Clouds.

Tasks PaaS IaaS


Training or major major
Learning Curve
Installation and minor major
Configuration
Database Migra- major major
tion
Code Modifica- major none
tion
Network Con- none minor
nection
Testing major major

Table 4.5: Efforts comparison for migrating to PaaS and IaaS Clouds

90
4.6 Reflection and Discussion

• Training or Learning Curve - Both PaaS and IaaS Clouds require

significant learning effort for several reasons: the Cloud offers latest tech-
nologies that one may not be farmiliar with; new offerings and services are

rapidly created; and the Cloud has a broad community who contributes
with numerous third-party tools. The task in this category could take up a

huge amount of time for both IaaS and PaaS Clouds at the beginning.

• Installation and Configuration - Creating a similar environment to

the local server in IaaS Clouds requires significant effort compared to PaaS
Clouds. In PaaS Clouds, the environment is handled by Cloud providers.

• Database Migration - Changing and migrating a database to both IaaS

and PaaS Clouds can be very hard if the local databases and the Cloud
database are different. This could require major effort.

• Code Modification - IaaS Clouds provide a more flexible environment to


deploy and manage applications; therefore, no major code modification is
required if the environment in IaaS Clouds has been installed and configured

similarly to that in the local servers. There is no such flexibility in PaaS

Clouds; hence, code modification is needed for the application to run in

PaaS Clouds.

• Network Connection - PaaS Clouds free their users from the burden of

infrastructure management tasks; hence, their flexibility is lower than IaaS

Clouds. Therefore, PaaS Clouds’ users would not need to concern about
application network connections; whereas IaaS Clouds’ users are responsible

for those of their systems, although only minor effort is anticipated.

91
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD

• Testing - Testing is unavoidable when changes are performed in the appli-

cation, either code changes or configuration changes. The migration team


may need to undertake a full testing on the application in the Cloud, both

IaaS and PaaS, to make sure the application functions properly. This effort
depends on the complexity of the application.

Table 4.5 shows a side by side comparison of whether none, minor or major
effort is required for each migration category in PaaS versus IaaS Clouds. IaaS

Clouds provide a more flexible environment to deploy and manage applications;

therefore, no code modification is required. However, installation and configu-


ration tasks in IaaS Clouds to create similar settings to the applications’ local

environment require significant effort compared to PaaS Clouds. Also, both PaaS
and IaaS Clouds require significant learning effort and testing.
Effort required for a migration project to a Cloud platform, either PaaS or

IaaS type of Cloud, depends on various factors as illustrated above. The study
in this chapter enables us to understand these influential aspects and forms a
background for us in our next step of quantifying Cloud migration tasks in the

next chapter.

4.7 Summary

Migrating applications to Cloud platforms requires extra effort to perform migra-


tion tasks as demonstrated in the previous sections. Application migration from

local servers to Cloud platforms is a one-time task and may seem straightforward
at first. However, our experience showed that this process is not automatic and

effort spent on migration may be not trivial.

92
4.7 Summary

In this chapter, we experimented with the re-engineering and migration of a

software application, and successfully deployed it into a Cloud platform. The


experience allowed us to identify important influential cost factors for migrating

a system to the Cloud, which provides the basis for understanding the cost im-
plications of a Cloud migration project. A taxonomy of migration tasks has been

developed and tailored specifically for our Cloud migration context, and applied

to one validation project using different strategies. It will be used as input into
our size measurement model for migration projects to the Cloud in the following
chapter.
The taxonomy consists of six main categories, namely: Training and Learn-
ing, Installation and Configuration, Database Migration, Code Modifications,

Network Connection and Testing. These categories resulted from the internal
cost factors that were identified from our experiment. We have also identified
external cost factors, which are environmental aspects of organizations intending

to conduct migration projects. While the taxonomy and the internal cost factors
indicate what migration tasks are required, and how those tasks are completed;
the external cost factors determine how fast those tasks can be achieved.

93
4. TAXONOMY OF MIGRATION TASKS TO THE CLOUD

94
Chapter 5

Cloud Migration Point

“Measuring programming progress by lines of code is like measuring


aircraft building progress by weight.”

∼ Bill Gates.

The taxonomy of Cloud migration tasks outlined in the previous chapter helps
Cloud consumers to form their migration plans. A Cloud migration project con-
sists of a list of migration tasks from the taxonomy. As a result, the amount of

effort required for a migration project to Cloud is accumulated from the effort
spent on each migration activity or migration task.

In this chapter, we introduce our Cloud Migration Point (CMP) model and

how it can further assist the Cloud consumers in estimating the size of those

migration tasks in their plans, which will facilitate the prediction of the amount

of effort required. We also describe the counting method of the CMP model,
illustrated with examples to help practitioners apply it easily.

CMP is a size metric for Cloud migration projects, which is expected to be

95
5. CLOUD MIGRATION POINT

applicable early in the migration process. Additionally, Cloud migration projects

consider not only the system to be migrated, but also the migration process
where various aspects of the system, besides lines of code, are involved. For

these reasons, Function Point (FP) approach, which has been seen a successful
foundation for many extensions, is more suitable as a basis for the CMP model

than SLOC. We have determined to develop the CMP model by using the well-

known FP approach, and applying it in our Cloud migration context. It is worth


noting that CMP extends FP not by adding more elements into the existing FP
method, but by adopting the three-step approach of FP:

1. Classify the basic estimating units (a function in the FP context, a class in

the Class Point (Costagliola et al., 2005) context, and a migration task in
CMP context) into different pre-defined categories

2. Then for each unit, evaluate its complexity level (Low, Average, or High)

3. Finally, compute the final sizing value

Apart from the FP methodology, the CMP model is also developed on the
basis of the taxonomy presented in Chapter 4. Each category from the taxonomy
will be carefully analyzed and selected to be a CMP component. This will be

discussed in further detail in sub-sections of this chapter.

Sub-sections of this chapter will explain and cover different aspects of the CMP

model, and are arranged as follows: Section 5.1 states the underlying assumptions

of the CMP model. Section 5.2 analyzes the cost factors from the taxonomy

in Chapter 4 to consolidate the fundamental components of the CMP model.


Section 5.3 classifies Cloud migration projects into different types based on their

96
5.1 CMP Assumptions

characteristics. The purpose of this Section is to, later on, show that the CMP

model can be applied to different migration project types. Section 5.4 describes
our CMP metric and its counting process. Section 5.5 demonstrates how CMP

can be applied to size an example Cloud migration project. A reflection on


our process of building the CMP model is presented in Section 5.6. Section 5.7

summarizes and concludes the chapter.

5.1 CMP Assumptions

This section will explain the CMP model’s alignment to the broader scope of our
work (presented in Section 1.4). Some specific assumptions for the CMP model

itself will also be stated.

• We consider migration cases between two data centers only (typically, one

in-house and one in-Cloud). In the case where two or more data centers are
involved, CMP can be applied repeatedly for each pair of data centers.

• Our work only focuses on PaaS and IaaS Clouds. Hence, CMP considers
only IaaS and PaaS, although some parts of our cost model might still be

applicable to other Cloud offerings.

• If it is required to modify the application code for the Cloud environment,

CMP is only applicable for object-oriented applications. It assesses appli-


cation code changes at “class” level.

• We assume that the decision on the Cloud target is not a part of the mi-
gration process. CMP estimates the complexity of migrating to a specific

97
5. CLOUD MIGRATION POINT

Cloud platform, excluding the process of determining the most suitable

Cloud technologies/providers, and the need to get familiar with the specific
Cloud technology and offering.

• We assume that the design decision for the migration has been made, such as

which components of the system to be migrated to the Cloud, which compo-

nents stay in the local data centre, which pieces of code require modification
for the Cloud environment, which network connections to be modified and

what requirements must be satisfied. CMP requires inputs from the design
phase and is most appropriate to apply before the implementation phase of
a migration.

• CMP takes it for granted that all migration tasks have already been out-

lined. CMP measures the size and complexity of migration tasks, hence
migration tasks must be outlined in advance (i.e., the migration plan has
sufficiently completed).

The above presented items form the scope and assumptions of the CMP model
in this chapter.

5.2 Cloud Migration Cost Factors

In chapter 4, we have defined two types of cost factors of a Cloud migration

projects, namely internal and external cost factors. Internal cost factors refer
to what migration tasks are required, how they can be achieved, and determine

their complexity, without knowledge of who is carrying out those tasks and in

which conditions those tasks are done. External cost factors are concerned with

98
5.2 Cloud Migration Cost Factors

environmental factors that are specific to each organization, such as: develop-

ment team’s skills and expertise, or knowledge on Cloud platforms and offerings.
External cost factors determine how fast a migration task can be completed.

The CMP model aims at sizing Cloud migration projects. In other words, the
CMP model will measure the size of all migration tasks involved in a migration

project. As a result, our CMP model focuses only on the internal cost factors and

identify them as sole indicators of the migration tasks’ complexity, regardless of


who conducts those tasks and under what conditions the tasks are carried out.

The internal cost factors essentially equate to the taxonomy of Cloud migration
tasks.

The CMP model measures the accumulated size of all migration tasks making

up the migration project. Therefore, the taxonomy can comfortably be fed as


the input into the CMP model. This section repeats here all categories of the
taxonomy for the convenience of reading, and will also analyze each category of
the taxonomy and determine which categories are suitable for the CMP model,
taking into considerations the assumptions stated in Section 5.1.

• Training or Learning Curve - The tasks in this category rely heavily on

the Cloud experience of developers and their learning abilities, which are
external cost factors. Although this category contributes significantly to the

total effort required, we exclude this category from the scope of the CMP

model. This category itself should be treated in a separate study since it is

also concerned with the learning ability of different individuals.

• Installation and Configuration - When migrating to an IaaS Cloud such as

Amazon EC2, effort is required to install the necessary system software,

99
5. CLOUD MIGRATION POINT

database servers, or middleware; environment variables and settings also

need to be configured. When migrating to a PaaS Cloud such as Microsoft


Azure, installation and configuration effort lies in the application layer,

such as libraries or plugins. If the application before migration relies on


some third-party libraries, similar libraries are required in the Cloud as

well. Effort is required to integrate the new libraries with the application

after migration. Hence, the tasks in this category should be included in the
CMP model.

• Database Migration - Migrating a database to the Cloud can result in


database schema changes and query changes because of differences in ver-
sions, variants (MySQL vs. MSSQL), or database types (Relational vs.

NoSQL). Effort is needed to change schemas, modify queries, transfer and


populate databases. The tasks in this group should be covered by the CMP
model.

• Code Modification - In some migration cases, code modification is required


to adapt to the new programming model in the Cloud, or database access

layer needs to be changed to work seamlessly with different databases in the


Cloud. If a relational database is migrated to a NoSQL Cloud database, the

JOIN operation may need to be added to an application’s code to preserve


system’s functionality. If required libraries are not available in the Cloud,

a rewrite of libraries is necessary; or if similar libraries exist, code needs to

be changed so that application-library integration does not interfere with

system’s functionality. These tasks reflect the changes in the migrating


system; hence, should be assessed by the CMP model.

100
5.2 Cloud Migration Cost Factors

• Network Connection Changes - Within a system S before migration, the

connection between two components A and B is a LAN connection. If only


B is migrated to the Cloud and A is kept in local data center, the LAN

connection between A and B becomes a WAN connection. If both A and B


are migrated to the Cloud, the LAN connection between A and B becomes

a LAN connection in the Cloud (network conditions can be different in the

Cloud). Network conditions in the Cloud (even for LAN) may be different
to the original environment. In all cases, the connection is changed and
effort is required to ensure security and performance are optimal. The
CMP model will also take these tasks into account.

• Testing - Many different testing activities may be required. Testing to


make sure the systems function properly with no performance issues can

be incorporated into other categories. For example, testing tasks to ensure


that network connection security and performance are optimal are included
in the network connection category. Other formal tests that have their

requirements, methodology, and test cases for the Cloud migration context

are not different from the traditional software development. Other size
metrics for traditional software development do not take these testing tasks
into their measurement; similarly, this category is excluded from the CMP

model.

From the above analysis, the CMP model is determined to include 4 main

components: Installation and Configuration, Database Migration, Code Modifi-

cation, and Network Connection. These components capture distinct aspects of


a migration project to the Cloud; therefore, the CMP model is intended to cover

101
5. CLOUD MIGRATION POINT

all these aspects separately.

5.3 Cloud Migration Project Classification

The cost factors identified in Section 5.2 do not apply to all components of the

system, but only to those components that have been affected by the migration.
We classify components involved in a migration into four different categories:

Migrated, Removed, Unchanged and Added. These categories would help us


better understand the dynamics of the migration process, as well as its impact
on the effort as captured in our CMP model. There are two options for an

existing component: either migrated to the Cloud or kept in-house. For the
former option, if the component is migrated to the Cloud without any changes, it
belongs to the Migrated category. If it is migrated to the Cloud and then modified,
it can be considered as a Removed component and a newly Added component.
For the latter option of the component being kept in-house, if nothing changes,

the component belongs to the Unchanged category. If it is changed, it is again


considered as a Removed component and then a newly Added component. If

a component is removed from the system, it belongs to the Removed category.

Similarly, if a new component is added to the system, it belongs to the Added

category. Therefore, these 4 categories are sufficient to cover all components


related to the migration.

It is important to distinguish between a migrating system and a migration

project. The definitions of these two concepts have been clearly defined in Chap-

ter 1, Section 1.2 It is repeated here for convenience purpose: A migrating system
is the system to be migrated to the Cloud, and is defined as a set of components

102
5.3 Cloud Migration Project Classification

required for the system to function properly, such as: third-party libraries or mid-

dlewares, system software, databases, applications’ code, and network connection


amongst its modules. A migration project is defined as a set of migration tasks

to move a migrating system from a local data center to the Cloud.

We classify a migration project by, first, denoting its migrating system’ states
in a local data center and in cloud before and after the migration as summarized

in Table 5.1.

Local Remote
Before Migration L R
After Migration L R

Table 5.1: System’s states before and after migration

Table 5.1 depicts the components that present at each of the states, with the
rows dividing the components temporally and columns dividing the components
spatially. The set of components at each of the states are denoted by L = ∅,

R, L and R . Note that, the same component may appear in different rows but
they cannot appear twice in the same row (i.e. a component cannot appear both

in-house and in-cloud at the same time). Hence, L and R are disjoint sets, and

similarly, L and R are also disjoint. The allocation of components to each state
can be determined using the design documents.

Definition 1 A migration project is defined as a full migration if L ⊆ R ,


otherwise it is a partial migration.

The set of components involved in a migration project can be partitioned into

three categories (or disjoint subsets):

103
5. CLOUD MIGRATION POINT

• Migrated components (M = L ∩ R ) - Components moved from in-house to

the cloud. These components are reused with or without modifications. For
example, third-party libraries, database servers, or system software that are

moved to cloud (i.e., effort involved for installation, configuration, and in-
tegration with the rest of the system); application’s code (i.e., effort needed

for moving and changing code); and database (i.e., efforts required for data

transfer and any required modifications in schema and queries).

• Removed components (R = L \ (L ∪ R )) - Components removed from in-


house as a result of the migration. Removal of components is not always
necessarily because some components can exist without interfering or dis-

rupting the functionality of a system, in which case no effort is required.


However, sometimes this may be necessary to ensure normal operation of
the system, then effort will be required.

• Unchanged components (U = L ∩ L ) - Components that remain unchanged


in-house. These components do not participate in the migration process,

they simply continue to operate in-house as usual, hence no effort is re-

quired.

In addition to the above, there is also the category of Added components ((L ∪
R ) \ (L ∪ R)), which are components added to the system as part of the migra-
tion, such as: new libraries in cloud, newly added code for extra functionality,
or integrating new middlewares. For example, when a library is not fitted for

the cloud environment, a more suitable library is used if it exists in cloud or is

rewritten if it is not available. In a partial migration, if a component remains

in-house and is modified to interact with a component that has been migrated

104
5.3 Cloud Migration Project Classification

to the Cloud, it can be categorized as removing the old component and adding a

new component.

Proposition 2 If x is a component in the local data center before migration

(i.e., x ∈ L), then x is one of a migrated component (i.e., x ∈ M), a removed

component (i.e., x ∈ R) or an unchanged component (i.e., x ∈ U) after migration.

Proof 3 It suffices to show that (1) M ∪ R ∪ U = L, and that (2) the collection
{M, R, U} is pairwise disjoint.

For (1),

M ∪ R ∪ U ≡ (L ∩ R ) ∪ (L \ (L ∪ R )) ∪ (L ∩ L )
≡ (L ∩ (L ∪ R )) ∪ (L \ (L ∪ R )) ≡ L.

For (2), there are three cases:

(i)

M ∩ R ≡ (L ∩ R ) ∩ (L \ (L ∪ R ))
≡ (L ∩ R ) ∩ (L ∩ (¬L ∩ ¬R )) ≡ ∅;

(ii)

M ∩ U ≡ (L ∩ R ) ∩ (L ∩ L ) ≡ (L ∩ (L ∩ R ))
≡ L ∩ ∅ ≡ ∅.

Note that (L ∩ R ) ≡ ∅ as defined above;

(iii)

R ∩ U ≡ (L \ (L ∪ R )) ∩ (L ∩ L )
≡ (L ∩ (¬L ∩ ¬R )) ∩ (L ∩ L ) ≡ ∅.

105
5. CLOUD MIGRATION POINT

The effort associated with each of the categories defined above are carefully

captured in our CMP model. Roughly speaking, migrating components require


the most effort, followed by adding and removing components then components

with no changes. Each component here can be a piece of code in the application,
or a database, or a third-party software to enable the whole system to function

properly. Extra effort may also be required to ensure these components work

together seamlessly.

5.4 Cloud Migration Point

The classification of Cloud migration projects discussed in Section 5.3 can be


seen as a way to allocate components of a migration project into different types.

Regardless what type a migration project is, the effort required for the whole
project still aligns with the CMP components defined in Section 5.2.

The CMP metric consists of 4 main components (and each component is a set
of related migration tasks):

• Network Connection Component: CMPconn - covers all migration tasks re-


lated to network connection changes.

• Code Modification Component: CMPcode - concerns with all application


code changes.

• Installation and Configuration Component: CMPic - includes all tasks to


install and configure the Cloud environment to be suitable for the migrating

system.

106
5.4 Cloud Migration Point

• Database Migration Component: CMPdb - considers all database-related

migration tasks.

Each of these CMP components is developed in light of the FP three-step

approach. Particularly:

• Firstly, each migration task in each CMP component is identified and classi-
fied into a pre-defined sub-category. These sub-categories will be discussed

further for each component later in the chapter.

• Secondly, each migration task is evaluated on its complexity level (Low,


Average, or High) based on some pre-defined criteria.

• At this last stage, each migration task has already been classified into a
specific type and has been evaluated with a complexity level in the first two
steps. A weighted value will then be assigned for each task accordingly.

Finally, the total value of this CMP component is the sum of weighted
values of all migration tasks in this component.

Then, the final CMP value is calculated as a weighted sum of its four compo-

nents CMPconn , CMPcode , CMPic , and CMPdb , which measure size of migration
tasks related to connection changes, code changes, installation and configuration,
and database changes, respectively. In this section, we delve further into the

components of each category to assess the complexity of each migration task.

The weighted values assigned for each migration tasks in the third step are
initially derived from our discussion with a group of Cloud engineers, who have

carried out different types of Cloud migration projects themselves. These values

107
5. CLOUD MIGRATION POINT

will be calibrated further in Chapter 6 with more empirical data. In this chapter

we will present the model with these initial values.

5.4.1 Network Connection Component: CM Pconn

CMPconn assesses all migration tasks related to network connections and evaluates

their complexity. It adopts the three-step approach from FP as discussed above.


First, all network connections that will be affected by the migration process

and require effort to optimize performance are identified and classified into three
types:

• LAN-to-LAN: A connection belongs to this type if both ends A and B


of the connection are migrated from the local data center to the Cloud,

i.e., {A, B} ⊆ L ∩ R . The LAN connection in the local site becomes a


LAN connection in the Cloud. Its performance may be affected because
of possible changes in the network environment. Some migration tasks

and minor effort are expected to ensure that security and performance are
preserved.

• LAN-to-WAN: A connection is classified into this type if only one end A

of the connection is migrated to the Cloud while the other end B stays in-

house (i.e., A ∈ L ∩ R and B ∈ L ∩ L ). The LAN connection in the local


site becomes a WAN connection spanning from in-house to the Cloud over

an Internet connection. Major effort is anticipated for securing the WAN


connection and optimizing its performance.

• WAN-to-LAN: This type of connection happens if before migration, a part

of the system is already in the Cloud, i.e., R = ∅. Before the migration, this

108
5.4 Cloud Migration Point

is a WAN connection with one end A in local data center (i.e., A ∈ L) and

the other end B in the Cloud (i.e., B ∈ R). After the migration, both ends A
and B are in the Cloud (i.e., A ∈ L ∩ R and B ∈ R ∩ R ). The connection
becomes a LAN connection in the Cloud environment. Migration tasks
related to this type are to undo all security and performance tasks applied

from the previous type LAN-to-WAN. This is necessary because a WAN

optimization is unlikely to be the best option for LAN performance.

Second, the complexity level (Low, Average, or High) of all migration tasks
involved in each connection is evaluated based on its requirements for security
and protocol optimization using Table 5.2. We identify these two dimensions:

Security and Protocol Optimization, as main cost factors for connection-related


tasks in the Cloud context, based on our Cloud migration experience with cost
breakdown analysis, discussion with Cloud engineers, analysis of the taxonomy

from the previous Chapter, and close study of many Cloud practitioners’ blogs
and discussions.

Protocol Security
Optimization Required Not Required
Required High Average
Not Required Average Low

Table 5.2: Complexity evaluation for each connection

Lastly, a weighted value is assigned for each connection, based on its type
identified from the first step and its complexity level evaluated from the second

step, using Table 5.3. For example, if a connection is of LAN-to-WAN type and
of High complexity level (i.e., it requires effort for both security and protocol

optimization), its associated weight value would be 9. Values in Table 5.2 and

109
5. CLOUD MIGRATION POINT

5.3 are defined from our discussion with a group of cloud engineers involved in

cloud migration projects.

Connection Connection’s Complexity Level


Total
Type Low Average High
LAN-to-LAN ... × 1 = ... ... × 3 = ... ... × 4 = ... ...
LAN-to-WAN ... × 1 = ... ... × 6 = ... ... × 9 = ... ...
WAN-to-LAN ... × 1 = ... ... × 6 = ... ... × 9 = ... ...
CMPconn ...

Table 5.3: Evaluating CMPconn

The value of CMPconn is defined as the weighted sum of all identified connec-
tions:


2 
2
CM Pconn = xij × wij
i=0 j=0

where xij is the number of connections type i with complexity level j, and wij
is the weighted value for connection type i and complexity level j.

5.4.2 Code Modification Component: CM Pcode

CMPcode assesses any migration tasks relating to code changes. These tasks can
vary from adding new functionality, removing unnecessary code, to modifying

code to use new databases or integrate with new libraries. CMPcode is inherited
from Class Point (Costagliola et al., 2005) but with modifications to adapt to code
changes rather than adding new functionality. Similar to CMPconn , CMPcode also
follows FP’s three-step approach.

First, all classes in application code that require modification efforts are iden-

tified and classified into four types as defined in Class Point (Costagliola et al.,

110
5.4 Cloud Migration Point

2005):

• Problem Domain Type (PDT): classes that represent real-world entities in

the application domain of the system.

• Human Interaction Type (HIT): classes designed for information visualiza-

tion and human-computer interaction.

• Data Management Type (DMT): classes that accommodate data storage

and retrieval.

• Task Management Type (TMT): classes that are responsible for definition

and control of tasks, communications between subsystems and with external


systems.

Identify:
Before changing code After changing code
A - a set of attributes A - a set of attributes
M - a set of public methods M  - a set of public methods
S - a set of services re- S  - a set of services re-
quested from other classes quested from other classes
Derive:
|A \ A | : number of attributes removed
|A \ A| : number of attributes added
|M \ M  | : number of methods removed
|M  \ M | : number of methods added
|S \ S  | : number of requested services removed
|S  \ S| : number of requested services added
Define the changes:
CA = |A \ A | × 0.2 + |A \ A| : changes in attributes
CM = |M \ M  | × 0.2 + |M  \ M | : changes in methods
CS = |S \ S  | × 0.2 + |S  \ S| : changes in services requested

Table 5.4: Elements of each changed class

111
5. CLOUD MIGRATION POINT

Second, each class’s changes in three dimensions: attributes (CA), public

methods (CM ), and services requested from other classes (CS), are evaluated.
These changes are made of the number of elements to be removed and added by

following three steps in Table 5.4.

The sets of three elements (attributes, methods, services requested) of the

system are identified both before and after code change (e.g., A and A are sets
of attributes before and after the migration, respectively). This information is
already available after the design phase of the development cycle, where all design

decisions have been made.

The number of elements to be removed and added is calculated by taking

the differences between its sets before and after the migration(e.g., |A \ A | and
|A \A| are the number of attributes to be removed and added, respectively). The
final values CA, CM , and CS are determined by applying a factor of 0.2 and 1
on removing and adding tasks, respectively (e.g., , CA = |A \ A | × 0.2 + |A \ A|).
These factors were suggested by Niessink and Vliet (Niessink & Vliet, 1997) since

a removing task also requires effort although not as significant as an adding task.
One element which no longer contributes towards a system’s functionality is better

removed because its presence may result in system’s unexpected behaviours.

CA, CM , and CS are defined to capture aspects of changed classes. Special

circumstances happen when a class is newly added, i.e., there are no existing sets

of elements before the migration, or A = M = S = ∅. In this case,

CA = |A \ A | × 0.2 + |A \ A| = 0 × 0.2 + |A | = |A |,

which is the number of attributes in the new class. Similarly CM = |M  | and

CS = |S  |, which are the number of methods and services requested in the new

112
5.4 Cloud Migration Point

class. These three values are similar to Class Point for sizing a new class for

development effort. In other words, CA, CM and CS are also valid for capturing
newly added code.

These three dimensions form the basis to evaluate each changed class’s com-

plexity level as in Table 5.5. The complexity level indicators are inherited from

Class Point.

Changes in CA
CA)
Changes in Attributes (CA
CM )
methods (CM 0−5 6−9 ≥ 10
0−4 Low Low Average
5−8 Low Average High
≥9 Average High High
(a) Changes in services requested (CS): 0 − 2

Changes in CA
CA)
Changes in Attributes (CA
CM )
methods (CM 0−4 5−8 ≥9
0−3 Low Low Average
4−7 Low Average High
≥8 Average High High
(b) Changes in services requested (CS): 3 − 4

Changes in CA
CA)
Changes in Attributes (CA
CM )
methods (CM 0−3 4−7 ≥8
0−2 Low Low Average
3−6 Low Average High
≥7 Average High High
(c) Changes in services requested (CS): ≥ 5

Table 5.5: Complexity evaluation for each class

Lastly, a weighted value is assigned for each changed class based on its type

identified from the first step and its complexity level evaluated from the second

step. These weights are also adopted from Class Point (shown in Table 5.6).

113
5. CLOUD MIGRATION POINT

Class Class’ Complexity Level


Total
Type Low Average High
PDT ... × 3 = ... ... × 6 = ... ... × 10 = ... ...
HIT ... × 4 = ... ... × 7 = ... ... × 12 = ... ...
DMT ... × 5 = ... ... × 8 = ... ... × 13 = ... ...
TMT ... × 4 = ... ... × 6 = ... ... × 9 = ... ...
CMPcode ...

Table 5.6: Evaluating CMPcode

The value of CMPcode is computed as a weighted sum of all changed classes:


3 
2
CM Pcode = xij × wij
i=0 j=0

where xij is the number of classes of type i with complexity level j, and wij
is the weighted value for class type i and complexity level j.

CMPcode is analogous to Class Point in the sense that it also assesses a class’
attributes, public methods, and services requested from other classes. However,

it extends Class Point by evaluating the changes of elements in a class by taking

into account both adding and removing tasks. Nevertheless, its validity still holds
when it comes to adding an entirely new class, in which case its counting approach
is exactly the same as Class Point, as shown above. As a result, all complexity

levels and weighted values can be sufficiently inherited from Class Point.

5.4.3 Installation and Configuration Component: CM Pic

CMPic assesses all migration tasks related to Installation and Configuration (IC),

such as: installation of system software, middleware, database server, third-party

library; or configuration of environment variable and basic network information.

114
5.4 Cloud Migration Point

CMPic is determined in a similar manner as the previous two components of

CMP.

First, all required installation and configuration tasks are identified and clas-

sified into two types:

• Infrastructure level: software or servers required to set up the environment

belong to this type, for example, setting up EC2 instance or image, installing
the operating system and middleware, or installing database server.

• Application level: this type consists of any third-party libraries that the
application requires, for example, JDBC drivers for databases. When an

application relies on an external library to function properly, and that li-


brary does not exist within the Cloud environment, there are two options:

– (1) Rewrite the library from scratch for the Cloud environment - This

is seen by CMP as adding new code into the system and is sufficiently
captured by CMPcode . Hence, the migration tasks related to this option
are excluded from CMPic .

– (2) Reuse a similar library (if one exists) in the Cloud environment,

and change code in the system to preserve functionality and to connect

with the new library seamlessly - The migration tasks involved in this

option are integrating the new library into the system, which will be
assessed by CMPic , and changing code, which is assessed by CMPcode
and excluded from CMPic . If the libraries are available in the Cloud

environment exactly as required, the migration tasks expected are to


integrate them with the system and are measured by CMPic .

115
5. CLOUD MIGRATION POINT

Second, we evaluate the complexity of each IC task based on the number

of configuration steps required and the installation methods (from binary files
or source code) as in Table 5.7. Installation and Configuration usually go to-

gether for each package or software, for example, when java is installed, the
JAVA HOME variable needs to be set accordingly; or when MySQL is installed
in an Ubuntu EC2 instance, it is not accessible from outside the instance by de-
fault, hence reconfiguration for accessibility is required. Therefore, Installation

and Configuration tasks are evaluated together based on the following criteria:

• Installation: is the installation package or only source code available? or


no installation is required at all?

• Configuration: for each installation, how many configuration steps are re-
quired?

Installation
Configuration
No installation Package Source Code
<2 Low Low Average
2−5 Low Average High
≥6 Average High High

Table 5.7: Complexity evaluation for each IC task

For example, the IC task of installing MySQL from an installation file and

consisting of one configuration to allow global accessibility is of Low complexity.

Finally, each IC task is assigned with a weighted value as in Table 5.8 based

on its type from the first step and its complexity level from the second one. This
last step is necessary because an IC task at the Application level requires different
amount of effort from the same complexity IC task at the Infrastructure level.

The final value of CMPic is determined as:

116
5.4 Cloud Migration Point

IC IC’s Complexity Level


Total
Type Low Average High
Application ... × 1 = ... ... × 2 = ... ... × 7 = ... ...
Infrastructure ... × 1 = ... ... × 3 = ... ... × 9 = ... ...
CMPic ...

Table 5.8: Evaluating CMPic


1 
2
CM Pic = xij × wij
i=0 j=0

where xij is the number of IC tasks of type i with complexity level j, and wij
is the weighted value for IC task type i and complexity level j.

5.4.4 Database Migration Component: CM Pdb

CMPdb assesses all migration tasks related to modifying queries and populating
data to new databases, excluding database server installation tasks and any code

changes required which have been covered by CMPic and CMPcode , respectively.
Since the effort required for each query modification task or data population task

is quite uniform, CMPdb is easier to calculate than other CMP components.

First, all database related tasks are identified and classified into two types:

• Query modification task: when a database changes in database type (e.g.,

MySQL to MSSQL), or database version, or from relational to NoSQL

database, queries must be modified accordingly.

• Data population task: Data in each table must be packaged and loaded into

the new database.

117
5. CLOUD MIGRATION POINT

Second, the complexity of each task is determined based on the differences

between the database of the local data center and the database in cloud: same
type of relational database, same type of relational database but different versions,

different types of relational databases, or relational to NoSQL database. Table


5.9 summarizes these complexity levels.

Database changes Complexity level


Same relational database, same version Low
Same relational database, different version Average
Different relational databases Average
Relational to NoSQL databases High

Table 5.9: Complexity evaluation for each database task

Finally, CMPdb is determined by the number of database tasks and for each
database task its associated weight as in Table 5.10.

Complexity Level
Type Total
Low Average High
Query Modification ... × 1 = ... ... × 3 = ... ... × 8 = ... ...
Data Population ... × 3 = ... ... × 4 = ... ... × 10 = ... ...
CMPdb ...

Table 5.10: Evaluating CMPdb

The final value of CMPdb is calculated as:


1 
2
CM Pdb = xij × wij
i=0 j=0

where xij is the number of database tasks of type i (i.e., the number of queries

to be modified or the number of tables to be populated) with complexity level j,

and wij is the weighted value for database task type i and complexity level j.

118
5.5 CMP Application

5.4.5 CMP

The final value of CMP is determined as a weighted sum of its four components

CMPi with i ∈{conn, code, ic, db}:


3
CM P = CM Pi × wi
i=0

where CMPi is the value of CMP type i, and wi is the weighted value for CMP
type i (as shown in Table 5.11).

Type CMPconn CMPcode CMPic CMPdb


Weight 3 5 2 1

Table 5.11: Weighted values of CMP’s components

Conclusion:

In this section, we have presented the CMP model and its counting method
for sizing a Cloud migration project. The greater the CMP value is, the more
complicated the project is, and the more effort is required.

5.5 CMP Application

This section will demonstrate how CMP can be applied to size a Cloud migration
project in practice. In this section, we use the example of PetShop .Net that

has been described in the previous chapter (Section 4.2). For the convenience
of referencing, we summarize here again our experiment process of migrating

PetShop .Net to Windows Azure and SQL Azure databases as following:

119
5. CLOUD MIGRATION POINT

• We have used the existing application PetShop which was not developed by

ourselves; hence, we need to learn, understand, and get PetShop to work


on local machine first.

• PetShop was developed on an older platform than the current version sup-

ported by Windows Azure. Windows 7 is required to deploy applications to


Windows Azure, while PetShop installation file was packaged for Windows

XP and could not run properly on Windows 7. We needed to deploy the

PetShop source code onto Windows 7 manually.

• The same issue is applied for PetShop database. There are existing tools of-
fering database and data transfer from local servers to SQL Azure; however,
it requires SQL Server 2008 to be installed, while PetShop was designed to
work with SQL Server 2005 and cannot be installed directly on SQL Server

2008. We had to manually retrieve and run the database script on SQL
Server 2008.

• In order to deploy applications into Windows Azure Cloud, it was important

to create a package file and a configuration file from the existing source code.
Azure plugin for Visual Studio provides a quite straightforward method to
achieve this; however, this method works with “Web application project”

only, while PetShop was created as a WebSite project, where there is no

project file and it relies on ASP.NET dynamic compilation to compile pages

and classes in the application. Effort was also spent on converting WebSite
project to Web Application project. Alternatively, the utility tool cspack

provided by Azure can also be used to create the package file.

120
5.5 CMP Application

Our experiment with PetShop .Net includes tasks to enable the application

work on local machine prior to migration. This is out of scope of our Cloud
migration project as outlined in Sections 1.4 and 5.1. The starting point of the

migration project is defined when PetShop .Net is already running in the local
machine and is ready to be migrated, and the ending point of the migration

project is when PetShop has been all moved to Windows Azure together with its

database. CMP model for sizing a migration project to the Cloud only consider
migration tasks within the scope of the defined migration project. Hence, we
exclude all tasks to understand the application’s source code and operations, or
to install packages to enable the application to work on local machines.
As a result, migration tasks for PetShop can be selected and categorized into

four components of the CMP model as following:

• CMPconn : There is no LAN or WAN connections amongst components


of the PetShop application; hence, no migration tasks are required for the
CMP connection component. As a result,

CM Pconn = 0

• CMPcode : There is not much code modification required; however, we need


to modify the database connection string to use the new database in SQL

Azure. Also SQL Azure does not support distributed transactions, which

PetShop utilised this feature for its transactions. Hence we need to modify
code to accommodate this compatibility. The changes in code are reported
in Table 5.12. The weight values in Table 5.12 are referenced from Table

5.6.

121
5. CLOUD MIGRATION POINT

Classes Complexity Weights


1 class of Data Management Type Low 5
2 classes of Data Management Type Average 8
1 class of Task Management Type High 9

Table 5.12: Code changes for PetShop

The value of CMPcode is computed as a weighted sum of all changed classes:

CM Pcode = (1 × 5) + (2 × 8) + (1 × 9) = 30

Total number of hours spent on these tasks was recorded as 10 hours.

• CMPic : This is a migration project to PaaS Cloud, so there was no in-


stallation required in Windows Azure. However, some installations were

required to facilitate the migration of the application and its database to


Windows Azure cloud, including Visual Studio 2010 to modify and compile
PetShop code, cspack utility to convert PetShop from Website to Web Ap-
plication, SQL Server 2008 to convert PetShop database from SQL Server
2005 to compatible format for SQL Azure, codeplex to migrate data to SQL

Azure, and Windows Azure Tools for Visual Studio to create package file
and configuration file from PetShop source code, so that it can be deployed

into Windows Azure platform.

All installation tasks are reported in Table 5.13. The weight values in Table

5.13 are referenced from Table 5.8.

The value of CMPic is computed as a weighted sum of all installation tasks:

CM Pcode = (1 × 1) + (4 × 3) = 13

122
5.5 CMP Application

Number of Installations Type Complexity Weights


1 Infrastructure Low 1
4 Infrastructure Average 3

Table 5.13: Installations for PetShop

Total number of hours spent on these tasks was recorded as 14 hours.

• CMPdb : PetShop database is SQL Server 2005, while SQL Azure requires
a database in SQL Server 2008. This migration is considered as same rela-
tional database type with different version. Based on Table 5.9, the com-

plexity of this database migration is Average. Some query modification


tasks were performed to align PetShop database to the new version 2008,
and some tasks were done to populate data into the new database in SQL

Azure, including dumping the old database and restoring it to the new
database.

All database-related tasks are reported in Table 5.14. The weight values in

Table 5.14 are referenced from Table 5.10.

Number of Tasks Type Complexity Weights


5 Query modification Average 3
2 Data population Average 4

Table 5.14: Database Migration for PetShop

The value of CMPdb is computed as a weighted sum of all database-related


tasks:

CM Pdb = (5 × 3) + (2 × 4) = 23

Total number of hours spent on these tasks was recorded as 7 hours.

123
5. CLOUD MIGRATION POINT

Components Value Weights Hours


CMPconn 0 3 0
CMPcode 30 5 10
CMPic 13 2 14
CMPdb 23 1 7

Table 5.15: CMP components for PetShop

The value of those four components are summarize in Table 5.15 together with

their weights as referenced from Table 5.11.

The value of CMP is computed as a weighted sum of its four components:

CM P = (0 × 3) + (30 × 5) + (13 × 2) + (23 × 1) = 199

Total number of hours spent on these tasks are: 0 + 10 + 14 + 7 = 31 hours.

Conclusion:

This Section has demonstrated how the CMP counting process can be applied
for sizing a Cloud migration project within its scope.

5.6 Reflection and Discussion

In this section, we will reflect on our process of developing the CMP model. The

discussion will evolve around the structure and methodology of the CMP model.
Other discussion on its validity will be saved for chapter 6 about Validation.

The model has been developed through a few iterations. The model presented
in this chapter is the most basic version, which can be used as a foundation for

any tuning on its parameters later on.

There are 37 tunable parameters in the model (reflected in Tables 5.3, 5.6, 5.8,

124
5.6 Reflection and Discussion

5.10, and 5.11). In this basic version, the initial values for these parameters were

derived from our discussion with a group of Cloud engineers, who have conducted
some migration projects to the Cloud. Individual discussion was carried out with

each Cloud engineer to determine the value of each parameter. We then derived
the average value from all discussion for each parameter. We employed the expert

judgement approach for the parameter values at this stage because of the lack of

past projects of migration to Cloud. The only data points we had at that stage
were from the migration exercises and projects conducted by our group.

We took a further step to improve our model by, firstly, looking for more
data points. Survey and interviews were conducted with academic and industrial
practitioners, which will be described in more details in the next chapter. These

data points are more general and of larger scope than the initial ones. The data
collection and tuning process will be discussed further in Chapter 6.

Although CMP is developed based on FP, it is different from other FP ex-


tensions in the sense that it does not add more components into the existing FP

model, it only follows the three-step approach of FP. As a result, CMP is also
affected by some limitations of FP as already being criticized (Lokan, 1998; Low

& Jeffery, 1990; Symons, 1988; Matson et al., 1994; Kitchenham, 1997), such as:
Classification of all system component types’ complexity as low, average, or high,

has the merit of being straightforward, but is criticized as oversimplified. The


work by Abran & Maya (1995) addressed this oversimplifying issue by proposing

an extended FPA technique, which subdivides the complexity classification of FP

from three intervals (low, average, and high) into five intermediate subintervals.

This extension proposed a finer granularity for counting FP of a development


project; however, this approach still does not address the upper bound of count-

125
5. CLOUD MIGRATION POINT

ing FP (and similarly for CMP). For example, a system component containing

over 100 data elements is given at most twice the function points of a component
with one data element. Similarly, CMP suffers from the same problem as FP,

e.g., an installation and configuration tasks containing 100 configuration steps


is given at most three times the points of an installation and configuration task

with one configuration step. However, compared to FP and other extensions, this

limitation is less problematic for CMP, since there are normally many migration
tasks with few steps in each task.

The choice of weights has been derived from the expert judgement method

and, in the next chapter, tuned using a set of projects from external sources, but
it is also reasonable to ask if it will be valid in all circumstances. The threats of
validity discussion will be covered in Chapter 6.

The current CMP model only considers internal factors, but not external
factors. Internal factors are to ensure all necessary migration tasks are counted;

while external factors are to adjust and assess the complexity of the migration
tasks to each organization. Further work is scheduled to explore external factors
as well. The challenge with external factors is it is very difficult to identify a

sound list of external factors. It is extremely hard, if not impossible, to justify

whether they are the right factors, or whether the list is complete, and how to

identify all factors.

The CMP model measures the size of a migration project from a local data

center to cloud with the condition L = ∅, as discussed in Section 5.3. However,

the CMP model was developed without any constraint on L. In other words, the

CMP model is also applicable to migration projects with L = ∅, which means


the system can be migrated from cloud back to the local data center. This

126
5.7 Summary

characteristic of CMP enables the measurement to expand beyond just two data

centers. When there are more than two data centers (either from local to cloud, or
vice versa) involved in the migration process, the CMP model can be repeatedly

applied for each pair at a time, and this can be repeated until all migration tasks
are considered.

5.7 Summary

In this chapter, we have developed the CMP model as an important software


size measure for legacy-to-Cloud migration projects. Our study shows CMP is

more suitable for Cloud migration projects than other existing size metrics in
the literature since it captures special aspects of the Cloud migration context,
as discussed in section 5.2. Moreover, CMP emphasises the distinct features of
the Cloud migration, as distinct from migrating between two local data centres,
for example, Cloud users (or developers) do not possess full control over the

Cloud environment as they do in a local data centre. This results in limited

range of actions for each migration task. Therefore, the CMP model takes into
consideration Cloud-specific dependencies for each migration task, for example,

only security and protocol optimisation are assessed for each connection task, and
database tasks are concerned with migrating from relational to NoSQL databases,

and so on.

In a project development cycle, the CMP model fits well before the implemen-

tation phase and after the design phase. One important assumption for CMP is
all design decisions have been made. These design decisions have direct impact

on how CMP is counted, since they define all anticipated migration tasks. The

127
5. CLOUD MIGRATION POINT

CMP counting process itself does not require much training and effort; however,

its accuracy relies on the sufficiency and granularity of the migration task list.
Therefore, it is important to carefully analyse the list of expected migration tasks

to ensure it captures the Cloud migration aspects adequately and with as much
detail as possible.

128
Chapter 6

Validation

“Trying to improve something when you dont have a means of mea-


surement and performance standards is like setting out on a cross-

country trip in a car without a fuel gauge. You can make calculated
guesses and assumptions based on experience and observations, but
without hard data, conclusions are based on insufficient evidence.”

∼ Mikel Harry.

Validation is an essential process to justify whether a software metric meets


its specification and fulfils its intended purpose (Briand et al., 1996; Costagliola

et al., 2005). It is widely accepted that there are two types of validation required
for software metrics, namely theoretical validation and empirical validation. The

objective of the theoretical validation is to prove that a metric sufficiently satisfies

the necessary conditions of a measurement metric that it claims to be (such

as sizing metrics, complexity metrics, cohesion metrics and coupling metrics),


whereas the empirical validation is to show that the metric is practically useful

129
6. VALIDATION

within a given context.

The CMP metric, similar to FP, incorporates both size and complexity con-

cepts. In addition, CMP is a metric related to both processes and products.


Briand et al. (1996) proposed a list of mathematical properties for size metrics

and complexity metrics, which focus on products. There exists no set of proper-
ties for both product and process sizing metrics yet ; hence, the set of criteria for

product-only size as proposed by Briand et al. (1996) is used in our theoretical

validation, although it is not quite sufficient yet.

Therefore, the main validation of CMP in this chapter is empirically based.


We are challenged to demonstrate that CMP is practically useful in the Cloud

migration context. Data on past Cloud migration projects must be available for
this purpose, including what tasks have been carried out and how much time has
been spent on those tasks. However, there exists no public repositories for such
data unlike traditional software development projects. As a result, a survey has

been conducted at this stage to collect relevant data. More details on this will
be presented later in the sub-sections of this chapter. Also, in this chapter, two
terms CMP weights and CMP parameters will be used interchangeable, and they

both mean the weighted values of each CMP component and their elements as

presented in Chapter 5.

The structure of this chapter is arranged as follows to cover important aspects


of the validation process: Section 6.1 demonstrates how the CMP model satisfies

a set of criteria proposed for product sizing metrics. The empirical validation is

divided into three phases. Section 6.2 describes the first phase of the empirical

validation, where the CMP model is evaluated on the initial set of 6 migration
projects conducted by our group. This section also states the evaluation criteria

130
6.1 Theoretical Validation

and the approach we follow for the empirical validation purpose. The result of

this phase 1 validation shows that CMP is potentially an indicator for Cloud
migration effort estimation. However, more data from external organizations are

necessary to demonstrate that CMP is also externally valid. Section 6.3 presents
the final dataset we obtained from conducting a survey. A similar empirical

validation is performed again on CMP using the new dataset, called Empirical

Validation Phase 2, and is presented in Section 6.4. The result shows that the
parameters (or weights) of CMP need further calibration. Hence, Section 6.5
demonstrates the process of calibrating the CMP weights. In this section, we
also state a list of assumptions made for developing the model, and test their
plausibility using the available data from the survey. This list of assumptions

demonstrate the high complexity and difficulty of validating the metric. Section
6.6 illustrates the Empirical Validation Phase 3, where CMP with the calibrated
weights is validated on the new dataset. The result shows that the calibration

improves the performance of the CMP model significantly, and the model can
be used as a predictor for effort estimation of the Cloud migration. Section 6.7
discusses the threats of validity of the model. Lastly, Section 6.8 summarizes and

concludes this chapter.

6.1 Theoretical Validation

Briand et al. (1996) proposed a generic mathematical framework that defines

some software measurement concepts, such as size and complexity. The frame-
work provides different sets of convenient and intuitive properties which are used

as necessary conditions for each measurement concept. In this section, we math-

131
6. VALIDATION

ematically validate CMP against three properties of a size concept proposed in

(Briand et al., 1996), since CMP is a sizing metric developed to measure the size
of migration projects.

Based on (Briand et al., 1996), a system S can be represented as a pair E, R ,


where E is the set of elements of S, and R is a binary relation of E (R ⊆ E × E).

In the context of this paper, a migration project is defined as a set of migration

tasks. In light of this analogy, a migration project can be represented as a system


S = E, R , where E is the set of migrating tasks of S, and R is the set of

relations between migration tasks e ∈ E. Particularly, if e1 , e2 ∈ E are network


connection tasks, then the relation r ∈ R such that r = e1 × e2 is the common
system component involved in these two network connections. Similarly, if e1 and
e2 are database tasks, then r is the relation or table involved in these tasks, and
so on.

Three properties for a size metric proposed by Briand et al. (1996) are: Non-
negativity, Null Value, and Module Additivity. These properties are formalized

as:

• Property Size 1: Non-negativity - The size of a system S = E, R is


non-negative:

Size(S) ≥ 0

Proof 1 Size(S) is the CMP value of the migration project S. CMP is ob-

tained as a weighted sum of its four components, which in turn are weighted

sums of non-negative numbers. Hence, CM P = Size(S) ≥ 0, or the Non-


Negativity Property is verified.

132
6.1 Theoretical Validation

• Property Size 2: Null Value - The size of a system S = E, R is null

if E is empty:

E = ∅ ⇒ Size(S) = 0

Proof 2 CMP is determined by assessing each component to be migrated,


evaluating its migration task’s complexity, and assigning an associated weight

to it. The final value of CMP is the sum of all the weights of the migration

task set. If E = ∅, i.e., there exist no components to be migrated, there is


no weight to be assigned. Hence, Size(S) = CM P = 0, or the Null Value
Property holds.

• Property Size 3: Module Additivity - The size of a system S = E, R


is equal to the sum of the size of two of its modules m1 = Em1 , Rm1 and
m2 = Em2 , Rm2 such that any element of S is an element of either m1 or
m2 :

∀m1 , m2 ((m1 ⊆ S and m2 ⊆ S and

E = Em1 ∪ Em2 and Em1 ∩ Em2 = ∅)

⇒ Size(S) = Size(m1 ) + Size(m2 ))

Proof 3 The CMP calculation examines each element ei (i.e., migrating


task i) of E = {e0 , e1 , ..., en−1 } independently. Each element ei is assessed
and assigned a weight wi and described in the previous Chapter. CMP is

then determined as the sum of all these weights, i.e., CM P = n−1
i=0 wi .

If E is divided into two disjoint subsets Em1 and Em2 , with no loss of

133
6. VALIDATION

generality, Em1 and Em2 can be represented as: Em1 = {e0 , e1 , ..., ek−1 } and

Em2 = {ek , ek+1 , ..., en−1 }, where k ≤ n.

Applying the same process of determining CMP, the values CM Pm1 and
CM Pm2 of these two subsets of migration tasks Em1 and Em2 are: CM Pm1 =
k−1 n−1
i=0 wi and CM P = i=k wi .

As a result,


k−1 
n−1
CM Pm1 + CM Pm2 = wi + wi
i=0 i=k

n−1
= wi = CM P
i=0

Hence, the Module Additivity Property is satisfied.

We have shown that CMP satisfies all three necessary conditions of a size
measurement proposed by Briand et al. (1996). However, an empirical validation

is also required to demonstrate that CMP is practically useful as a predictor for

effort estimation in the cloud migration context.

6.2 Empirical Validation - Phase 1

Empirical validation is necessary to ensure that CMP is practically useful as an


indicator of effort estimation in terms of person-hours. The empirical validation

is divided into three phases. This Phase 1 will evaluate the CMP model with
its initial set of weights as presented in Chapter 5 using our initial set of 6

Cloud migration projects. Because of the limited number of data points publicly

134
6.2 Empirical Validation - Phase 1

available, the data we use in this first phase of the empirical validation is extracted

from a number of small-scale projects conducted at Nicta.

Although the validity of these data points has not been verified externally with

other research projects, they are suitable for this empirical validation because:

1. We have access to all necessary information required to determine CMP.

2. These projects cover different migration project types. In an actual mi-


gration project, not all aspects of CMP happens at the same time in one
project. Therefore, these data points sufficiently reflect what is likely to

happen in reality.

3. The uniformity of these projects are ensured, because they were carried
out by the same team. Therefore, the external cost factors as discussed in
Section 4.3 have minimal impact on these data points. This is suitable for

validating the CMP model since we focus on internal cost factors only.

In this section, we also state the evaluation criteria and the approach we follow

for the purpose of empirical validation. These are also applied for the other two

phases.

6.2.1 Evaluation Criteria

The details of each migration task are used to calculate the size of the migration

project to the Cloud, using the CMP model. Regression analysis will be used to
determine the relationship between the size of a Cloud migration project and the

effort required.

135
6. VALIDATION

The reliability of an effort estimation is assessed using the following criteria

as suggested in (Conte et al., 1986):

• Magnitude Relative Error (M RE):

|AE − P E|
M RE =
AE

where AE is Actual Effort, and P E is Predicted Effort.

• Mean Magnitude Relative Error (M M RE):


M RE
M M RE =
n

where n is the sample size, and M M RE ≤ 0.25 is acceptable.

• Prediction at level l (or P RED(l) in short):

k
P RED(l) =
n

where k is the number of observations such that M RE ≤ l. Note that,

k ≤ n, hence 0 ≤ P RED(l) ≤ 1. The closer the P RED(l) value to 1 the


better, and P RED(0.25) ≥ 0.75 is acceptable.

6.2.2 Leave-One-Out Cross Validation

We followed a leave-one-out or jackknife approach (Tukey, 1958; Efron & Gong,

1983) to examine the relationship between CMP values and the actual effort

for migrating a system to cloud. This approach is the same as a k-fold cross-

136
6.2 Empirical Validation - Phase 1

validation, in which k is equal to the number of data points. The k-fold cross-

validation has been successfully used to validate cost estimation models in the
literature, and is especially recommended for small data sets (Briand et al., 1999;

Costagliola et al., 2005). In the leave-one-out cross validation, each single data
point is used as the validation data, whereas the remaining data are used as

training sets. This is repeated until each data point is used once as the validation

data.

6.2.3 Ordinary Least Square Regression Analysis

Table 6.1 shows the data points extracted from our six projects. For project 1,
the majority of the effort was spent on securing and optimizing WAN connection.

While projects 2, 3, and 4 required most effort on installation and population


of data. The migration process of projects 5 and 6 involved installation, data
population, and code changes. The table shows the final CMP value of each
project and its associated number of hours spent on migration tasks.

No Effort(hours) CMP
1 45 504
2 4 60
3 6 95
4 9 149
5 32 337
6 51 645

Table 6.1: Empirical validation data points

We followed a leave-one-out cross-validation approach on this dataset. In this

phase, we performed six rounds of validation. Each round uses five projects as

the training set, and one project is left out as the validation set. Descriptive

137
6. VALIDATION

statistics were computed for each training set, based on which the boxplot and

outliers of each set were analysed. Figure 6.1 shows that there are no outliers in
the training sets of the six validation rounds which may biasedly influence the

derived models from regression analysis.

Figure 6.1: The boxplots for the six training datasets of variable CMP

The scatter plots in Figure 6.2 show a positive linear relationship between

CMP and Effort (in hours) of each training set. As a result, an Ordinary Least-

Squares (OLS) regression analysis is then applied on each training set to derive

the equation of the trend line, which can be used as a prediction model for effort
required in hours.

The proficiency of each regression model is determined by the Coefficient of

Determination R2 , representing the proportion of the dependent variable effort


(in hours) explained by the independent variable CMP. Moreover, the statisti-

cal significance of CMP as a predictor of effort is evaluated with t-test and is

138
6.2 Empirical Validation - Phase 1

Figure 6.2: The scatter plots for OLS regression

determined by t-value and p-value of the coefficient of the prediction model. If


p-value < 0.05, the null hypothesis can be rejected; in other words, it shows that
CMP is a significant predictor of effort. The t-value is then applied to indicate

the reliability of the predictor. If t-value > 1.5, it shows that CMP is a potential
predictor of effort. The results of R2 , t-value, and p-value of the coefficients and
the intercepts of all six validation rounds are summarized in Table 6.2. (Note

that the p = 0.05 critical value of the t-test with 3 degrees of freedom is 3.18 for

a two-sided test and 2.35 for a one-sided test, and coefficients are expected to be

positive (Figure 6.2); hence, one-sided test can be used here.)

The result suggests that the coefficients of the models are statistically signifi-

cant and hence CMP is indicated to be a significant predictor of effort. Although


the intercepts are statistically insignificant, each derived model has a high value

of R2 and all the coefficients pass the significant test. In other words, the OLS
regression analysis results still shows a strong linear relationship between CMP

139
6. VALIDATION

ID Coefficient Intercept
Value t-value p-value Value t-value p-value
0.0869 10.504 0.002 −1.4664 −0.4399 0.6898
1 R2 = 0.9736
Effort = 0.0869 × CMP − 1.4664
0.0858 10.958 0.002 −0.8785 −0.2789 0.7984
2 R2 = 0.9756
Effort = 0.0858 × CMP − 0.8785
0.0849 12.507 0.001 −0.2669 −0.0985 0.9277
3 R2 = 0.9812
Effort = 0.0849 × CMP − 0.2669
0.086 16.339 0.000 −1.9929 −1.0084 0.3876
4 R2 = 0.9889
Effort = 0.086 × CMP − 1.9929
0.0839 12.069 0.001 −1.1697 −0.501 0.6508
5 R2 = 0.9798
Effort = 0.0839 × CMP − 1.1697
0.0972 17.112 0.000 −3.0637 −1.9008 0.1535
6 R2 = 0.9899
Effort = 0.0972 × CMP − 3.0637

Table 6.2: Phase 1 - OLS Regression Analysis

and effort (in hours). For example, in the first training set, the derived model
is: Effort = 0.0869× CMP −1.9929, with high value of R2 = 0.9736 and the
coefficient is significant at level 0.05.

The cross-validation result is determined by using the derived models to com-

pute the predicted effort of the left-out project in each validation round (reported

in Table 6.3). The results is then evaluated using metrics described in Section
6.2.1.

Table 6.3 shows that the MMRE value is 0.199 and the prediction at level

0.25 is 0.833. This result suggests that the CMP model shows a good predictor

for effort estimation in the considered Cloud migration projects.

140
6.3 Data Collection

No CMP AE PE MRE
1 504 4541.116 0.086
2 60 4 3.221 0.195
3 95 6 7.272 0.212
4 149 912.383 0.376
5 337 3226.989 0.157
6 645 51 59.63 0.169
MMRE 0.199
PRED(0.25) 0.833

Table 6.3: Phase 1 - Results Evaluation

6.2.4 Conclusion

In this section, we have shown that phase 1 of the empirical validation yields
good result of the CMP model as a predictor for effort estimation in some Cloud
migration cases. However, to have more confidence in the CMP model, more data

on external projects are required to further validate it.

6.3 Data Collection

We conducted a survey to collect data on migration projects from external or-


ganizations, as described in Chapter 3. The objective of the survey is to collect

data on past migration projects to Cloud for determining migration cost fac-
tors, including size, and examining their relationships with the effort required for

migration.

Table 6.4 shows the data points we got from our own projects, survey, and in-
terviews, together with the corresponding CMP values calculated with the initial

parameters (as presented in Chapter 5).

These data points are calculated for each CMP component separately, then

141
Database Install.& Config. Connection Code Total
ID
CMP db Hours CMPic Hours CMP conn Hours CMPcode Hours CMP Hours
1 6 2 45 80 0 0 440 250 3232 332
2 0 0 9 3 0 0 0 0 18 3
3 29 25 0 0 0 0 65 40 493 65
4 40 8 0 0 0 0 0 0 56 8
5 0 0 33 50 9 5 44 20 387 75
6 0 0 0 0 9 10 0 0 54 10
7 8 5 0 0 18 20 0 0 118 25
8 0 0 18 24 0 0 0 0 44 24
9 0 0 7 6 0 0 0 0 21 6

142
10 0 0 0 0 110 100 0 0 480 100
11 0 0 27 50 18 20 0 0 124 70
12 0 0 135 300 2 2 90 80 1158 382
13 6 1 9 7 3 2 0 0 32 10
14 6 2 21 20 22 20 0 0 167 42
15 23 7 13 14 0 0 30 10 207 31
16 84 15 8 10 1 2 89 40 511 67
17 6 2 9 4 2 2 0 0 38 8
6. VALIDATION

18 6 2 8 8 2 2 0 0 38 12
19 0 0 36 48 0 0 0 0 72 48
Table 6.4: Data points from surveys and interviews
6.4 Empirical Validation - Phase 2

the CMP value can be accumulated with associated weights from the model for

each component. Some data points consist of all 4 CMP components, but some
only have one or two components of CMP.

CMP only considers internal factors, but not external factors, as discussed in
Chapter 5; while these data points come from different organizations. As a result,

in the survey and interview questions, as well as in the data analysis process, we

tried to eliminate the affect of external factors as much as possible, in order to


ensure the data points can be normalized for validating CMP.

6.4 Empirical Validation - Phase 2

In our context, we performed 19 rounds of validation similar to the approach


outlined in Phase 1. Each round uses 18 projects as the training set, and one

project is left out as the validation set. The results of R2 , t-value, and p-value of
the coefficients and the intercepts of all 19 validation rounds are summarized in

Table 6.5.

Table 6.5: Phase 2 - OLS Regression Analysis

ID Coefficient Intercept
Value t-value p-value Value t-value p-value

0.27476 9.663 4.42E − 08 −6.55571 −0.639 0.532


1 R2 = 0.8537
Effort = 0.27476 × CMP − 6.55571

0.11759 6.067 1.63E − 05 25.81171 1.580 0.134


2 R2 = 0.697

Continued on Next Page. . .

143
6. VALIDATION

Table 6.5 – Continued

ID Coefficient Intercept
Value t-value p-value Value t-value p-value

Effort = 0.11759 × CMP + 25.81171

0.1187 6.150 1.40E − 05 25.0533 1.555 0.140


3 R2 = 0.7027
Effort = 0.1187 × CMP + 25.0533

0.11770 6.080 1.59E − 05 25.73660 1.577 0.134


4 R2 = 0.6979
Effort = 0.11770 × CMP + 25.73660

0.11849 6.128 1.46E − 05 23.87854 1.474 0.16


5 R2 = 0.7012
Effort = 0.11849 × CMP + 23.87854

0.11777 6.078 1.6E − 05 25.58319 1.566 0.137


6 R2 = 0.6978
Effort = 0.11777 × CMP + 25.58319

0.11812 6.094 1.55E − 05 25.03 1.533 0.145


7 R2 = 0.6989
Effort = 0.11812 × CMP + 25.03

0.11830 6.081 1.59E − 05 24.53021 1.496 0.154


8 R2 = 0.698
Effort = 0.11830 × CMP + 24.53021

0.1177 6.067 1.63E − 05 25.6207 1.567 0.137


9 Continued on Next Page. . .

144
6.4 Empirical Validation - Phase 2

Table 6.5 – Continued

ID Coefficient Intercept
Value t-value p-value Value t-value p-value

R2 = 0.697
Effort = 0.1177 × CMP + 25.6207

0.11829 6.133 1.44E − 05 23.17480 1.438 0.170


10 R2 = 0.7015
Effort = 0.11829 × CMP + 23.17480

0.11934 6.202 1.27E − 05 22.08703 1.363 0.192


11 R2 = 0.7063
Effort = 0.11934 × CMP + 22.08703

0.09923 16.992 1.16E − 11 18.41721 3.955 0.00114


12 R2 = 0.9475
Effort = 0.09923 × CMP + 18.41721

0.11782 6.070 1.62E − 05 25.42006 1.554 0.140


13 R2 = 0.6972
Effort = 0.11782 × CMP + 25.42006

0.11845 6.109 1.51E − 05 24.27992 1.487 0.156


14 R2 = 0.7
Effort = 0.11845 × CMP + 24.27992

0.11817 6.117 1.49E − 05 25.26461 1.554 0.140


15 R2 = 0.7004
Effort = 0.11817 × CMP + 25.26461

Continued on Next Page. . .

145
6. VALIDATION

Table 6.5 – Continued

ID Coefficient Intercept
Value t-value p-value Value t-value p-value

0.1187 6.150 1.40E − 05 25.0477 1.555 0.139


16 R2 = 0.7028
Effort = 0.1187 × CMP + 25.0477

0.11773 6.072 1.62E − 05 25.60553 1.567 0.137


17 R2 = 0.6974
Effort = 0.11773 × CMP + 25.60553

0.11788 6.072 1.62E − 05 25.32458 1.548 0.141


18 R2 = 0.6974
Effort = 0.11788 × CMP + 25.32458

0.1190 6.134 1.44E − 05 23.1027 1.413 0.177


19 R2 = 0.7016
Effort = 0.1190 × CMP + 23.1027

The cross-validation result were determined by using the derived models to


compute the predicted effort of the left-out project in each validation round (re-

ported in Table 6.6). Table 6.6 shows that the MMRE value is 1.5155 and the
prediction at level 0.25 is 0.5789. This result suggests that the CMP weights (or

parameters) need further calibration.

146
6.5 CMP Parameters Calibration

ID CMP AE PE MRE
1 3232 332 881.4686 1.6550
2 18 3 27.9283 8.3094
3 493 65 83.5724 0.2857
4 56 8 32.3278 3.0410
5 387 75 69.7342 0.0702
6 54 10 31.9428 2.1943
7 118 25 38.9682 0.5587
8 44 24 29.7354 0.2390
9 21 6 28.0924 3.6821
10 480 100 79.9540 0.2005
11 124 70 36.8852 0.4731
12 1158 382 133.3256 0.6510
13 32 10 29.1903 1.9190
14 167 42 44.0611 0.0491
15 207 31 49.7258 0.6041
16 511 67 85.7034 0.2792
17 38 8 30.0793 2.7599
18 38 12 29.8040 1.4837
19 72 48 31.6707 0.3402
MMRE 1.5155
PRED(0.25) 0.5789

Table 6.6: Phase 2 - Results Evaluation

6.5 CMP Parameters Calibration

This section presents our attempt to calibrate the CMP model, in order to increase
its validity externally, so that CMP can be more widely useful.

There are 37 parameters (or weights) in total in the CMP model (reflected
in Tables 5.3, 5.6, 5.8, 5.10, and 5.11 of Chapter 5). The original values of these

weights were defined by discussion with a group of Cloud engineers who have

participated in Cloud migration projects. We asked each Cloud engineer for their
individual judgment on each weight value, then we averaged all values across all

Cloud engineers that we had the discussion with, to derive a final value for each

147
6. VALIDATION

parameter.

These expert opinion weights can be further refined using data points collected

from our survey on Cloud migration projects. Since questions for each component
of the CMP was asked separately, so data collected for each CMP component is

also separated from one another; hence, we can separate each survey response for

each component of the CMP (connection, code, installation and configuration,


and database). The number of data points available for each weight of each

component is summarized as in Table 6.7.

Table 6.7 clearly shows that there are 11 parameters without any data points
(Weight IDs: 7, 8, 9, 10, 11, 13, 14, 19, 20, 22, and 23). These weights cannot
be calibrated without data points; hence, we keep the expert opinion values for

these weights. These values are candidates for adjustment when more data points
become available.

There are 4 weights with only 1 data point each (Weight IDS: 24, 28, 29,
and 30), and 7 weights with 2 data points each (Weight IDs: 3, 12, 15, 16, 17,
32, and 33). With too few data points, these values can be easily changed to
improve the prediction level of the model, but it could lead to the problem of

overfitting. Therefore, we decided that these weights do not have sufficient data

points for the calibration, and their expert opinion values are also kept for the
time being. However, these values may also be subject to change if more data

points are available in the future.

The remaining 11 weights in Table 6.7 and 4 other weights associated with each

CMP components have 3 or more data points each; hence, they are considered
for the calibration process, although the number of data points for each weight

is still not ideally sufficient.

148
6.5 CMP Parameters Calibration

Components Type Complexity ID Weight # of


Value Data
Points
Low 1 1 5
LAN-to-LAN Average 2 3 3
High 3 4 2
Low 4 1 3
LAN-to-
CMPconn Average 5 6 3
WAN
High 6 9 4
Low 7 1 0
WAN-to-
Average 8 6 0
LAN
High 9 9 0
Low 10 3 0
Problem
Average 11 6 0
Domain
High 12 10 2
Low 13 4 0
Human
Average 14 7 0
Interaction
High 15 12 2
CMPcode
Low 16 5 2
Data
Average 17 8 2
Management
High 18 13 3
Low 19 4 0
Task
Average 20 6 0
Management
High 21 9 4
Low 22 1 0
Application Average 23 2 0
High 24 7 1
CMPic
Low 25 1 5
Infrastructure Average 26 3 10
High 27 9 5
Low 28 1 1
Query
Average 29 3 1
Modification
High 30 8 1
CMPdb
Low 31 3 6
Data
Average 32 4 2
Population
High 33 10 2

Table 6.7: Number of data points available to calibrate each weight of the CMP
model

149
6. VALIDATION

Although with the current dataset, the calibration process can be performed

on at most 15 weights out of 37 weights in total, it is worth explicitly stating


all assumptions made for each CMP components and their sub-elements. It is

important to test the plausibility of those assumptions given the available data,
before performing any calibrations. The validation process in the following section

will rely on raw data from survey responses, as attached in Appendix B.

6.5.1 CMP Components’ Assumptions

The assumptions on each CMP components and their elements are stated as
follows:

Network Connection Component: CM Pconn

Assumption 1 There are three types of connection changes: LAN-to-LAN, LAN-


to-WAN, WAN-to-LAN.

Relevant projects in the survey responses show connection changes in the first

two types only. None of the responses demonstrated WAN-to-LAN connection

changes, or any other types different from those proposed. Hence, this assumption

is considered valid.

Assumption 2 Two types of connection changes (LAN-to-WAN and WAN-to-


LAN) have the same impact on the size of a migration task. The reason behind

this assumption is because any changes that make a WAN connection become a
LAN connection are essentially reversed activities of changes to make a LAN

connection become a WAN connection, given the source and destination of these

150
6.5 CMP Parameters Calibration

connections remain unchanged. Effort required for carrying out those changes

and their reversals is expected to be the same.

None of the survey responses encountered WAN-to-LAN connections. Hence,

we cannot validate this assumption at this stage. This assumption should be


tested for its plausibility when more data on relevant projects are available.

Assumption 3 The other type of connection change (LAN-to-LAN), compared

to the first two types mentioned above, has a significantly different impact on the
size of a migration task. Essentially, effort required to amend a LAN connection
in the local environment to adapt to the new environment in cloud should be much

less than effort required for LAN-to-WAN and WAN-to-LAN connection changes.

This assumption is reflected quite clearly in projects 6, 7, 13, and 16. Projects

6 and 7 consist of only LAN-to-WAN connections, while projects 13 and 16 have


only LAN-to-LAN connections. The effort required for the connection component
of projects 6 and 7 is significantly greater than that of the other two projects (10

and 20 hours vs. 2 and 2 hours). Similar observation is for projects 7 and 14,
where both projects have 2 LAN-to-WAN connections and project 14 also consists

of another LAN-to-LAN connection. However, no differences in effort have been

observed for these two projects (both projects required 20 hours each). Therefore,
this assumption is verified.

Assumption 4 The requirements for Protocol Optimization and/or Security each

has a significantly different impact on the size and effort of the migration task.

The requirements for Protocol Optimization and/or Security are defined into

3 levels of Complexity: Low, Average, and High (Table 5.2). Given the avail-
able data of the component CM Pconn as in Appendix B, project 5 has 1 average

151
6. VALIDATION

complexity LAN-to-LAN connection and 1 average complexity LAN-to-WAN con-

nection, whereas project 6 has only 1 high complexity LAN-to-WAN connection.


The effort spent on this 1 high complexity LAN-to-WAN connection of project 6

is twice as much as that spent on 2 connection changes of average complexity of


project 5 (10 hours vs. 5 hours). Similar observation can be seen in projects 11

and 12, where 4 average complexity connections require effort 10 times as much

as that of 2 low complexity connections (20 hours vs. 2 hours). Data on sev-
eral other projects (such as 13 and 18) also yield similar result. Therefore, this
assumption is plausible.

Assumption 5 The relative impact of the three types and performance and se-
curity can be represented by the set of significantly different weights (weight IDs

1, 2, 3, 4, 5, 6, 7, 8 and 9, in Table 6.7).

Each individual weight is a specific assumption. Table 6.7 shows that only 5

weight IDs (1, 2, 4, 5, and 6) can be considered for the calibration exercise as
discussed above. The validation and calibration of these weights will be presented
in more depth in Section 6.5.2. The other weights with very few data points,

whose values were determined by expert opinion, are kept at this stage. These

values may be subject to change if more data is available in the future.

Code Modification Component: CM Pcode


Although this CMP component is mainly inherited from Class Point (Costagli-

ola et al., 2005), in this section, we still state all assumptions and validate them
on the available data of our Cloud migration context.

Assumption 6 Four different types of class have a significant impact on the size

152
6.5 CMP Parameters Calibration

of the migration tasks.

6 out of 19 projects from our survey responses (projects 1, 3, 5, 12, 15, and
16) involve code modification component, and they spread over 4 types of class:

Problem Domain, Human Interaction, Data Management, and Task Management.

None of the responses suggested any different type of class, apart from these 4
types. In these 6 projects with code modification, the effort required to modify

the four types of class plays a major part in the total effort required for the whole
projects ( e.g., in project 1: 250 hours for code modification out of 332 hours
in total (75% of total effort); or project 3: 40 hours for code modification out

of 65 hours in total (62% of total effort)). Hence, this assumption is considered


verified.

Assumption 7 There are three different types of change in a class: attributes,


public methods, and services requested.

Data from the 6 corresponding projects show that all these types of class

changes were actually carried out during their migration, although the data sup-

porting this claim still does not seem very clear and explicit. Also, these types

were inherited from Class Point. Hence, we consider this assumption is valid to
some certain extent.

Assumption 8 The relative impact on tasks’ size and effort of added and deleted

elements for each class type and each change type have a significantly different
impact on task size and effort in the ratio 5 to 1.

This assumption is based on the suggestion of Niessink & Vliet (1997) that

a removing task requires effort 0.2 times as much as that of an adding task.

153
6. VALIDATION

Unfortunately, our data is not sufficient to test this assumption. This assumption

should be subject to be tested when more data comes available in the future.

Assumption 9 The total size of unchanged elements is irrelevant to task size


and effort.

The reason behind this assumption is because unchanged elements require no


effort at all. None of the responses raised any effort spent on unchanged elements.

Therefore, this claim holds.

Assumption 10 The impact of changes can be categorized into complexity levels

based on ranges of the individual change counts, and counts greater than the upper
value all have the same impact.

This is an important assumption, inherited from Class Point, and it has been
validated in (Costagliola et al., 2005). The second part of this claim may result
in problems with development effort, and this is a known issue from Function
Point. However, in the Cloud migration context, this issue is less problematic
since data from survey responses have shown that there are very few tasks with

counts greater than the upper value.

Assumption 11 The differences between class type and complexity level can be
represented as a set of 12 weights, where each individual weight represents a
specific assumption.

Table 6.7 shows that only 2 weight IDs (18 and 21) can be considered for

the calibration process as discussed above. The validation and calibration of the
weights will be presented in more depth in Section 6.5.2. The other weights with

154
6.5 CMP Parameters Calibration

very few data points, whose values were determined by expert opinions, are kept

at this stage. These values may be subject to change when more data becomes
available in the future.

Installation and Configuration Component: CM Pic

Assumption 12 Two different types of package to be installed and configured in

the Cloud (Application packages and Infrastructure packages) have a significant


impact on the size of the migration tasks.

Data from the survey responses show that it is quite popular to have Infras-
tructure packages installed and configured in the Cloud for a migration project
(10 out of 19 responses). Only 1 project required Application packages (project

16). The amount of effort spent on these installation and configuration tasks is
relatively significant compared to other CMP components, especially for Infras-

tructure packages. Therefore, this assumption is considered valid.

Assumption 13 The installation methods and the number of parameters to be

configured have significantly different impact on the size and effort of the migra-
tion task.

A package may require no installations at all, or simple installation from bi-


nary installers, or more complicated installation from source code, which requires

extra effort to compile the source code. These types require different amount of

effort. Project 1 requires 80 hours to install 5 packages from source code with

large number of parameters to be configured, while project 2 requires only 3 hours


to install 3 packages from binary installers. Similar observation can be seen in

155
6. VALIDATION

projects 5 and 13. Project 5 has 2 packages from binary installer and 3 packages

from source code, whereas project 13 also has 2 packages from binary installer
and 3 packages without any installations at all. The former requires 50 hours,

and the latter requires only 7 hours. Some other observations on other projects
also give similar results. Therefore, this assumption is certainly valid.

Assumption 14 The impact of installation and configuration can be categorized

into complexity levels based on the installation methods and the number of param-
eters to be configured.

Although there are no data explicitly supporting this claim, and it is very

hard to verify this type of assumption, it intuitively makes sense because of the
different impacts of installation methods and the number of parameters to be
configured on the size of migration tasks, as in the previous assumption.

Assumption 15 The differences between package types and complexity levels can

be represented as a set of 6 weights, where each individual weight represents a


specific assumption.

Table 6.7 shows that only 3 weight IDs (25, 26 and 27) can be considered for

the calibration as discussed above. The validation and calibration of the weights

will be presented in more depth in Section 6.5.2. The other weights with very

few data points, whose values were determined by expert opinions, are kept at
this stage. These values may be subject to change when more data is available

in the future.

Database Migration Component: CM Pdb

156
6.5 CMP Parameters Calibration

Assumption 16 Four different types of database change have a significant im-

pact on the size of the migration tasks.

Some projects only have database migration component, such as project 4.


Some other projects (such as 3, 8, 16 and 17) have the database component as one

of the major part of their migration processes (about 30% of total effort). These

projects represent all four types of database change: same relational database
and same version (project 3), same relational database and different version,

or different relational databases (projects 7 and 15), and relational to NoSQL


databases (projects 4 and 16). Therefore, this assumption is validated.

Assumption 17 NoSQL databases have a significantly different impact on the

size of the migration tasks more than relational databases.

Migrating a relational database to a NoSQL Cloud database requires more

migration tasks, for example: populating data to a NoSQL database requires more
tasks than just a “sqldump” command, or JOIN operations from the relational

database must be modified since NoSQL database does not support JOIN. Hence,
more effort is required for NoSQL databases. This assumption is supported by
data from the survey responses. Particularly, project 16 required 5 hours to

populate data from a relational to a NoSQL database, while project 15 required

only 2 hours to populate the same amount of data from a relational to another

relational database in the Cloud. Project 1 required 2 hours to populate data


from a relational to a relational database, while project 4 required 8 hours to
populate twice as much data from a relational to a NoSQL database. All in all,

this assumption is reasonable.

157
6. VALIDATION

Assumption 18 Two different database migration tasks (query modification and

data population) have significant impact on the size of the migration tasks.

Our survey responses only report activities related to either query modification

or data population, or both. None of them raised any other database-related

migration activities. Hence, we consider this assumption plausible, although more


types of task might be added if extra information in the future suggests so.

Assumption 19 The impact of database migration tasks and complexity level


can be represented as a set of 6 weights, where each individual weight represents
a specific assumption.

Table 6.7 shows that only 1 weight (weight ID 31) can be calibrated as dis-
cussed above. The validation and calibration of this weight will be presented in
more depth in Section 6.5.2. The other weights with very few data points, whose
values were determined by expert opinions, are kept at this stage. These values

are subject to change with more data in the future.

Conclusion:
The above assumptions have been made during our CMP development. We

only stated the main and high level assumptions at this stage. More assumptions
can be extracted and tested when more information becomes available. These

assumptions are essential because of the high complexity of a size metric for
Cloud migration projects.

As can be seen, there are already too many assumptions with too little infor-

mation from the survey responses in order to properly validate their plausibility.

This shows the high complexity and difficulty of validating the CMP metric at

158
6.5 CMP Parameters Calibration

this stage of the Cloud migration context. We attempted to test many of the

assumptions, nonetheless, with the available data from our survey.

6.5.2 The Calibration Process

In this section, the calibration will be performed on 15 weights, which have three

or more data points from the survey, as discussed in the previous section. The
15 weights from Table 6.7 are:

• Network Connection Component CM Pconn : weight IDs 1, 2, 4, 5, and 6

• Code Modification Component CM Pcode : weight IDs 18, and 21

• Installation and Configuration Component CM Pic : weight IDs 25, 26, and

27

• Database Migration Component CM Pdb : weight ID 31

• 4 main weights for each CMP component to compute the final CMP value

The calibration is, first, performed on each CMP component individually, and

then together. For each CMP component, we perform multiple regression on the
tunable weights. For projects that also consist of other un-tunable weights, we

use their expert opinion values. The data used for the calibration are attached
in Appendix B.

The result of multiple regression on each CMP component is presented as

follows:

Network Connection Component: CM Pconn

159
6. VALIDATION

The multiple regression for this component uses 11 data points (projects 5,

6, 7, 10, 11, 12, 13, 14, 16, 17, and 18). The 5 tunable weights count as 5 input
variables. The multivariate model is:

f = a1 ∗ x1 + a2 ∗ x2 + a3 ∗ x3 + a4 ∗ x4 + a5 ∗ x5

where x1 → x5 are weights to be calibrated. Values of f and a1 → a5 are

from Table 6.8 (extracted from Table B.1).

Project ID a1 a2 a3 a4 a5 f
5 0 1 0 1 0 5
6 0 0 0 0 1 10
7 0 0 0 0 2 20
10 0 5 0 5 5 100
11 0 2 0 2 0 20
12 1 0 1 0 0 2
13 3 0 0 0 0 2
14 0 0 0 0 2 20
16 1 0 0 0 0 2
17 1 0 1 0 0 2
18 1 0 1 0 0 2

Table 6.8: Data points for calibrating network connection component weights

This multiple regression gives regression coefficients that are essentially new

values for these 5 weights as in Table 6.9.

Weight ID Old Value New Value


1 1 1.2
2 3 9.6
4 1 1.7
5 6 6
6 9 10.5

Table 6.9: Multiple Regression Coefficient Result for CM Pconn

Code Modification Component: CM Pcode

160
6.5 CMP Parameters Calibration

The multiple regression for this component uses 6 data points (projects 1,

3, 5, 12, 15, and 16). The 2 tunable weights count as 2 input variables. This
multiple regression gives regression coefficients that are essentially new values for

these 2 weights as in Table 6.10.

Weight ID Old Value New Value


18 13 11.5
21 9 12.3

Table 6.10: Multiple Regression Coefficient Result for CM Pcode

Installation and Configuration Component: CM Pic


The multiple regression for this component uses 14 data points (projects 1, 2,
5, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, and 19). The 3 tunable weights count as
3 input variables. This multiple regression gives regression coefficients that are

essentially new values for these 5 weights as in Table 6.11.

Weight ID Old Value New Value


25 1 2.5
26 3 4.5
27 9 20.2

Table 6.11: Multiple Regression Coefficient Result for CM Pic

Database Migration Component: CM Pdb


This component has only 1 tunable weight. The regression for this component
uses 6 data points (projects 1, 3, 13, 14, 17, and 18). The only tunable weight is

used as an input variable. This regression gives a regression coefficient that are

essentially new value for this only weight as in Table 6.12.

Final CMP Value Calculation

161
6. VALIDATION

Weight ID Old Value New Value


31 3 2.3

Table 6.12: Regression Coefficient Result for CM Pconn

There are 4 weights to calculate the final CMP value as in Table 5.11 of

Chapter 5. The multiple regression uses all 19 data points, calculated with new
weights from the calibration processes above. The 4 tunable final weights count

as 4 input variables. This multiple regression gives regression coefficient that are
essentially new values for these 4 weights as in Table 6.13

Weight ID Old Value New Value


34 3 0.7
35 5 0.5
36 2 1.1
37 1 0.4

Table 6.13: Regression Coefficient Result for the Final CMP

Conclusion:

There are 15 tunable weights out of 37 weights in total. The rest of the weights

are kept unchanged because they have very few data points for the calibration.
The 15 calibrated weight values have changed quite significantly from the expert

opinion values. The model with new set of weights need to be validated again to
ensure its performance is improved from the original one.

6.6 Empirical Validation - Phase 3

In this section, we perform similar empirical validation as the first two phases
on the new dataset of 19 data points. This dataset essential originates from the

162
6.6 Empirical Validation - Phase 3

survey as in Phase 2; however, the final CMP values in this dataset are calculated

based on the new set of weights calibrated in the previous section.


The new dataset from the new set of weights is presented in Table 6.14

ID CMP Value Total Hours


1 382.3 332
2 9.9 3
3 92.2 65
4 22.4 8
5 82.8 75
6 12.6 10
7 29.2 25
8 24.2 24
9 11.55 6
10 112 100
11 51.1 70
12 293.7 382
13 14.45 10
14 52.6 42
15 47.85 31
16 93.95 67
17 14.9 8
18 14.9 12
19 39.6 48

Table 6.14: New dataset - calculated from the new set of calibrated weights

We perform 19 rounds of cross-validation on the new dataset. The result is


described in Table 6.15.

Table 6.15: Phase 3 - OLS Regression Analysis

ID Coefficient Intercept
Value t-value p-value Value t-value p-value

1.25749 18.903 2.28e − 12 −16.47287 −2.857 0.0114


1 Continued on Next Page. . .

163
6. VALIDATION

Table 6.15 – Continued

ID Coefficient Intercept
Value t-value p-value Value t-value p-value

R2 = 0.9571
Effort = 1.25749 × CMP − 16.47287

1.02783 15.203 6.25e − 11 −6.44734 −0.763 0.457


2 R2 = 0.9353
Effort = 1.02783 × CMP − 6.44734

1.03064 15.777 3.57e − 11 −5.39675 −0.671 0.512


3 R2 = 0.9396
Effort = 1.03064 × CMP − 5.39675

1.02545 15.280 5.79e − 11 −5.82892 −0.695 0.497


4 R2 = 0.9359
Effort = 1.02545 × CMP − 5.82892

1.02829 15.401 5.14e − 11 −6.31798 −0.766 0.455


5 R2 = 0.9368
Effort = 1.02829 × CMP − 6.31798

1.02941 15.249 5.97e − 11 −6.80407 −0.806 0.432


6 R2 = 0.9356
Effort = 1.02941 × CMP − 6.80407

1.02849 15.310 5.62e − 11 −6.61778 −0.789 0.442


7 R2 = 0.9361
Effort = 1.02849 × CMP − 6.61778

Continued on Next Page. . .

164
6.6 Empirical Validation - Phase 3

Table 6.15 – Continued

ID Coefficient Intercept
Value t-value p-value Value t-value p-value

1.02977 15.326 5.53e − 11 −6.94611 −0.828 0.42


8 R2 = 0.9362
Effort = 1.02977 × CMP − 6.94611

1.02833 15.220 6.14e − 11 −6.55842 −0.777 0.449


9 R2 = 0.9354
Effort = 1.02833 × CMP − 6.55842

1.03007 15.405 5.12e − 11 −6.16652 −0.755 0.461


10 R2 = 0.9368
Effort = 1.03007 × CMP − 6.16652

1.03133 15.802 3.49e − 11 −8.07971 −0.995 0.33


11 R2 = 0.9398
Effort = 1.03133 × CMP − 8.07971

0.86969 30.930 1.06e − 15 −1.55872 −0.532 0.602


12 R2 = 0.9836
Effort = 0.86969 × CMP − 1.55872

1.02869 15.243 6.01e − 11 −6.64232 −0.788 0.442


13 R2 = 0.9356
Effort = 1.02869 × CMP − 6.64232

1.02739 15.383 5.23e − 11 −6.14267 −0.739 0.47


14 R2 = 0.9367

Continued on Next Page. . .

165
6. VALIDATION

Table 6.15 – Continued

ID Coefficient Intercept
Value t-value p-value Value t-value p-value

Effort = 1.02739 × CMP − 6.14267

1.02629 15.42 5.03e − 11 −5.71958 −0.69 0.5


15 R2 = 0.937
Effort = 1.02629 × CMP − 5.71958

1.03086 15.771 3.6e − 11 −5.42361 −0.674 0.51


16 R2 = 0.9396
Effort = 1.03086 × CMP − 5.42361

1.02780 15.231 6.08e − 11 −6.43739 −0.763 0.456


17 R2 = 0.9355
Effort = 1.02780 × CMP − 6.43739

1.02923 15.258 5.92e − 11 −6.76964 −0.803 0.434


18 R2 = 0.9357
Effort = 1.02923 × CMP − 6.76964

1.03091 15.506 4.64e − 11 −7.48460 −0.903 0.38


19 R2 = 0.9376
Effort = 1.03091 × CMP − 7.48460

The cross-validation result were determined by using the derived models to


compute the predicted effort of the left-out project in each validation round (re-

ported in Table 6.16).


Table 6.16 shows that the MMRE value is 0.2947 and the prediction at level

0.25 is 0.9474.

166
6.6 Empirical Validation - Phase 3

ID CMP AE PE MRE
1 382.3 332 464.2656 0.3984
2 9.9 3 3.7282 0.2427
3 92.2 65 89.6283 0.3789
4 22.4 8 17.1412 1.1426
5 82.8 75 78.8244 0.0510
6 12.6 10 6.1665 0.3834
7 29.2 25 23.4141 0.0634
8 24.2 24 17.9743 0.2511
9 11.55 6 5.3188 0.1135
10 112 100 109.2013 0.0920
11 51.1 70 44.6213 0.3626
12 293.7 382 253.8692 0.3354
13 14.45 10 8.2223 0.1778
14 52.6 42 47.8980 0.1404
15 47.85 31 43.3884 0.3996
16 93.95 67 91.4257 0.3646
17 14.9 8 8.8768 0.1096
18 14.9 12 8.5659 0.2862
19 39.6 48 33.3394 0.3054
MMRE 0.2947
PRED(0.25) 0.9474

Table 6.16: Phase 3 - Results Evaluation

Conclusion:

This new MMRE value shows a significant improvement from Phase 2 (MMRE

= 1.5155). Although the new MMRE value is still greater than the recommended
0.25 level, it becomes much closer to this value after the calibration. We strongly

believe that when more data on Cloud migration projects comes available in the
future, further calibration can be performed on other weights as well. In addition,

the prediction at level 0.25 is 0.9474, which is higher than the standard value 0.75,

further support the claim that the CMP model can be a potential predictor for

Cloud migration effort estimation.

167
6. VALIDATION

6.7 Threats of Validity and Discussion

The validation process described in this chapter has shown that the CMP model

can help enterprises to map out the migration tasks for their Cloud migration
projects, and it can be a potential predictor for migration effort estimation. How-

ever, in order to generalize this claim to the whole population, we need a much

larger dataset to calibrate the parameters, and a different large dataset to validate
the model. We argue that we divided the dataset into multiple subsets for the

calibration, and then the whole dataset for the validation, which could increase
the reliability of the validation results to some extent. Having said that, the
results of the validation were still biased. However, this very threat of validity is
unavoidable at this stage. In the future, when more data points become available,
a full exercise of calibration and validation can be executed again using the same

methodology presented. Moreover, because of the limited number of data points


we could secure from interviews and survey, all 19 data points were used, which
also could affect the validity of the dataset.

The four components of the CMP model and their steps were only validated

with internal and self-development projects. Questions from the survey were also
asked to invite more suggestions and insights on additional tasks that may be
required in a Cloud migration project. However, there were no relevant comments

received. The CMP model itself, besides the weights, would need to be further

validated with external projects if available in the future.

In order to increase the validity of the model, the quality and quantity of data
points from survey can be further improved by:

• Clarify some questions, since some questions from survey were not clear

168
6.8 Summary

enough, such as: how much time they spent on migrating data. It could

be understood that migrating data means just transferring data to cloud,


whether it includes idle/waiting time for transferring, which does not require

any effort.

• Some parts of the model requires detailed information on some specific


tasks; however, in almost every case, the respondents did not keep track

of them. For example, for questions on how many classes were modified in
Code Modification session, some answers were purely wild guess (as com-
mented by the respondents). These answers were discarded.

• Some answers indicate that the total number of hours includes learning

time, but it’s not clear how much time was spent on learning, how much
time was spent on actual tasks. This can be overcome by modifying the
questions, such as: how much time was spent on this task for the first time?
How much time for the second time? However, this will make the question
list longer and possibly more tedious.

• Some answers did not indicate whether the effort included learning time or

not. This should be made clear by modifying the questions.

6.8 Summary

In this section, we have presented our process of theoretically and empirically


validating the metric over 3 phases.

In phase 1, the CMP metric is first validated using our initial dataset of 6

small-scale migration projects conducted by our group. The result gives good

169
6. VALIDATION

indication that CMP can be a potential predictor for effort estimation in some

Cloud migration cases. We conducted a survey to collect data about past Cloud
migration projects with external organization. The motivation for this study is to

further validate the CMP model externally. A survey and some interviews were
the best approach for our data collection purpose, because there are no existing

data on this.

In phase 2, we validated the CMP metric using the dataset from the survey.
The result indicates that the CMP metric need further calibrations to improve its
performance. At this phase, we also listed a set of assumptions on the structure of
the CMP metric and attempted to test their plausibility using the available data
from the survey. The tunable weights (15 out of 37 weights) were also calibrated

using the multiple regression approach.


In phase 3, the CMP model is validated again with the new set of calibrated
weights. The result of this phase improves significantly from phase 2, and gets

very close to the standard requirement. This indicates that the CMP model can
be a predictor for effort estimation in the Cloud migration context. It also infers
that when more data comes available in the future, the performance of the CMP

model can be further improved with more calibration on other weights as well.

170
Chapter 7

Conclusions and Future


Directions

”The more you understand what is wrong with a figure, the more
valuable that figure becomes.”

∼ Lord Kelvin.

The main objective of this thesis is to understand Cloud migration projects


and the associated cost implications. Particularly, migrating a legacy system

from a local server to the Cloud requires different migration tasks to be carefully
planned and performed. Different types of migration task may have significantly

different impact on the migration effort. It is important to identify possible

migration tasks and quantify their impact on the migration effort early in a
Cloud migration project, so that enterprises can make well informed decisions on

whether it is worth migrating to the Cloud. On the other hand, this is challenging
because Cloud computing is still relatively immature, and there is very little

171
7. CONCLUSIONS AND FUTURE DIRECTIONS

related work on the topic of interest. Moreover, Cloud migration projects vary in

many dimensions (different types of Cloud, different types of system/application,


different types of migration requirement), and it is challenging to fully understand

them.

In this thesis, we have achieved our research goals to understand Cloud mi-

gration projects. We have identified influential cost factors (internal cost factors
and external cost factors) of a Cloud migration project. We also proposed a

taxonomy of possible migration tasks that a migration project might encounter.


A size metric was developed to measure the size of a Cloud migration project.
The size of a migration project can give a good indication on how much effort is

anticipated for this project.

This chapter concludes this thesis, using the following structure: In Section
7.1, we summarize the main studies and findings of this research. Section 7.2 elab-

orates on how this research has achieved our research goals and how it contributes
to software engineering domain within the Cloud migration context. The limita-
tions of this research are presented in Section 7.3. Section 7.4 suggests directions
for future research.

7.1 Research Summary

For a better understanding of a Cloud migration project, we undertook several


Cloud migration experiences, and captured our understanding of how migration

projects are conducted, in the form of a list of potential migration tasks that
might be involved in a Cloud migration project. Our experiment is to migrate

the PetShop .Net application from a local server to Windows Azure and SQL

172
7.1 Research Summary

Azure. The migration of Java PetStore into Amazon EC2 and SimpleDB was

also investigated to add more richness to our findings.

The report on our migration experiences helped us identify some influential


cost factors that impact on the effort of the migration process, both internal and

external cost factors. The internal cost factors indicate what migration tasks are
required, such as: compatibility issues, library dependency, database features and

connection issues. The external cost factors determine how fast those tasks can be

achieved, such as: project team’s capabilities, existing knowledge and experience
on Cloud providers and technologies, and selecting the correct Cloud platforms

and services.

Some of these influential cost factors are specific for migration to the Cloud,
because they are not applicable for a conventional migration project from one
platform to another. For example, a migration project from Java to .Net is a
complete rewrite and it would not have compatibility, datatabase, connection, or

possibly library dependency issues. A migration project from an old version to


a newer version of platform or environment would not have networking issues,
library dependency or database feature issues as discussed above, and so on...

These factors, one way or the other, all affect the effort spent on the Cloud

migration process.

The list of internal cost factors, together with related work from the literature
review and practitioners’ blogs, enable us to generalise and propose a general

taxonomy of migration tasks that any migration projects may encounter, and

the migration tasks are grouped under 6 categories: Training and Learning, In-

stallation and Configuration, Database Migration, Code Modifications, Network


Connection and Testing. These categories are mutually exclusive since they cover

173
7. CONCLUSIONS AND FUTURE DIRECTIONS

different aspects of a Cloud migration project; but on the other hand, they com-

plement each other and altogether provide a complete picture of migration to


the Cloud. These categorized migration tasks need to be carefully planned at

the early stage of any migration projects. Some tasks may be broken down into
more detailed levels, whereas some tasks may be skipped, depending on specific

characteristics of each project.

Amongst many cost factors affecting traditional software development effort,


project size is considered to be the main cost driver, and has been used in many
cost estimation models. We apply this theory into the context of migration to
the Cloud that migration project size also significantly influences migration ef-
fort. There exists no size measurement in the literature for migration projects

to the Cloud; therefore, we developed our CMP model for sizing Cloud migra-
tion projects by casting the well-known Function Point (FP) measurement into
our context of interest. The difference between these two contexts is that the

traditional software development focuses on functionality development, whereas


the Cloud migration context concerns with porting an existing system into a
Cloud environment. As a result, size metrics for these two contexts are also

different. Size metrics for functionality development measure the product (i.e.,
the components or class or functions to be developed), whereas size metrics for

migration tasks measure both process (i.e., the migration tasks to be carried out)
and product (i.e., related parts of the system to be migrated).

CMP extends FP not by adding more elements into the existing FP method,

but by adopting the three-step approach of FP:

1. Classify the basic estimating units (a function in the FP context, a class in


the Class Point (Costagliola et al., 2005) context, and a migration task in

174
7.1 Research Summary

CMP context) into different pre-defined categories

2. Then for each unit, evaluate its complexity level (Low, Average, or High)

3. Finally, compute the final sizing value

Apart from the FP methodology, CMP is also developed on the basis of the

proposed taxonomy of Cloud migration tasks. The CMP model measure the ac-

cumulated size of all migration tasks making up the migration project. Therefore,
the taxonomy can easily be used as the input into the CMP model. After care-

fully analyzing all categories of the taxonomy, the CMP model was determined
to include 4 main components: Installation and Configuration, Database Migra-
tion, Code Modification, and Network Connection. These components capture

distinct aspects of a migration project to the Cloud; therefore, the CMP model
has been developed to cover all these aspects separately. Each of these CMP
components was developed using the FP three-step approach. Then, the final
CMP value is calculated as a weighted sum of its four components CMPconn ,
CMPcode , CMPic , and CMPdb , which measure size of migration tasks related to
connection changes, code changes, installation and configuration, and database

changes, respectively. 37 weighted values assigned for each migration tasks in


the CMP model are expert opinion values, initially derived from our discussion

with a group of Cloud engineers, who have carried out different types of Cloud

migration projects themselves.


The CMP model has been developed as an important software size measure

for Cloud migration projects. Our study shows CMP is more suitable for Cloud
migration projects than other existing size metrics in the literature since it cap-

tures special aspects of the Cloud migration context. Moreover, CMP emphasises

175
7. CONCLUSIONS AND FUTURE DIRECTIONS

the specific features of Cloud migration process, such as that some required third-

party libraries are not readily available in the Cloud as they are in the local data
centre. This is not so much an issue when migrating between two local data cen-

tres because third-party libraries can usually be reused without major changes.
Another Cloud feature reflected in the CMP model is that Cloud users (or devel-

opers) do not possess full control over the Cloud environment as they do in a local

data centre. This results in the limited range of actions for each migration task.
Therefore, the CMP model takes into consideration Cloud-specific dependencies
for each migration task, for example, only security and protocol optimisation are
assessed for connection tasks, and database tasks are concerned with migrating
from relational to NoSQL databases.

In a project development cycle, the CMP model fits well into the pre-implementation
phase and after the design phase. One important assumption for CMP is that
all design decisions have been made. These design decisions have direct impact
on how CMP is counted, since they define all anticipated migration tasks. The

CMP counting process itself should not require much training and effort; how-

ever, its accuracy relies on the completeness and granularity of the migration task
list. Therefore, it is important to carefully analyse the list of expected migration
tasks to ensure it captures all the Cloud migration aspects adequately and with

as much details as possible.

Briand et al. (1996) proposed a list of properties for product sizing metric,
while CMP is related to both process and product. The CMP model has been

proved to meet all requirements from (Briand et al., 1996); however, additional

properties for process sizing metrics (and process-product-hybrid sizing metrics)

would be ideal. Therefore, the validation of CMP is mainly empirically based.

176
7.1 Research Summary

The empirical validation was to justify the usefulness of the CMP size measure-

ment as a significant indicator of migration effort to the Cloud. The empirical


validation is divided into three phases. In phase 1, we evaluated the CMP model

with its initial set of weights as presented in Chapter 5 using our initial set of 6
Cloud migration projects. Because of the limited number of data points publicly

available, the data we use in this first phase of the empirical validation is ex-

tracted from a number of small-scale projects conducted at NICTA. We followed


a leave-one-out cross-validation approach on this dataset. In this phase, we per-
formed six rounds of validation. Each round uses five projects as the training
set, and one project is left out as the validation set. The cross-validation result
shows that the MMRE value is 0.199 and the prediction at level 0.25 is 0.833.

This result suggests that the CMP is a good predictor of effort estimation for
some Cloud migration projects that have been considered.

However, to have more confidence in the CMP model, more data on external
projects are required to further validate it. Hence, at the beginning of phase 2, we

conducted a survey to collect data on past migration projects to the Cloud from
external organizations. The reason we had to conduct this survey is because, un-

like data on development effort of traditional software development projects, the


data of interest do not exist in any public repositories. Data was collected mainly

via web surveys, and some additional interviews. The studied population includes
project teams from NICTA and individual practitioners who have migrated their

systems to the Cloud. The practitioners were identified from Cloud community

and online discussions. Interviews were conducted with NICTA’s project teams

to gain more insights and more detailed data, and surveys were sent to a list of
identified practitioners. The study was conducted on the entire population due

177
7. CONCLUSIONS AND FUTURE DIRECTIONS

to its limited size.

We sent out more than 300 surveys to different target audience, including aca-

demic researchers, industrial groups and companies, and individual practitioners.


We received more than 30 responses (10%), but some of them were incomplete.

The main reason for this low responses rate is because most of the projects were
done for exploration and tutorial purposes; hence there were no detailed informa-

tion recorded, especially some information required for calculating CMP. Most

responses could easily answer general questions on why they migrated to the
Cloud, or how they generally did that, but most of them failed to provide suffi-

cient information at the design level of migration tasks. After careful analysis,
we obtained a new dataset of 19 data points.

In phase 2, we performed the same analysis as in phase 1. We performed 19


rounds of validation, each round uses 18 projects as the training set, and one
project is left out as the validation set. The result shows that the MMRE value

is 1.5155 and the prediction at level 0.25 is 0.5789. This result suggests that the
CMP weights (or parameters) need further calibration.

There are 37 parameters (or weights) in total in the CMP model. The original

values of these weights were defined by discussion with a group of Cloud engi-

neers who have participated in Cloud migration projects. We asked each Cloud
engineer for their individual judgment on each weight value, then we averaged
all values across all Cloud engineers that we had the discussion with, to derive

a final value for each parameter. These expert opinion weights can be further

refined using data points collected from our survey on Cloud migration projects.

The available data show that only 15 out of 37 weights were considered to have
sufficient information for the calibration. The remaining 22 weights are kept un-

178
7.1 Research Summary

changed with their expert opinion values. However, these values would also be

subject to change when more data points are available.

Although the calibration process can be performed on at most 15 weights out


of 37 weights in total, we explicitly stated all assumptions made for each CMP

components and their sub-elements. It is important to test the plausibility of


those assumptions given the available data, before performing any calibrations.

With the available data, a few assumptions still do not have sufficient information

to be tested. We attempted to test as many of the assumptions as possible to


show that they are plausible.

Calibration was performed on each CMP component individually, and then

together. For each CMP component, we performed multiple regression on the


tunable weights. The 15 calibrated weight values have changed quite significantly
from the expert opinion values. The model with new set of weights needed to be
validated again to ensure its performance is improved from the original one.

In phase 3, we performed similar empirical validation as the first two phases on

the new dataset of 19 data points. This dataset originates from the survey as in
phase 2; however, the final CMP values in this dataset are calculated based on the

new set of calibrated weights. The new MMRE value (0.2946) shows a significant

improvement from phase 2 (MMRE = 1.5155). Although the new MMRE value
is still greater than the standard 0.25 level, it is much closer to this value after
the calibration. We strongly believe that when more data on Cloud migration

projects come available in the future, further calibration can be performed on

other weights as well. In addition, the prediction at level 0.25 is 0.9474, which is

higher than the standard value 0.75, further supporting the claim that the CMP
model can be used as a reliable predictor for Cloud migration effort estimation.

179
7. CONCLUSIONS AND FUTURE DIRECTIONS

7.2 Research Contribution

This research has answered the research questions stated in Section 1.3. Through

the research process as described in Chapter 3, we have understood what mi-


gration tasks are required for a Cloud migration project, and how they can be

classified. Our understanding is captured in the taxonomy of migration tasks


presented in Chapter 4. We have also understood the cost implications of those

tasks on the Cloud migration effort. Our view on this is illustrated with the

CMP model in Chapter 5. The CMP model can be useful for multiple purposes:
(1) helping enterprises map out their migration tasks, (2) identifying the com-
plexity of each task, so that the right staff with right skills can be assigned tasks
accordingly, and (3) estimating the total effort required for the migration project.
To date, no other research has focused on the migration effort aspect of soft-

ware engineering in Cloud computing. In this section, we will elaborate on pri-


mary contributions of this study to the common knowledge of the domain of
interest.

Contribution 1 This research has initiated the application of effort estimation

and size measurement concepts from the traditional software engineering to Cloud

computing domain.

One of our contributions, which can also be seen as one of the difficulties we

encountered, is that no related research with the same focus on Cloud migration
effort is available; hence, the list of migration tasks, influential cost factors, or

validation data cannot be gathered from the literature review. We had to ex-

plore and develop everything from scratch. For example, we carried out a series

of migration experiments to understand how migration projects happen, what

180
7.2 Research Contribution

migration tasks are required, and what impact they have on the migration effort.

We performed more migration projects ourselves to initially validate our model.


We conducted surveys and interviews on external projects to collect more data

for further validation. All these activities that we have undertaken can be useful
for other research with similar focus in their starting phase or comparative study.

Contribution 2 This research has identified critical cost factors of Cloud mi-
gration effort.

We identified different factors that have significant impact on the migration

effort. These factors are categorized into internal and external factors. This is
aligned with traditional size measurement approaches. This research adds to the

existing body of knowledge relating to the size measurement cost drivers, offering
more critical factors in the Cloud migration context.

Contribution 3 This research has proposed a taxonomy of migration tasks to


the Cloud.

The taxonomy outlines possible migration tasks that any migration project

to the Cloud may encounter. It enables Cloud practitioners to gain an under-


standing of the combination of tasks involved in a Cloud migration project and

its implication on the amount of effort required. We derived these tasks from our
series of migration experiments of different application types to different Cloud

providers.

Contribution 4 This research has developed a size metric, Cloud Migration

Point (CMP), for estimating the size of Cloud migration projects, by recasting

181
7. CONCLUSIONS AND FUTURE DIRECTIONS

a well-known software size estimation model called Function Point (FP) into the

context of Cloud migration.

We adopted the three-phased approach of the FP model to estimating size


of individuals components involved in a migration project. In particular, we

focused on Cloud-relevant components of the migrated systems, including con-

nection changes, database migration, code modification, and installation and con-
figuration for the new environment in the Cloud. For each component, we per-

formed the measurement by identifying relevant activities that contribute to the


overall effort required for that component. Finally, we aggregated all individual
measurements into a single CMP value by calculating their weighted sum. The

weighted sum CMP provides an indication of how large the migration project is,
and it can be used as an indicator to Cloud migration effort estimation.

Contribution 5 This research has described the survey protocol to collect data
on past Cloud migration projects.

We have conducted a survey with external organizations and individuals to

collect data on how they migrated their system to the Cloud and how much time
they spent on the migration tasks. The response rate was quite low because many

of the migration exercises were mainly for exploration purposes, and not many

practitioners kept track of the time spent on each individual task. This survey

questionnaire and its protocol can certainly be re-used and improved to collect
more data on a wider range of projects.

Contribution 6 This research has demonstrated that the proposed metric is

practically useful as an indicator of migration effort estimation.

182
7.3 Research Limitation

We validated our CMP model by conducting an empirical evaluation. The

empirical validation shows that the metric is practically useful under a defined
set of assumptions. This research has outlined and justified each step in the

validation phase. The calibration process has been described and it can be re-
applied to calibrate the model further when there is a larger dataset.

Conclusion:
Our overall contribution is to shed light into Cloud migration and the tasks
involved, which enables Cloud practitioners to estimate the amount of effort re-
quired for the migration of legacy systems into the Cloud. This contributes

towards the cost-benefit analysis and the decision of whether it is worth to move
to the Cloud.

7.3 Research Limitation

Several limitations to this research have been identified, for example:

1. This research involved many exploratory activities, and the result cannot

be generalized to the general population of Cloud migration projects as a

whole at this stage, because there is not enough data. However, the process

of undertaking all activities in this research has been carefully recorded and
justified. This process can certainly be re-applied on a larger set of data to

generalize the result.

2. Data collection has been done mainly via web surveys, and the questions
and responses depend on the respondents’ personal interpretation and mem-

ory. In-person interviews would give more reliable and accurate responses,

183
7. CONCLUSIONS AND FUTURE DIRECTIONS

because it allows both interviewer and interviewee clarify any confusion. It

also gains more insights from the interviewee. However, we were not able
to conduct many interviews, because of time and geographical constraints.

3. The low response rate of the survey (10%), together with the nature of a

self-selected sample (because not everyone contacted will respond to the


survey), raise potential impact of the reliance on responses. The results

from these responses are not representative enough to be generalized to the

entire population.

4. For applications that require code modification for the Cloud environment,
CMP only assesses application code changes at “class” level, and employs

”Class Point” for the Code Modification Component. Hence, the CMP
model is only applicable for object-oriented applications. There are still
numerous legacy applications that are not object- oriented and that could
be migrated to the Cloud.

5. The calibration and validation of the CMP model were undertaken with a

small number of data points. The response rate from the survey is quite

low (less than 10%), and some responses were incomplete. The reason for

the low response rate is because not many respondents actually recorded
how long they spent on each migration task. Most of the projects from the
responses are small and medium projects. It was very hard to conduct sur-

veys or interviews with large organizations. The model would be validated

more effectively with data from larger scale projects.

6. This research has used the same data to calibrate the model parameters, as

184
7.4 Future Research Directions

for the final validation. This may result in overfitting problems, where the

accuracy of the model may not be applicable for other datasets. We argue
that we divided the dataset into multiple subsets for the calibration, and

then the whole dataset for the validation, which could increase the reliability
of the validation results to some extent. Having said that, the results of

the validation were still biased. However, this very threat of validity is

unavoidable at this stage. In the future, when more data points become
available, a full exercise of calibration and validation can be executed again
using the same methodology presented. This can be a worthwhile future
direction for this research.

7. The four components of the CMP model and their steps were only validated

with internal and self-development projects. Questions from the survey were
also asked to invite more suggestions and insights on additional tasks that
may be required in a Cloud migration project. However, there were no

relevant comments received. The CMP model itself, besides the weights,

would need to be further validated with external projects if available in the


future.

7.4 Future Research Directions

Accurate effort estimation has always attracted a lot of attention from the tra-
ditional software engineering community because of its difficulty and complexity.

Casting this concept into the Cloud migration context increases the difficulty be-
cause there are even more angles to investigate. This research has investigated

several aspects of this problem, such as: exploring the cost implications of Cloud

185
7. CONCLUSIONS AND FUTURE DIRECTIONS

migration projects, identifying internal and external cost factors, proposing a

taxonomy of migration tasks, and developing a size metric as an indicator for the
migration effort estimation. However, other important aspects also require inves-

tigation in order to accurately estimate the effort required, as well as to better


assist the decision making process of enterprises on whether it is worth migrating

to the Cloud.

Some further research directions worth pursuing have emerged as a result of


this research, for example:

1. This research focuses mainly on internal cost factors of Cloud migration

projects. The survey has collected data on how some external factors affect
the migration effort (Appendix B), but they were not incorporated in our
result. Future research can used these data and investigate further a list of
external cost factors to determine if they are really cost factors and if the

list is complete. This can be tackled by examining the causal relationship


between the cost factors and the effort required. However, a causal rela-
tionship is very hard to prove in software engineering domain because the

involved factors are normally closely coupled.

2. An effort estimation model can be developed for Cloud migration projects.


The size metric developed in this thesis can be used as an important input

variable into the effort estimation model. All that is required is a sufficient

set of data on effort spent on past migration projects. This future research

can be achieved with a wider range survey or more interviews with larger
organizations, or with proper case studies.

3. This research developed one type of size metric for Cloud migration projects.

186
7.4 Future Research Directions

We believe that the methodology employed is the best suited for this pur-

pose. However, future research can explore different methodologies to build


different size metrics for the same context. A comparative study can be

performed to decide which size metric provides the most accurate effort
predictions. MMRE has been quite widely criticised for its accuracy when

it comes to select the best model. Hence, the comparative study should

also considered other alternatives, such as MMER (Mean Magnitude of Er-


ror Relative to the Estimate) or RSD (Relative Standard Deviation) (Foss
et al., 2003).

4. This research only examines object-oriented applications to be migrated to


the Cloud. Future research can explore different options for the Code Modi-

fication Component rather than employing the ”Class Point” measurement,


so that it is applicable for other types of applications and is not limited to
only object-oriented systems.

5. This research aims to understand the cost of migrating a system to the


Cloud. Future research can use this result as a component to build a cost-

benefit framework to assist the decision of whether one should migrate to

the Cloud. The input into this framework is the type of system currently
in use, the type of the Cloud targeted, and performance requirements. The

framework will then quantify both benefits and costs of having the system in
the Cloud. The output of the framework can assist enterprises to conclude

if the benefits outweigh the costs and if moving to the Cloud is a wise

decision.

6. The validation in this research is empirically based, because there is no well-

187
7. CONCLUSIONS AND FUTURE DIRECTIONS

established framework for the theoretical validation. Future research can

establish a mathematical framework for validation size-complexity hybrid


metrics, which are related to both processes and products.

188
Bibliography

Abadi, D.J. (2009). Data management in the cloud: Limitations and opportu-
nities. IEEE Data Eng. Bull., 32, 3–12.

Abran, A. (1999). Functional size measurement for real time and embedded soft-
ware. In Proceedings of the 4th IEEE International Symposium and Forum on
Software Engineering Standards, 259–, IEEE Computer Society, Washington,

DC, USA. 39, 42

Abran, A. & Maya, M. (1995). A sizing measure for adaptive maintenance

work products. In Proceedings of the International Conference on Software


Maintenance, ICSM ’95, 286–, IEEE Computer Society, Washington, DC, USA.

125

Abran, A. & Robillard, P.N. (1994). Function points: a study of their

measurement processes and scale transformations. J. Syst. Softw., 25, 171–


184. 39

Aggarwal, S. & McCabe, L. (2009). The compelling tco case for cloud com-

puting in smb and mid market enterprises. Whitepaper, sponsored by NetSuite.


3

189
BIBLIOGRAPHY

Agrawal, R., Ailamaki, A., Bernstein, P.A., Brewer, E.A., Carey,

M.J., Chaudhuri, S., Doan, A., Florescu, D., Franklin, M.J.,


Garcia-Molina, H., Gehrke, J., Gruenwald, L., Haas, L.M.,

Halevy, A.Y., Hellerstein, J.M., Ioannidis, Y.E., Korth, H.F.,


Kossmann, D., Madden, S., Magoulas, R., Ooi, B.C., O’Reilly, T.,

Ramakrishnan, R., Sarawagi, S., Stonebraker, M., Szalay, A.S. &

Weikum, G. (2009). The claremont report on database research. Commun.


ACM , 52, 56–65. 4

Albanes, D. (2009). Vitamin supplements and cancer prevention: Where do


randomized controlled trials stand? Journal of the National Cancer Institute,

101, 2–4.

Albrecht, A. & Gaffney, J. (1983). Software function, source lines of code,

and development effort prediction: A software science validation. IEEE Trans-


actions on Software Engineering, 9, 639–648. 36, 37, 67

Amazon (2009). Amazon elastic compute cloud. xvii, 4, 5, 7, 10, 11

Amazon (2011). Amazon web services blog. 13

Antoniol, G., Lokan, C., Caldiera, G. & Fiutem, R. (1999). A function


point-like measure for object-oriented software. Empirical Software Engineer-
ing, 4, 263–287. 39, 40

Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R.H., Kon-

winski, A., Lee, G., Patterson, D.A., Rabkin, A., Stoica, I. & Za-
haria, M. (2009). Above the clouds: A berkeley view of cloud computing.

190
BIBLIOGRAPHY

Tech. rep., Electrical Engineering and Computer Sciences, University of Cali-

fornia at Berkeley. 2, 3, 4, 10, 11

Babar, M.A. & Chauhan, M.A. (2011). A tale of migration to cloud comput-

ing for sharing experiences and observations. In Proceedings of the 2nd Inter-
national Workshop on Software Engineering for Cloud Computing, SECLOUD

’11, 50–56, ACM, New York, NY, USA. 29

Baird, B. (1989). Managerial Decisions Under Uncertainty. Baird, B., John

Wiley & Sons. 33, 34

Banker, R.D., Kauffman, R.J. & Kumar, R. (1991). An empirical test of

object-based output measurement metrics in a computer aided software engi-


neering (case) environment. J. Manage. Inf. Syst., 8, 127–150. 33, 35, 42

Bisbal, J., Lawless, D., Wu, B., Grimson, J., Wade, V., Richard-

son, R. & O’Sullivan, D. (1997). An overview of legacy information sys-


tem migration. In Software Engineering Conference, 1997. Asia Pacific and

International Computer Science Conference 1997. APSEC ’97 and ICSC ’97.
Proceedings, 529 –530. 32

Bisbal, J., Lawless, D., Wu, B. & Grimson, J. (1999). Legacy information

systems: issues and directions. Software, IEEE , 16, 103 –111. 32

Boehm, B., Clark, B., Horowitz, E., Madachy, R., Shelby, R. &
Westland, C. (1995). Cost Models for Future Software Life Cycle Processes:

COCOMO 2.0. Annals of Software Engineering, 1, 57–94. 35, 42

191
BIBLIOGRAPHY

Boehm, B., Abts, C. & Chulani, S. (2000). Software development cost es-

timation approaches a survey. Annals of Software Engineering, 10, 177–205.


33, 40

Boehm, B.W. (1981). Software Engineering Economics. Prentice Hall PTR,

Upper Saddle River, NJ, USA.

Briand, L., Morasca, S. & Basili, V. (1996). Property-based software en-


gineering measurement. IEEE Transactions on Software Engineering, 22, 68

–86. 129, 130, 131, 132, 134, 176

Briand, L., El Emam, K., Surmann, D., Wieczorek, I. & Maxwell,


K. (1999). An assessment and comparison of common software cost estima-
tion modeling techniques. In Proceedings of the International Conference on

Software Engineering ICSE , 313 –323. 137

Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J. & Brandic, I. (2008).

Cloud computing and emerging it platforms: Vision, hype, and reality for
delivering computing as the 5th utility. Future Generation Computer Systems,

25, 599–616. 4

Calheiros, R.N., Ranjan, R., Rose, C.A.F.D. & Buyya, R. (2009).


Cloudsim: A novel framework for modeling and simulation of cloud computing

infrastructures and services. CoRR.

Carriere, J., Kazman, R. & Ozkaya, I. (2010). A cost-benefit framework for

making architectural decisions in a business context. In ICSE ’10: Proceedings


of the 32nd ACM/IEEE International Conference on Software Engineering,

149–157, ACM, New York, NY, USA. 15

192
BIBLIOGRAPHY

Cetin, S., Ilker Altintas, N., Oguztuzun, H., Dogru, A., Tufekci, O.

& Suloglu, S. (2007). Legacy migration to service-oriented computing with


mashups. In Software Engineering Advances, 2007. ICSEA 2007. International

Conference on, 21. 32

Chang, F., Dean, J., Ghemawa, S., Hsieh, W.C., Wallach, D.A., Bur-
rows, M., Chandra, T., Fikes, A. & Gruber, R.E. (2006). Bigtable: A

distributed storage system for structured data. In OSDI ’06 , 205–218. 4

Chappell, D. (2008). Introducing the azure services platform. Whitepaper,

sponsored by Microsoft Corporation. 4

Chappell, D. (2011). Opinari - david chappell’s blog. 14, 51

Chauhan, M.A. & Babar, M.A. (2011). Migrating service-oriented system to


cloud computing: An experience report. Cloud Computing, IEEE International

Conference on, 0, 404–411. 29

Chow, R., Golle, P., Jakobsson, M., Shi, E., Staddon, J., Masuoka,

R. & Molina, J. (2009). Controlling data in the cloud: outsourcing computa-

tion without outsourcing control. In CCSW ’09: Proceedings of the 2009 ACM
workshop on Cloud computing security, 85–90, ACM, New York, NY, USA.

Cleary, D. (2000). Web-based Development and Functional Size Measurement.

In IFPUG 2000 Annual Conference, Charismatek Software Metrics. 40

Conte, S.D., Dunsmore, H.E. & Shen, Y.E. (1986). Software engineering

metrics and models. Benjamin-Cummings Publishing Co., Inc., Redwood City,


CA, USA. 136

193
BIBLIOGRAPHY

Costagliola, G., Ferrucci, F., Tortora, G. & Vitiello, G. (2005).

Class point: An approach for the size estimation of object-oriented systems.


IEEE Transactions on Software Engineering, 31, 52–74. 39, 41, 96, 110, 129,

137, 152, 154, 174

Creswell, J.W. (2002). Research design : qualitative, quantitative, and mixed

methods approaches. Sage Publ., 2nd edn. 47, 57

de Assuncao, M.D., di Costanzo, A. & Buyya, R. (2009). Evaluating


the cost-benefit of using cloud computing to extend the capacity of clusters.

In HPDC ’09: Proceedings of the 18th ACM international symposium on High


performance distributed computing, 141–150, ACM, New York, NY, USA. 15,
29

Dean, J. & Ghemawat, S. (2004). Mapreduce: Simplified data processing on

large clusters. In OSDI ’04 .

Deelman, E., Singh, G., Livny, M., Berriman, B. & Good, J. (2008).
The cost of doing science on the cloud: the montage example. In SC ’08: Pro-

ceedings of the 2008 ACM/IEEE conference on Supercomputing, 1–12, IEEE

Press, Piscataway, NJ, USA.

Dekkers, T., Vogelezang, F. & V, S.N.B. (2003). Cosmic full function


points: Additional to or replacing fpa. In Proceedings of the Ninth International

Software Metrics Symposium, ACOSM. 39

Dolado, J.J. (2000). A validation of the component-based method for software


size estimation. IEEE Trans. Softw. Eng., 26, 1006–1021. 36

194
BIBLIOGRAPHY

Efron, B. & Gong, G. (1983). A leisurely look at the bootstrap, the jackknife,

and cross-validation. The American Statistician, 37, 36–48. 136

Elmore, A.J., Das, S., Agrawal, D. & El Abbadi, A. (2011). Zephyr: live
migration in shared nothing databases for elastic cloud platforms. In Proceed-

ings of the 2011 international conference on Management of data, SIGMOD

’11, 301–312, ACM, New York, NY, USA. 31

Elmroth, E. & Larsson, L. (2009). Interfaces for placement, migration, and


monitoring of virtual machines in federated clouds. International Conference

on Grid and Cooperative Computing, 0, 253–260.

Erdogmus, H. (2009). Cloud computing: Does nirvana hide behind the nebula?
Software, IEEE , 26, 4 –6. 1

Finnie, G.R., Wittig, G.E. & Desharnais, J.M. (1997). A comparison


of software effort estimation techniques: Using function points with neural

networks, case-based reasoning and regression models. Journal of Systems and


Software, 39, 281 – 289. 36, 37, 51

Foss, T., Stensrud, E., Kitchenham, B. & Myrtveit, I. (2003). A simu-

lation study of the model evaluation criterion mmre. IEEE Trans. Softw. Eng.,
29, 985–995. 187

Frey, S. & Hasselbring, W. (2011). An extensible architecture for detecting

violations of a cloud environment’s constraints during legacy software system

migration. Software Maintenance and Reengineering, European Conference on,


0, 269–278. 31

195
BIBLIOGRAPHY

Gabner, R., Schwefel, H.P., Hummel, K.A. & Haring, G. (2011). Op-

timal model-based policies for component migration of mobile cloud services.


Network Computing and Applications, IEEE International Symposium on, 0,

195–202.

Ghemawat, S., Gobioff, H. & Leung, S.T. (2003). The google file system.

SIGOPS Oper. Syst. Rev., 37, 29–43. 4

Google (2009). Google app engine. xvii, 4, 6, 7, 9, 10, 11

Google (2011). Google trends. 12

Group, C.X. (2002). Estimating internet developement. 41

Hajjat, M., Sun, X., Sung, Y.W.E., Maltz, D., Rao, S., Sripanid-

kulchai, K. & Tawarmalani, M. (2010). Cloudward bound: planning for


beneficial migration of enterprise applications to the cloud. In Proceedings of the
ACM SIGCOMM 2010 conference on SIGCOMM , SIGCOMM ’10, 243–254,

ACM, New York, NY, USA. 12, 14, 27, 32

Hamilton, J. (2011). Perspective - james hamilton’s blog. 14, 51

Hao, W., Yen, I.L. & Thuraisingham, B. (2009). Dynamic service and

data migration in the clouds. Computer Software and Applications Conference,

Annual International , 2, 134–139. 28, 32

Hazelhurst, S. (2008). Scientific computing using virtual high-performance


computing: a case study using the amazon elastic computing cloud. In SAIC-

SIT ’08: Proceedings of the 2008 annual research conference of the South

196
BIBLIOGRAPHY

African Institute of Computer Scientists and Information Technologists on IT

research in developing countries, 94–103, ACM, New York, NY, USA.

Helmer, O. (1966). Social Technology. Helmer, O., Basic Books, New York,

NY, USA. 33, 34

Ho, Y., Liu, P. & Wu, J.J. (2011). Server consolidation algorithms with
bounded migration cost and performance guarantees in cloud computing. Util-

ity and Cloud Computing, IEEE Internatonal Conference on, 0, 154–161. 28

IFPUG (2010). Function point counting practices manual. 67

Jayasinghe, D., Malkowski, S., Wang, Q., Li, J., Xiong, P. & Pu, C.

(2011). Variations in performance and scalability when migrating n-tier appli-


cations to different clouds. Cloud Computing, IEEE International Conference
on, 0, 73–80. 32

Ji, W., Ma, J. & Ji, X. (2009). A reference model of cloud operating and
open source software implementation mapping. Enabling Technologies, IEEE
International Workshops on, 0, 63–65. 4

Jorgensen, M. (2004). A review of studies on expert estimation of software


development effort. Journal of Systems and Software, 70, 37 – 60. 34, 51

Jorgensen, M. & Shepperd, M. (2007). A systematic review of software de-

velopment cost estimation studies. IEEE Transactions on Software Engineer-

ing, 33, 33–53. 33, 35

Kanmani, S., Kathiravan, J., Kumar, S.S. & Shanmugam, M. (2007).

Neural network based effort estimation using class points for oo systems. In

197
BIBLIOGRAPHY

Proceedings of the International Conference on Computing: Theory and Appli-

cations, 261–266, IEEE Computer Society, Washington, DC, USA. 41

Kanmani, S., Kathiravan, J., Kumar, S.S. & Shanmugam, M. (2008).


Class point based effort estimation of oo systems using fuzzy subtractive clus-

tering and artificial neural networks. In Proceedings of the 1st India software

engineering conference, ISEC ’08, 141–142, ACM, New York, NY, USA. 42

Karner, G. (1993). Resource Estimation for Objectory Projects. Objectory Sys-


tems. 39

Kazman, R., Asundi, J. & Klein, M. (2001). Quantifying the costs and ben-

efits of architectural decisions. In ICSE ’01: Proceedings of the 23rd Interna-


tional Conference on Software Engineering, 297–306, IEEE Computer Society,
Washington, DC, USA.

Keung, J.W., Kitchenham, B.A. & Jeffery, D.R. (2008). Analogy-x: Pro-

viding statistical inference to analogy-based software cost estimation. IEEE


Trans. Softw. Eng., 34, 471–484. 33

Khajeh-Hosseini, A., Greenwood, D. & Sommerville, I. (2010a). Cloud

migration: A case study of migrating an enterprise it system to iaas. In Cloud


Computing (CLOUD), 2010 IEEE 3rd International Conference on, 450 –457.
12, 14, 26

Khajeh-Hosseini, A., Sommerville, I. & Sriram, I. (2010b). Research

challenges for enterprise cloud computing. Tech. rep., Cloud Computing Co-
laboratory, School of Computer Science, University of St Andrews, UK.

198
BIBLIOGRAPHY

Khajeh-Hosseini, A., Sommerville, I., Bogaerts, J. & Teregowda,

P. (2011). Decision support tools for cloud migration in the enterprise. Cloud
Computing, IEEE International Conference on, 0, 541–548. 27

Kitchenham, B. (1997). Counterpoint: The problem with function points.

IEEE Softw., 14, 29–. 125

Kitchenham, B., Pfleeger, S., Pickard, L., Jones, P., Hoaglin, D.,
El Emam, K. & Rosenberg, J. (2002). Preliminary guidelines for empirical
research in software engineering. Software Engineering, IEEE Transactions on,

28, 721 – 734.

Klems, M., Nimis, J. & Tai, S. (2009). Do clouds compute? a framework


for estimating the value of cloud computing. Designing E-Business Systems.
Markets, Services, and Networks, 22, 110–123. 28

Kundra, V. (2010). State of public sector cloud computing. 12, 13

Lai, R. & Huang, S.J. (2003). A model for estimating the size of a formal com-

munication protocol specification and its implementation. IEEE Trans. Softw.

Eng., 29, 46–62. 36

Leake, G. (2006). Microsoft .net pet shop 4: Migrating an asp.net 1.1 applica-

tion to 2.0. 49, 67

Lederer, A. & Prasad, J. (1998). A causal model for software cost estimating

error. Software Engineering, IEEE Transactions on, 24, 137 –148.

199
BIBLIOGRAPHY

Lenk, A., Klems, M., Nimis, J., Tai, S. & Sandholm, T. (2009). What’s

inside the cloud? an architectural map of the cloud landscape. Software Engi-
neering Challenges of Cloud Computing, ICSE Workshop on, 0, 23–31. 5

Li, H., Zhong, L., Liu, J., Li, B. & Xu, K. (2011a). Cost-effective partial

migration of vod services to content clouds. Cloud Computing, IEEE Interna-


tional Conference on, 0, 203–210. 28

Li, W., Tordsson, J. & Elmroth, E. (2011b). Modeling for dynamic cloud

scheduling via migration of virtual machines. Cloud Computing Technology and


Science, IEEE International Conference on, 0, 163–171.

Li, W.S., Hsiung, W.P., Po, O., Hino, K., Candan, K.S. & Agrawal,
D. (2004). Challenges and practices in deploying web acceleration solutions for

distributed enterprise systems. In WWW ’04: Proceedings of the 13th interna-


tional conference on World Wide Web, 297–308, ACM, New York, NY, USA.
50

Linthicum, D. (2011). Cloud computing - david linthicum’s blog. 14, 51

Lokan, C.J. (1998). An empirical analysis of function point adjustment factors.

Information and Software Technology, 42, 649–660. 125

Low, G.C. & Jeffery, D.R. (1990). Function points in the estimation and
evaluation of the software process. IEEE Trans. Softw. Eng., 16, 64–71. 125

Madachy, R. (1997). Heuristic risk assessment using cost factors. Software,

IEEE , 14, 51 –59. 74

200
BIBLIOGRAPHY

Mark Basler, D.N., Sean Brydon & Singh, I. (2010). Introducing the java

pet store 2.0 application.

Mastroeni, L. & Naldi, M. (2011). Long-range evaluation of risk in the migra-


tion to cloud storage. E-Commerce Technology, IEEE International Conference

on, 0, 260–266. 27, 28

Matson, J.E., Barrett, B.E. & Mellichamp, J.M. (1994). Software de-
velopment cost estimation using function points. IEEE Trans. Softw. Eng., 20,
275–287. 39, 125

Mehta, N.R., Medvidovic, N. & Phadke, S. (2000). Towards a taxonomy

of software connectors. In Proceedings of the 22nd international conference on


Software engineering, ICSE ’00, 178–187, ACM, New York, NY, USA. 66

Meng, X., Shi, J., Liu, X., Liu, H. & Wang, L. (2011). Legacy application
migration to cloud. Cloud Computing, IEEE International Conference on, 0,

750–751.

Mens, T. & Gorp, P.V. (2006). A taxonomy of model transformation. Elec-

tronic Notes in Theoretical Computer Science, 152, 125 – 142, proceedings of


the International Workshop on Graph and Model Transformation (GraMoT
2005). 64, 65

Microsoft (2009). Microsoft azure platform. xvii, 4, 6, 7, 8, 10, 11

Microsoft (2012). See how startups are using windows azure today. 13

201
BIBLIOGRAPHY

Mikkilineni, R. & Sarathy, V. (2009). Cloud computing and the lessons

from the past. Enabling Technologies, IEEE International Workshops on, 0,


57–62. 4

Mohagheghi, P. & Saether, T. (2011). Software engineering challenges for

migration to the service cloud paradigm: Ongoing work in the remics project.
Services, IEEE Congress on, 0, 507–514. 31

Mohagheghi, P., Anda, B. & Conradi, R. (2005). Effort estimation of

use cases for incremental large-scale software development. In Proceedings of


the 27th international conference on Software engineering, ICSE ’05, 303–311,
ACM, New York, NY, USA. 39

Mudge, J.C. (2010). CLOUD COMPUTING: opportunities and challenges for

australia. Tech. rep., The australian academy of Technological sciences and


engineering, Melbourne, Victoria. 1

Network, S.D. (2010). Java blueprints.

Niessink, F. & Vliet, H.v. (1997). Predicting maintenance effort with func-
tion points. In Proceedings of the International Conference on Software Main-

tenance, 32–39, IEEE Computer Society, Washington, DC, USA. 112, 153

Padioleau, Y., Tan, L. & Zhou, Y. (2009). Listening to programmers. Soft-


ware Engineering, International Conference on, 0, 331–341. 65

Palankar, M.R., Iamnitchi, A., Ripeanu, M. & Garfinkel, S. (2008).

Amazon s3 for science grids: a viable solution? In Proceedings of the 2008

202
BIBLIOGRAPHY

international workshop on Data-aware distributed computing, DADC ’08, 55–

64, ACM, New York, NY, USA. 4

Piao, J.T. & Yan, J. (2010). A network-aware virtual machine placement and

migration approach in cloud computing. Grid and Cloud Computing, Interna-

tional Conference on, 0, 87–92. 31

Reifer, D. (2000). Web development: estimating quick-to-market software. Soft-

ware, IEEE , 17, 57 –64. 39, 40

RightScale (2009). Rightscale cloud management.

Rochwerger, B., Breitgand, D., Levy, E., Galis, A., Nagin, K.,
Llorente, I.M., Montero, R., Wolfsthal, Y., Elmroth, E., Cac-
eres, J., Ben-Yehuda, M., Emmerich, W. & Galan, F. (2009). The

reservoir model and architecture for open federated cloud computing. IBM
Journal of Research and Development, 53.

Rosenberg, J. (1997). Some misconceptions about lines of code. IEEE Inter-

national Symposium on Software Metrics, 0, 137. 36

Ruhe, M., Jeffery, R. & Wieczorek, I. (2003a). Cost estimation for web
applications. In ICSE ’03: Proceedings of the 25th International Conference

on Software Engineering, 285–294, IEEE Computer Society, Washington, DC,


USA. 74

Ruhe, M., Jeffery, R. & Wieczorek, I. (2003b). Using web objects for

estimating software development effort for web applications. In Proceedings of

203
BIBLIOGRAPHY

the 9th International Symposium on Software Metrics, 30–39, IEEE Computer

Society, Washington, DC, USA. 36

SalesForce (2012). Roi for it. 3

Shepperd, M. & Schofield, C. (1997). Estimating software project effort

using analogies. IEEE Transactions on Software Engineering, 23, 736 –743.

33, 51, 53

Singh, I., Stearns, B. & Johnson, M. (2002). Designing enterprise applica-


tions with the J2EE platform. Addison-Wesley Longman Publishing Co., Inc.,

Boston, MA, USA. 50

Smith, D. (2007). Migration of legacy assets to service-oriented architecture

environments. In Software Engineering - Companion, 2007. ICSE 2007 Com-


panion. 29th International Conference on, 174 –175. 32

Smith, J.W. (2009). A comparison of public cloud platforms. Tech. rep., StACC:
St Andrews Cloud Computing Collaboratory.

Sommerville, I. (2006). Software Engineering. Pearson Education, 8th edn. 44

Suen, C.H., Kirchberg, M. & Lee, B.S. (2011). Efficient migration of virtual

machines between public and private cloud. Cloud Computing Technology and
Science, IEEE International Conference on, 0, 549–553. 16

Symons, C.R. (1988). Function point analysis: Difficulties and improvements.

IEEE Trans. Softw. Eng., 14, 2–11. 125

Symons, C.R. (1991). Software sizing and estimating: Mk II FPA (Function


Point Analysis). John Wiley & Sons, Inc., New York, NY, USA. 42

204
BIBLIOGRAPHY

Symons, F.C. & Symons, C. (2001). Come back function point analysis (mod-

ernised) – all is. In Software Measurement Services Ltd , 413–426. 43

Thakar, A. & Szalay, A. (2010). Migrating a (large) science database to

the cloud. In Proceedings of the 19th ACM International Symposium on High

Performance Distributed Computing, HPDC ’10, 430–434, ACM, New York,


NY, USA. 30, 31

Tilley, S. & Parveen, T. (2010). Migrating software testing to the cloud.


Software Maintenance, IEEE International Conference on, 0, 1.

Tran, V., Keung, J., Liu, A. & Fekete, A. (2011a). Application migration
to cloud: A taxonomy of critical factors. In Proceedings of the ICSE Software
Engineering For Cloud Computing Workshop, SECLOUD, ACM, New York,

NY, USA.

Tran, V., Lee, K., Fekete, A., Liu, A. & Keung, J. (2011b). Size estima-
tion of cloud migration projects with cloud migration point (cmp). In Proceed-

ings of the 5th International Symposium on Empirical Software Engineering


and Measurement, ESEM, ACM.

Truong, H.L. & Dustdar, S. (2010). Composable cost estimation and mon-

itoring for computational applications in cloud computing environments. Pro-


cedia Computer Science, 1, 2169 – 2178, iCCS 2010.

Tukey, J.W. (1958). Bias and confidence in not-quite large samples. The Annals

of Mathematical Statistics, 29, 614. 136

UKSMA (1998). Mkii function point analysis counting practices manual. 42

205
BIBLIOGRAPHY

Vaquero, L.M., Rodero-Merino, L., Caceres, J. & Lindner, M.

(2009a). A break in the clouds: towards a cloud definition. SIGCOMM Com-


puter Communication Review , 39, 50–55. 2

Vaquero, L.M., Rodero-Merino, L., Caceres, J. & Lindner, M.

(2009b). A break in the clouds: towards a cloud definition. SIGCOMM Com-

puter Communication Review , 39, 50–55.

Venugopal, S., Desikan, S. & Ganesan, K. (2011). Effective migration of


enterprise applications in multicore cloud. Utility and Cloud Computing, IEEE

Internatonal Conference on, 0, 463–468. 31

Verma, A., Kumar, G., Koller, R. & Sen, A. (2011). Cosmig: Modeling
the impact of reconfiguration in a cloud. Modeling, Analysis, and Simulation
of Computer Systems, International Symposium on, 0, 3–11. 15, 28, 32

Verner, J. & Tate, G. (1992). A software size model. IEEE Trans. Softw.

Eng., 18, 265–278. 36

Ward, C., Aravamudan, N., Bhattacharya, K., Cheng, K., Filepp,

R., Kearney, R., Peterson, B., Shwartz, L. & Young, C. (2010).

Workload migration into clouds challenges, experiences, opportunities. In Cloud


Computing (CLOUD), 2010 IEEE 3rd International Conference on, 164 –171.
12

Yam, C.Y., Baldwin, A., Shiu, S. & Ioannidis, C. (2011). Migration

to cloud as real option: Investment decision under uncertainty. IEEE Trust-


Com/IEEE ICESS/FCST, International Joint Conference of , 0, 940–949. 27

206
BIBLIOGRAPHY

Ye, K., Jiang, X., Huang, D., Chen, J. & Wang, B. (2011). Live migration

of multiple virtual machines with resource reservation in cloud computing en-


vironments. Cloud Computing, IEEE International Conference on, 0, 267–274.

28

Yi, S., Andrzejak, A. & Kondo, D. (2011). Monetary cost-aware check-


pointing and migration on amazon cloud spot instances. IEEE Transactions

on Services Computing, 99.

Yin, R.K. (2003). Case study research : design and methods. Sage Publications,
3rd edn.

Youseff, L., Butrico, M. & Da Silva, D. (2008). Toward a unified ontology

of cloud computing. In Grid Computing Environments Workshop, 2008. GCE


’08 , 1–10. 4

Yuan, C., Chen, Y. & Zhang, Z. (2003). Evaluation of edge


caching/offloading for dynamic content delivery. In WWW ’03: Proceedings

of the 12th international conference on World Wide Web, 461–471, ACM, New
York, NY, USA. 50

Zhang, G., Chiu, L. & Liu, L. (2010). Adaptive data migration in multi-

tiered storage based cloud environment. Cloud Computing, IEEE International

Conference on, 0, 148–155. 31

207
BIBLIOGRAPHY

208
Appendix A

Cloud Migration Projects -


Survey Questionnaire

209
THE UNIVERSITY OF NEW SOUTH WALES AND NICTA

PARTICIPANT INFORMATION STATEMENT AND CONSENT FORM

Effort Estimation of Migration of Legacy Systems to Cloud


You are invited to participate in a study of factors in migrating legacy software applications to cloud computing
systems. We hope to learn more about important facts and also to check whether some of our models correctly capture
these factors. You were selected as a possible participant in this study because of your experience in migrating legacy
software applications to cloud computing systems.

If you decide to participate, we will conduct one interview with you at a time mutually agreed to. In the unlikely case
that there is a need for a follow-up interview, it will also be conducted at a mutually agreed time. Every interview will
be recorded with a voice recorder and should take no more than one hour to complete.

Results of this study might help you better understand factors in migrating legacy software applications to cloud
computing systems. This in turn might improve your work performance or customer satisfaction with your future
software. However, we cannot and do not guarantee or promise that you will receive any benefits from this study.

Any information that is obtained in connection with this study and that can be identified with you will remain
confidential and will be disclosed only with your permission, except as required by law. If you give us your permission
by signing this document, we plan to publish the summary results in a very general form at scientific conferences. The
purpose of this publication would be to inform the broader scientific community about how migration of legacy
applications to cloud computing systems can be alleviated. In any publication, information will be provided in such a
way that you, your company, the software tools that you used/supported/sold/developed, and the vendors of these
software tools cannot be identified.

Complaints may be directed to the Ethics Secretariat, The University of New South Wales, SYDNEY 2052
AUSTRALIA (phone 9385 4234, fax 9385 6648, email ethics.sec@unsw.edu.au). Any complaint you make will be
investigated promptly and you will be informed about the outcome.

After the completion of the study (likely in the second half of 2011), we will present you (and every other participant)
with summary results of this study (via email as a PDF file) and will ask you for some feedback. Your participation in
the feedback is voluntary (i.e. participation in interviews does not automatically imply participation in the feedback
process). If you are participating in the feedback process, you will be required to spend additional time to familiarize
yourself with study results and to provide some comments. The estimated time needed for the feedback is up to one
hour. If you wish to sign up for the feedback process now, you can do so by ticking the box on the next page. Please
note that you can withdraw from the feedback process any time by contacting us.

I would like to provide my feedback on a draft of the summary results.

Your decision whether or not to participate in this study will not prejudice your future relations with the University of
New South Wales and NICTA. If you decide to participate, you are free to withdraw your consent and to discontinue
participation at any time, without any prejudice. You can decline to answer any question, for whatever reason.

If you have any questions, please feel free to ask Thi Khanh Van Tran (phone: 02 9376 2259; e-mail: ThiKhanhVan.
Tran@nicta.com.au) or Kevin Lee (phone: 02 9376 2207, e-mail: Kevin.Lee@nicta.com.au). If you have any
additional questions later, Thi Khanh Van Tran or Kevin Lee will be happy to answer them.

You will be given a copy of this form to keep.

Page 1 of 12
THE UNIVERSITY OF NEW SOUTH WALES AND NICTA

PARTICIPANT INFORMATION STATEMENT AND CONSENT FORM (continued)

Effort Estimation of Migration of Legacy Systems to Cloud

You are making a decision whether or not to participate in this research study. Your signature indicates that,
having read the information provided above, you have decided to participate.

…………………………………………………… .…………………………………………………….
Signature of Research Participant Signature of Witness

…………………………………………………… .…………………………………………………….
(Please PRINT name) (Please PRINT name)

…………………………………………………… .…………………………………………………….
Date Nature of Witness

REVOCATION OF CONSENT

Effort Estimation of Migration of Legacy Systems to Cloud


I hereby wish to WITHDRAW my consent to participate in the research proposal described above and understand that
such withdrawal WILL NOT jeopardise any treatment or my relationship with The University of New South Wales
and NICTA.

…………………………………………………… .…………………………………………………….
Signature Date

……………………………………………………
Please PRINT Name

The section for Revocation of Consent should be forwarded to NICTA, Attn: Kevin Lee, Software Systems Research
Group, Locked Bag 9013, Alexandria NSW 1435.

Page 2 of 12
A survey on cost factors for migration effort to Cloud
This survey is designed to collect data on migration projects to cloud for determining significant
cost factors that affect migration effort to cloud.

There are 36 questions in this survey.

I. General questions

GQ1: What type of cloud did you migrate to? Please specify.
Check any that apply

IaaS __________________________________
PaaS __________________________________
SaaS __________________________________

GQ2: What components of your system did you migrate to cloud?


Check any that apply

Web Application
Desktop Software Application
Web Server
Database Server
Database
Operating Systems
Other: _____________________________

GQ3: Did you migrate the whole system to cloud?


Choose one of the following answers

The entire system was migrated to cloud


A part of the system was migrated to cloud, the rest stays in house
No answer

Page 3 of 12
II. Cost factors
Questions in this section focus on any cost factors that influence migration effort to cloud.

CF1: Have development team done any similar projects on Cloud before?

Yes
No
No answer

CF2: What is development team's expertise?


Check any that apply

Database
Networking
Software Architecture
Other: ______________________________________

CF3: Please rate the following factors on how they influenced your migration effort to cloud?
1 - None to minor influence, 5 - Significant influence

1 2 3 4 5 No answer
Developers'
expertise

Experience in
software
development

Experience in cloud

Design quality of
migration tasks

Choice of cloud
services

Page 4 of 12
CF4: Are there any other factors influencing the migration effort?

_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________

III. Database Migration


Questions in this section focus on the database migration part of your system

DB1: Did you migrate your database to Cloud?


This includes the migration of data, database server, etc...
If yes, there will be a few questions related to database migration tasks.
If no, you will be taken to the next section.

Yes
No

DB2: What database did you use before the migration?


Choose one of the following answers

MySQL
MSSQL 2008 or later
MSSQL 2005 or older
PostgreSQL
MSAccess
Other: _________________________________
No answer

Page 5 of 12
DB3: What database did you migrate to?
Choose one of the following answers. If you installed your own database server in cloud (e.g., in
an EC2 instance), please specify.

Amazon RDS
Amazon SimpleDB
Amazon S3
Microsoft SQL Azure
Google Bigtable
Other: ______________________________
No answer

DB4: How many SQL queries did you modify for your system to adapt to the new database
in cloud?
Choose one of the following answers
None
1 - 10
More than 10
No answer

DB6: How many GBs of data did you migrate to cloud?


Only numbers may be entered in this field

DB7: How many person-hours did it take to migrate all data to cloud?
Only numbers may be entered in this field

Page 6 of 12
DB8: Did you perform any of the following for your database to adapt to the new database
in cloud?
Check any that apply

Modify database schema


Split data into multiple databases
Replicate data into multiple databases
None
Other: ____________________________________

DB9: How many person-hours did it take to perform those tasks?


Only numbers may be entered in this field

DB10: Did you carry out any other activities for database migration, and how many person-
hours did it take?

_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________

IV. Installation and Configuration


Questions in this section focus on installation and configuration tasks.

IC1: Did you install or configure any software in cloud?

Yes
No

Page 7 of 12
IC2: How many software were installed to set up the environment in cloud?
e.g., Operation systems, database servers, web servers, etc...
Only numbers may be entered in these fields

Installed from binary files


Installed from source code
No installation required

IC3: How many software were reconfigured?


Configuration variable examples are pathname, environment variables, etc...
Only numbers may be entered in these fields.

Re-configured with less than 6 configuration variables


Re-configured with 6 configuration variables and more

IC4: How many person-hours did it take to complete all installation and configuration
tasks?
Only numbers may be entered in this field

IC5: Did you carry out any other activities for installation and configuration, and how many
person-hours did it take?

_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________

Page 8 of 12
V. Network Connections
Questions in this section focus on migration tasks related to network connection changes because
of the migration.

NC1: Do any components in your system connect with each other via the Internet or a local
network?
Yes
No

NC2: Did you carry out any tasks related to these network connections?
e.g., adding security such as VPC, optimizing network performance by changing packet size, etc...

Yes
No

NC3: How many connections in cloud that you have performed the following tasks?
Add security, i.e., secure a connection with VPC, or with secured protocol such as https
Optimize protocol for performance, such as changing TPC packet size, etc...
Only numbers may be entered in these fields.

Add security
Optimize protocol

NC4: How many connections across the Internet that you have performed the following
tasks?
Only numbers may be entered in these fields

Add security
Optimize protocol

NC5: How many person-hours did it take to complete all tasks related to connection?
Only numbers may be entered in this field

Page 9 of 12
NC6: Did you carry out any other activities for network connection, and how many person-
hours did it take?

_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________

VI. Code modification


Questions in this section focus on how application code has been modified.

CM1: Did you modify any parts of the application code?


Code modification can be for any purposes: add new functionality, modify data access layer to
adapt to a new database, etc...

Yes
No

CM2: How many classes were modified?


- Problem Domain Type (PDT): classes that represent real-world entities in the application
domain of the system.
- Human Interaction Type (HIT): classes designed for information visualization and human-
computer interaction.
- Data Management Type (DMT): classes that accommodate data storage and retrieval.
- Task Management Type (TMT): classes that are responsible for definition and control of tasks,
communications between subsystems and with external systems.

Only numbers may be entered in these fields

Human interaction classes


Data management classes
Task management classes
Problem domain classes

Page 10 of 12
CM3: How many Human Interaction classes were modified in
Only numbers may be entered in these fields

More than 5 attributes


More than 5 methods
More than 5 calls to other classes

CM4: How many Data Management classes were modified in


Only numbers may be entered in these fields

More than 5 attributes


More than 5 methods
More than 5 calls to other classes

CM5: How many Task Management classes were modified in


Only numbers may be entered in these fields

More than 5 attributes


More than 5 methods
More than 5 calls to other classes

CM6: How many Problem Domain classes were modified in:


Only numbers may be entered in these fields

More than 5 attributes


More than 5 methods
More than 5 calls to other classes

CM7: How many person-hours did it take to complete all code modification?
Only numbers may be entered in this field

Page 11 of 12
CM8: Did you carry out any other activities for code modification, and how many person-
hours did it take?

_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________

End of survey.
Thank you for your time and effort for taking this survey.
Please return the completed survey to thikhanhvan.tran@nicta.com.au or
tyao1801@uni.sydney.edu.au

Page 12 of 12
A. CLOUD MIGRATION PROJECTS - SURVEY
QUESTIONNAIRE

222
Appendix B

Survey Responses - Raw Data

223
Network Connection
LAN-to-LAN LAN-to-WAN WAN-to-LAN
ID Hours
Low Average High Low Average High Low Average High
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0
5 0 1 0 0 1 0 0 0 0 5
6 0 0 0 0 0 1 0 0 0 10
7 0 0 0 0 0 2 0 0 0 20
B. SURVEY RESPONSES - RAW DATA

8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0

224
10 0 5 5 0 5 5 0 0 0 100
11 0 2 0 0 2 0 0 0 0 20
12 1 0 0 1 0 0 0 0 0 2
13 3 0 0 0 0 0 0 0 0 2
14 0 0 1 0 0 2 0 0 0 20
15 0 0 0 0 0 0 0 0 0 0
16 1 0 0 0 0 0 0 0 0 2
17 1 0 0 1 0 0 0 0 0 2
18 1 0 0 1 0 0 0 0 0 2
19 0 0 0 0 0 0 0 0 0 0
Table B.1: Survey responses for network connection component
Code Modification
Problem Domain Human Interaction Data Management Task Management
ID Hrs
Low Average High Low Average High Low Average High Low Average High
1 0 0 20 0 0 5 0 0 0 0 0 20 250
2 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 5 0 0 0 40
4 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 1 0 0 1 0 0 1 0 0 1 20
6 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0

225
10 0 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 10 80
13 0 0 0 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 1 2 0 0 0 1 10
16 0 0 0 0 0 0 1 4 4 0 0 0 40
17 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 0

Table B.2: Survey responses for code modification component


Installation and Configuration
Application Infrastructure
ID Hours
Low Average High Low Average High
1 0 0 0 0 0 5 80
2 0 0 0 0 3 0 3
3 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0
5 0 0 0 0 2 3 50
6 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0
B. SURVEY RESPONSES - RAW DATA

8 0 0 0 0 3 1 24
9 0 0 0 7 0 0 6

226
10 0 0 0 0 0 0 0
11 0 0 0 0 3 2 50
12 0 0 0 0 0 15 300
13 0 0 0 3 2 0 7
14 0 0 0 0 7 0 20
15 0 0 0 1 4 0 14
16 0 0 1 1 0 0 10
17 0 0 0 0 3 0 4
18 0 0 0 2 2 0 8
19 0 0 0 0 12 0 48
Table B.3: Survey responses for installation and configuration component
Database Migration
Query Modification Data Population
ID Hours
Low Average High Low Average High
1 0 0 0 2 0 0 2
2 0 0 0 0 0 0 0
3 20 0 0 3 0 0 25
4 0 0 0 0 0 4 8
5 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0
7 0 0 0 0 2 0 5
8 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0

227
10 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0
13 0 0 0 2 0 0 1
14 0 0 0 2 0 0 2
15 0 5 0 0 2 0 7
16 0 0 8 0 0 2 15
17 0 0 0 2 0 0 2
18 0 0 0 2 0 0 2
19 0 0 0 0 0 0 0

Table B.4: Survey responses for database migration component


ID Dev. Expertise Exp. in Soft. Dev. Exp. in Cloud Design Quality of Mig. Tasks Choice of Cloud
1 5 5 1 0 0
2 3 3 4 0 1
3 4 4 5 1 4
4 3 3 3 2 4
5 5 5 5 2 4
6 2 3 5 4 4
7 4 3 5 4 5
8 4 4 3 2 5
B. SURVEY RESPONSES - RAW DATA

9 5 5 5 5 0
10 2 2 3 3 3

228
11 0 0 0 0 0
12 5 5 5 0 1
13 4 3 2 3 5
14 4 2 5 2 5
15 3 5 5 3 5
16 3 5 5 3 5
17 1 1 2 2 4
18 5 2 5 2 5
19 4 4 5 1 1
Table B.5: Survey responses for external cost factors

Вам также может понравиться